Training DeepRUOTv2

DeepRUOTv2 uses a flexible configuration system, where users can specify the parameters used to train DeepRUOT. We provide example configurations used to train on four scRNA-seq datasets: Mouse Blood Hematopoiesis (50D), Embryoid Body (50D), Pancreatic \(\beta\) -cell differentiation (30D) and A549 EMT (10D). The configurations are stored in the config/ folder.

To train DeepRUOTv2 on your own dataset, you need to convert your own dataset to a csv file and store it in the data/ folder. Specifically, the column samples refers to the biological time points starting from time 0, and it is recommended to normalize the time scales to a reasonable range. The following columns, starting from x1, refer to the gene expression features. After the dataset is prepared, modify these parts in the confuguration file:

device: 'cuda' # device to run the model

exp:
  name: "my_experiment"     # Experiment name

data:
  file_path: "data.csv"     # Path to your dataset, your dataset should be prepared as a csv file
  dim: 50                   # Data dimension

model:
  in_out_dim: 50 # Data dimension

For other hyperparameters, we recommend using the same settings as config/weinreb_config.yaml. Note that the default setting for the hyperparameter use_pinn, which controls whether to update the score model in the final training phase, is set to False. Setting it to True may achieve better performance but will significantly increase training time. For more efficient training, we recommend setting it to False. If you encounter CUDA out of memory error, you may set the parameters sample_size and score_batch_size to smaller values.

For training, simply specify the path to your configuration file, and run train_RUOT.py:

python train_RUOT.py --config config/<config_name>.yaml

For example, to reproduce our results on the Mouse Blood Hematopoiesis dataset, run:

python train_RUOT.py --config config/weinreb_config.yaml

Evaluation

After training, model checkpoints will be generated in the results/ directory: model_final and score_final, which can then be used to inference trajectories. We provide a Jupyter notebook to plot the learned results in evaluation/plot.ipynb. Downstream analysis can be conducted using the provided notebook in evaluation/analysis.ipynb.