cryovit.run

Functions to run feature extraction, training, evaluation, and inference for users.

Functions

run_dino(train_data, result_dir, batch_size)

Run DINO feature extraction on the specified training data, and saves the results as .hdf files.

run_evaluation(test_data, test_labels, ...)

Run evaluation on the specified test data and labels, saving result metrics as a .csv file.

run_inference(data_files, model_path, result_dir)

Run inference on the specified data files and saves the results.

run_training(train_data, train_labels, ...)

Run training on the specified data and labels.

run_dino(train_data: list[Path], result_dir: Path, batch_size: int, window_size: int | None = 630, visualize: bool = False) None[source]

Run DINO feature extraction on the specified training data, and saves the results as .hdf files. The saved result file will contain data, dino_features, and any labels present in the source tomogram in the labels/ group.

Parameters:
  • train_data (list[Path]) – List of paths to the training tomograms.

  • result_dir (Path) – Directory where the results will be saved.

  • batch_size (int) – Number of samples to process in each batch.

  • window_size (Optional[int], optional) – Size of the sliding window for feature extraction. If None, uses the default size.

  • visualize (bool, optional) – Whether to visualize the extracted features. Defaults to False.

run_evaluation(test_data: list[Path], test_labels: list[Path], labels: list[str], model_path: Path, result_dir: Path, visualize: bool = True) Path[source]

Run evaluation on the specified test data and labels, saving result metrics as a .csv file.

Parameters:
  • test_data (list[Path]) – List of paths to the test tomograms.

  • test_labels (list[Path]) – List of paths to the test labels.

  • labels (list[str]) – List of label names to evaluate.

  • model_path (Path) – Path to the trained model file.

  • result_dir (Path) – Directory where the evaluation results will be saved.

  • visualize (bool, optional) – Whether to visualize the evaluation results. Defaults to True.

Returns:

Path to the evaluation results file.

Return type:

Path

run_inference(data_files: list[Path], model_path: Path, result_dir: Path, threshold: float = 0.5) list[Path][source]

Run inference on the specified data files and saves the results.

Parameters:
  • data_files (list[Path]) – List of paths to the input data files.

  • model_path (Path) – Path to the trained model file.

  • result_dir (Path) – Directory where the inference results will be saved.

  • threshold (float, optional) – Threshold for binary classification. Defaults to 0.5.

Returns:

List of paths to the saved result files.

Return type:

list[Path]

run_training(train_data: list[Path], train_labels: list[Path], labels: list[str], model_type: ModelType, model_name: str, label_key: str, result_dir: Path, val_data: list[Path] | None = None, val_labels: list[Path] | None = None, num_epochs: int = 50, log_training: bool = False) Path[source]

Run training on the specified data and labels.

Parameters:
  • train_data (list[Path]) – List of paths to the training tomograms.

  • train_labels (list[Path]) – List of paths to the training labels.

  • labels (list[str]) – List of label names to train on.

  • model_type (ModelType) – Type of the model to train.

  • model_name (str) – Name of the model.

  • label_key (str) – Key for the label in the dataset.

  • result_dir (Path) – Directory where the training results will be saved.

  • val_data (Optional[list[Path]], optional) – List of paths to the validation tomograms. Defaults to None.

  • val_labels (Optional[list[Path]], optional) – List of paths to the validation labels. Defaults to None.

  • num_epochs (int, optional) – Number of training epochs. Defaults to 50.

  • log_training (bool, optional) – Whether to log training metrics to Tensorboard. Defaults to False.

Returns:

Path to the saved model file.

Return type:

Path