cryovit.config

Hydra configuration classes for CryoViT experiments.

Functions

validate_dino_config(cfg)

Validates the configuration for DINOv2 feature extraction.

validate_experiment_config(cfg)

Validates an experiment configuration.

Classes

BaseDataModule([_target_, _partial_, ...])

Base configuration for datasets in CryoViT experiments.

BaseExperimentConfig([name, label_key, ...])

Base configuration for running CryoViT experiments.

BaseModel([_target_, name, input_key, ...])

Base configuration for models used in CryoViT experiments.

BaseTrainer([_target_, accelerator, ...])

Base configuration for the trainer used in CryoViT experiments.

DinoFeaturesConfig([batch_size, dino_dir, ...])

Base configuration for computing DINOv2 features in CryoViT experiments.

ExperimentPaths([model_dir, data_dir, ...])

Configuration for managing experiment paths in CryoViT experiments.

class BaseModel(_target_: str = '???', name: str = '???', input_key: str = '???', model_dir: Path | None = None, lr: float = '???', weight_decay: float = 0.001, losses: dict = '???', metrics: dict = '???', custom_kwargs: dict | None = None)[source]

Bases: object

Base configuration for models used in CryoViT experiments.

name

Name of the model for identification purposes.

Type:

str

input_key

Key to get the input data from a tomogram.

Type:

str

model_dir

Optional directory to download model weights to (for SAMv2 models).

Type:

Optional[Path]

lr

Learning rate for the model training.

Type:

float

weight_decay

Weight decay (L2 penalty) rate. Default is 1e-3.

Type:

float

losses

Configurations for loss functions used in training.

Type:

dict[str, Any]

metrics

Configurations for metrics used during model evaluation.

Type:

dict[str, Any]

custom_kwargs

Optional dictionary of custom keyword arguments to pass to the model.

Type:

Optional[dict[str, Any]]

class BaseTrainer(_target_: str = 'pytorch_lightning.Trainer', accelerator: str = 'gpu', devices: str = '1', precision: str = '16-mixed', default_root_dir: Path | None = None, max_epochs: int | None = None, enable_checkpointing: bool = False, enable_model_summary: bool = True, log_every_n_steps: int | None = None)[source]

Bases: object

Base configuration for the trainer used in CryoViT experiments.

accelerator

Type of hardware acceleration. Default is ‘gpu’.

Type:

str

devices

Number of devices to use for training. Default is ‘1’.

Type:

str

precision

Precision configuration for training (e.g., ‘16-mixed’).

Type:

str

default_root_dir

Default root directory for saving checkpoints and logs.

Type:

Optional[Path]

max_epochs

The maximum number of epochs to train for.

Type:

Optional[int]

enable_checkpointing

Flag to enable or disable model checkpointing. Default is False.

Type:

bool

enable_model_summary

Enable model summarization. Default is True.

Type:

bool

log_every_n_steps

Frequency of logging in terms of training steps.

Type:

Optional[int]

class BaseDataModule(_target_: str = '', _partial_: bool = True, sample: Any = '???', split_id: int | None = None, split_key: str | None = 'split_id', test_sample: Any | None = None, dataset: dict = '???', dataloader: dict = '???')[source]

Bases: object

Base configuration for datasets in CryoViT experiments.

sample

Specific sample or samples used for training.

Type:

Union[Sample, tuple[Sample]]

split_id

Optional split_id to use for validation.

Type:

Optional[int]

split_key

Key in the sample .csv file to use for splitting the data. Default is “split_id”.

Type:

Optional[str]

test_sample

Specific sample or samples used for testing.

Type:

Optional[Any]

dataset

Configuration for the dataset.

Type:

dict[str, Any]

dataloader

Configuration for the dataloader.

Type:

dict[str, Any]

class ExperimentPaths(model_dir: Path = '???', data_dir: Path = '???', exp_dir: Path = '???', results_dir: Path = '???', tomo_name: str = 'tomograms', feature_name: str = 'dino_features', dino_name: str = 'DINOv2', sam_name: str = 'SAM2', csv_name: str = 'csv', split_name: str = 'splits.csv')[source]

Bases: object

Configuration for managing experiment paths in CryoViT experiments.

model_dir

Path to the folder containing downloaded models.

Type:

Path

data_dir

Path to the parent directory containing tomogram data and .csv files.

Type:

Path

exp_dir

Path to the parent directory for saving results from an experiment.

Type:

Path

results_dir

Path to the parent directory for saving overall results.

Type:

Path

tomo_name

Name of the folder in data_dir with tomograms.

Type:

str

feature_name

Name of the folder in data_dir with DINOv2 features.

Type:

str

dino_name

Name of the folder in model_dir to save DINOv2 model.

Type:

str

csv_name

Name of the folder in data_dir with .csv files.

Type:

str

split_name

Name of the .csv file with training splits.

Type:

str

class DinoFeaturesConfig(batch_size: int = 128, dino_dir: Path = '???', paths: ExperimentPaths = '???', datamodule: dict = '???', sample: Sample | None = '???', export_features: bool = False)[source]

Bases: object

Base configuration for computing DINOv2 features in CryoViT experiments.

batch_size

Number of tomogram slices to process as one batch. Default is 128.

Type:

int

dino_dir

Path to the DINOv2 foundation model.

Type:

Path

paths

Configuration for experiment paths.

Type:

ExperimentPaths

datamodule

Configuration for the datamodule to use for loading tomograms.

Type:

dict[str, Any]

sample

Sample to calculate features for. None means to calculate features for all samples.

Type:

Optional[Sample]

export_features

Whether to additionally compute PCA colormaps for the calculated features.

Type:

bool

class BaseExperimentConfig(name: str = '???', label_key: str = '???', additional_keys: tuple[str] = (), random_seed: int = 42, paths: ExperimentPaths = '???', model: BaseModel = '???', trainer: BaseTrainer = '???', callbacks: dict[str, Any] = '???', logger: dict[str, Any] = '???', datamodule: BaseDataModule = '???', ckpt_path: Path | None = None, resume_ckpt: bool = False)[source]

Bases: object

Base configuration for running CryoViT experiments.

name

Name of the experiment, should be unique for each configuration.

Type:

str

label_key

Key used to specify the training labels.

Type:

str

additional_keys

Keys to pass through additional data from the dataset.

Type:

tuple[str]

random_seed

Random seed set for reproducibility. Default is 42.

Type:

int

paths

Configuration for experiment paths.

Type:

ExperimentPaths

model

Configuration for the model to use.

Type:

BaseModel

trainer

Configuration for the trainer to use.

Type:

BaseTrainer

callbacks

List of callback functions for the trainer.

Type:

Optional[list]

logger

List of logging functions for the trainer.

Type:

Optional[list]

datamodule

Configuration for the datamodule to use.

Type:

BaseDataModule

ckpt_path

Optional path to a checkpoint file to resume training from.

Type:

Optional[Path]

resume_ckpt

Whether to resume training from the checkpoint. Default is False.

Type:

bool

validate_dino_config(cfg: DinoFeaturesConfig) None[source]

Validates the configuration for DINOv2 feature extraction.

Checks if all necessary parameters are present in the configuration. If any required parameters are missing, it logs an error message and exits the script.

Parameters:

cfg (DinoFeaturesConfig) – The configuration object containing settings for feature extraction.

Raises:

SystemExit – If any configuration parameters are missing.

validate_experiment_config(cfg: BaseExperimentConfig) None[source]

Validates an experiment configuration.

Checks if all necessary parameters are present in the configuration. If any required parameters are missing, it logs an error message and exits the script.

Additionally, checks that all Samples specified are valid, and logs an error and exits if any samples are not valid.

Parameters:

cfg (BaseExperimentConfig) – The configuration object to validate.

Raises:

SystemExit – If any configuration parameters are missing, or any samples are not valid, terminating the script.