cryovit.datasets

Implementations of PyTorch datasets for loading Cryo-EM tomograms.

Classes

VITDataset(data_root, records)

Dataset class for Vision Transformer models, loading and processing tomograms.

TomoDataset(records, input_key, label_key, ...)

A dataset class for handling and preprocessing tomographic data for CryoVIT models.

FileDataset(files, input_key, label_key[, ...])

A dataset class for handling and preprocessing tomographic data for CryoVIT models.

class VITDataset(data_root: Path, records: list[str])[source]

Bases: Dataset

Dataset class for Vision Transformer models, loading and processing tomograms.

__init__(data_root: Path, records: list[str]) None[source]

Initializes a dataset object to load tomograms, applying normalization and resizing for DINOv2 models.

Parameters:
  • root (Path) – Root directory where tomogram files are stored.

  • records (list[str]) – A list of strings representing paths to tomogram files in the root directory.

class TomoDataset(records: DataFrame, input_key: str, label_key: str, split_key: str, data_root: Path, aux_keys: list[str] | None = None, train: bool = False)[source]

Bases: Dataset

A dataset class for handling and preprocessing tomographic data for CryoVIT models.

__init__(records: DataFrame, input_key: str, label_key: str, split_key: str, data_root: Path, aux_keys: list[str] | None = None, train: bool = False) None[source]

Initializes a dataset object to load tomograms for model training, applying optional training crops.

Parameters:
  • records (pd.DataFrame) – A DataFrame containing records of tomograms.

  • input_key (str) – The key in the HDF5 file to access input features.

  • label_key (str) – The key in the HDF5 file to access labels.

  • split_key (str) – The key in the DataFrame to access the split identifier.

  • data_root (Path) – The root directory where the tomograms are stored.

  • aux_keys (Optional[List[str]]) – Optional additional keys for auxiliary data to load from the HDF5 files.

  • train (bool) – Flag to determine if the dataset is for training (enables transformations).

class FileDataset(files: list[FileData], input_key: str | None, label_key: str | None, train: bool = False, for_dino: bool = False)[source]

Bases: Dataset

A dataset class for handling and preprocessing tomographic data for CryoVIT models.

__init__(files: list[FileData], input_key: str | None, label_key: str | None, train: bool = False, for_dino: bool = False) None[source]

Creates a new FileDataset object.

Parameters:
  • files (list[FileData]) – A list of FileData objects containing file paths and metadata.

  • input_key (Optional[str]) – The key in a HDF5 file to access input features.

  • label_key (Optional[str]) – The key in a HDF5 file to access labels.

  • train (bool) – Flag to determine if the dataset is for training (enables transformations).

  • for_dino (bool) – Flag to determine if the dataset is for DINO feature extraction (enables DINO transformations).