cryovit.datasets
Implementations of PyTorch datasets for loading Cryo-EM tomograms.
Classes
|
Dataset class for Vision Transformer models, loading and processing tomograms. |
|
A dataset class for handling and preprocessing tomographic data for CryoVIT models. |
|
A dataset class for handling and preprocessing tomographic data for CryoVIT models. |
- class VITDataset(data_root: Path, records: list[str])[source]
Bases:
DatasetDataset class for Vision Transformer models, loading and processing tomograms.
- __init__(data_root: Path, records: list[str]) None[source]
Initializes a dataset object to load tomograms, applying normalization and resizing for DINOv2 models.
- Parameters:
root (Path) – Root directory where tomogram files are stored.
records (list[str]) – A list of strings representing paths to tomogram files in the root directory.
- class TomoDataset(records: DataFrame, input_key: str, label_key: str, split_key: str, data_root: Path, aux_keys: list[str] | None = None, train: bool = False)[source]
Bases:
DatasetA dataset class for handling and preprocessing tomographic data for CryoVIT models.
- __init__(records: DataFrame, input_key: str, label_key: str, split_key: str, data_root: Path, aux_keys: list[str] | None = None, train: bool = False) None[source]
Initializes a dataset object to load tomograms for model training, applying optional training crops.
- Parameters:
records (pd.DataFrame) – A DataFrame containing records of tomograms.
input_key (str) – The key in the HDF5 file to access input features.
label_key (str) – The key in the HDF5 file to access labels.
split_key (str) – The key in the DataFrame to access the split identifier.
data_root (Path) – The root directory where the tomograms are stored.
aux_keys (Optional[List[str]]) – Optional additional keys for auxiliary data to load from the HDF5 files.
train (bool) – Flag to determine if the dataset is for training (enables transformations).
- class FileDataset(files: list[FileData], input_key: str | None, label_key: str | None, train: bool = False, for_dino: bool = False)[source]
Bases:
DatasetA dataset class for handling and preprocessing tomographic data for CryoVIT models.
- __init__(files: list[FileData], input_key: str | None, label_key: str | None, train: bool = False, for_dino: bool = False) None[source]
Creates a new FileDataset object.
- Parameters:
files (list[FileData]) – A list of FileData objects containing file paths and metadata.
input_key (Optional[str]) – The key in a HDF5 file to access input features.
label_key (Optional[str]) – The key in a HDF5 file to access labels.
train (bool) – Flag to determine if the dataset is for training (enables transformations).
for_dino (bool) – Flag to determine if the dataset is for DINO feature extraction (enables DINO transformations).