cryovit.utils

Utility functions to process data and models in a format recognizable by CryoVIT.

Functions

id_generator([size, chars])

Generates a random string of fixed size.

load_data(file_path[, key])

Load data or labels from a given file path.

load_files_from_path(path)

Load files from a given directory or a .txt file listing file paths.

load_labels(file_path, label_keys, key)

Load labels from a given file path, given a list of label names in ascending-value order.

load_model(model_path[, load_model])

Load a model from a given path.

read_hdf(hdf_file[, key])

Read data from an HDF5 file.

read_mrc(mrc_file)

Read data from an MRC file.

read_tiff(tiff_file)

Read data from a TIFF file.

save_model(model_name, label_key, model, ...)

Save a model to a given path.

save_model_from_weights(model_name, ...)

Save a model to a given path from a weights file.

Classes

FileMetadata(drange, dshape, dtype[, nunique])

Metadata information for a file.

SavedModel(name, model_type, label_key, ...)

A class to represent a pre-trained model.

id_generator(size: int = 6, chars='abcdefghijklmnopqrstuvwxyz0123456789')[source]

Generates a random string of fixed size.

class FileMetadata(drange: tuple[float, float], dshape: tuple[int, ...], dtype: dtype, nunique: int = 0)[source]

Bases: object

Metadata information for a file.

drange

The dynamic range of the data.

Type:

tuple[float, float]

dshape

The shape of the data.

Type:

tuple[int, …]

dtype

The data type of the data.

Type:

numpy.dtype

nunique

The number of unique values in the data.

Type:

int

read_hdf(hdf_file: str | Path, key: str | None = None) tuple[str, ndarray, FileMetadata][source]

Read data from an HDF5 file. If a key is not specified, assumes the data with the most unique values is the data.

Parameters:
  • hdf_file – The path to the HDF5 file.

  • key – The key to read from the HDF5 file. If None, assumes the data with the most unique values is the data. If not a valid key, will attempt to read all keys and use the one with the most unique values.

Returns:

A tuple of the key used, the data, and the metadata.

read_mrc(mrc_file: str | Path) tuple[ndarray, FileMetadata][source]

Read data from an MRC file.

Parameters:

mrc_file – The path to the MRC file.

Returns:

A tuple of the data and the metadata.

read_tiff(tiff_file: str | Path) tuple[ndarray, FileMetadata][source]

Read data from a TIFF file.

Parameters:

tiff_file – The path to the TIFF file.

Returns:

A tuple of the data and the metadata.

load_data(file_path: str | Path, key: str | None = None) tuple[ndarray, str][source]

Load data or labels from a given file path. Supports .h5, .hdf5, .mrc, .mrcs formats.

Parameters:
  • file_path – The path to the data file.

  • key – An optional key to specify which dataset to load from an HDF5 file.

Raises:
  • ValueError – If the file format is unsupported.

  • FileNotFoundError – If the specified file does not exist.

Returns:

A tuple of the data and the key used (empty string if not applicable).

load_labels(file_path: str | Path, label_keys: list[str], key: str | None) dict[str, ndarray][source]

Load labels from a given file path, given a list of label names in ascending-value order. Supports .h5, .hdf5, .mrc, .mrcs, .tiff, and .tif formats.

Parameters:
  • file_path – The path to the label file.

  • label_keys – A list of label names in ascending-value order (e.g., [‘mito’, ‘cristae’] for 0=background, 1=mito, 2=cristae).

  • key – An optional key to specify which dataset to load from an HDF5 file.

Raises:
  • ValueError – If the number of unique values in the label data does not match the number of provided label keys.

  • ValueError – If the file format is unsupported.

  • ValueError – If the specified key is not found in the label data.

  • FileNotFoundError – If the specified file does not exist.

Returns:

A dictionary of label name to int8 label array.

load_files_from_path(path: Path) list[Path][source]

Load files from a given directory or a .txt file listing file paths.

Parameters:

path (Path) – The path to the directory or .txt file.

Raises:

ValueError – If the path is not a directory or a .txt file.

Returns:

A list of file paths.

Return type:

list[Path]

class SavedModel(name: str, model_type: ModelType, label_key: str, model_cfg: BaseModel, weights: dict[str, Any])[source]

Bases: object

A class to represent a pre-trained model.

name

The name of the model.

Type:

str

model_type

The type of the model, e.g., ‘CryoVIT’, ‘3D U-Net’.

Type:

cryovit.types.ModelType

label_key

The label key used for training the model.

Type:

str

model_cfg

The config dictionary to instantiate the model.

Type:

cryovit.config.BaseModel

weights

The saved weights of the model.

Type:

dict[str, Any]

save_model(model_name: str, label_key: str, model: Module, model_cfg: BaseModel, save_path: str | Path) None[source]

Save a model to a given path.

Parameters:
  • model_name – The name of the model.

  • label_key – The label key used for training the model.

  • model – The model to save.

  • model_cfg – The config dictionary to instantiate the model.

  • save_path – The path to save the model to.

save_model_from_weights(model_name: str, label_key: str, model_type: ModelType, weights_path: str | Path, save_path: str | Path, **kwargs) None[source]

Save a model to a given path from a weights file.

Parameters:
  • model_name – The name of the model.

  • label_key – The label key used for training the model.

  • model_type – The type of the model.

  • weights_path – The path to the weights file.

  • save_path – The path to save the model to.

  • **kwargs – Additional keyword arguments to pass to the model config. To access nested config parameters, use double underscores (e.g., a.b -> a__b).

Raises:

FileNotFoundError – If the weights file does not exist.

load_model(model_path: str | Path, load_model: bool = True) tuple[Module | None, ModelType, str, str][source]

Load a model from a given path.

Parameters:
  • model_path – The path to the model file.

  • load_model – Whether to load the model weights. If False, only returns the model type, name, and label key.

Raises:

FileNotFoundError – If the specified file does not exist.

Returns:

A tuple of the model (or None if load_model is False), the model type, the model name, and the label key.