Quick Start Guide

This section goes over a quick example of using CryoViT to segment mitochondria in a group of Cryo-ET tomograms using a pre-trained model.

Note

This guide assumes you have already installed CryoViT by following the instructions in Installing CryoViT. If you have not done so, please do that first.

You do not need a working installation of napari to follow this guide, but a GPU is recommended for faster inference.

First, download the example data from here and its contents:

$ tar -xzf example_data.tar.gz

This will extract a directory example_data containing a folder of tomograms data/ and a pre-trained model file pretrained_model.model.

Viewing Tomogram Data

CryoViT supports most common file formats for tomogram data, including .mrc, .tiff, and .hdf formats, expecting the tomogram data to be stored as a 3D array with shape (D, H, W).

You can preview the tomogram data with cryovit.utils.load_data(), which returns the data as a numpy array:

Tip

For .hdf files, which can contain multiple keyed datasets, you can specify which dataset to load by passing in the key argument to cryovit.utils.load_data().

Otherwise, the dataset with the most unique values will be loaded by default, and cryovit.utils.load_data() will return the key found.

>>> from cryovit.utils import load_data
>>> data, key = load_data("example_data/data/HD_iPSC_sample_bin4.hdf")
>>> data
array([[[[0.69411767, 0.5647059 , 0.62352943, ..., 0.49411765,
      0.50980395, 0.47843137],
     [0.5921569 , 0.63529414, 0.6509804 , ..., 0.49803922,
      0.5058824 , 0.50980395],
     [0.6       , 0.6627451 , 0.56078434, ..., 0.5254902 ,
      0.4862745 , 0.5058824 ],
     ...,
     [0.5019608 , 0.5019608 , 0.49019608, ..., 0.50980395,
      0.4745098 , 0.49411765],
     [0.49803922, 0.5058824 , 0.5058824 , ..., 0.49411765,
      0.49411765, 0.52156866],
     [0.49803922, 0.49803922, 0.5019608 , ..., 0.4745098 ,
      0.5411765 , 0.49411765]]]],
  shape=(1, 128, 512, 512), dtype=float32)
>>> key
'data'
>>> print(type(data))
<class 'numpy.ndarray'>
>>> print(data.shape)
(1, 128, 512, 512)  # load_data adds an additional channel dimension
>>> print(data.dtype.name)
float32

Viewing Model Information

CryoViT uses a custom file extension .model to save pre-trained model weights and metadata about the model. You can view the model data with cryovit.utils.load_model(), which returns a tuple containing the model (a pytorch model), and its metadata.

Tip

If you only want to view the metadata without loading the model, you can pass in the argument load_model=False to cryovit.utils.load_model().

>>> from cryovit.utils import load_model
>>> model, model_type, name, label = load_model("example_data/pretrained_mito.model")
>>> print(model)
CryoVIT(
    (metric_fns): ModuleDict(
        ...
    )
    (layers): Sequential(
        ...
    )
    (output_layer): Sequential(
        ...
    )
)
>>> print(model_type)
ModelType.CRYOVIT
>>> print(name)
pretrained_mito
>>> print(label)
mito

We see that the model_type is ModelType.CRYOVIT, indicating that this is a CryoViT segmentation model, and the label is mito, indicating that this model segments mitochondria.

Running Inference Script

The main utilities of CryoViT can be run through command-line scripts. You can see all available scripts by running:

$ cryovit --help
# or
$ cryovit

../_images/cryovit_cli_output.png — Output of `cryovit --help` command.

and the arguments for a specific script by running:

$ cryovit <script_name> --help
# or
$ cryovit <script_name>

We see the available scripts are features, train, evaluate, and inference. For this quick start guide, we will be using the inference script to segment the tomograms using the pre-trained model.

Important

Since the model is a CryoViT model, we need to run the features script first to extract the high-level ViT features from the tomograms.

../_images/cryovit_cli_features_output.png — Output of `cryovit features --help` command.

To run the features script, we need to specify the input tomogram folder and the output directory to save the extracted features:

$ cryovit features example_data/data example_data/features

Note

This step requires a GPU, and is possibly very memory-intensive. If you run into out-of-memory issues, try reducing the --batch-size or --window-size arguments. Reducing the batch size is preferable, as reducing the window size will affect the quality of the extracted features.

Then, we can run the infer script on the extracted features, storing the results in a predictions folder:

$ cryovit infer example_data/features --model example_data/pretrained_model.model --result-folder example_data/predictions

Viewing Segmentation Results

The segmentation results will be saved as .hdf files in the example_data/predictions folder, each containing a data dataset with the original data, and a <label>_preds dataset with the predicted segmentation masks.

While you can still load the predicted segmentations using cryovit.utils.load_data() or cryovit.utils.load_labels(), it is recommended to use a visualization tool like ChimeraX to view the results in 3D, as shown below:

../_images/tutorial_chimerax.png — Visualization of tomogram and segmentation results in ChimeraX.