Note

This page was generated from 1. Basic usage of Spateo alignment for 2D slices.ipynb. Interactive online version: Colab badge. Some tutorial content may look better in light mode.

1. Basic Usage of Spateo Alignment for 2D Slices

In this tutorial, we will provide a brief introduction to the basic usage of Spateo alignment for 2D slices. We’ll assume that you have two consecutive spatial transcriptomics slices, each capturing both gene expression data and spatial coordinates. After slicing, library preparation and sequencing, the relative coordinates of the cells/spots across sections are often lost. Our goal is to align the two samples in such a way that corresponding cells/spots between them have similar readouts, while also preserving the spatial distributions of spots across the samples.

See also:

For a more detailed introduction and better understanding of the method, please check our technical page of 3D alignment spatial transcriptomics datasets: Spatial transcriptomics alignment

[1]:
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

import torch
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print("Running this notebook on: ", device)

import spateo as st
print("Last run with spateo version:", st.__version__)

# Other imports
import warnings
warnings.filterwarnings('ignore')
import matplotlib.pyplot as plt
import scanpy as sc
import anndata as ad

%config InlineBackend.print_figure_kwargs={'facecolor' : "w"}
%config InlineBackend.figure_format='retina'
Running this notebook on:  cuda
2024-11-07 22:33:41.498054: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-07 22:33:42.272728: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ylu/anaconda3/envs/Spateo/lib/python3.9/site-packages/cv2/../../lib64:
2024-11-07 22:33:42.272858: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /home/ylu/anaconda3/envs/Spateo/lib/python3.9/site-packages/cv2/../../lib64:
2024-11-07 22:33:42.272870: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Last run with spateo version: 1.1.0.dev30+084c763.dirty

Loading the Data

In this tutorial, we’ll be using data from a mouse embryo at the E9.5 developmental stage, obtained using the Stereo-Seq technique. Specifically, we’ll work with slices #32 and #33 as our demo data. These slices contain 17,425 and 19,939 cells, respectively. You can download the processed data from the links below. Once downloaded, ensure you place the data in the appropriate directory.

[2]:
# Load the slices
slice1 = st.read('./data/basic_usage_demo_1.h5ad')
slice2 = st.read('./data/basic_usage_demo_2.h5ad')

slice1, slice2
[2]:
(AnnData object with n_obs × n_vars = 17425 × 26137
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain'
     uns: '__type', 'cellbin_SpatialDomain_colors', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial',
 AnnData object with n_obs × n_vars = 19939 × 26137
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain'
     uns: '__type', 'cellbin_SpatialDomain_colors', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial')

(Optional & Recommended) Pre-processing the Data

Before proceeding to the next step, we highly recommend performing some basic quality control, normalization, and feature selection. These preprocessing steps can enhance the stability and performance of downstream applications including the 3D alignment in Spateo package. We follow the standard preprocessing workflow for scRNA-seq data in Scanpy [cite] and process the two slices separately.

Warning:

If your data is already preprocessed, you should skip this process to avoid running preprocessing multiple runs which may affect the downstream data analyses

[3]:
# preprocess slice1
sc.pp.filter_cells(slice1, min_genes=10)  # we use min_genes=10 as 100 is too large for ST data
sc.pp.filter_genes(slice1, min_cells=3)
# Saving count data
slice1.layers["counts"] = slice1.X.copy()
# Normalizing to median total counts
sc.pp.normalize_total(slice1)
# Logarithmize the data
sc.pp.log1p(slice1)
# annotates highly variable genes
sc.pp.highly_variable_genes(slice1, n_top_genes=2000)

# preprocess slice1
sc.pp.filter_cells(slice2, min_genes=10)
sc.pp.filter_genes(slice2, min_cells=3)
# Saving count data
slice2.layers["counts"] = slice2.X.copy()
# Normalizing to median total counts
sc.pp.normalize_total(slice2)
# Logarithmize the data
sc.pp.log1p(slice2)
# annotates highly variable genes
sc.pp.highly_variable_genes(slice2, n_top_genes=2000)

Visualize the slices before the alignment

We can visualize the spatial distribution of the two slices using spateo’s plotting function. As expected, the two slices are not aligned, which poses challenges for downstream 3D analysis.

[4]:
spatial_key = 'spatial'
cluster_key = 'cellbin_SpatialDomain'

st.pl.slices_2d(
    slices = [slice1, slice2],
    label_key = cluster_key,
    spatial_key = spatial_key,
    height=4,
    center_coordinate=True,
    show_legend=True,
    legend_kwargs={'loc': 'upper center', 'bbox_to_anchor': (0.5, 0) ,'ncol': 5, 'borderaxespad': -4, 'frameon': False},
)
../../../_images/tutorials_notebooks_3_alignment_1._Basic_usage_of_Spateo_alignment_for_2D_slices_8_0.png

(Optional & Recommended) Perform PCA Between Two Slices

Principal Component Analysis (PCA), a classic linear dimensionality reduction algorithm, can extract the main features from the data while denoising it. Compared to directly using the original high-dimensional gene expression matrix, leveraging PCA features to generate mapping probabilities often results in better performance. Additionally, using features with fewer dimensions can significantly reduce computational overhead when calculating the similarity matrix.

It is important to note that PCA should be performed across both slices, rather than independently on each slice, to ensure that the feature representations are aligned in PCA space. Therefore, in the following steps, we first concatenate the two slices into one dataset and then perform PCA using the function provided by Scanpy. Finally, we extract the corresponding PCA features using the “batch” key.

[5]:
st.align.group_pca([slice1,slice2], pca_key='X_pca')

slice1, slice2
[5]:
(AnnData object with n_obs × n_vars = 17425 × 19506
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain', 'n_genes'
     var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: '__type', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial', 'log1p', 'hvg'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial'
     layers: 'counts',
 AnnData object with n_obs × n_vars = 19939 × 19699
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain', 'n_genes'
     var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: '__type', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial', 'log1p', 'hvg'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial'
     layers: 'counts')

See also:

Spateo is very flexible to the input feature and even support multiple input modalities. For example, we can incorporate label and deep features generated by neural networks. For more detailed introduction, please see the following notebooks:

Spateo Alignment

Spateo alignment is both simple to use, scalable and highly efficient. With a straightforward call to st.align.morpho_align, you can obtain aligned slices and the corresponding mapping matrix in just a few seconds (even faster if CUDA is available). In this example, we’ll use the highly variable genes extracted earlier and their PCA representations to perform the alignment. We explain the function’s input parameters in the following:

  • models: The slices to be aligned, where highly variable genes are used.

  • rep_layer: The name of the representation to be used.

  • rep_field: The field in which to store the representation in AnnData.

  • dissimilarity: The method used to calculate disimilarity.

  • spatial_key: The key in .obsm of AnnData corresponding to the spatial coordinates.

  • key_added: The key under which the aligned spatial coordinates are added in .obsm.

  • device: The device to use for computation, either "cpu" or "cuda".

[12]:
key_added = 'align_spatial'
# spateo return aligned slices as well as the mapping matrix
aligned_slices, pis = st.align.morpho_align(
    models=[slice1, slice2],
    ## Uncomment this if use highly variable genes
    # models=[slice1[:, slice1.var.highly_variable], slice2[:, slice2.var.highly_variable]],
    ## Uncomment the following if use pca embeddings
    # rep_layer='X_pca',
    # rep_field='obsm',
    # dissimilarity='cos',
    verbose=False,
    spatial_key=spatial_key,
    key_added=key_added,
    device=device,
)
|-----> [Models alignment based on morpho, mode: SN-S.] in progress: 100.0000%363
|-----> [Models alignment based on morpho, mode: SN-S.] in progress: 100.0000%
|-----> [Models alignment based on morpho, mode: SN-S.] finished [4.5336s]

Imporant:

Spateo alignment add three keys in .obsm: key_added, key_added + "_rigid", and key_added + "_nonrigid".

By default, Spateo will incorporate both rigid and nonrigid (also known as elastic) deformation, and calculate an optimal rigid one based on the both rigid and nonrigid deformaiton. For more details about nonrigid deformation, please refer to the following notebook:

[7]:
aligned_slices
[7]:
[AnnData object with n_obs × n_vars = 17425 × 2000
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain', 'n_genes'
     var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: '__type', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial', 'log1p', 'hvg'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial', 'align_spatial', 'align_spatial_rigid', 'align_spatial_nonrigid'
     layers: 'counts',
 AnnData object with n_obs × n_vars = 19939 × 2000
     obs: 'area', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'n_counts', 'louvain', 'cellbin_SpatialDomain', 'n_genes'
     var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm'
     uns: '__type', 'louvain', 'neighbors', 'pca', 'pearson_residuals_normalization', 'spatial', 'log1p', 'hvg', 'iter_spatial', 'VecFld_morpho'
     obsm: 'X_pca', 'X_spatial', 'bbox', 'spatial', 'align_spatial', 'align_spatial_rigid', 'align_spatial_nonrigid'
     layers: 'counts']

Visualization

After the alignment, we can check the alignment results by visualizing the overlay of the two slices. This can be easily done by calling st.pl.overlay_slices_2d to visualize the overlay of the two slices. In the following,we visualize both the rigid and nonrigid alignments. The rigid alignment addresses rotation and translation issues, while the nonrigid alignment provides a better fit for the local structure.

Warning:

Although nonrigid alignment is better for aligning local structures, it may not always be the best choice in 3D reconstruction. Consecutive slices are inherently different, and it’s often unclear whether the deformation in a slice is due to distortions during slicing or reflects the original structure. To help with this, we offer a tutorial on how to determine when to use nonrigid alignment in 3D alignment: 3D reconstruction with pairwise alignment

[10]:
st.pl.overlay_slices_2d(slices = aligned_slices, spatial_key = key_added, height=5, overlay_type='backward')
../../../_images/tutorials_notebooks_3_alignment_1._Basic_usage_of_Spateo_alignment_for_2D_slices_18_0.png
[11]:
st.pl.overlay_slices_2d(slices = aligned_slices, spatial_key = key_added+'_nonrigid', height=5, overlay_type='backward')
../../../_images/tutorials_notebooks_3_alignment_1._Basic_usage_of_Spateo_alignment_for_2D_slices_19_0.png

Conclusion

In this tutorial, we’ve demonstrated the basic usage of Spateo’s alignment functionality, which is simple to use, yet both accurate and efficient. Spateo offers many other powerful features, ranging from 2D slice alignment, 3D spatial transcriptomics reconstruction, 3D aware digitization/cell-cell interaction to 4D spatiotemporal mapping.

For 3D alignment, please find the following with tutorials for other key features:

  • Nonrigid Alignment: Aligns local structures with greater precision.

  • Partial Alignment: Allows for alignment of specific regions of interest.

  • Sparse Calculation: Scales up computations for larger datasets.

  • 3D reconstruction (Pairwise-Based): Reconstructs 3D structures by aligning slices in a sequential pairwise manner.

  • 3D reconstruction (Global-Based): Performs global alignment across all slices by jointly considering multiple slices.

  • 3D reconstruction mesh correction: Enhances 3D reconstruction by integrating mesh information, improving structural accuracy.

  • Integrate other modalities: Allows for the incorporation of various data modalities and features.

  • Align images: Align two image pairs using Spateo

Additionally, after 3D alignment, Spateo integrates advanced 3D visualization functions and tools, enabling users to explore and demonstrate their 3D data in a more interactive and detailed manner. Please refer to the next section for more information.