
This notebook demonstrates how to perform basic clustering on a raw anndata object (spatial transcriptomic data) using spateo and SCC clustering.

1.Spatially constrained clustering (SCC) with binnning data

This notebook demonstrates how to perform basic clustering on a raw anndata object (spatial transcriptomic data) using spateo and SCC clustering.

Binning anndata object can be obtained with functions from multiple spatial transcriptomic assays. (See the docs for


import spateo as st
import dynamo as dyn
Data source


# Load binning data
fname_bin60 = "mousebrain_bin60.h5ad"
adata_bin60 = st.sample_data.mousebrain(fname_bin60)

Normalization & Dimensional reduction

# Preprocessing
st.pp.filter.filter_genes(adata_bin60, min_cells=3, inplace=True)

# Normalization
dyn.pp.normalize_cell_expr_by_size_factors(adata_bin60, layers="X")

# Linear reduction, n_pca_components=30)

# Identify neighbors(KNN), n_neighbors=30)

Vanilla louvain clustering

#louvain clustering, resolution=1), color=['louvain'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")

adata_bin60.obs['louvain_smooth'], radius=5, key='louvain'), color=['louvain_smooth'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")

Spatially constrained clustering (SCC)

The SCC clustering function is implemented based on basic clustering methods (e.g. louvain, leiden, …), by replacing the input K-nearest neighbor(KNN) network, with the fusion of KNN and spatial neighbor network.

We adjust the computational weight of spatial nearness by adjusting the s_neigh argument. Typically, we set s_neigh according to the spatial arrangement of spots (i.e. the assay we use). For example, s_neigh could be 4, 8, 12, etc, in a squared array sequencing platform (such as Stereo-seq, …), and could be 6, 18, etc, in a hexagon platform (such as Visium, …). Larger s_neigh brings larger weight for spatial information, while we do not recommend setting s_neigh too big.

#scc clustering
), color=['scc'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")
adata_bin60.obs['scc_smooth'], radius=10, key='scc'), color=['scc_smooth'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")
|-----> Finish smoothing the label. The new label is stored in adata.obs['label_smooth']


SCC clusters annotation

# create a dictionary to map cluster to annotation label
cluster2annotation = {
     '0': 'DORsm',
     '1': 'Fiber tracts',
     '2': 'Iso cortex L6',
     '3': 'Iso cortex L5',
     '4': 'AMY',
     '5': 'HY',
     '6': 'Iso cortex L2/3',
     '7': 'STRd',
     '8': 'HIP & CTXpl L1',
     '9': 'DORpm',
     '10': 'PAL',
     '11': 'OLF',
     '12': 'Iso cortex L4',
     '13': 'CTXsp',
     '14': 'DG',
     '15': 'CA',
     '16': 'RT',
     '17': 'VS',
adata_bin60.obs['scc_anno'] = adata_bin60.obs['scc_smooth'].map(cluster2annotation).astype('category')
    show_legend="upper left",
    figsize=(4, 3),
adata_bin60.write("mousebrain_bin60_clustered.h5ad", compression="gzip")

STAGATE Clustering.

STAGATE learns low-dimensional latent embeddings with both spatial information and gene expressions via a graph attention auto-encoder. The method adopts an attention mechanism in the middle layer of the encoder and decoder, which adaptively learns the edge weights of spatial neighbor networks, and further uses them to update the spot representation by collectively aggregating information from its neighbors. The latent embeddings and the reconstructed expression profiles can be used to downstream tasks such as spatial domain identification, visualization, spatial trajectory inference, data denoising and 3D expression domain extraction.

Dong, K., Zhang, S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun 13, 1739 (2022).


If you want to try STAGATE, you need to install pytorch-geometric.

import numpy as np
st.pp.select_hvf_seurat(adata_bin60,n_top= 2000,)
                 spatial_key=['X','Y'],rad_cutoff=200,num_epoch = 1000,lr=0.001,
                weight_decay=1e-4,hidden_dims = [512, 30],
The rex values are stored in adata.layers["STAGATE_ReX"].

Here, we implemented mclust from the R language using Python. We provide the same parameters as mclust for analysis. It should be noted that if an error occurs with modelNames=EEV, you can manually change it to modelNames=EEE.

adata_bin60.obs['mclust_STAGATE'] =, radius=5, key='mclust')
[15]:, color=['mclust','mclust_smooth'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")

CAST Clustering

CAST is a Python library for physically aligning different spatial transcriptome regardless of technologies, magnification, individual variation, and experimental batch effects. CAST is composed of three modules: CAST Mark, CAST Stack, and CAST Projection.

Tang, Z., Luo, S., Zeng, H. et al. Search and match across spatial omics samples at single-cell resolution. Nat Methods (2024).


If you want to use CAST, you need to install dgl

        output_path = 'output/CAST_Mark',gpu_t=0,device='cuda:0')
[3]:,n_clusters=20, use_rep='X_cast', random_state=42,
[4]:, color=['CAST_clusters'], show_legend="upper left",
            figsize=(4, 3), color_key_cmap="tab20")