Note

This page was generated from 4_svg_archetype.ipynb. Interactive online version: Colab badge. Some tutorial content may look better in light mode.

4.Identification of spatial variable genes and archetypes

This notebook demonstrates how to use spateo to identify spatial variable genes, by calculating Moran‘s I score, and how to get spatial gene archetypes.

[1]:
import spateo as st

import numpy as np

st.configuration.set_pub_style_mpltex()
%matplotlib inline
2022-11-13 20:18:33.040154: I tensorflow/core/util/util.cc:169] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
network.py (36): The next major release of pysal/spaghetti (2.0.0) will drop support for all ``libpysal.cg`` geometries. This change is a first step in refactoring ``spaghetti`` that is expected to result in dramatically reduced runtimes for network instantiation and operations. Users currently requiring network and point pattern input as ``libpysal.cg`` geometries should prepare for this simply by converting to ``shapely`` geometries.

Data source

bin60_clustered_h5ad: https://www.dropbox.com/s/wxgkim87uhpaz1c/mousebrain_bin60_clustered.h5ad?dl=0

[2]:
# Load annotated binning data
fname_bin60 = "mousebrain_bin60_clustered.h5ad"
adata_bin60 = st.sample_data.mousebrain(fname_bin60)

adata_bin60
[2]:
AnnData object with n_obs × n_vars = 7765 × 21667
    obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
    var: 'pass_basic_filter'
    uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
    obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
    layers: 'count', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'

SVG identification (Moran’s I)

[3]:
# Filter out low expressed and uninterested, such as mitochondrial genes
mito_genes = adata_bin60.var_names.str.startswith("mt-")
adata_bin60 = adata_bin60[: ,~mito_genes]

st.pp.filter.filter_genes(adata_bin60,min_cells= adata_bin60.n_obs*0.05, inplace=True)
adata_bin60
[3]:
AnnData object with n_obs × n_vars = 7765 × 5059
    obs: 'area', 'n_counts', 'Size_Factor', 'initial_cell_size', 'louvain', 'scc', 'scc_anno'
    var: 'pass_basic_filter'
    uns: '__type', 'louvain', 'louvain_colors', 'neighbors', 'pp', 'scc', 'scc_anno_colors', 'scc_colors', 'spatial', 'spatial_neighbors'
    obsm: 'X_pca', 'X_spatial', 'bbox', 'contour', 'spatial'
    layers: 'count', 'spliced', 'unspliced'
    obsp: 'connectivities', 'distances', 'spatial_connectivities', 'spatial_distances'
[ ]:
# Calculate Moran's I score for genes
m = st.tl.moran_i(adata_bin60, n_jobs=30)
[5]:
# Filter genes by Moran's I score and adjusted p-value
m_filter = m[(m.moran_i > 0.15)&(m.moran_q_val<0.05)].sort_values(by=['moran_i'],ascending=False)
m_filter[0:10]
[5]:
moran_i moran_p_val moran_z moran_q_val
Plp1 0.529320 0.005 78.720267 0.014116
Gm28928 0.516217 0.005 72.395455 0.014116
Ntng1 0.511843 0.005 72.667184 0.014116
Mbp 0.510432 0.005 77.079408 0.014116
Gm42418 0.468743 0.005 69.624860 0.014116
Phactr1 0.468161 0.005 70.079342 0.014116
Mobp 0.457718 0.005 61.732588 0.014116
Cmss1 0.436734 0.005 65.651464 0.014116
Kcnip4 0.421696 0.005 61.525662 0.014116
Camk1d 0.406778 0.005 56.055133 0.014116
[6]:
# Visualize top SVGs
st.pl.space(
    adata_bin60,
    genes=m_filter.index[0:6].tolist(),
    pointsize=0.1,
    ncols=3,
    show_legend="upper left",
    figsize=(2,2),
)
../../../_images/tutorials_notebooks_5_cluster_digitization_4_svg_archetype_8_0.png

SVG archetypes

[7]:
# Cluster identified SVGs into archetypes
num_clusters = 15

archetypes, clusters, gene_corrs = st.tl.find_spatial_archetypes(
    exp_mat = adata_bin60[:,m_filter.index].X.A.T,
    num_clusters = num_clusters,
)
|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1889s]
|-----> done!
[8]:
# Create dataframe with genes and its archetype
df_clusters = pd.DataFrame({"gene":m_filter.index.to_list(), "archetype":clusters-1})
df_clusters[0:10]

# Save gene cluster info
# df_clusters.to_csv("archetype_clusters.csv")
[8]:
gene archetype
0 Plp1 11
1 Gm28928 14
2 Ntng1 8
3 Mbp 11
4 Gm42418 2
5 Phactr1 5
6 Mobp 11
7 Cmss1 1
8 Kcnip4 4
9 Camk1d 1
[9]:
# Calculate genes having significant spatial correlation with each archetype
archetypes_dict = st.tl.archetypes_genes(
    adata=adata_bin60,
    archetypes=archetypes,
    num_clusters=num_clusters,
    moran_i_genes=m_filter.index,
)
|-----? No genes with significant correlation were found at the current p-value threshold.
[10]:
# Calculate expression scores for archetypes
adata_bin60.obs = st.tl.archetypes(adata_bin60,m_filter.index,num_clusters=num_clusters)
|-----> [Finding gene archetypes] in progress: 100.0000%
|-----> [Finding gene archetypes] finished [0.1891s]
|-----> done!
[11]:
# Visualize archetypes
arch_cols = ["archetype %d" % i for i in np.arange(num_clusters)]
st.pl.space(
    adata_bin60,
    color = adata_bin60.obs[arch_cols].columns.tolist(),
    pointsize=0.1,
    ncols=5,
    show_legend="upper left",
    figsize=(2,2),
)
../../../_images/tutorials_notebooks_5_cluster_digitization_4_svg_archetype_14_0.png