`spateo.tools.lisa`#

Spatial markers.

Module Contents#

Functions#

`lisa_geo_df`(→ geopandas.GeoDataFrame)	Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas
`local_moran_i`(adata, group[, spatial_key, genes, ...])	Identify cell type specific genes with local Moran's I test.
`GM_lag_model`(adata, group[, spatial_key, genes, ...])	Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

spateo.tools.lisa.lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) → geopandas.GeoDataFrame[source]#

Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions.

Parameters:

adata: An adata object that has spatial information (via spatial_key key in adata.obsm).
gene: The gene that will be used for lisa analyses, must be included in the data.
spatial_key: The spatial key of the spatial coordinate of each bucket.
n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.
layer: the key to the layer. If it is None, adata.X will be used by default.

Returns:

a geopandas dataframe that includes the coordinate (x, y columns), expression (exp column) and lagged expression (w_exp column), z-score (exp_zscore, w_exp_zscore) and the LISA (Is column). score.

Return type:

spateo.tools.lisa.local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30)[source]#

Identify cell type specific genes with local Moran’s I test.

Parameters:

adata: An adata object that has spatial information (via spatial_key key in adata.obsm).
group: The key to the cell group in the adata.obs.
spatial_key: The spatial key of the spatial coordinate of each bucket.
genes: The gene that will be used for lisa analyses, must be included in the data.
layer: the key to the layer. If it is None, adata.X will be used by default.
n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.
copy: Whether to copy the adata object.

Returns:

Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The resultant adata will include the following new columns in adata.var:

{*}_num_val: The maximum number of categories (`{“hotspot”, “coldspot”, “doughnut”, “diamond”}) across all
cell groups

{*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by

the cell group size).

{*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category.

{*} can be one of {“hotspot”, “coldspot”, “doughnut”, “diamond”}.

Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query(“hotspot_frac_val > 0.05 & mean > 0.05”). >>> groupby([‘hotspot_spec_group’])[‘hotspot_spec_val’].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8))

spateo.tools.lisa.GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30)[source]#

Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

math:

`log{P_i} = lpha +

ho log{P_{lag-i}} + sum_k eta_k X_{ki} + epsilon_i`

Reference:
https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html

Args:
adata: An adata object that has spatial information (via spatial_key key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object.

Returns:
Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The result adata will include the following new columns in adata.var:

{*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*})

Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group=’simpleanno’) >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith(‘_GM_lag_coeff’)] >>> adata.var.loc[[“Hbb-bt”, “Hbb-bh1”, “Hbb-y”, “Hbb-bs”], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis=’spatial’, color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis=’spatial’, color=[‘simpleanno’], >>> highlights=[i.split(‘_GM_lag_coeff’)[0]], pointsize=0.1, alpha=1, show_legend=’on data’)

spateo.tools.lisa#

Module Contents#

Functions#

`spateo.tools.lisa`#