spateo.tools.lisa#

Spatial markers.

Module Contents#

Functions#

lisa_geo_df(→ geopandas.GeoDataFrame)

Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas

local_moran_i(adata, group[, spatial_key, genes, ...])

Identify cell type specific genes with local Moran's I test.

GM_lag_model(adata, group[, spatial_key, genes, ...])

Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

spateo.tools.lisa.lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) geopandas.GeoDataFrame[source]#

Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions.

Parameters:
adata

An adata object that has spatial information (via spatial_key key in adata.obsm).

gene

The gene that will be used for lisa analyses, must be included in the data.

spatial_key

The spatial key of the spatial coordinate of each bucket.

n_neighbors

The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.

layer

the key to the layer. If it is None, adata.X will be used by default.

Returns:

a geopandas dataframe that includes the coordinate (x, y columns), expression (exp column) and lagged expression (w_exp column), z-score (exp_zscore, w_exp_zscore) and the LISA (Is column). score.

Return type:

df

spateo.tools.lisa.local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30)[source]#

Identify cell type specific genes with local Moran’s I test.

Parameters:
adata

An adata object that has spatial information (via spatial_key key in adata.obsm).

group

The key to the cell group in the adata.obs.

spatial_key

The spatial key of the spatial coordinate of each bucket.

genes

The gene that will be used for lisa analyses, must be included in the data.

layer

the key to the layer. If it is None, adata.X will be used by default.

n_neighbors

The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.

copy

Whether to copy the adata object.

Returns:

Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The resultant adata will include the following new columns in adata.var:

{*}_num_val: The maximum number of categories (`{“hotspot”, “coldspot”, “doughnut”, “diamond”}) across all

cell groups

{*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by

the cell group size).

{*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category.

{*} can be one of {“hotspot”, “coldspot”, “doughnut”, “diamond”}.

Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query(“hotspot_frac_val > 0.05 & mean > 0.05”). >>> groupby([‘hotspot_spec_group’])[‘hotspot_spec_val’].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8))

spateo.tools.lisa.GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30)[source]#

Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).

math:

`log{P_i} = lpha +

ho log{P_{lag-i}} + sum_k eta_k X_{ki} + epsilon_i`

Reference:

https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html

Args:

adata: An adata object that has spatial information (via spatial_key key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object.

Returns:

Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The result adata will include the following new columns in adata.var:

{*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*})

Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group=’simpleanno’) >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith(‘_GM_lag_coeff’)] >>> adata.var.loc[[“Hbb-bt”, “Hbb-bh1”, “Hbb-y”, “Hbb-bs”], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis=’spatial’, color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis=’spatial’, color=[‘simpleanno’], >>> highlights=[i.split(‘_GM_lag_coeff’)[0]], pointsize=0.1, alpha=1, show_legend=’on data’)