spateo.tools.lisa¶
Spatial markers.
Functions¶
|
Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas |
|
Identify cell type specific genes with local Moran's I test. |
|
Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988). |
Module Contents¶
- spateo.tools.lisa.lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) geopandas.GeoDataFrame [source]¶
Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions.
- Parameters:
- adata
An adata object that has spatial information (via spatial_key key in adata.obsm).
- gene
The gene that will be used for lisa analyses, must be included in the data.
- spatial_key
The spatial key of the spatial coordinate of each bucket.
- n_neighbors
The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.
- layer
the key to the layer. If it is None, adata.X will be used by default.
- Returns:
a geopandas dataframe that includes the coordinate (x, y columns), expression (exp column) and lagged expression (w_exp column), z-score (exp_zscore, w_exp_zscore) and the LISA (Is column). score.
- Return type:
df
- spateo.tools.lisa.local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30)[source]¶
Identify cell type specific genes with local Moran’s I test.
- Parameters:
- adata
An adata object that has spatial information (via spatial_key key in adata.obsm).
- group
The key to the cell group in the adata.obs.
- spatial_key
The spatial key of the spatial coordinate of each bucket.
- genes
The gene that will be used for lisa analyses, must be included in the data.
- layer
the key to the layer. If it is None, adata.X will be used by default.
- n_neighbors
The number of nearest neighbors of each bucket that will be used in calculating the spatial lag.
- copy
Whether to copy the adata object.
- Returns:
Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The resultant adata will include the following new columns in adata.var:
- {*}_num_val: The maximum number of categories (`{“hotspot”, “coldspot”, “doughnut”, “diamond”}) across all
cell groups
{*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by
the cell group size).
{*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category.
{*} can be one of {“hotspot”, “coldspot”, “doughnut”, “diamond”}.
Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query(“hotspot_frac_val > 0.05 & mean > 0.05”). >>> groupby([‘hotspot_spec_group’])[‘hotspot_spec_val’].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8))
- spateo.tools.lisa.GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30)[source]¶
Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988).
- math:
`log{P_i} = lpha +
ho log{P_{lag-i}} + sum_k eta_k X_{ki} + epsilon_i`
- Reference:
https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html
- Args:
adata: An adata object that has spatial information (via spatial_key key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object.
- Returns:
Depend on the copy argument, return a deep copied adata object (when copy = True) or inplace updated adata object. The result adata will include the following new columns in adata.var:
{*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*})
Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group=’simpleanno’) >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith(‘_GM_lag_coeff’)] >>> adata.var.loc[[“Hbb-bt”, “Hbb-bh1”, “Hbb-y”, “Hbb-bs”], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis=’spatial’, color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis=’spatial’, color=[‘simpleanno’], >>> highlights=[i.split(‘_GM_lag_coeff’)[0]], pointsize=0.1, alpha=1, show_legend=’on data’)