spateo.tools.lisa ================= .. py:module:: spateo.tools.lisa .. autoapi-nested-parse:: Spatial markers. Functions --------- .. autoapisummary:: spateo.tools.lisa.lisa_geo_df spateo.tools.lisa.local_moran_i spateo.tools.lisa.GM_lag_model Module Contents --------------- .. py:function:: lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) -> geopandas.GeoDataFrame Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions. :param adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). :param gene: The gene that will be used for lisa analyses, must be included in the data. :param spatial_key: The spatial key of the spatial coordinate of each bucket. :param n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. :param layer: the key to the layer. If it is None, adata.X will be used by default. :returns: a geopandas dataframe that includes the coordinate (`x`, `y` columns), expression (`exp` column) and lagged expression (`w_exp` column), z-score (`exp_zscore`, `w_exp_zscore`) and the LISA (`Is` column). score. :rtype: df .. py:function:: local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30) Identify cell type specific genes with local Moran's I test. :param adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). :param group: The key to the cell group in the adata.obs. :param spatial_key: The spatial key of the spatial coordinate of each bucket. :param genes: The gene that will be used for lisa analyses, must be included in the data. :param layer: the key to the layer. If it is None, adata.X will be used by default. :param n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. :param copy: Whether to copy the adata object. :returns: Depend on the `copy` argument, return a deep copied adata object (when `copy = True`) or inplace updated adata object. The resultant adata will include the following new columns in `adata.var`: {*}_num_val: The maximum number of categories (`{"hotspot", "coldspot", "doughnut", "diamond"}) across all cell groups {*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by the cell group size). {*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category. {*} can be one of `{"hotspot", "coldspot", "doughnut", "diamond"}`. Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query("hotspot_frac_val > 0.05 & mean > 0.05"). >>> groupby(['hotspot_spec_group'])['hotspot_spec_val'].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8)) .. py:function:: GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30) Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988). :math: `\log{P_i} = lpha + ho \log{P_{lag-i}} + \sum_k eta_k X_{ki} + \epsilon_i` Reference: https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html Args: adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object. Returns: Depend on the `copy` argument, return a deep copied adata object (when `copy = True`) or inplace updated adata object. The result adata will include the following new columns in `adata.var`: {*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*}) Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group='simpleanno') >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith('_GM_lag_coeff')] >>> adata.var.loc[["Hbb-bt", "Hbb-bh1", "Hbb-y", "Hbb-bs"], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis='spatial', color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis='spatial', color=['simpleanno'], >>> highlights=[i.split('_GM_lag_coeff')[0]], pointsize=0.1, alpha=1, show_legend='on data')