spateo.tools.cluster ==================== .. py:module:: spateo.tools.cluster Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/spateo/tools/cluster/cluster_spagcn/index /autoapi/spateo/tools/cluster/find_clusters/index /autoapi/spateo/tools/cluster/leiden/index /autoapi/spateo/tools/cluster/spagcn_utils/index /autoapi/spateo/tools/cluster/utils/index Functions --------- .. autoapisummary:: spateo.tools.cluster.spagcn_vanilla spateo.tools.cluster.scc spateo.tools.cluster.spagcn_pyg spateo.tools.cluster.compute_pca_components spateo.tools.cluster.ecp_silhouette spateo.tools.cluster.integrate spateo.tools.cluster.pca_spateo spateo.tools.cluster.pearson_residuals Package Contents ---------------- .. py:function:: spagcn_vanilla(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'spagcn_pred', n_pca_components: Optional[int] = None, e_neigh: int = 10, resolution: float = 0.4, n_clusters: Optional[int] = None, refine_shape: Literal['hexagon', 'square'] = 'hexagon', p: float = 0.5, seed: int = 100, numIterMaxSpa: int = 2000, copy: bool = False) -> Optional[anndata.AnnData] Integrating gene expression and spatial location to identify spatial domains via SpaGCN. Original Code Repository: https://github.com/jianhuupenn/SpaGCN Reference: Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021) :param adata: An Anndata object after normalization. :param spatial_key: the key in `.obsm` that corresponds to the spatial coordinate of each bucket. :param key_added: adata.obs key under which to add the cluster labels. The initial clustering results of SpaGCN are under `key_added`, and the refined clustering results are under `f'{key_added}_refined'`. :param n_pca_components: Number of principal components to compute. If `n_pca_components` == None, the value at the inflection point of the PCA curve is automatically calculated as n_comps. :param e_neigh: Number of nearest neighbor in gene expression space. Used in dyn.pp.neighbors(adata, n_neighbors=e_neigh). :param resolution: Resolution in the Louvain clustering method. Used when `n_clusters`==None. :param n_clusters: Number of spatial domains wanted. If `n_clusters` != None, the suitable resolution in the initial Louvain clustering method will be automatically searched based on n_clusters. :param refine_shape: Smooth the spatial domains with given spatial topology, "hexagon" for Visium data, "square" for ST data. Defaults to None. :param p: Percentage of total expression contributed by neighborhoods. :param seed: Global seed for `random`, `torch`, `numpy`. Defaults to 100. :param numIterMaxSpa: SpaGCN maximum number of training iterations. :param copy: Whether to copy `adata` or modify it inplace. :returns: Depending on the parameter `copy`, when True return an updates adata with the field ``adata.obs[key_added]`` and ``adata.obs[f'{key_added}_refined']``, containing the cluster result based on SpaGCN; else inplace update the adata object. .. py:function:: scc(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'scc', pca_key: str = 'pca', e_neigh: int = 30, s_neigh: int = 6, resolution: Optional[float] = None) -> Optional[anndata.AnnData] Spatially constrained clustering (scc) to identify continuous tissue domains. Reference: Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Wenjiao Li, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Feng Lin, Zhao Li, Yan Hong, Defeng Fu, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Huiwen Zheng, Zhifeng Wang, H Xiang, L Han, B Qin, P Guo, PM Cánoves, JP Thiery, Q Wu, F Zhao, M Li, H Kuang, J Hui, O Wang, B Wang, M Ni, W Zhang, F Mu, Y Yin, H Yang, M Lisby, RJ Cornall, J Mulder, M Uhlen, MA Esteban, Y Li, L Liu, X Xu, J Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022. :param adata: an Anndata object, after normalization. :param spatial_key: the key in `.obsm` that corresponds to the spatial coordinate of each bucket. :param key_added: adata.obs key under which to add the cluster labels. :param pca_key: label for the .obsm key containing PCA information (without the potential prefix "X_") :param e_neigh: the number of nearest neighbor in gene expression space. :param s_neigh: the number of nearest neighbor in physical space. :param resolution: the resolution parameter of the louvain clustering algorithm. :returns: An `~anndata.AnnData` object with cluster info in .obs. :rtype: adata .. py:function:: spagcn_pyg(adata: anndata.AnnData, n_clusters: int, p: float = 0.5, s: int = 1, b: int = 49, refine_shape: Optional[str] = None, his_img_path: Optional[str] = None, total_umi: Optional[str] = None, x_pixel: str = None, y_pixel: str = None, x_array: str = None, y_array: str = None, seed: int = 100, copy: bool = False) -> Optional[anndata.AnnData] Function to find clusters with spagcn. Reference: Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021) :param adata: an Anndata object, after normalization. :param n_clusters: Desired number of clusters. :param p: parameter `p` in spagcn algorithm. See `SpaGCN` for details. Defaults to 0.5. :param s: alpha to control the color scale in calculating adjacent matrix. Defaults to 1. :param b: beta to control the range of neighbourhood when calculate grey value for one spot in calculating adjacent matrix. Defaults to 49. :param refine_shape: Smooth the spatial domains with given spatial topology, "hexagon" for Visium data, "square" for ST data. Defaults to None. :param his_img_path: The file path of histology image used to calculate adjacent matrix in spagcn algorithm. Defaults to None. :param total_umi: By providing the key(colname) in `adata.obs` which contains total UMIs(counts) for each spot, the function use the total counts as a grayscale image when histology image is not provided. Ignored if his_img_path is not `None`. Defaults to "total_umi". :param x_pixel: The key(colname) in `adata.obs` which contains corresponding x-pixels in histology image. Defaults to None. :param y_pixel: The key(colname) in `adata.obs` which contains corresponding y-pixels in histology image. Defaults to None. :param x_array: The key(colname) in `adata.obs` which contains corresponding x-coordinates. Defaults to None. :param y_array: The key(colname) in `adata.obs` which contains corresponding y-coordinates. Defaults to None. :param seed: Global seed for `random`, `torch`, `numpy`. Defaults to 100. :param copy: Whether to return a new deep copy of `adata` instead of updating `adata` object passed in arguments. Defaults to False. :returns: `~anndata.AnnData`: An `~anndata.AnnData` object with cluster info in "spagcn_pred", and in "spagcn_pred_refined" if `refine_shape` is set. The adjacent matrix used in spagcn algorithm is saved in `adata.uns["adj_spagcn"]`. :rtype: class .. py:function:: compute_pca_components(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], random_state: Optional[int] = 1, save_curve_img: Optional[str] = None) -> Tuple[Any, int, float] Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain. :param matrix: A dense or sparse matrix. :param save_curve_img: If save_curve_img != None, save the image of the PCA curve and inflection points. :returns: The number of principal components that PCA should retain. new_components_stored: Percentage of variance explained by the retained principal components. :rtype: new_n_components .. py:function:: ecp_silhouette(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], cluster_labels: numpy.ndarray) -> float Here we evaluate the clustering performance by calculating the Silhouette Coefficient. The silhouette analysis is used to choose an optimal value for clustering resolution. The Silhouette Coefficient is a widely used method for evaluating clustering performance, where a higher Silhouette Coefficient score relates to a model with better defined clusters and indicates a good separation between the celltypes. Advantages of the Silhouette Coefficient: * The score is bounded between -1 for incorrect clustering and +1 for highly dense clustering. Scores around zero indicate overlapping clusters. * The score is higher when clusters are dense and well separated, which relates to a standard concept of a cluster. Original Code Repository: https://scikit-learn.org/stable/modules/clustering.html#silhouette-coefficient :param matrix: A dense or sparse matrix of feature. :param cluster_labels: A array of labels for each cluster. :returns: Mean Silhouette Coefficient for all clusters. .. rubric:: Examples >>> silhouette_score(matrix=adata.obsm["X_pca"], cluster_labels=adata.obs["leiden"].values) .. py:function:: integrate(adatas: List[anndata.AnnData], batch_key: str = 'slices', fill_value: Union[int, float] = 0) -> anndata.AnnData Concatenating all anndata objects. :param adatas: AnnData matrices to concatenate with. :param batch_key: Add the batch annotation to :attr:`obs` using this key. :param fill_value: Scalar value to fill newly missing values in arrays with. :returns: The concatenated AnnData, where adata.obs[batch_key] stores a categorical variable labeling the batch. :rtype: integrated_adata .. py:function:: pca_spateo(adata: anndata.AnnData, X_data: Optional[numpy.ndarray] = None, n_pca_components: Optional[int] = None, pca_key: Optional[str] = 'X_pca', genes: Union[list, None] = None, layer: Union[str, None] = None, random_state: Optional[int] = 1) Do PCA for dimensional reduction. :param adata: An Anndata object. :param X_data: The user supplied data that will be used for dimension reduction directly. :param n_pca_components: The number of principal components that PCA will retain. If none, will Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain. :param pca_key: Add the PCA result to :attr:`obsm` using this key. :param genes: The list of genes that will be used to subset the data for dimension reduction and clustering. If `None`, all genes will be used. :param layer: The layer that will be used to retrieve data for dimension reduction and clustering. If `None`, will use ``adata.X``. :returns: The processed AnnData, where adata.obsm[pca_key] stores the PCA result. :rtype: adata_after_pca .. py:function:: pearson_residuals(adata: anndata.AnnData, n_top_genes: Optional[int] = 3000, subset: bool = False, theta: float = 100, clip: Optional[float] = None, check_values: bool = True) Preprocess UMI count data with analytic Pearson residuals. Pearson residuals transform raw UMI counts into a representation where three aims are achieved: 1.Remove the technical variation that comes from differences in total counts between cells; 2.Stabilize the mean-variance relationship across genes, i.e. ensure that biological signal from both low and high expression genes can contribute similarly to downstream processing 3.Genes that are homogeneously expressed (like housekeeping genes) have small variance, while genes that are differentially expressed (like marker genes) have high variance :param adata: An anndata object. :param n_top_genes: Number of highly-variable genes to keep. :param subset: Inplace subset to highly-variable genes if `True` otherwise merely indicate highly variable genes. :param theta: The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and `theta=np.Inf` corresponds to a Poisson model. :param clip: Determines if and how residuals are clipped: * If `None`, residuals are clipped to the interval [-sqrt(n), sqrt(n)], where n is the number of cells in the dataset (default behavior). * If any scalar c, residuals are clipped to the interval [-c, c]. Set `clip=np.Inf` for no clipping. :param check_values: Check if counts in selected layer are integers. A Warning is returned if set to True. :returns: Updates adata with the field ``adata.obsm["pearson_residuals"]``, containing pearson_residuals.