spateo.svg.get_svg ================== .. py:module:: spateo.svg.get_svg Functions --------- .. autoapisummary:: spateo.svg.get_svg.svg_iden_reg spateo.svg.get_svg.get_std_wasserstein spateo.svg.get_svg.smoothing_and_sampling spateo.svg.get_svg.smoothing spateo.svg.get_svg.downsampling spateo.svg.get_svg.cal_wass_dis_for_genes spateo.svg.get_svg.cal_wass_dist_bs spateo.svg.get_svg.cal_wass_dis_nobs spateo.svg.get_svg.bin_scale_adata_get_distance spateo.svg.get_svg.cal_wass_dis_target_on_genes Module Contents --------------- .. py:function:: svg_iden_reg(adata: spateo.svg.utils.AnnData, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 8, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 500, max_dis_cutoff: float = 1000, n_neighbors_for_std: int = 30) -> spateo.svg.utils.pd.DataFrame Identifying SVGs using a spatial uniform distribution as the reference. :param adata: AnnData object :param bin_layer: Data in this layer will be binned according to the spatial information. :param cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean. :param distance_layer: Data in this layer will be used to calculate the spatial distance. :param n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance. :param numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged. :param gene_set: Gene set that will be used to identified spatial variable genes, default is for all genes. :param target: The target gene expression distribution or the target gene name. :param min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered. :param max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered. :param n_neighbors_for_std: Number of neighbors that will be used to calculate the standard deviation of the Wasserstein distances. :returns: a pandas data frame that stores the information of spatial variable genes results. It includes the following columns: "raw_pos_rate": The raw positive ratio (the fraction of cells that have non-zero expression ) of the gene across all cells. "Wasserstein_distance": The computed Wasserstein distance of each gene to the reference uniform distribution. "expectation_reg": The predicted Wasserstein distance after fitting a loess regression using the gene positive rate as the predictor. "std": Standard deviation of the Wasserstein distance. "std_reg": The predicted standard deviation of the Wasserstein distance after fitting a loess regression using the gene positive rate as the predictor. "zscore": The z-score of the Wasserstein distance. "pvalue": The p-value based on the z-score. "adj_pvalue": Adjusted p-value. In addition, the input adata object has updated with the following information: adata.var["raw_pos_rate"]: The positive rate of each gene. :rtype: w0 .. py:function:: get_std_wasserstein(l: spateo.svg.utils.Union[spateo.svg.utils.np.ndarray, spateo.svg.utils.pd.DataFrame], n_neighbors: int = 30) -> spateo.svg.utils.np.ndarray Calculate the standard deviation of the Wasserstein distance. :param l: The vector of the Wasserstein distance. :param n_neighbors: number of nearest neighbors. :returns: The standard deviation of the Wasserstein distance. :rtype: std .. py:function:: smoothing_and_sampling(adata: spateo.svg.utils.AnnData, smoothing: bool = True, downsampling: int = 400, device: str = 'cpu') -> Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.AnnData] Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object. :param adata: The input AnnData object. :param smoothing: Whether to do smooth the gene expression. :param downsampling: The number of cells to down sample. :param device: The device to run the deep learning smoothing model. Can be either "cpu" or proper "cuda" related devices, such as: "cuda:0". :returns: The adata after smoothing and downsampling. adata_smoothed: The adata after smoothing but not downsampling. :rtype: adata .. py:function:: smoothing(adata: spateo.svg.utils.AnnData, device: str = 'cpu') -> spateo.svg.utils.AnnData Smoothing the gene expression using a graph neural network. :param adata: The input AnnData object. :param device: The device to run the deep learning smoothing model. Can be either "cpu" or proper "cuda" related devices, such as: "cuda:0". :returns: imputation result :rtype: adata_smoothed .. py:function:: downsampling(adata: spateo.svg.utils.AnnData, downsampling: int = 400) -> spateo.svg.utils.AnnData Downsampling the cells from the adata object. :param adata: The input AnnData object. :param downsampling: The number of cells to down sample. :returns: adata after the downsampling. :rtype: adata .. py:function:: cal_wass_dis_for_genes(inp0: Tuple[spateo.svg.utils.csr_matrix, spateo.svg.utils.AnnData], inp1: Tuple[int, spateo.svg.utils.List, spateo.svg.utils.np.ndarray, int]) -> Tuple[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, spateo.svg.utils.np.ndarray] Calculate Wasserstein distances for a list of genes. :param inp0: A tuple of the sparse matrix of spatial distance between nearest neighbors, and the adata object. :param inp1: A tuple of the seed, the list of genes, the target gene expression vector (need to be normalized to have a sum of 1), and the maximal number of iterations. :returns: The gene list that is used to calculate the Wasserstein distribution. ws: The Wasserstein distances from each gene to the target gene. pos_rs: The expression positive rate vector related to the gene list. :rtype: gene_ids .. py:function:: cal_wass_dist_bs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, rank_p: bool = True, bin_num: int = 100, larger_or_small: str = 'larger') -> Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData] Computing Wasserstein distance for an AnnData to identify spatially variable genes. :param adata: AnnData object. :param bin_size: Bin size for mergeing cells. :param bin_layer: Data in this layer will be binned according to the spatial information. :param cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean. :param distance_layer: The data of this layer would be used to calculate distance :param n_neighbors: The number of neighbors for calculating spatial distance. :param numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged. :param gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes. :param target: The target gene expression distribution or the target gene name. :param processes: The process number for parallel computing :param bootstrap: Bootstrap number for permutation to calculate p-value :param min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered. :param max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered. :param rank_p: Whether to calculate p value in ranking manner. :param bin_num: Classy genes into bin_num groups according to mean Wasserstein distance from bootstrap. :param larger_or_small: In what direction to get p value. Larger means the right tail area of the null distribution. :returns: A dataframe storing information related to the Wasserstein distances. bin_scale_adata: Binned AnnData object :rtype: w_df .. py:function:: cal_wass_dis_nobs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) -> Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData] Computing Wasserstein distance for a AnnData to identify spatially variable genes. :param adata: AnnData object :param bin_size: bin size for mergeing cells. :param bin_layer: data in this layer will be binned according to spatial information. :param cell_distance_method: the method for calculating distance of two cells. geodesic or euclidean :param distance_layer: the data of this layer would be used to calculate distance :param n_neighbors: the number of neighbors for calculation geodesic distance :param numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged :param gene_set: Gene set for computing, default is for all genes. :param target: the target distribution or the target gene name. :param min_dis_cutoff: Cells/Bins whose min distance to 30 neighbors are larger than this cutoff would be filtered. :param max_dis_cutoff: Cells/Bins whose max distance to 30 neighbors are larger than this cutoff would be filtered. :returns: A dataframe storing information related to the Wasserstein distances. :rtype: w_df .. py:function:: bin_scale_adata_get_distance(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, n_neighbors: int = 30) -> Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.csr_matrix] Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean). :param adata: AnnData object. :param bin_size: Bin size for mergeing cells. :param bin_layer: Data in this layer will be binned according to the spatial information. :param distance_layer: The data of this layer would be used to calculate distance :param cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean. :param min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered. :param max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered. :param n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance. :returns: Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors. :rtype: bin_scale_adata .. py:function:: cal_wass_dis_target_on_genes(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', n_neighbors: int = 30, numItermax: int = 1000000, target_genes: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 0, top_n: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) -> Tuple[dict, spateo.svg.utils.AnnData] Find genes in gene_set that have similar distribution to each target_genes. :param adata: AnnData object. :param bin_size: Bin size for mergeing cells. :param bin_layer: Data in this layer will be binned according to the spatial information. :param distance_layer: The data of this layer would be used to calculate distance :param cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean. :param n_neighbors: The number of neighbors for calculating spatial distance. :param numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged. :param target_genes: The list of the target genes. :param gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes. :param processes: The process number for parallel computing. :param bootstrap: Number of bootstraps. :param top_n: Number of top genes to select. :param min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered. :param max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered. :returns: The dictionary of the Wasserstein distance. Each key corresponds to a gene name while the corresponding value the pandas DataFrame of the Wasserstein distance related information. bin_scale_adata: binned, scaled anndata object. :rtype: w_genes