spateo.svg.get_svg¶
Functions¶
|
Identifying SVGs using a spatial uniform distribution as the reference. |
|
Calculate the standard deviation of the Wasserstein distance. |
Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object. |
|
|
Smoothing the gene expression using a graph neural network. |
|
Downsampling the cells from the adata object. |
|
Calculate Wasserstein distances for a list of genes. |
|
Computing Wasserstein distance for an AnnData to identify spatially variable genes. |
|
Computing Wasserstein distance for a AnnData to identify spatially variable genes. |
Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified |
|
|
Find genes in gene_set that have similar distribution to each target_genes. |
Module Contents¶
- spateo.svg.get_svg.svg_iden_reg(adata: spateo.svg.utils.AnnData, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 8, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 500, max_dis_cutoff: float = 1000, n_neighbors_for_std: int = 30) spateo.svg.utils.pd.DataFrame [source]¶
Identifying SVGs using a spatial uniform distribution as the reference.
- Parameters:
- adata
AnnData object
- bin_layer
Data in this layer will be binned according to the spatial information.
- cell_distance_method
The method for calculating distance between two cells, either geodesic or euclidean.
- distance_layer
Data in this layer will be used to calculate the spatial distance.
- n_neighbors
The number of nearest neighbors that will be considered for calculating spatial distance.
- numItermax
The maximum number of iterations before stopping the optimization algorithm if it has not converged.
- gene_set
Gene set that will be used to identified spatial variable genes, default is for all genes.
- target
The target gene expression distribution or the target gene name.
- min_dis_cutoff
Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
- max_dis_cutoff
Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
- n_neighbors_for_std
Number of neighbors that will be used to calculate the standard deviation of the Wasserstein distances.
- Returns:
a pandas data frame that stores the information of spatial variable genes results. It includes the following columns:
- ”raw_pos_rate”: The raw positive ratio (the fraction of cells that have non-zero expression ) of the gene
across all cells.
- ”Wasserstein_distance”: The computed Wasserstein distance of each gene to the reference uniform
distribution.
- ”expectation_reg”: The predicted Wasserstein distance after fitting a loess regression using the gene
positive rate as the predictor.
”std”: Standard deviation of the Wasserstein distance. “std_reg”: The predicted standard deviation of the Wasserstein distance after fitting a loess regression
using the gene positive rate as the predictor.
”zscore”: The z-score of the Wasserstein distance. “pvalue”: The p-value based on the z-score. “adj_pvalue”: Adjusted p-value.
- In addition, the input adata object has updated with the following information:
adata.var[“raw_pos_rate”]: The positive rate of each gene.
- Return type:
w0
- spateo.svg.get_svg.get_std_wasserstein(l: spateo.svg.utils.Union[spateo.svg.utils.np.ndarray, spateo.svg.utils.pd.DataFrame], n_neighbors: int = 30) spateo.svg.utils.np.ndarray [source]¶
Calculate the standard deviation of the Wasserstein distance.
- Parameters:
- l
The vector of the Wasserstein distance.
- n_neighbors
number of nearest neighbors.
- Returns:
The standard deviation of the Wasserstein distance.
- Return type:
std
- spateo.svg.get_svg.smoothing_and_sampling(adata: spateo.svg.utils.AnnData, smoothing: bool = True, downsampling: int = 400, device: str = 'cpu') Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.AnnData] [source]¶
Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.
- Parameters:
- adata
The input AnnData object.
- smoothing
Whether to do smooth the gene expression.
- downsampling
The number of cells to down sample.
- device
The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.
- Returns:
The adata after smoothing and downsampling. adata_smoothed: The adata after smoothing but not downsampling.
- Return type:
adata
- spateo.svg.get_svg.smoothing(adata: spateo.svg.utils.AnnData, device: str = 'cpu') spateo.svg.utils.AnnData [source]¶
Smoothing the gene expression using a graph neural network.
- Parameters:
- adata
The input AnnData object.
- device
The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.
- Returns:
imputation result
- Return type:
adata_smoothed
- spateo.svg.get_svg.downsampling(adata: spateo.svg.utils.AnnData, downsampling: int = 400) spateo.svg.utils.AnnData [source]¶
Downsampling the cells from the adata object.
- Parameters:
- adata
The input AnnData object.
- downsampling
The number of cells to down sample.
- Returns:
adata after the downsampling.
- Return type:
adata
- spateo.svg.get_svg.cal_wass_dis_for_genes(inp0: Tuple[spateo.svg.utils.csr_matrix, spateo.svg.utils.AnnData], inp1: Tuple[int, spateo.svg.utils.List, spateo.svg.utils.np.ndarray, int]) Tuple[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, spateo.svg.utils.np.ndarray] [source]¶
Calculate Wasserstein distances for a list of genes.
- Parameters:
- inp0
A tuple of the sparse matrix of spatial distance between nearest neighbors, and the adata object.
- inp1
A tuple of the seed, the list of genes, the target gene expression vector (need to be normalized to have a sum of 1), and the maximal number of iterations.
- Returns:
The gene list that is used to calculate the Wasserstein distribution. ws: The Wasserstein distances from each gene to the target gene. pos_rs: The expression positive rate vector related to the gene list.
- Return type:
gene_ids
- spateo.svg.get_svg.cal_wass_dist_bs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, rank_p: bool = True, bin_num: int = 100, larger_or_small: str = 'larger') Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData] [source]¶
Computing Wasserstein distance for an AnnData to identify spatially variable genes.
- Parameters:
- adata
AnnData object.
- bin_size
Bin size for mergeing cells.
- bin_layer
Data in this layer will be binned according to the spatial information.
- cell_distance_method
The method for calculating distance between two cells, either geodesic or euclidean.
- distance_layer
The data of this layer would be used to calculate distance
- n_neighbors
The number of neighbors for calculating spatial distance.
- numItermax
The maximum number of iterations before stopping the optimization algorithm if it has not converged.
- gene_set
Gene set that will be used to compute Wasserstein distances, default is for all genes.
- target
The target gene expression distribution or the target gene name.
- processes
The process number for parallel computing
- bootstrap
Bootstrap number for permutation to calculate p-value
- min_dis_cutoff
Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
- max_dis_cutoff
Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
- rank_p
Whether to calculate p value in ranking manner.
- bin_num
Classy genes into bin_num groups according to mean Wasserstein distance from bootstrap.
- larger_or_small
In what direction to get p value. Larger means the right tail area of the null distribution.
- Returns:
A dataframe storing information related to the Wasserstein distances. bin_scale_adata: Binned AnnData object
- Return type:
w_df
- spateo.svg.get_svg.cal_wass_dis_nobs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData] [source]¶
Computing Wasserstein distance for a AnnData to identify spatially variable genes.
- Parameters:
- adata
AnnData object
- bin_size
bin size for mergeing cells.
- bin_layer
data in this layer will be binned according to spatial information.
- cell_distance_method
the method for calculating distance of two cells. geodesic or euclidean
- distance_layer
the data of this layer would be used to calculate distance
- n_neighbors
the number of neighbors for calculation geodesic distance
- numItermax
The maximum number of iterations before stopping the optimization algorithm if it has not converged
- gene_set
Gene set for computing, default is for all genes.
- target
the target distribution or the target gene name.
- min_dis_cutoff
Cells/Bins whose min distance to 30 neighbors are larger than this cutoff would be filtered.
- max_dis_cutoff
Cells/Bins whose max distance to 30 neighbors are larger than this cutoff would be filtered.
- Returns:
A dataframe storing information related to the Wasserstein distances.
- Return type:
w_df
- spateo.svg.get_svg.bin_scale_adata_get_distance(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, n_neighbors: int = 30) Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.csr_matrix] [source]¶
Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean).
- Parameters:
- adata
AnnData object.
- bin_size
Bin size for mergeing cells.
- bin_layer
Data in this layer will be binned according to the spatial information.
- distance_layer
The data of this layer would be used to calculate distance
- cell_distance_method
The method for calculating distance between two cells, either geodesic or euclidean.
- min_dis_cutoff
Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
- max_dis_cutoff
Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
- n_neighbors
The number of nearest neighbors that will be considered for calculating spatial distance.
- Returns:
Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors.
- Return type:
bin_scale_adata
- spateo.svg.get_svg.cal_wass_dis_target_on_genes(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', n_neighbors: int = 30, numItermax: int = 1000000, target_genes: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 0, top_n: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) Tuple[dict, spateo.svg.utils.AnnData] [source]¶
Find genes in gene_set that have similar distribution to each target_genes.
- Parameters:
- adata
AnnData object.
- bin_size
Bin size for mergeing cells.
- bin_layer
Data in this layer will be binned according to the spatial information.
- distance_layer
The data of this layer would be used to calculate distance
- cell_distance_method
The method for calculating distance between two cells, either geodesic or euclidean.
- n_neighbors
The number of neighbors for calculating spatial distance.
- numItermax
The maximum number of iterations before stopping the optimization algorithm if it has not converged.
- target_genes
The list of the target genes.
- gene_set
Gene set that will be used to compute Wasserstein distances, default is for all genes.
- processes
The process number for parallel computing.
- bootstrap
Number of bootstraps.
- top_n
Number of top genes to select.
- min_dis_cutoff
Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
- max_dis_cutoff
Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
- Returns:
- The dictionary of the Wasserstein distance. Each key corresponds to a gene name while the corresponding
value the pandas DataFrame of the Wasserstein distance related information.
bin_scale_adata: binned, scaled anndata object.
- Return type:
w_genes