spateo.svg.get_svg¶

Functions¶

`svg_iden_reg`(→ spateo.svg.utils.pd.DataFrame)	Identifying SVGs using a spatial uniform distribution as the reference.
`get_std_wasserstein`(→ spateo.svg.utils.np.ndarray)	Calculate the standard deviation of the Wasserstein distance.
`smoothing_and_sampling`(...)	Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.
`smoothing`(→ spateo.svg.utils.AnnData)	Smoothing the gene expression using a graph neural network.
`downsampling`(→ spateo.svg.utils.AnnData)	Downsampling the cells from the adata object.
`cal_wass_dis_for_genes`(→ Tuple[spateo.svg.utils.List, ...)	Calculate Wasserstein distances for a list of genes.
`cal_wass_dist_bs`(...)	Computing Wasserstein distance for an AnnData to identify spatially variable genes.
`cal_wass_dis_nobs`(...)	Computing Wasserstein distance for a AnnData to identify spatially variable genes.
`bin_scale_adata_get_distance`(...)	Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified
`cal_wass_dis_target_on_genes`(→ Tuple[dict, ...)	Find genes in gene_set that have similar distribution to each target_genes.

Module Contents¶

spateo.svg.get_svg.svg_iden_reg(adata: spateo.svg.utils.AnnData, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 8, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 500, max_dis_cutoff: float = 1000, n_neighbors_for_std: int = 30) → spateo.svg.utils.pd.DataFrame[source]¶

Identifying SVGs using a spatial uniform distribution as the reference.

Parameters:

adata: AnnData object
bin_layer: Data in this layer will be binned according to the spatial information.
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
distance_layer: Data in this layer will be used to calculate the spatial distance.
n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
gene_set: Gene set that will be used to identified spatial variable genes, default is for all genes.
target: The target gene expression distribution or the target gene name.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
n_neighbors_for_std: Number of neighbors that will be used to calculate the standard deviation of the Wasserstein distances.

Returns:

a pandas data frame that stores the information of spatial variable genes results. It includes the following columns:

”raw_pos_rate”: The raw positive ratio (the fraction of cells that have non-zero expression ) of the gene
across all cells.

”Wasserstein_distance”: The computed Wasserstein distance of each gene to the reference uniform
distribution.

”expectation_reg”: The predicted Wasserstein distance after fitting a loess regression using the gene
positive rate as the predictor.

”std”: Standard deviation of the Wasserstein distance. “std_reg”: The predicted standard deviation of the Wasserstein distance after fitting a loess regression

using the gene positive rate as the predictor.

”zscore”: The z-score of the Wasserstein distance. “pvalue”: The p-value based on the z-score. “adj_pvalue”: Adjusted p-value.

In addition, the input adata object has updated with the following information:: adata.var[“raw_pos_rate”]: The positive rate of each gene.

Return type:

spateo.svg.get_svg.get_std_wasserstein(l: spateo.svg.utils.Union[spateo.svg.utils.np.ndarray, spateo.svg.utils.pd.DataFrame], n_neighbors: int = 30) → spateo.svg.utils.np.ndarray[source]¶

Calculate the standard deviation of the Wasserstein distance.

Parameters:

l: The vector of the Wasserstein distance.
n_neighbors: number of nearest neighbors.

Returns:

The standard deviation of the Wasserstein distance.

Return type:

std

spateo.svg.get_svg.smoothing_and_sampling(adata: spateo.svg.utils.AnnData, smoothing: bool = True, downsampling: int = 400, device: str = 'cpu') → Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.AnnData][source]¶

Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.

Parameters:

adata: The input AnnData object.
smoothing: Whether to do smooth the gene expression.
downsampling: The number of cells to down sample.
device: The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

The adata after smoothing and downsampling. adata_smoothed: The adata after smoothing but not downsampling.

Return type:

adata

spateo.svg.get_svg.smoothing(adata: spateo.svg.utils.AnnData, device: str = 'cpu') → spateo.svg.utils.AnnData[source]¶

Smoothing the gene expression using a graph neural network.

Parameters:

adata: The input AnnData object.
device: The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

imputation result

Return type:

adata_smoothed

spateo.svg.get_svg.downsampling(adata: spateo.svg.utils.AnnData, downsampling: int = 400) → spateo.svg.utils.AnnData[source]¶

Downsampling the cells from the adata object.

Parameters:

adata: The input AnnData object.
downsampling: The number of cells to down sample.

Returns:

adata after the downsampling.

Return type:

adata

spateo.svg.get_svg.cal_wass_dis_for_genes(inp0: Tuple[spateo.svg.utils.csr_matrix, spateo.svg.utils.AnnData], inp1: Tuple[int, spateo.svg.utils.List, spateo.svg.utils.np.ndarray, int]) → Tuple[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, spateo.svg.utils.np.ndarray][source]¶

Calculate Wasserstein distances for a list of genes.

Parameters:

inp0: A tuple of the sparse matrix of spatial distance between nearest neighbors, and the adata object.
inp1: A tuple of the seed, the list of genes, the target gene expression vector (need to be normalized to have a sum of 1), and the maximal number of iterations.

Returns:

The gene list that is used to calculate the Wasserstein distribution. ws: The Wasserstein distances from each gene to the target gene. pos_rs: The expression positive rate vector related to the gene list.

Return type:

gene_ids

spateo.svg.get_svg.cal_wass_dist_bs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, rank_p: bool = True, bin_num: int = 100, larger_or_small: str = 'larger') → Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]¶

Computing Wasserstein distance for an AnnData to identify spatially variable genes.

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
distance_layer: The data of this layer would be used to calculate distance
n_neighbors: The number of neighbors for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes.
target: The target gene expression distribution or the target gene name.
processes: The process number for parallel computing
bootstrap: Bootstrap number for permutation to calculate p-value
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
rank_p: Whether to calculate p value in ranking manner.
bin_num: Classy genes into bin_num groups according to mean Wasserstein distance from bootstrap.
larger_or_small: In what direction to get p value. Larger means the right tail area of the null distribution.

Returns:

A dataframe storing information related to the Wasserstein distances. bin_scale_adata: Binned AnnData object

Return type:

w_df

spateo.svg.get_svg.cal_wass_dis_nobs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) → Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]¶

Computing Wasserstein distance for a AnnData to identify spatially variable genes.

Parameters:

adata: AnnData object
bin_size: bin size for mergeing cells.
bin_layer: data in this layer will be binned according to spatial information.
cell_distance_method: the method for calculating distance of two cells. geodesic or euclidean
distance_layer: the data of this layer would be used to calculate distance
n_neighbors: the number of neighbors for calculation geodesic distance
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged
gene_set: Gene set for computing, default is for all genes.
target: the target distribution or the target gene name.
min_dis_cutoff: Cells/Bins whose min distance to 30 neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30 neighbors are larger than this cutoff would be filtered.

Returns:

A dataframe storing information related to the Wasserstein distances.

Return type:

w_df

spateo.svg.get_svg.bin_scale_adata_get_distance(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, n_neighbors: int = 30) → Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.csr_matrix][source]¶

Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean).

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
distance_layer: The data of this layer would be used to calculate distance
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance.

Returns:

Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors.

Return type:

bin_scale_adata

spateo.svg.get_svg.cal_wass_dis_target_on_genes(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', n_neighbors: int = 30, numItermax: int = 1000000, target_genes: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 0, top_n: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) → Tuple[dict, spateo.svg.utils.AnnData][source]¶

Find genes in gene_set that have similar distribution to each target_genes.

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
distance_layer: The data of this layer would be used to calculate distance
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
n_neighbors: The number of neighbors for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
target_genes: The list of the target genes.
gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes.
processes: The process number for parallel computing.
bootstrap: Number of bootstraps.
top_n: Number of top genes to select.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

Returns:

The dictionary of the Wasserstein distance. Each key corresponds to a gene name while the corresponding: value the pandas DataFrame of the Wasserstein distance related information.

bin_scale_adata: binned, scaled anndata object.

Return type:

w_genes