spateo.svg.get_svg#

Module Contents#

Functions#

svg_iden_reg(→ spateo.svg.utils.pd.DataFrame)

Identifying SVGs using a spatial uniform distribution as the reference.

get_std_wasserstein(→ spateo.svg.utils.np.ndarray)

Calculate the standard deviation of the Wasserstein distance.

smoothing_and_sampling(...)

Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.

smoothing(→ spateo.svg.utils.AnnData)

Smoothing the gene expression using a graph neural network.

downsampling(→ spateo.svg.utils.AnnData)

Downsampling the cells from the adata object.

cal_wass_dis_for_genes(→ Tuple[spateo.svg.utils.List, ...)

Calculate Wasserstein distances for a list of genes.

cal_wass_dist_bs(...)

Computing Wasserstein distance for an AnnData to identify spatially variable genes.

cal_wass_dis_nobs(...)

Computing Wasserstein distance for a AnnData to identify spatially variable genes.

bin_scale_adata_get_distance(...)

Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified

cal_wass_dis_target_on_genes(→ Tuple[dict, ...)

Find genes in gene_set that have similar distribution to each target_genes.

spateo.svg.get_svg.svg_iden_reg(adata: spateo.svg.utils.AnnData, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 8, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 500, max_dis_cutoff: float = 1000, n_neighbors_for_std: int = 30) spateo.svg.utils.pd.DataFrame[source]#

Identifying SVGs using a spatial uniform distribution as the reference.

Parameters:
adata

AnnData object

bin_layer

Data in this layer will be binned according to the spatial information.

cell_distance_method

The method for calculating distance between two cells, either geodesic or euclidean.

distance_layer

Data in this layer will be used to calculate the spatial distance.

n_neighbors

The number of nearest neighbors that will be considered for calculating spatial distance.

numItermax

The maximum number of iterations before stopping the optimization algorithm if it has not converged.

gene_set

Gene set that will be used to identified spatial variable genes, default is for all genes.

target

The target gene expression distribution or the target gene name.

min_dis_cutoff

Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.

max_dis_cutoff

Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

n_neighbors_for_std

Number of neighbors that will be used to calculate the standard deviation of the Wasserstein distances.

Returns:

a pandas data frame that stores the information of spatial variable genes results. It includes the following columns:

”raw_pos_rate”: The raw positive ratio (the fraction of cells that have non-zero expression ) of the gene

across all cells.

”Wasserstein_distance”: The computed Wasserstein distance of each gene to the reference uniform

distribution.

”expectation_reg”: The predicted Wasserstein distance after fitting a loess regression using the gene

positive rate as the predictor.

”std”: Standard deviation of the Wasserstein distance. “std_reg”: The predicted standard deviation of the Wasserstein distance after fitting a loess regression

using the gene positive rate as the predictor.

”zscore”: The z-score of the Wasserstein distance. “pvalue”: The p-value based on the z-score. “adj_pvalue”: Adjusted p-value.

In addition, the input adata object has updated with the following information:

adata.var[“raw_pos_rate”]: The positive rate of each gene.

Return type:

w0

spateo.svg.get_svg.get_std_wasserstein(l: spateo.svg.utils.Union[spateo.svg.utils.np.ndarray, spateo.svg.utils.pd.DataFrame], n_neighbors: int = 30) spateo.svg.utils.np.ndarray[source]#

Calculate the standard deviation of the Wasserstein distance.

Parameters:
l

The vector of the Wasserstein distance.

n_neighbors

number of nearest neighbors.

Returns:

The standard deviation of the Wasserstein distance.

Return type:

std

spateo.svg.get_svg.smoothing_and_sampling(adata: spateo.svg.utils.AnnData, smoothing: bool = True, downsampling: int = 400, device: str = 'cpu') Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.AnnData][source]#

Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.

Parameters:
adata

The input AnnData object.

smoothing

Whether to do smooth the gene expression.

downsampling

The number of cells to down sample.

device

The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

The adata after smoothing and downsampling. adata_smoothed: The adata after smoothing but not downsampling.

Return type:

adata

spateo.svg.get_svg.smoothing(adata: spateo.svg.utils.AnnData, device: str = 'cpu') spateo.svg.utils.AnnData[source]#

Smoothing the gene expression using a graph neural network.

Parameters:
adata

The input AnnData object.

device

The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

imputation result

Return type:

adata_smoothed

spateo.svg.get_svg.downsampling(adata: spateo.svg.utils.AnnData, downsampling: int = 400) spateo.svg.utils.AnnData[source]#

Downsampling the cells from the adata object.

Parameters:
adata

The input AnnData object.

downsampling

The number of cells to down sample.

Returns:

adata after the downsampling.

Return type:

adata

spateo.svg.get_svg.cal_wass_dis_for_genes(inp0: Tuple[spateo.svg.utils.csr_matrix, spateo.svg.utils.AnnData], inp1: Tuple[int, spateo.svg.utils.List, spateo.svg.utils.np.ndarray, int]) Tuple[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, spateo.svg.utils.np.ndarray][source]#

Calculate Wasserstein distances for a list of genes.

Parameters:
inp0

A tuple of the sparse matrix of spatial distance between nearest neighbors, and the adata object.

inp1

A tuple of the seed, the list of genes, the target gene expression vector (need to be normalized to have a sum of 1), and the maximal number of iterations.

Returns:

The gene list that is used to calculate the Wasserstein distribution. ws: The Wasserstein distances from each gene to the target gene. pos_rs: The expression positive rate vector related to the gene list.

Return type:

gene_ids

spateo.svg.get_svg.cal_wass_dist_bs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, rank_p: bool = True, bin_num: int = 100, larger_or_small: str = 'larger') Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]#

Computing Wasserstein distance for an AnnData to identify spatially variable genes.

Parameters:
adata

AnnData object.

bin_size

Bin size for mergeing cells.

bin_layer

Data in this layer will be binned according to the spatial information.

cell_distance_method

The method for calculating distance between two cells, either geodesic or euclidean.

distance_layer

The data of this layer would be used to calculate distance

n_neighbors

The number of neighbors for calculating spatial distance.

numItermax

The maximum number of iterations before stopping the optimization algorithm if it has not converged.

gene_set

Gene set that will be used to compute Wasserstein distances, default is for all genes.

target

The target gene expression distribution or the target gene name.

processes

The process number for parallel computing

bootstrap

Bootstrap number for permutation to calculate p-value

min_dis_cutoff

Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.

max_dis_cutoff

Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

rank_p

Whether to calculate p value in ranking manner.

bin_num

Classy genes into bin_num groups according to mean Wasserstein distance from bootstrap.

larger_or_small

In what direction to get p value. Larger means the right tail area of the null distribution.

Returns:

A dataframe storing information related to the Wasserstein distances. bin_scale_adata: Binned AnnData object

Return type:

w_df

spateo.svg.get_svg.cal_wass_dis_nobs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]#

Computing Wasserstein distance for a AnnData to identify spatially variable genes.

Parameters:
adata

AnnData object

bin_size

bin size for mergeing cells.

bin_layer

data in this layer will be binned according to spatial information.

cell_distance_method

the method for calculating distance of two cells. geodesic or euclidean

distance_layer

the data of this layer would be used to calculate distance

n_neighbors

the number of neighbors for calculation geodesic distance

numItermax

The maximum number of iterations before stopping the optimization algorithm if it has not converged

gene_set

Gene set for computing, default is for all genes.

target

the target distribution or the target gene name.

min_dis_cutoff

Cells/Bins whose min distance to 30 neighbors are larger than this cutoff would be filtered.

max_dis_cutoff

Cells/Bins whose max distance to 30 neighbors are larger than this cutoff would be filtered.

Returns:

A dataframe storing information related to the Wasserstein distances.

Return type:

w_df

spateo.svg.get_svg.bin_scale_adata_get_distance(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, n_neighbors: int = 30) Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.csr_matrix][source]#

Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean).

Parameters:
adata

AnnData object.

bin_size

Bin size for mergeing cells.

bin_layer

Data in this layer will be binned according to the spatial information.

distance_layer

The data of this layer would be used to calculate distance

cell_distance_method

The method for calculating distance between two cells, either geodesic or euclidean.

min_dis_cutoff

Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.

max_dis_cutoff

Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

n_neighbors

The number of nearest neighbors that will be considered for calculating spatial distance.

Returns:

Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors.

Return type:

bin_scale_adata

spateo.svg.get_svg.cal_wass_dis_target_on_genes(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', n_neighbors: int = 30, numItermax: int = 1000000, target_genes: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 0, top_n: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) Tuple[dict, spateo.svg.utils.AnnData][source]#

Find genes in gene_set that have similar distribution to each target_genes.

Parameters:
adata

AnnData object.

bin_size

Bin size for mergeing cells.

bin_layer

Data in this layer will be binned according to the spatial information.

distance_layer

The data of this layer would be used to calculate distance

cell_distance_method

The method for calculating distance between two cells, either geodesic or euclidean.

n_neighbors

The number of neighbors for calculating spatial distance.

numItermax

The maximum number of iterations before stopping the optimization algorithm if it has not converged.

target_genes

The list of the target genes.

gene_set

Gene set that will be used to compute Wasserstein distances, default is for all genes.

processes

The process number for parallel computing.

bootstrap

Number of bootstraps.

top_n

Number of top genes to select.

min_dis_cutoff

Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.

max_dis_cutoff

Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

Returns:

The dictionary of the Wasserstein distance. Each key corresponds to a gene name while the corresponding

value the pandas DataFrame of the Wasserstein distance related information.

bin_scale_adata: binned, scaled anndata object.

Return type:

w_genes