spateo.svg¶

Submodules¶

Attributes¶

`lm`
`lm`
`lm`

Functions¶

`svg_iden_reg`(→ spateo.svg.utils.pd.DataFrame)	Identifying SVGs using a spatial uniform distribution as the reference.
`get_std_wasserstein`(→ spateo.svg.utils.np.ndarray)	Calculate the standard deviation of the Wasserstein distance.
`smoothing_and_sampling`(...)	Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.
`smoothing`(→ spateo.svg.utils.AnnData)	Smoothing the gene expression using a graph neural network.
`downsampling`(→ spateo.svg.utils.AnnData)	Downsampling the cells from the adata object.
`cal_wass_dis_for_genes`(→ Tuple[spateo.svg.utils.List, ...)	Calculate Wasserstein distances for a list of genes.
`cal_wass_dist_bs`(...)	Computing Wasserstein distance for an AnnData to identify spatially variable genes.
`cal_wass_dis_nobs`(...)	Computing Wasserstein distance for a AnnData to identify spatially variable genes.
`bin_scale_adata_get_distance`(...)	Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified
`cal_wass_dis_target_on_genes`(→ Tuple[dict, ...)	Find genes in gene_set that have similar distribution to each target_genes.
`bin_adata`(→ anndata.AnnData)	Aggregate cell-based adata by bin size. Cells within a bin would be
`shuffle_adata`(adata[, seed, replace])	Shuffle X in anndata object randomly.
`filter_adata_by_pos_ratio`(adata, pos_ratio)	Filter out cells with positive ratio lower than a setting value.
`get_genes_by_pos_ratio`(→ list)	Get genes that have postive ratio higher than a setting value.
`add_pos_ratio_to_adata`(adata[, layer, var_name])	Calculate positive ratios for all genes, and return to AnnData.
`cal_geodesic_distance`(→ anndata.AnnData)	Calculate geodesic distance between any pair of genes.
`cal_euclidean_distance`(→ anndata.AnnData)
`scale_to`(→ anndata.AnnData)	Scale the X array in AnnData.
`cal_wass_dis`(M, a[, b, numItermax])	Computing Wasserstein distance.
`cal_rank_p`(genes, ws, w_df[, bin_num])
`loess_reg`(→ anndata.AnnData)
`bin_scale_adata_get_distance`(...)	Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified
`cal_gro_wass_bs`(adata1, adata2[, bin_size1, ...])
`cal_gw_dis_on_genes`(inp1, inp2)
`bin_adata`(→ anndata.AnnData)	Aggregate cell-based adata by bin size. Cells within a bin would be
`shuffle_adata`(adata[, seed, replace])	Shuffle X in anndata object randomly.
`filter_adata_by_pos_ratio`(adata, pos_ratio)	Filter out cells with positive ratio lower than a setting value.
`get_genes_by_pos_ratio`(→ list)	Get genes that have postive ratio higher than a setting value.
`add_pos_ratio_to_adata`(adata[, layer, var_name])	Calculate positive ratios for all genes, and return to AnnData.
`cal_geodesic_distance`(→ anndata.AnnData)	Calculate geodesic distance between any pair of genes.
`cal_euclidean_distance`(→ anndata.AnnData)
`scale_to`(→ anndata.AnnData)	Scale the X array in AnnData.
`cal_wass_dis`(M, a[, b, numItermax])	Computing Wasserstein distance.
`cal_rank_p`(genes, ws, w_df[, bin_num])
`loess_reg`(→ anndata.AnnData)
`bin_adata`(→ anndata.AnnData)	Aggregate cell-based adata by bin size. Cells within a bin would be
`shuffle_adata`(adata[, seed, replace])	Shuffle X in anndata object randomly.
`filter_adata_by_pos_ratio`(adata, pos_ratio)	Filter out cells with positive ratio lower than a setting value.
`get_genes_by_pos_ratio`(→ list)	Get genes that have postive ratio higher than a setting value.
`add_pos_ratio_to_adata`(adata[, layer, var_name])	Calculate positive ratios for all genes, and return to AnnData.
`cal_geodesic_distance`(→ anndata.AnnData)	Calculate geodesic distance between any pair of genes.
`cal_euclidean_distance`(→ anndata.AnnData)
`scale_to`(→ anndata.AnnData)	Scale the X array in AnnData.
`cal_wass_dis`(M, a[, b, numItermax])	Computing Wasserstein distance.
`cal_rank_p`(genes, ws, w_df[, bin_num])
`loess_reg`(→ anndata.AnnData)

Package Contents¶

spateo.svg.lm¶

spateo.svg.svg_iden_reg(adata: spateo.svg.utils.AnnData, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 8, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 500, max_dis_cutoff: float = 1000, n_neighbors_for_std: int = 30) → spateo.svg.utils.pd.DataFrame[source]¶

Identifying SVGs using a spatial uniform distribution as the reference.

Parameters:

adata: AnnData object
bin_layer: Data in this layer will be binned according to the spatial information.
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
distance_layer: Data in this layer will be used to calculate the spatial distance.
n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
gene_set: Gene set that will be used to identified spatial variable genes, default is for all genes.
target: The target gene expression distribution or the target gene name.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
n_neighbors_for_std: Number of neighbors that will be used to calculate the standard deviation of the Wasserstein distances.

Returns:

a pandas data frame that stores the information of spatial variable genes results. It includes the following columns:

”raw_pos_rate”: The raw positive ratio (the fraction of cells that have non-zero expression ) of the gene
across all cells.

”Wasserstein_distance”: The computed Wasserstein distance of each gene to the reference uniform
distribution.

”expectation_reg”: The predicted Wasserstein distance after fitting a loess regression using the gene
positive rate as the predictor.

”std”: Standard deviation of the Wasserstein distance. “std_reg”: The predicted standard deviation of the Wasserstein distance after fitting a loess regression

using the gene positive rate as the predictor.

”zscore”: The z-score of the Wasserstein distance. “pvalue”: The p-value based on the z-score. “adj_pvalue”: Adjusted p-value.

In addition, the input adata object has updated with the following information:: adata.var[“raw_pos_rate”]: The positive rate of each gene.

Return type:

spateo.svg.get_std_wasserstein(l: spateo.svg.utils.Union[spateo.svg.utils.np.ndarray, spateo.svg.utils.pd.DataFrame], n_neighbors: int = 30) → spateo.svg.utils.np.ndarray[source]¶

Calculate the standard deviation of the Wasserstein distance.

Parameters:

l: The vector of the Wasserstein distance.
n_neighbors: number of nearest neighbors.

Returns:

The standard deviation of the Wasserstein distance.

Return type:

std

spateo.svg.smoothing_and_sampling(adata: spateo.svg.utils.AnnData, smoothing: bool = True, downsampling: int = 400, device: str = 'cpu') → Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.AnnData][source]¶

Smoothing the gene expression using a graph neural network and downsampling the cells from the adata object.

Parameters:

adata: The input AnnData object.
smoothing: Whether to do smooth the gene expression.
downsampling: The number of cells to down sample.
device: The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

The adata after smoothing and downsampling. adata_smoothed: The adata after smoothing but not downsampling.

Return type:

adata

spateo.svg.smoothing(adata: spateo.svg.utils.AnnData, device: str = 'cpu') → spateo.svg.utils.AnnData[source]¶

Smoothing the gene expression using a graph neural network.

Parameters:

adata: The input AnnData object.
device: The device to run the deep learning smoothing model. Can be either “cpu” or proper “cuda” related devices, such as: “cuda:0”.

Returns:

imputation result

Return type:

adata_smoothed

spateo.svg.downsampling(adata: spateo.svg.utils.AnnData, downsampling: int = 400) → spateo.svg.utils.AnnData[source]¶

Downsampling the cells from the adata object.

Parameters:

adata: The input AnnData object.
downsampling: The number of cells to down sample.

Returns:

adata after the downsampling.

Return type:

adata

spateo.svg.cal_wass_dis_for_genes(inp0: Tuple[spateo.svg.utils.csr_matrix, spateo.svg.utils.AnnData], inp1: Tuple[int, spateo.svg.utils.List, spateo.svg.utils.np.ndarray, int]) → Tuple[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, spateo.svg.utils.np.ndarray][source]¶

Calculate Wasserstein distances for a list of genes.

Parameters:

inp0: A tuple of the sparse matrix of spatial distance between nearest neighbors, and the adata object.
inp1: A tuple of the seed, the list of genes, the target gene expression vector (need to be normalized to have a sum of 1), and the maximal number of iterations.

Returns:

The gene list that is used to calculate the Wasserstein distribution. ws: The Wasserstein distances from each gene to the target gene. pos_rs: The expression positive rate vector related to the gene list.

Return type:

gene_ids

spateo.svg.cal_wass_dist_bs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, rank_p: bool = True, bin_num: int = 100, larger_or_small: str = 'larger') → Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]¶

Computing Wasserstein distance for an AnnData to identify spatially variable genes.

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
distance_layer: The data of this layer would be used to calculate distance
n_neighbors: The number of neighbors for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes.
target: The target gene expression distribution or the target gene name.
processes: The process number for parallel computing
bootstrap: Bootstrap number for permutation to calculate p-value
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
rank_p: Whether to calculate p value in ranking manner.
bin_num: Classy genes into bin_num groups according to mean Wasserstein distance from bootstrap.
larger_or_small: In what direction to get p value. Larger means the right tail area of the null distribution.

Returns:

A dataframe storing information related to the Wasserstein distances. bin_scale_adata: Binned AnnData object

Return type:

w_df

spateo.svg.cal_wass_dis_nobs(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, numItermax: int = 1000000, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, target: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray, str] = [], min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) → Tuple[spateo.svg.utils.pd.DataFrame, spateo.svg.utils.AnnData][source]¶

Computing Wasserstein distance for a AnnData to identify spatially variable genes.

Parameters:

adata: AnnData object
bin_size: bin size for mergeing cells.
bin_layer: data in this layer will be binned according to spatial information.
cell_distance_method: the method for calculating distance of two cells. geodesic or euclidean
distance_layer: the data of this layer would be used to calculate distance
n_neighbors: the number of neighbors for calculation geodesic distance
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged
gene_set: Gene set for computing, default is for all genes.
target: the target distribution or the target gene name.
min_dis_cutoff: Cells/Bins whose min distance to 30 neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30 neighbors are larger than this cutoff would be filtered.

Returns:

A dataframe storing information related to the Wasserstein distances.

Return type:

w_df

spateo.svg.bin_scale_adata_get_distance(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, n_neighbors: int = 30) → Tuple[spateo.svg.utils.AnnData, spateo.svg.utils.csr_matrix][source]¶

Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean).

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
distance_layer: The data of this layer would be used to calculate distance
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance.

Returns:

Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors.

Return type:

bin_scale_adata

spateo.svg.cal_wass_dis_target_on_genes(adata: spateo.svg.utils.AnnData, bin_size: int = 1, bin_layer: str = 'spatial', distance_layer: str = 'spatial', cell_distance_method: str = 'geodesic', n_neighbors: int = 30, numItermax: int = 1000000, target_genes: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 0, top_n: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0) → Tuple[dict, spateo.svg.utils.AnnData][source]¶

Find genes in gene_set that have similar distribution to each target_genes.

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
distance_layer: The data of this layer would be used to calculate distance
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
n_neighbors: The number of neighbors for calculating spatial distance.
numItermax: The maximum number of iterations before stopping the optimization algorithm if it has not converged.
target_genes: The list of the target genes.
gene_set: Gene set that will be used to compute Wasserstein distances, default is for all genes.
processes: The process number for parallel computing.
bootstrap: Number of bootstraps.
top_n: Number of top genes to select.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.

Returns:

The dictionary of the Wasserstein distance. Each key corresponds to a gene name while the corresponding: value the pandas DataFrame of the Wasserstein distance related information.

bin_scale_adata: binned, scaled anndata object.

Return type:

w_genes

spateo.svg.bin_adata(adata: anndata.AnnData, bin_size: int = 1, layer: str = 'spatial') → anndata.AnnData[source]¶

Aggregate cell-based adata by bin size. Cells within a bin would be aggregated together as one cell.

Parameters:

adata: the input adata.
bin_size: the size of square to bin adata.

Returns:

Aggreated adata.

spateo.svg.shuffle_adata(adata: anndata.AnnData, seed: int = 0, replace: bool = False)[source]¶

Shuffle X in anndata object randomly.

Parameters:

adata: AnnData object
seed: seed for randomly shuffling

Returns:

AnnData object

Return type:

adata

spateo.svg.filter_adata_by_pos_ratio(adata, pos_ratio)[source]¶

Filter out cells with positive ratio lower than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: Cells with positive ratio lower than this value would be discarded.

Returns:

AnnData object.

spateo.svg.get_genes_by_pos_ratio(adata: anndata.AnnData, pos_ratio: float = 0.1) → list[source]¶

Get genes that have postive ratio higher than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: The threshold of positive ratio.

Returns:

Gene list. AnnData object.

spateo.svg.add_pos_ratio_to_adata(adata: anndata.AnnData, layer: str = None, var_name: str = 'raw_pos_rate')[source]¶

Calculate positive ratios for all genes, and return to AnnData. We defind positive ratio of a gene as the percent of cells express this gene.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used. If not given, we use data in X.
var_name: The var name for storing positive ratios.

Returns:

None

spateo.svg.cal_geodesic_distance(adata: anndata.AnnData, layer: str = 'spatial', n_neighbors: int = 30, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 4.0) → anndata.AnnData[source]¶

Calculate geodesic distance between any pair of genes.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used.
n_neighbors: The number of neighbor to connect a cell to its nearest neighbors.
min_dis_cutoff: Remove cells with minimal distance with its neighbors larger than this value. These cells are like islated cells.
max_dis_cutoff: Remove cells with maximal distance with its neighbors larger than this value. These cells are like sparse cells.

Returns:

AnnData object.

spateo.svg.cal_euclidean_distance(adata: anndata.AnnData, layer: str = 'spatial', min_dis_cutoff: float = np.inf, max_dis_cutoff: float = np.inf) → anndata.AnnData[source]¶

spateo.svg.scale_to(adata: anndata.AnnData, to_median: bool = True, N: int = 10000) → anndata.AnnData[source]¶

Scale the X array in AnnData.

Parameters:

adata: AnnData object.
to_median: Whether scale to the median of cell total expressions.
N: if to_median is False, scale data to this value.

Returns:

AnnData object.

spateo.svg.cal_wass_dis(M, a, b=[], numItermax=1000000)[source]¶

Computing Wasserstein distance.

Parameters:

M: (ns,nt) array-like, float – Loss matrix (c-order array in numpy with type float64)
a: (ns,) array-like, float – Source histogram (uniform weight if empty list)
b: (nt,) array-like, float – Target histogram (uniform weight if empty list)

Returns:

(float, array-like) – Optimal transportation loss for the given parameters

Return type:

spateo.svg.cal_rank_p(genes, ws, w_df, bin_num=100)[source]¶

spateo.svg.loess_reg(adata: anndata.AnnData, layers: str = 'X') → anndata.AnnData[source]¶

Bin (based on spatial information), scale adata object and calculate the distance matrix based on the specified method (either geodesic or euclidean).

Parameters:

adata: AnnData object.
bin_size: Bin size for mergeing cells.
bin_layer: Data in this layer will be binned according to the spatial information.
distance_layer: The data of this layer would be used to calculate distance
cell_distance_method: The method for calculating distance between two cells, either geodesic or euclidean.
min_dis_cutoff: Cells/Bins whose min distance to 30th neighbors are larger than this cutoff would be filtered.
max_dis_cutoff: Cells/Bins whose max distance to 30th neighbors are larger than this cutoff would be filtered.
n_neighbors: The number of nearest neighbors that will be considered for calculating spatial distance.

Returns:

Bin, scaled anndata object. M: The scipy sparse matrix of the calculated distance of nearest neighbors.

Return type:

bin_scale_adata

spateo.svg.cal_gro_wass_bs(adata1: spateo.svg.utils.AnnData, adata2: spateo.svg.utils.AnnData, bin_size1: int = 1, bin_size2: int = 1, bin_layer: str = 'spatial', cell_distance_method: str = 'geodesic', distance_layer: str = 'spatial', n_neighbors: int = 30, gene_set: spateo.svg.utils.Union[spateo.svg.utils.List, spateo.svg.utils.np.ndarray] = None, processes: int = 1, bootstrap: int = 100, min_dis_cutoff: float = 2.0, max_dis_cutoff: float = 6.0, larger_or_small: str = 'larger')[source]¶

spateo.svg.cal_gw_dis_on_genes(inp1, inp2)[source]¶

spateo.svg.lm¶

spateo.svg.bin_adata(adata: anndata.AnnData, bin_size: int = 1, layer: str = 'spatial') → anndata.AnnData[source]¶

Aggregate cell-based adata by bin size. Cells within a bin would be aggregated together as one cell.

Parameters:

adata: the input adata.
bin_size: the size of square to bin adata.

Returns:

Aggreated adata.

spateo.svg.shuffle_adata(adata: anndata.AnnData, seed: int = 0, replace: bool = False)[source]¶

Shuffle X in anndata object randomly.

Parameters:

adata: AnnData object
seed: seed for randomly shuffling

Returns:

AnnData object

Return type:

adata

spateo.svg.filter_adata_by_pos_ratio(adata, pos_ratio)[source]¶

Filter out cells with positive ratio lower than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: Cells with positive ratio lower than this value would be discarded.

Returns:

AnnData object.

spateo.svg.get_genes_by_pos_ratio(adata: anndata.AnnData, pos_ratio: float = 0.1) → list[source]¶

Get genes that have postive ratio higher than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: The threshold of positive ratio.

Returns:

Gene list. AnnData object.

spateo.svg.add_pos_ratio_to_adata(adata: anndata.AnnData, layer: str = None, var_name: str = 'raw_pos_rate')[source]¶

Calculate positive ratios for all genes, and return to AnnData. We defind positive ratio of a gene as the percent of cells express this gene.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used. If not given, we use data in X.
var_name: The var name for storing positive ratios.

Returns:

None

Calculate geodesic distance between any pair of genes.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used.
n_neighbors: The number of neighbor to connect a cell to its nearest neighbors.
min_dis_cutoff: Remove cells with minimal distance with its neighbors larger than this value. These cells are like islated cells.
max_dis_cutoff: Remove cells with maximal distance with its neighbors larger than this value. These cells are like sparse cells.

Returns:

AnnData object.

spateo.svg.cal_euclidean_distance(adata: anndata.AnnData, layer: str = 'spatial', min_dis_cutoff: float = np.inf, max_dis_cutoff: float = np.inf) → anndata.AnnData[source]¶

spateo.svg.scale_to(adata: anndata.AnnData, to_median: bool = True, N: int = 10000) → anndata.AnnData[source]¶

Scale the X array in AnnData.

Parameters:

adata: AnnData object.
to_median: Whether scale to the median of cell total expressions.
N: if to_median is False, scale data to this value.

Returns:

AnnData object.

spateo.svg.cal_wass_dis(M, a, b=[], numItermax=1000000)[source]¶

Computing Wasserstein distance.

Parameters:

M: (ns,nt) array-like, float – Loss matrix (c-order array in numpy with type float64)
a: (ns,) array-like, float – Source histogram (uniform weight if empty list)
b: (nt,) array-like, float – Target histogram (uniform weight if empty list)

Returns:

(float, array-like) – Optimal transportation loss for the given parameters

Return type:

spateo.svg.cal_rank_p(genes, ws, w_df, bin_num=100)[source]¶

spateo.svg.loess_reg(adata: anndata.AnnData, layers: str = 'X') → anndata.AnnData[source]¶

spateo.svg.lm¶

spateo.svg.bin_adata(adata: anndata.AnnData, bin_size: int = 1, layer: str = 'spatial') → anndata.AnnData[source]¶

Aggregate cell-based adata by bin size. Cells within a bin would be aggregated together as one cell.

Parameters:

adata: the input adata.
bin_size: the size of square to bin adata.

Returns:

Aggreated adata.

spateo.svg.shuffle_adata(adata: anndata.AnnData, seed: int = 0, replace: bool = False)[source]¶

Shuffle X in anndata object randomly.

Parameters:

adata: AnnData object
seed: seed for randomly shuffling

Returns:

AnnData object

Return type:

adata

spateo.svg.filter_adata_by_pos_ratio(adata, pos_ratio)[source]¶

Filter out cells with positive ratio lower than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: Cells with positive ratio lower than this value would be discarded.

Returns:

AnnData object.

spateo.svg.get_genes_by_pos_ratio(adata: anndata.AnnData, pos_ratio: float = 0.1) → list[source]¶

Get genes that have postive ratio higher than a setting value.

Parameters:

adata: AnnData object.
pos_ratio: The threshold of positive ratio.

Returns:

Gene list. AnnData object.

spateo.svg.add_pos_ratio_to_adata(adata: anndata.AnnData, layer: str = None, var_name: str = 'raw_pos_rate')[source]¶

Calculate positive ratios for all genes, and return to AnnData. We defind positive ratio of a gene as the percent of cells express this gene.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used. If not given, we use data in X.
var_name: The var name for storing positive ratios.

Returns:

None

Calculate geodesic distance between any pair of genes.

Parameters:

adata: AnnData object.
layer: The layer of AnnData, in which the data are used.
n_neighbors: The number of neighbor to connect a cell to its nearest neighbors.
min_dis_cutoff: Remove cells with minimal distance with its neighbors larger than this value. These cells are like islated cells.
max_dis_cutoff: Remove cells with maximal distance with its neighbors larger than this value. These cells are like sparse cells.

Returns:

AnnData object.

spateo.svg.cal_euclidean_distance(adata: anndata.AnnData, layer: str = 'spatial', min_dis_cutoff: float = np.inf, max_dis_cutoff: float = np.inf) → anndata.AnnData[source]¶

spateo.svg.scale_to(adata: anndata.AnnData, to_median: bool = True, N: int = 10000) → anndata.AnnData[source]¶

Scale the X array in AnnData.

Parameters:

adata: AnnData object.
to_median: Whether scale to the median of cell total expressions.
N: if to_median is False, scale data to this value.

Returns:

AnnData object.

spateo.svg.cal_wass_dis(M, a, b=[], numItermax=1000000)[source]¶

Computing Wasserstein distance.

Parameters:

M: (ns,nt) array-like, float – Loss matrix (c-order array in numpy with type float64)
a: (ns,) array-like, float – Source histogram (uniform weight if empty list)
b: (nt,) array-like, float – Target histogram (uniform weight if empty list)

Returns:

(float, array-like) – Optimal transportation loss for the given parameters

Return type:

spateo.svg.cal_rank_p(genes, ws, w_df, bin_num=100)[source]¶

spateo.svg.loess_reg(adata: anndata.AnnData, layers: str = 'X') → anndata.AnnData[source]¶