spateo.tools.gene_expression_variance#

Characterizing cell-to-cell variability within spatial domains

Module Contents#

Functions#

compute_gene_groups_p_val(→ Tuple[str, float])

Calculate the Mann-Whitney U test p-value for a gene between two groups.

get_highvar_genes(→ Tuple[pandas.DataFrame, Dict])

Find highly-variable genes in single-cell data matrices.

get_highvar_genes_sparse(→ Tuple[pandas.DataFrame, Dict])

Find highly-variable genes in sparse single-cell data matrices.

compute_variance_decomposition(adata, ...)

Computes and then optionally visualizes the variance decomposition for an AnnData object.

genewise_variance_decomposition(adata, ...)

For each gene in the chosen subset, computes a variance decomposition by computing the intra-cell type variance

plot_variance_decomposition(var_df, figsize, float] =, ...)

Visualization of the parts-wise intra-cell type variation, cell type-independent gene variation to the total

spateo.tools.gene_expression_variance.compute_gene_groups_p_val(gene: str, group1: anndata.AnnData, group2: anndata.AnnData) Tuple[str, float][source]#

Calculate the Mann-Whitney U test p-value for a gene between two groups.

Parameters:
gene

Name of the gene

group1

AnnData object containing cells from the first group to compare

group2

AnnData object containing cells from the second group to compare

Returns:

Name of the gene p_val: Mann-Whitney U test p-value

Return type:

gene

spateo.tools.gene_expression_variance.get_highvar_genes(expression: numpy.ndarray | scipy.sparse.csr_matrix | scipy.sparse.csc_matrix | scipy.sparse.coo_matrix, expected_fano_threshold: float | None = None, numgenes: int | None = None, minimal_mean: float = 0.5) Tuple[pandas.DataFrame, Dict][source]#

Find highly-variable genes in single-cell data matrices.

Parameters:
expression

Gene expression matrix

expected_fano_threshold

Optionally can be used to set a manual dispersion threshold (for definition of “highly-variable”)

numgenes

Optionally can be used to find the n most variable genes

minimal_mean

Sets a threshold on the minimum mean expression to consider

spateo.tools.gene_expression_variance.get_highvar_genes_sparse(expression: numpy.ndarray | scipy.sparse.csr_matrix | scipy.sparse.csc_matrix | scipy.sparse.coo_matrix, expected_fano_threshold: float | None = None, numgenes: int | None = None, minimal_mean: float = 0.5) Tuple[pandas.DataFrame, Dict][source]#

Find highly-variable genes in sparse single-cell data matrices.

Parameters:
expression

Gene expression matrix

expected_fano_threshold

Optionally can be used to set a manual dispersion threshold (for definition of “highly-variable”)

numgenes

Optionally can be used to find the n most variable genes

minimal_mean

Sets a threshold on the minimum mean expression to consider

Returns:

Results dataframe containing pertinent information for each gene gene_fano_parameters: Additional informative dictionary (w/ records of dispersion for each gene, threshold, etc.)

Return type:

gene_counts_stats

spateo.tools.gene_expression_variance.compute_variance_decomposition(adata: anndata.AnnData, spatial_label_id: str, celltype_label_id: str, genes: Union[None, str, List[str]] = None, figsize: Union[None, Tuple[float, float]] = None, save_show_or_return: Literal[save, show, return, both, all] = 'show', save_kwargs: Optional[dict] = {})[source]#

Computes and then optionally visualizes the variance decomposition for an AnnData object.

Within spatial regions, determines the proportion of the total variation that occurs within the same cell type, the proportion of the variation that occurs between cell types in the region, and the proportion of the variation that comes from baseline differences in the expression levels of the genes in the data. The within-cell type variation could potentially come from differences in cell-cell communication.

Parameters:
adata

AnnData object containing data

spatial_label_id

Key in .obs containing spatial domain labels

celltype_label_id

Key in .obs containing cell type labels

genes

Can be used to filter to chosen subset of genes for variance computation

figsize

Can be optionally used to set the size of the plotted figure

save_show_or_return

Whether to save, show or return the figure. Only used if ‘visualize’ is True If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.

save_kwargs

A dictionary that will passed to the save_fig function. Only used if ‘visualize’ is True. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.

Returns:

Dataframe containing four columns, for the category label, celltype variation,

inter-celltype variation and gene-level variation

Return type:

var_decomposition

spateo.tools.gene_expression_variance.genewise_variance_decomposition(adata: anndata.AnnData, celltype_label_id: str, genes: Union[str, List[str]], figsize: Union[None, Tuple[float, float]] = None, save_show_or_return: Literal[save, show, return, both, all] = 'show', save_kwargs: Optional[dict] = {})[source]#

For each gene in the chosen subset, computes a variance decomposition by computing the intra-cell type variance and the inter-cell type variance.

Parameters:
adata

AnnData object containing data

celltype_label_id

Key in .obs containing cell type labels

genes

Can be used to filter to chosen subset of genes for variance computation

figsize

Can be used to optionally set the size of the plotted figure

save_show_or_return

Whether to save, show or return the figure. Only used if ‘visualize’ is True If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.

save_kwargs

A dictionary that will passed to the save_fig function. Only used if ‘visualize’ is True. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.

Returns:

Dataframe containing three columns, for the gene, intra-celltype variation and

inter-celltype variation

Return type:

var_decomposition

spateo.tools.gene_expression_variance.plot_variance_decomposition(var_df: pandas.DataFrame, figsize: Tuple[float, float] = (6, 2), cmap: str = 'Blues_r', multiindex: bool = False, title: Union[None, str] = None, save_show_or_return: Literal[save, show, return, both, all] = 'show', save_kwargs: Optional[dict] = {})[source]#

Visualization of the parts-wise intra-cell type variation, cell type-independent gene variation to the total variation within the data.

Parameters:
var_df

Output from :func compute_variance_decomposition

figsize

(width, height) of the figure window

cmap

Name of the matplotlib colormap to use

multiindex

Specifies whether to set labels to record multi-level index information. Should only be used if var_df has a multi-index.

title

Optionally, provide custom title to plot. If not given, will use default title.

save_show_or_return

Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be returned.

save_kwargs

A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.