spateo.tools.cluster_degs#

Module Contents#

Functions#

find_spatial_cluster_degs(→ pandas.DataFrame)

Function to search nearest neighbor groups in spatial space

find_cluster_degs(→ pandas.DataFrame)

Find marker genes between one group to other groups based on gene expression.

find_all_cluster_degs(→ anndata.AnnData)

Find marker genes for each group of buckets based on gene expression.

top_n_degs(adata, group[, custom_score_func, sort_by, ...])

Find top n marker genes for each group of buckets based on differential gene expression analysis results.

spateo.tools.cluster_degs.find_spatial_cluster_degs(adata: anndata.AnnData, test_group: str, x: List[int] | None = None, y: List[int] | None = None, group: str | None = None, genes: List[str] | None = None, k: int = 10, ratio_thresh: float = 0.5) pandas.DataFrame[source]#

Function to search nearest neighbor groups in spatial space for the given test group.

Parameters:
adata

an Annodata object.

test_group

The group name from group for which neighbors has to be found.

x

x-coordinates of all buckets.

y

y-coordinates of all buckets.

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

k

Number of neighbors to use for kneighbors queries.

ratio_thresh

For each non-test group, if more than 50% (default) of its buckets are in the neighboring set, this group is then selected as a neighboring group.

Returns:

A pandas DataFrame of the differential expression analysis result between the test group and neighbor groups.

spateo.tools.cluster_degs.find_cluster_degs(adata: anndata.AnnData, test_group: str, control_groups: List[str], genes: List[str] | None = None, layer: str | None = None, X_data: numpy.ndarray | None = None, group: str | None = None, qval_thresh: float = 0.05, ratio_expr_thresh: float = 0.1, diff_ratio_expr_thresh: float = 0, log2fc_thresh: float = 0, method: Literal[multiple, pairwise] = 'multiple') pandas.DataFrame[source]#

Find marker genes between one group to other groups based on gene expression.

Test each gene for differential expression between buckets in one group and the other groups via Mann-Whitney U test. We calculate the percentage of buckets expressing the gene in the test group (ratio_expr), the difference between the percentages of buckets expressing the gene in the test group and control groups (diff_ratio_expr), the expression fold change between the test and control groups (log2fc), qval is calculated using Benjamini-Hochberg. In addition, the 1 - Jessen-Shannon distance between the distribution of percentage of cells with expression across all groups to the hypothetical perfect distribution in which only the test group of cells has expression (jsd_adj_score), and Pearson’s correlation coefficient between gene vector which actually detected expression in all cells and an ideal marker gene which is only expressed in test_group cells (ppc_score), as well as cosine_score are also calculated.

Parameters:
adata

an Annodata object

test_group

The group name from group for which markers has to be found.

control_groups

The list of group name(s) from group for which markers has to be tested against.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

layer

The layer that will be used to retrieve data for DEG analyses. If None and X_data is not given, .X is used.

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

X_data

The user supplied data that will be used for marker gene detection directly.

qval_thresh

The maximal threshold of qval to be considered as significant genes.

ratio_expr_thresh

The minimum percentage of buckets expressing the gene in the test group.

diff_ratio_expr_thresh

The minimum of the difference between two groups.

log2fc_thresh

The minimum expression log2 fold change.

method

This method is to choose the difference expression genes between test group and other groups one by one or combine them together (default: ‘multiple’). Valid values are “multiple” and “pairwise”.

Returns:

A pandas DataFrame of the differential expression analysis result between the two groups.

Raises:

ValueError – If the method is not one of “pairwise” or “multiple”.

spateo.tools.cluster_degs.find_all_cluster_degs(adata: anndata.AnnData, group: str, genes: List[str] | None = None, layer: str | None = None, X_data: numpy.ndarray | None = None, copy: bool = True, n_jobs: int = 1) anndata.AnnData[source]#

Find marker genes for each group of buckets based on gene expression.

Parameters:
adata

An Annadata object

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

genes

The list of genes that will be used to subset the data for identifying DEGs. If None, all genes will be used.

layer

The layer that will be used to retrieve data for DEG analyses. If None and X_data is not given, .X is used.

X_data

The user supplied data that will be used for marker gene detection directly.

copy

If True (default) a new copy of the adata object will be returned, otherwise if False, the adata will be updated inplace.

n_cores

int (default=1) The maximum number of concurrently running jobs. By default it is 1 and thus no parallel computing code is used at all. When -1 all CPUs are used.

Returns:

An ~anndata.AnnData with a new property cluster_markers in the .uns attribute, which includes a concatenated pandas DataFrame of the differential expression analysis result for all groups and a dictionary where keys are cluster numbers and values are lists of marker genes for the corresponding clusters. Please note that the markers are not the top marker genes. To identify top n marker genes, Use st.tl.cluster_degs.top_n_degs(adata, group=’louvain’).

spateo.tools.cluster_degs.top_n_degs(adata: anndata.AnnData, group: str, custom_score_func: None | Callable = None, sort_by: str | List[str] = 'log2fc', top_n_genes=10, only_deg_list: bool = True)[source]#

Find top n marker genes for each group of buckets based on differential gene expression analysis results.

Parameters:
adata

an Annodata object

group

The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes.

custom_score_func

A custom function to calculate the score based on the DEG analyses result. Note the columns in adata.uns[“cluster_markers”][“deg_tables”] includes:

  • ”test_group”,

  • ”control_group”,

  • ”ratio_expr”,

  • ”diff_ratio_expr”,

  • ”person_score”,

  • ”cosine_score”,

  • ”jsd_adj_score”,

  • ”log2fc”,

  • ”combined_score”,

  • ”pval”,

  • ”qval”.

sort_by

str or list Column name or names to sort by.

top_n_genes

int The number of top sorted markers.

only_gene_list

bool Whether to only return the marker gene list for each cluster.