spateo.preprocessing.filter#

Filter functions.

Module Contents#

Functions#

filter_cells(→ Optional[anndata.AnnData])

Select valid cells based on a collection of filters.

filter_genes(→ Optional[anndata.AnnData])

Select valid genes based on a collection of filters.

filter_by_coordinates(, y_range, inplace)

Select valid cells by coordinates.

spateo.preprocessing.filter.filter_cells(adata: anndata.AnnData, filter_bool: numpy.ndarray | None = None, keep_filtered: bool = False, min_expr_genes: int = 50, max_expr_genes: float = np.inf, min_area: float = 0, max_area: float = np.inf, inplace: bool = False) anndata.AnnData | None[source]#

Select valid cells based on a collection of filters. This function is partially based on dynamo (https://github.com/aristoteleo/dynamo-release).

TODO: What layers need to be considered? Argument shared_count ?

Parameters:
adata

AnnData object.

filter_bool

A boolean array from the user to select cells for downstream analysis.

keep_filtered

Whether to keep cells that don’t pass the filtering in the adata object.

min_expr_genes

Minimal number of genes with expression for a cell in the data from X.

max_expr_genes

Maximal number of genes with expression for a cell in the data from X.

min_area

Maximum area of a cell in the data from X.

max_area

Maximum area of a cell in the data from X.

inplace

Perform computation inplace or return result.

Returns:

An updated AnnData object with pass_basic_filter as a new column in obs to indicate the selection of cells for downstream analysis. adata will be subset with only the cells pass filtering if keep_filtered is set to be False.

spateo.preprocessing.filter.filter_genes(adata: anndata.AnnData, filter_bool: numpy.ndarray | None = None, keep_filtered: bool = False, min_cells: int = 1, max_cells: float = np.inf, min_avg_exp: float = 0, max_avg_exp: float = np.inf, min_counts: float = 0, max_counts: float = np.inf, inplace: bool = False) anndata.AnnData | None[source]#

Select valid genes based on a collection of filters. This function is partially based on dynamo (https://github.com/aristoteleo/dynamo-release).

Parameters:
adata

filter_bool: ndarray (default: None) A boolean array from the user to select genes for downstream analysis.

keep_filtered

Whether to keep genes that don’t pass the filtering in the adata object.

min_cells

Minimal number of cells with expression in the data from X.

max_cells

Maximal number of cells with expression in the data from X.

min_avg_exp

Minimal average expression across cells for the data.

max_avg_exp

Maximal average expression across cells for the data.

min_counts

Minimal number of counts (UMI/expression) for the data

max_counts

Minimal number of counts (UMI/expression) for the data

inplace

Perform computation inplace or return result.

Returns:

An updated AnnData object with pass_basic_filter as a new column in var to indicate the selection of genes for downstream analysis. adata will be subset with only the genes pass filtering if keep_filtered is set to be False.

spateo.preprocessing.filter.filter_by_coordinates(adata: anndata.AnnData, filter_bool: numpy.ndarray | None = None, keep_filtered: bool = False, x_range: Sequence[float] = (-np.inf, np.inf), y_range: Sequence[float] = (-np.inf, np.inf), inplace: bool = False) anndata.AnnData | None[source]#

Select valid cells by coordinates. TODO: lasso tool

Parameters:
adata

AnnData object.

filter_bool

A boolean array from the user to select cells for downstream analysis.

keep_filtered

Whether to keep cells that don’t pass the filtering in the adata object.

x_range

The X-axis range of cell coordinates.

y_range

The Y-axis range of cell coordinates.

inplace

Perform computation inplace or return result.

Returns:

An updated AnnData object with pass_basic_filter as a new column in obs to indicate the selection of cells for downstream analysis. adata will be subset with only the cells pass filtering if keep_filtered is set to be False.