spateo.preprocessing¶
Submodules¶
Functions¶
|
Normalize counts per cell. |
|
Select highly variable features using Seurat method. |
|
Computes the natural logarithm of the data matrix (unless different base is chosen using the base argument) |
|
Scale variables to unit variance and optionally zero mean. Variables that are constant across all observations |
Package Contents¶
- spateo.preprocessing.normalize_total(adata: anndata.AnnData, target_sum: float | None = None, norm_factor: numpy.ndarray | None = None, exclude_highly_expressed: bool = False, max_fraction: float = 0.05, key_added: str | None = None, layer: str | None = None, inplace: bool = True, copy: bool = False) anndata.AnnData | Dict[str, numpy.ndarray] [source]¶
Normalize counts per cell. Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization.
If exclude_highly_expressed=True, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes.
- Parameters:
- adata
The annotated data matrix of shape n_obs × n_vars. Rows correspond to cells and columns to genes.
- target_sum
Desired sum of counts for each gene post-normalization. If None, after normalization, each observation (cell) will have a total count equal to the median of total counts for observations ( cells) before normalization. 1e4 is a suitable recommendation, but if not given, will find a suitable number based on the library sizes.
- norm_factor
Optional array of shape n_obs × 1, where n_obs is the number of observations (cells). Each entry contains a pre-computed normalization factor for that cell.
- exclude_highly_expressed
Exclude (very) highly expressed genes for the computation of the normalization factor for each cell. A gene is considered highly expressed if it has more than max_fraction of the total counts in at least one cell.
- max_fraction
If exclude_highly_expressed=True, this is the cutoff threshold for excluding genes.
- key_added
Name of the field in adata.obs where the normalization factor is stored.
- layer
Layer to normalize instead of X. If None, X is normalized.
- inplace
Whether to update adata or return dictionary with normalized copies of adata.X and adata.layers.
- copy
Whether to modify copied input object. Not compatible with inplace=False.
- Returns:
Returns dictionary with normalized copies of adata.X and adata.layers or updates adata with normalized version of the original adata.X and adata.layers, depending on inplace.
- spateo.preprocessing.select_hvf_seurat(data: anndata.AnnData, n_top: int = 2000, min_disp: float = 0.5, max_disp: float = np.inf, min_mean: float = 0.0125, max_mean: float = 7) None [source]¶
Select highly variable features using Seurat method.
- spateo.preprocessing.log1p(X: anndata.AnnData | numpy.ndarray | scipy.sparse.spmatrix, base: int | None = None, copy: bool = False)[source]¶
Computes the natural logarithm of the data matrix (unless different base is chosen using the base argument)
- Parameters:
- X
Either full AnnData object or .X. Rows correspond to cells and columns to genes.
- base
Natural log is used by default.
- copy
If an
AnnData
is passed, determines whether a copy is returned.- layer
Layer to transform. If None, will transform .X. If given both argument to layer and obsm, argument to layer will take priority.
- obsm
Entry in .obsm to transform. If None, will transform .X.
- Returns:
If copy is True or input is numpy array/sparse matrix, returns updated data array. Otherwise, returns updated AnnData object.
- spateo.preprocessing.scale(X: anndata.AnnData | scipy.sparse.spmatrix | numpy.ndarray, zero_center: bool = True, max_value: float | None = None, copy: bool = False, layer: str | None = None, obsm: str | None = None, return_mean_std: bool = False)[source]¶
Scale variables to unit variance and optionally zero mean. Variables that are constant across all observations will be set to 0.
- Parameters:
- X
Either full AnnData object or .X. Rows correspond to cells and columns to genes.
- zero_center
If False, will not center variables.
- max_value
Truncate to this value after scaling.
- copy
If an
AnnData
is passed, determines whether a copy is returned.- layer
Layer to transform. If None, will transform .X. If given both argument to layer and obsm, argument to layer will take priority.
- obsm
Entry in .obsm to transform. If None, will transform .X.
- return_mean_std
Set True to return computed feature means and feature standard deviations.
- Returns:
Depending on copy returns or updates adata with a scaled adata.X, annotated with ‘mean’ and ‘std’ in adata.var.