spateo.preprocessing ==================== .. py:module:: spateo.preprocessing Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/spateo/preprocessing/_fast_utils/index /autoapi/spateo/preprocessing/aggregate/index /autoapi/spateo/preprocessing/auxseg/index /autoapi/spateo/preprocessing/filter/index /autoapi/spateo/preprocessing/image/index /autoapi/spateo/preprocessing/normalize/index /autoapi/spateo/preprocessing/transform/index Functions --------- .. autoapisummary:: spateo.preprocessing.normalize_total spateo.preprocessing.select_hvf_seurat spateo.preprocessing.log1p spateo.preprocessing.scale Package Contents ---------------- .. py:function:: normalize_total(adata: anndata.AnnData, target_sum: Optional[float] = None, norm_factor: Optional[numpy.ndarray] = None, exclude_highly_expressed: bool = False, max_fraction: float = 0.05, key_added: Optional[str] = None, layer: Optional[str] = None, inplace: bool = True, copy: bool = False) -> Union[anndata.AnnData, Dict[str, numpy.ndarray]] Normalize counts per cell. Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization. If `exclude_highly_expressed=True`, very highly expressed genes are excluded from the computation of the normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting normalized values for all other genes. :param adata: The annotated data matrix of shape `n_obs` × `n_vars`. Rows correspond to cells and columns to genes. :param target_sum: Desired sum of counts for each gene post-normalization. If `None`, after normalization, each observation (cell) will have a total count equal to the median of total counts for observations ( cells) before normalization. 1e4 is a suitable recommendation, but if not given, will find a suitable number based on the library sizes. :param norm_factor: Optional array of shape `n_obs` × `1`, where `n_obs` is the number of observations (cells). Each entry contains a pre-computed normalization factor for that cell. :param exclude_highly_expressed: Exclude (very) highly expressed genes for the computation of the normalization factor for each cell. A gene is considered highly expressed if it has more than `max_fraction` of the total counts in at least one cell. :param max_fraction: If `exclude_highly_expressed=True`, this is the cutoff threshold for excluding genes. :param key_added: Name of the field in `adata.obs` where the normalization factor is stored. :param layer: Layer to normalize instead of `X`. If `None`, `X` is normalized. :param inplace: Whether to update `adata` or return dictionary with normalized copies of `adata.X` and `adata.layers`. :param copy: Whether to modify copied input object. Not compatible with inplace=False. :returns: Returns dictionary with normalized copies of `adata.X` and `adata.layers` or updates `adata` with normalized version of the original `adata.X` and `adata.layers`, depending on `inplace`. .. py:function:: select_hvf_seurat(data: anndata.AnnData, n_top: int = 2000, min_disp: float = 0.5, max_disp: float = np.inf, min_mean: float = 0.0125, max_mean: float = 7) -> None Select highly variable features using Seurat method. .. py:function:: log1p(X: Union[anndata.AnnData, numpy.ndarray, scipy.sparse.spmatrix], base: Optional[int] = None, copy: bool = False) Computes the natural logarithm of the data matrix (unless different base is chosen using the `base` argument) :param X: Either full AnnData object or .X. Rows correspond to cells and columns to genes. :param base: Natural log is used by default. :param copy: If an :class:`~anndata.AnnData` is passed, determines whether a copy is returned. :param layer: Layer to transform. If None, will transform .X. If given both argument to `layer` and `obsm`, argument to `layer` will take priority. :param obsm: Entry in .obsm to transform. If None, will transform .X. :returns: If `copy` is True or input is numpy array/sparse matrix, returns updated data array. Otherwise, returns updated AnnData object. .. py:function:: scale(X: Union[anndata.AnnData, scipy.sparse.spmatrix, numpy.ndarray], zero_center: bool = True, max_value: Optional[float] = None, copy: bool = False, layer: Optional[str] = None, obsm: Optional[str] = None, return_mean_std: bool = False) Scale variables to unit variance and optionally zero mean. Variables that are constant across all observations will be set to 0. :param X: Either full AnnData object or .X. Rows correspond to cells and columns to genes. :param zero_center: If False, will not center variables. :param max_value: Truncate to this value after scaling. :param copy: If an :class:`~anndata.AnnData` is passed, determines whether a copy is returned. :param layer: Layer to transform. If None, will transform .X. If given both argument to `layer` and `obsm`, argument to `layer` will take priority. :param obsm: Entry in .obsm to transform. If None, will transform .X. :param return_mean_std: Set True to return computed feature means and feature standard deviations. :returns: Depending on `copy` returns or updates `adata` with a scaled `adata.X`, annotated with `'mean'` and `'std'` in `adata.var`.