spateo.preprocessing.normalize
==============================

.. py:module:: spateo.preprocessing.normalize

.. autoapi-nested-parse::

   Functions to either scale single-cell data or normalize such that the row-wise sums are identical.


Functions
---------

.. autoapisummary::

   spateo.preprocessing.normalize._normalize_data
   spateo.preprocessing.normalize.normalize_total
   spateo.preprocessing.normalize.calcFactorRLE
   spateo.preprocessing.normalize.calcFactorQuantile
   spateo.preprocessing.normalize.calcFactorTMM
   spateo.preprocessing.normalize.calcFactorTMMwsp
   spateo.preprocessing.normalize.calcNormFactors
   spateo.preprocessing.normalize.factor_normalization
   spateo.preprocessing.normalize.calc_mean_and_var
   spateo.preprocessing.normalize.calc_expm1
   spateo.preprocessing.normalize.select_hvf_seurat_single
   spateo.preprocessing.normalize.select_hvf_seurat


Module Contents
---------------

.. py:function:: _normalize_data(X, counts, after=None, copy=False, rows=True, round=False)

   Row-wise or column-wise normalization of sparse data array.

   :param X: Sparse data array to modify.
   :param counts: Array of shape [1, n], where n is the number of buckets or number of genes, containing the total
                  counts in each cell or for each gene, respectively.
   :param after: Target sum total counts for each gene or each cell. Defaults to `None`, in which case each observation
                 (cell) will have a total count equal to the median of total counts for observations (cells) before
                 normalization.
   :param copy: Whether to operate on a copy of X.
   :param rows: Whether to perform normalization over rows (normalize each cell to have the same total count number) or
                over columns (normalize each gene to have the same total count number).
   :param round: Whether to round to three decimal places to more exactly match the desired number of total counts.


.. py:function:: normalize_total(adata: anndata.AnnData, target_sum: Optional[float] = None, norm_factor: Optional[numpy.ndarray] = None, exclude_highly_expressed: bool = False, max_fraction: float = 0.05, key_added: Optional[str] = None, layer: Optional[str] = None, inplace: bool = True, copy: bool = False) -> Union[anndata.AnnData, Dict[str, numpy.ndarray]]

   Normalize counts per cell.
   Normalize each cell by total counts over all genes, so that every cell has the same total count after normalization.

   If `exclude_highly_expressed=True`, very highly expressed genes are excluded from the computation of the
   normalization factor (size factor) for each cell. This is meaningful as these can strongly influence the resulting
   normalized values for all other genes.

   :param adata: The annotated data matrix of shape `n_obs` × `n_vars`. Rows correspond to cells and columns to genes.
   :param target_sum: Desired sum of counts for each gene post-normalization. If `None`, after normalization,
                      each observation (cell) will have a total count equal to the median of total counts for observations (
                      cells) before normalization. 1e4 is a suitable recommendation, but if not given, will find a suitable
                      number based on the library sizes.
   :param norm_factor: Optional array of shape `n_obs` × `1`, where `n_obs` is the number of observations (cells). Each
                       entry contains a pre-computed normalization factor for that cell.
   :param exclude_highly_expressed: Exclude (very) highly expressed genes for the computation of the normalization factor
                                    for each cell. A gene is considered highly expressed if it has more than `max_fraction` of the total counts
                                    in at least one cell.
   :param max_fraction: If `exclude_highly_expressed=True`, this is the cutoff threshold for excluding genes.
   :param key_added: Name of the field in `adata.obs` where the normalization factor is stored.
   :param layer: Layer to normalize instead of `X`. If `None`, `X` is normalized.
   :param inplace: Whether to update `adata` or return dictionary with normalized copies of `adata.X` and `adata.layers`.
   :param copy: Whether to modify copied input object. Not compatible with inplace=False.

   :returns: Returns dictionary with normalized copies of `adata.X` and `adata.layers` or updates `adata` with normalized
             version of the original `adata.X` and `adata.layers`, depending on `inplace`.


.. py:function:: calcFactorRLE(data: numpy.ndarray) -> numpy.ndarray

   Calculate scaling factors using the Relative Log Expression (RLE) method. Python implementation of the same-named
   function from edgeR:

   Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
   differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.

   :param data: An array-like object representing the data matrix.

   :returns: An array of scaling factors for each cell
   :rtype: factors


.. py:function:: calcFactorQuantile(data: numpy.ndarray, lib_size: float, p: float = 0.95) -> numpy.ndarray

   Calculate scaling factors using the Quantile method. Python implementation of the same-named function from edgeR:

   Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
   differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.

   :param data: An array-like object representing the data matrix.
   :param lib_size: The library size or total count to normalize against.
   :param p: The quantile value (default: 0.75).

   :returns: An array of scaling factors for each cell
   :rtype: factors


.. py:function:: calcFactorTMM(obs: Union[float, numpy.ndarray], ref: Union[float, numpy.ndarray], libsize_obs: Optional[float] = None, libsize_ref: Optional[float] = None, logratioTrim: float = 0.3, sumTrim: float = 0.05, doWeighting: bool = True, Acutoff: float = -10000000000.0) -> float

   Calculate scaling factors using the Trimmed Mean of M-values (TMM) method. Python implementation of the
   same-named function from edgeR:

   Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
   differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.

   :param obs: An array-like object representing the observed library counts.
   :param ref: An array-like object representing the reference library counts.
   :param libsize_obs: The library size of the observed library (default: sum of observed counts).
   :param libsize_ref: The library size of the reference library (default: sum of reference counts).
   :param logratioTrim: The fraction of extreme log-ratios to be trimmed (default: 0.3).
   :param sumTrim: The fraction of extreme log-ratios to be trimmed based on the absolute expression (default: 0.05).
   :param doWeighting: Whether to perform weighted TMM estimation (default: True).
   :param Acutoff: The cutoff value for removing infinite values (default: -1e10).

   :returns: floating point scaling factor
   :rtype: factor


.. py:function:: calcFactorTMMwsp(obs: Union[float, numpy.ndarray], ref: Union[float, numpy.ndarray], libsize_obs: Optional[float] = None, libsize_ref: Optional[float] = None, logratioTrim: float = 0.3, sumTrim: float = 0.05, doWeighting: bool = True) -> float

   Calculate scaling factors using the Trimmed Mean of M-values with singleton pairing (TMMwsp) method. Python
   implementation of the same-named function from edgeR:

   Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
   differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139-140.

   :param obs: An array-like object representing the observed library counts.
   :param ref: An array-like object representing the reference library counts.
   :param libsize_obs: The library size of the observed library (default: sum of observed counts).
   :param libsize_ref: The library size of the reference library (default: sum of reference counts).
   :param logratioTrim: The fraction of extreme log-ratios to be trimmed (default: 0.3).
   :param sumTrim: The fraction of extreme log-ratios to be trimmed based on the absolute expression (default: 0.05).
   :param doWeighting: Whether to perform weighted TMM estimation (default: True).

   :returns: floating point scale factor
   :rtype: factor


.. py:function:: calcNormFactors(counts: Union[numpy.ndarray, scipy.sparse.spmatrix], lib_size: Optional[numpy.ndarray] = None, method: str = 'TMM', refColumn: Optional[int] = None, logratioTrim: float = 0.3, sumTrim: float = 0.05, doWeighting: bool = True, Acutoff: float = -10000000000.0, p: float = 0.75) -> numpy.ndarray

   Function to scale normalize RNA-Seq data for count matrices.
   This is a Python translation of an R function from edgeR package.

   :param object: Array or sparse array of shape [n_samples, n_features] containing gene expression data. Note that a
                  sparse array will be converted to dense before calculations.
   :param lib_size: The library sizes for each sample.
   :param method:
                  The normalization method. Can be:
                      -"TMM": trimmed mean of M-values,
                      -"TMMwsp": trimmed mean of M-values with singleton pairings,
                      -"RLE": relative log expression, or
                      -"upperquartile": using the quantile method
                  Defaults to "TMM".
   :param refColumn: Optional reference column for normalization
   :param logratioTrim: For TMM normalization, the fraction of extreme log-ratios to be trimmed (default: 0.3).
   :param sumTrim: For TMM normalization, the fraction of extreme log-ratios to be trimmed based on the absolute
                   expression (default: 0.05).
   :param doWeighting: Whether to perform weighted TMM estimation (default: True).
   :param Acutoff: For TMM normalization, the cutoff value for removing infinite values (default: -1e10).
   :param p: Parameter for upper quartile normalization. Defaults to 0.75.

   :returns: The normalization factors for each sample.
   :rtype: factors


.. py:function:: factor_normalization(adata: anndata.AnnData, norm_factors: Optional[numpy.ndarray] = None, compute_norm_factors: bool = False, **kwargs)

   Wrapper to apply factor normalization to AnnData object.

   :param adata: The annotated data matrix of shape `n_obs` × `n_vars`. Rows correspond to cells and columns to genes.
   :param norm_factors: Array of shape (`n_obs`, ), the normalization factors for each sample. If not given,
                        will compute using :func `calcNormFactors` and any arguments given to `kwargs`.
   :param compute_norm_factors: Set True to compute (or recompute) normalization factors using :func `calcNormFactors`.
   :param \*\*kwargs: Keyword arguments to pass to :func `calcNormFactors` or :func `normalize_total`. Options:
                      lib_size: The library sizes for each sample.
                      method: The normalization method. Can be:
                              -"TMM": trimmed mean of M-values,
                              -"TMMwsp": trimmed mean of M-values with singleton pairings,
                              -"RLE": relative log expression, or
                              -"upperquartile": using the quantile method
                          Defaults to "TMM" if given.
                      refColumn: Optional reference column for normalization
                      logratioTrim: For TMM normalization, the fraction of extreme log-ratios to be trimmed (default: 0.3).
                      sumTrim: For TMM normalization, the fraction of extreme log-ratios to be trimmed based on the absolute
                          expression (default: 0.05).
                      doWeighting: Whether to perform weighted TMM estimation (default: True).
                      Acutoff: For TMM normalization, the cutoff value for removing infinite values (default: -1e10).
                      p: Parameter for upper quartile normalization. Defaults to 0.75.
                      target_sum: Desired sum of counts for each gene post-normalization. If `None`, after normalization,
                      each observation (cell) will have a total count equal to the median of total counts for observations (
                      cells) before normalization.
                      exclude_highly_expressed: Exclude (very) highly expressed genes for the computation of the normalization
                          factor for each cell. A gene is considered highly expressed if it has more than `max_fraction` of the
                          total counts in at least one cell.
                      max_fraction: If `exclude_highly_expressed=True`, this is the cutoff threshold for excluding genes.
                      key_added: Name of the field in `adata.obs` where the normalization factor is stored.
                      layer: Layer to normalize instead of `X`. If `None`, `X` is normalized.
                      inplace: Whether to update `adata` or return dictionary with normalized copies of `adata.X` and
                          `adata.layers`.
                      copy: Whether to modify copied input object. Not compatible with inplace=False.

   :returns: The normalized AnnData object.
   :rtype: adata


.. py:function:: calc_mean_and_var(X: Union[scipy.sparse.csr_matrix, numpy.ndarray], axis: int)

.. py:function:: calc_expm1(X: Union[scipy.sparse.csr_matrix, numpy.ndarray]) -> numpy.ndarray

   exponential minus one


.. py:function:: select_hvf_seurat_single(X: Union[scipy.sparse.csr_matrix, numpy.ndarray], n_top: int, min_disp: float, max_disp: float, min_mean: float, max_mean: float)

   HVF selection for one channel using Seurat method


.. py:function:: select_hvf_seurat(data: anndata.AnnData, n_top: int = 2000, min_disp: float = 0.5, max_disp: float = np.inf, min_mean: float = 0.0125, max_mean: float = 7) -> None

   Select highly variable features using Seurat method.