spateo.segmentation.em
#
Implementation of EM algorithm to identify parameter estimates for a Negative Binomial mixture model. https://iopscience.iop.org/article/10.1088/1742-6596/1324/1/012093/meta
Written by @HailinPan, optimized by @Lioscro.
Module Contents#
Functions#
|
Convert lambda and theta to r. |
|
Convert the mean and variance to lambda and theta. |
|
Convert the lambda and theta to mean and variance. |
|
Helper function to compute PMF of negative binomial distribution. |
|
Run the EM algorithm to estimate the parameters for background and cell |
|
Compute the conditional probabilities, for each pixel, of observing the |
|
Compute confidence of each pixel being a cell, using the parameters |
|
EM |
Attributes#
- spateo.segmentation.em.lamtheta_to_r(lam: float, theta: float) float [source]#
Convert lambda and theta to r.
- spateo.segmentation.em.muvar_to_lamtheta(mu: float, var: float) Tuple[float, float] [source]#
Convert the mean and variance to lambda and theta.
- spateo.segmentation.em.lamtheta_to_muvar(lam: float, theta: float) Tuple[float, float] [source]#
Convert the lambda and theta to mean and variance.
- spateo.segmentation.em.nbn_pmf(n, p, X)[source]#
Helper function to compute PMF of negative binomial distribution.
This function is used instead of calling
stats.nbinom()
directly because there is some weird behavior when float32 is used. This function essentially casts the n and p parameters as floats.
- spateo.segmentation.em.nbn_em(X: numpy.ndarray, w: Tuple[float, float] = (0.99, 0.01), mu: Tuple[float, float] = (10.0, 300.0), var: Tuple[float, float] = (20.0, 400.0), max_iter: int = 2000, precision: float = 0.001) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] [source]#
Run the EM algorithm to estimate the parameters for background and cell UMIs.
- Parameters:
- X
Numpy array containing mixture counts
- w
Initial proportions of cell and background as a tuple.
- mu
Initial means of cell and background negative binomial distributions.
- var
Initial variances of cell and background negative binomial distributions.
- max_iter
Maximum number of iterations.
- precision
Desired precision. Algorithm will stop once this is reached.
- Returns:
Estimated w, r, p.
- spateo.segmentation.em.conditionals(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) Tuple[numpy.ndarray, numpy.ndarray] [source]#
Compute the conditional probabilities, for each pixel, of observing the observed number of UMIs given that the pixel is background/foreground.
- Parameters:
- Returns:
Two Numpy arrays, the first corresponding to the background conditional probabilities, and the second to the foreground conditional probabilities
- Raises:
SegmentationError – If em_results is a dictionary but bins was not provided.
- spateo.segmentation.em.confidence(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) numpy.ndarray [source]#
Compute confidence of each pixel being a cell, using the parameters estimated by the EM algorithm.
- spateo.segmentation.em.run_em(X: numpy.ndarray, downsample: int | float = 0.001, params: Dict[str, Tuple[float, float]] | Dict[int, Dict[str, Tuple[float, float]]] = dict(w=(0.5, 0.5), mu=(10.0, 300.0), var=(20.0, 400.0)), max_iter: int = 2000, precision: float = 1e-06, bins: numpy.ndarray | None = None, seed: int | None = None) Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]] [source]#
EM
- Parameters:
- X
UMI counts per pixel.
- use_peaks
Whether to use peaks of convolved image as samples for the EM algorithm.
- min_distance
Minimum distance between peaks when use_peaks=True
- downsample
Use at most this many samples. If use_peaks is False, samples are chosen randomly with probability proportional to the log UMI counts. When bins is provided, the size of each bin is used as a scaling factor. If this is a float, then samples are downsampled by this fraction.
- params
Initial parameters. This is a dictionary that contains w, mu, var as its keys, each corresponding to initial proportions, means and variances of background and foreground pixels. The values must be a 2-element tuple containing the values for background and foreground. This may also be a nested dictionary, where the outermost key maps bin labels provided in the bins argument. In this case, each of the inner dictionaries will be used as the initial paramters corresponding to each bin.
- max_iter
Maximum number of EM iterations.
- precision
Stop EM algorithm once desired precision has been reached.
- bins
Bins of pixels to estimate separately, such as those obtained by density segmentation. Zeros are ignored.
- seed
Random seed.
- Returns:
Tuple of parameters estimated by the EM algorithm if bins is not provided. Otherwise, a dictionary of tuple of parameters, with bin labels as keys.