spateo.segmentation.em¶

Implementation of EM algorithm to identify parameter estimates for a Negative Binomial mixture model. https://iopscience.iop.org/article/10.1088/1742-6596/1324/1/012093/meta

Written by @HailinPan, optimized by @Lioscro.

Attributes¶

progress

Functions¶

`lamtheta_to_r`(→ float)	Convert lambda and theta to r.
`muvar_to_lamtheta`(→ Tuple[float, float])	Convert the mean and variance to lambda and theta.
`lamtheta_to_muvar`(→ Tuple[float, float])	Convert the lambda and theta to mean and variance.
`nbn_pmf`(n, p, X)	Helper function to compute PMF of negative binomial distribution.
`nbn_em`(, mu, float] =, var, float] =, max_iter, ...)	Run the EM algorithm to estimate the parameters for background and cell
`conditionals`(→ Tuple[numpy.ndarray, numpy.ndarray])	Compute the conditional probabilities, for each pixel, of observing the
`confidence`(→ numpy.ndarray)	Compute confidence of each pixel being a cell, using the parameters
`run_em`([mu, var])	EM

Module Contents¶

spateo.segmentation.em.progress[source]¶

spateo.segmentation.em.lamtheta_to_r(lam: float, theta: float) → float[source]¶: Convert lambda and theta to r.

spateo.segmentation.em.muvar_to_lamtheta(mu: float, var: float) → Tuple[float, float][source]¶: Convert the mean and variance to lambda and theta.

spateo.segmentation.em.lamtheta_to_muvar(lam: float, theta: float) → Tuple[float, float][source]¶: Convert the lambda and theta to mean and variance.

spateo.segmentation.em.nbn_pmf(n, p, X)[source]¶

Helper function to compute PMF of negative binomial distribution.

This function is used instead of calling stats.nbinom() directly because there is some weird behavior when float32 is used. This function essentially casts the n and p parameters as floats.

spateo.segmentation.em.nbn_em(X: numpy.ndarray, w: Tuple[float, float] = (0.99, 0.01), mu: Tuple[float, float] = (10.0, 300.0), var: Tuple[float, float] = (20.0, 400.0), max_iter: int = 2000, precision: float = 0.001) → Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]¶

Run the EM algorithm to estimate the parameters for background and cell UMIs.

Parameters:

X: Numpy array containing mixture counts
w: Initial proportions of cell and background as a tuple.
mu: Initial means of cell and background negative binomial distributions.
var: Initial variances of cell and background negative binomial distributions.
max_iter: Maximum number of iterations.
precision: Desired precision. Algorithm will stop once this is reached.

Returns:

Estimated w, r, p.

spateo.segmentation.em.conditionals(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Compute the conditional probabilities, for each pixel, of observing the observed number of UMIs given that the pixel is background/foreground.

Parameters:

X: UMI counts per pixel
em_results: Return value of run_em().
bins: Pixel bins, as was passed to run_em().

Returns:

Two Numpy arrays, the first corresponding to the background conditional probabilities, and the second to the foreground conditional probabilities

Raises:

SegmentationError – If em_results is a dictionary but bins was not provided.

spateo.segmentation.em.confidence(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) → numpy.ndarray[source]¶

Compute confidence of each pixel being a cell, using the parameters estimated by the EM algorithm.

Parameters:

X: Numpy array containing mixture counts.
em_results: Return value of run_em().
bins: Pixel bins, as was passed to run_em().

Returns:

Numpy array of confidence scores within the range [0, 1].

spateo.segmentation.em.run_em(X: numpy.ndarray, downsample: int | float = 0.001, params: Dict[str, Tuple[float, float]] | Dict[int, Dict[str, Tuple[float, float]]] = dict(w=(0.5, 0.5), mu=(10.0, 300.0), var=(20.0, 400.0)), max_iter: int = 2000, precision: float = 1e-06, bins: numpy.ndarray | None = None, seed: int | None = None) → Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]][source]¶

Parameters:

X: UMI counts per pixel.
use_peaks: Whether to use peaks of convolved image as samples for the EM algorithm.
min_distance: Minimum distance between peaks when use_peaks=True
downsample: Use at most this many samples. If use_peaks is False, samples are chosen randomly with probability proportional to the log UMI counts. When bins is provided, the size of each bin is used as a scaling factor. If this is a float, then samples are downsampled by this fraction.
params: Initial parameters. This is a dictionary that contains w, mu, var as its keys, each corresponding to initial proportions, means and variances of background and foreground pixels. The values must be a 2-element tuple containing the values for background and foreground. This may also be a nested dictionary, where the outermost key maps bin labels provided in the bins argument. In this case, each of the inner dictionaries will be used as the initial paramters corresponding to each bin.
max_iter: Maximum number of EM iterations.
precision: Stop EM algorithm once desired precision has been reached.
bins: Bins of pixels to estimate separately, such as those obtained by density segmentation. Zeros are ignored.
seed: Random seed.

Returns:

Tuple of parameters estimated by the EM algorithm if bins is not provided. Otherwise, a dictionary of tuple of parameters, with bin labels as keys.