spateo.segmentation.em#

Implementation of EM algorithm to identify parameter estimates for a Negative Binomial mixture model. https://iopscience.iop.org/article/10.1088/1742-6596/1324/1/012093/meta

Written by @HailinPan, optimized by @Lioscro.

Module Contents#

Functions#

lamtheta_to_r(→ float)

Convert lambda and theta to r.

muvar_to_lamtheta(→ Tuple[float, float])

Convert the mean and variance to lambda and theta.

lamtheta_to_muvar(→ Tuple[float, float])

Convert the lambda and theta to mean and variance.

nbn_pmf(n, p, X)

Helper function to compute PMF of negative binomial distribution.

nbn_em(, mu, float] =, var, float] =, max_iter, ...)

Run the EM algorithm to estimate the parameters for background and cell

conditionals(→ Tuple[numpy.ndarray, numpy.ndarray])

Compute the conditional probabilities, for each pixel, of observing the

confidence(→ numpy.ndarray)

Compute confidence of each pixel being a cell, using the parameters

run_em([mu, var])

EM

Attributes#

spateo.segmentation.em.progress[source]#
spateo.segmentation.em.lamtheta_to_r(lam: float, theta: float) float[source]#

Convert lambda and theta to r.

spateo.segmentation.em.muvar_to_lamtheta(mu: float, var: float) Tuple[float, float][source]#

Convert the mean and variance to lambda and theta.

spateo.segmentation.em.lamtheta_to_muvar(lam: float, theta: float) Tuple[float, float][source]#

Convert the lambda and theta to mean and variance.

spateo.segmentation.em.nbn_pmf(n, p, X)[source]#

Helper function to compute PMF of negative binomial distribution.

This function is used instead of calling stats.nbinom() directly because there is some weird behavior when float32 is used. This function essentially casts the n and p parameters as floats.

spateo.segmentation.em.nbn_em(X: numpy.ndarray, w: Tuple[float, float] = (0.99, 0.01), mu: Tuple[float, float] = (10.0, 300.0), var: Tuple[float, float] = (20.0, 400.0), max_iter: int = 2000, precision: float = 0.001) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#

Run the EM algorithm to estimate the parameters for background and cell UMIs.

Parameters:
X

Numpy array containing mixture counts

w

Initial proportions of cell and background as a tuple.

mu

Initial means of cell and background negative binomial distributions.

var

Initial variances of cell and background negative binomial distributions.

max_iter

Maximum number of iterations.

precision

Desired precision. Algorithm will stop once this is reached.

Returns:

Estimated w, r, p.

spateo.segmentation.em.conditionals(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) Tuple[numpy.ndarray, numpy.ndarray][source]#

Compute the conditional probabilities, for each pixel, of observing the observed number of UMIs given that the pixel is background/foreground.

Parameters:
X

UMI counts per pixel

em_results

Return value of run_em().

bins

Pixel bins, as was passed to run_em().

Returns:

Two Numpy arrays, the first corresponding to the background conditional probabilities, and the second to the foreground conditional probabilities

Raises:

SegmentationError – If em_results is a dictionary but bins was not provided.

spateo.segmentation.em.confidence(X: numpy.ndarray, em_results: Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]], bins: numpy.ndarray | None = None) numpy.ndarray[source]#

Compute confidence of each pixel being a cell, using the parameters estimated by the EM algorithm.

Parameters:
X

Numpy array containing mixture counts.

em_results

Return value of run_em().

bins

Pixel bins, as was passed to run_em().

Returns:

Numpy array of confidence scores within the range [0, 1].

spateo.segmentation.em.run_em(X: numpy.ndarray, downsample: int | float = 0.001, params: Dict[str, Tuple[float, float]] | Dict[int, Dict[str, Tuple[float, float]]] = dict(w=(0.5, 0.5), mu=(10.0, 300.0), var=(20.0, 400.0)), max_iter: int = 2000, precision: float = 1e-06, bins: numpy.ndarray | None = None, seed: int | None = None) Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]] | Dict[int, Tuple[Tuple[float, float], Tuple[float, float], Tuple[float, float]]][source]#

EM

Parameters:
X

UMI counts per pixel.

use_peaks

Whether to use peaks of convolved image as samples for the EM algorithm.

min_distance

Minimum distance between peaks when use_peaks=True

downsample

Use at most this many samples. If use_peaks is False, samples are chosen randomly with probability proportional to the log UMI counts. When bins is provided, the size of each bin is used as a scaling factor. If this is a float, then samples are downsampled by this fraction.

params

Initial parameters. This is a dictionary that contains w, mu, var as its keys, each corresponding to initial proportions, means and variances of background and foreground pixels. The values must be a 2-element tuple containing the values for background and foreground. This may also be a nested dictionary, where the outermost key maps bin labels provided in the bins argument. In this case, each of the inner dictionaries will be used as the initial paramters corresponding to each bin.

max_iter

Maximum number of EM iterations.

precision

Stop EM algorithm once desired precision has been reached.

bins

Bins of pixels to estimate separately, such as those obtained by density segmentation. Zeros are ignored.

seed

Random seed.

Returns:

Tuple of parameters estimated by the EM algorithm if bins is not provided. Otherwise, a dictionary of tuple of parameters, with bin labels as keys.