Cell segmentation#

This section describes the technical details behind Spateo’s cell segmentation pipeline.

Alignment of stain and RNA coordinates#

Correct alignment between the stain and RNA coordinates is imperative to obtain correct cell segmentation. Slight misalignments can cause inaccuracies when aggregating UMIs in a patch of pixels into a single cell. We’ve found that for the most part, the alignments produced by the spatial assays themselves are quite good, but there are sometimes slight misalignments. Therefore, Spateo includes a couple of strategies to further refine this alignment to achieve the best possible cell segmentation results. In both cases, the stain image is considered as the “source” image (a.k.a. the image that will be transfomed), and the RNA coordinates represent the “target” image. This convention was chosen because the original RNA coordinates should be maintained as much as possible.

Rigid alignment#

The goal is to find the affine transformation of the stain such that the normalized cross-correlation (NCC) between the stain and RNA is minimized. Mathematically, we wish to find matrix \(T\) such that

$$\begin{bmatrix} x_{target}\\y_{target}\\1 \end{bmatrix} = T\begin{bmatrix} x_{source}\\y_{source}\\1 \end{bmatrix}$$

The matrix \(T\) is optimized with PyTorch’s automatic differentiation capabilities. Internally, \(T\) is represented as a \(2 \times 3\) matrix, following PyTorch convention, and the affine_grid and grid_sample functions are used.

Non-rigid alignment#

The goal is to find a transformation from the source coordinates to the target coordinates (also minimizing the NCC as with ) where the transformation is defined using a set of reference (“control”) points on the source image arranged in a grid (or a mesh) and displacements in each dimension (X and Y) from these points to a set of target points on the target image. Then, to obtain the transformation from the source to the target, the displacement of every source coordinate is computed by thin-plate-spline (TPS) interpolation. Here is an illustration of an image warped using control points and TPS interpolation [1].

Thin Plate Spline

The displacements for each of the control points are also optimized using PyTorch. Internally, the thin_plate_spline module in the Kornia library is used for TPS calculations.

Stain segmentation#

Segmentation functions

Labeling functions

Deep learning functions

Watershed-based segmentation#

Watershed-based nuclei segmentation works in three broad steps. First, a boolean mask, indicating where the nuclei are, is constructed (a.k.a. segmentation). Second, Watershed markers are obtained by iteratively eroding the mask. Finally, the Watershed algorithm is used to label each nuclei (a.k.a. labeling). We will go over each of these steps in more detail.


First, the boolean mask is obtained by a combination of global and local thresholding (spateo.segmentation.mask_nuclei_from_stain()). A Multi-Otsu threshold is obtained and the first class is classified as the “background” pixels. Then, adaptive thresholding (with a gaussian-weighted sum) is applied to identify nuclei. The downside of the adaptive (“local”) thresholding approach is that regions with sparse nuclei tend to be very noisy. Therefore, the background mask is used to remove noise in background regions by taking the pixels that are False in the background mask (indicating foreground) and True in the local mask. The final mask is obtained by applying the morphological closing and opening operations to fill in holes and remove noise.


With the nuclei mask in hand, we need to identify initial markers in order to run the Watershed algorithm. Spateo offers a few different ways to obtain these markers. spateo.segmentation.find_peaks() can be used to find local peaks in the staining image and use those as the Watershed markers, or spateo.segmentation.find_peaks_from_mask() can be used to find markers from the nuclei mask itself by finding the peaks in the distance transform (spateo.segmentation.find_peaks_with_erosion() approximates this process and may be faster in some cases).

Finally, the Watershed algorithm is applied on a blurred stain image, limiting the labels to within the mask obtained in the first step, using the initial markers from the previous step. This is done using the function spateo.segmentation.watershed().

Deep-learning-based segmentation#

Spateo provides a variety of existing deep-learning-based segmentation models. These include:

In our experiments, StarDist performed the most consistently and is the recommended method to try first. We’ve also found that performing Contrast Limited Adaptive Histogram Equalization (CLAHE) improves performance of these models, and is implemented with the equalize argument.

Integrating both segmentation approaches#

The Watershed-based and deep learning-based segmentation approaches have their own advantages. The former tends to perform reasonably well on most staining images and does not require any kind of training. However, the resulting segmentation tends to be “choppy” and sometimes breaks convexity. The latter performs very well on images similar to those they were trained on and results in more evenly-shaped segmentations, but does poorly on outlier regions. In particular, we’ve found that the deep learning-based approaches have difficulty identifying nuclei in dense regions, often resulting in segmentations spanning multiple nuclei, or missing them entirely.

Therefore, we implemented a simple method of integrating the Watershed-based segmentation into the deep-learning-based segmentation. For every label in the Watershed segmentation that does not overlap with any label in the deep learning-based segmentation, the label is copied to the deep-learning-based segmentation, and for every label in the deep learning-based segmentation that does not overlap with any label in the Watershed-based segementation, the label is removed. This effectively results in “filling the gaps” (“augmenting”) in the deep-learning segmentation using the Watershed segmentation, and removing noise amplified when applying CLAHE.

Identifying cytoplasm#

Taking advantage of the fact that most nuclei labeling strategies also very weakly label the cytoplasm, cytoplasmic regions can be identified by using a very lenient threshold. Then, cells can be labeled by iteratively expanding the nuclei labels by a certain distance.

RNA segmentation#

Spateo features several novel methods for cell segmentation using only RNA signal. As with any approximation/estimation method, some level of dropouts and noise is expected. A key requirement for RNA-only segmentation is that the unit of measure (“pixel”) is much smaller than the expected size of a cell. Otherwise, there is no real benefit of attempting cell segmentation because each pixel most likely contains more than one cell.

Segmentation functions

Labeling functions

Density binning#

We’ve found in testing that the RNA-only segmentation method, without separating out regions with different RNA (UMI) densities, tends to be too lenient in RNA-dense regions and too strict in RNA-sparse regions. Therefore, unless the UMI counts are sufficiently sparse (such as the case when only certain genes are being used), we recommend to first separate out the pixels into different “bins” according to their RNA density. Then, RNA- only segmentation can be performed on each bin separately, resulting in a better-calibrated algorithm.

One important detail is that density binning, as with any kind of clustering, is highly subjective. We suggest testing various parameters to find one that qualitatively “makes sense” to you.

Spateo employs a spatially-constrained hierarchical Ward clustering approach to segment the tissue into different RNA densities. Each pixel (or bin, if binning is used) is considered as an observation of a single feature: the number of UMIs observed for that pixel. Spatial constraints are imposed by providing a connectivity matrix with an edge between each neighboring pixel (so each pixel has four neighbors).

There are a few additional considerations in this approach.

  • As briefly noted in the previous paragraph, pixels can be grouped in the square bins (where the UMI count of each bin is simply the sum of UMIs of its constituent pixels). We highly recommend binning as it drastically reduces runtime with minimal downsides.

  • The pixel UMI counts (or equivalently, the binned UMI counts) are Gaussian-blurred prior to clustering to reduce noise. The size of the kernel is controlled by the k parameter.

  • After the pixels (or bins) have been clustered, each bin is “dilated” one at a time, in ascending order of mean UMI counts. The size of the kernel is controlled by the dk parameter. This is done to reduce sharp transitions between bins in downstream RNA-based cell segmentation that we’ve observed in our testing.

Segmentation approaches#

All approaches follow the following general steps.

  1. Apply a 2D convolution (which may be Gaussian or summation) to the UMI count image. The size of the convolution is controlled with the k parameter.

  2. Obtain per-pixel scores, usually in the range [0, 1], indicating how likely each pixel is occupied by a cell.

  3. Apply a threshold to these scores, which is either computed using Otsu’s method or manually provided with the threshold parameter.

  4. Apply morphological opening and closing with size mk to fill in holes and remove noise. By default, this value is set to k+2 when using the Negative binomial mixture model, otherwise to k-2.

Each of the supported methods differ in how the per-pixel scores (step 2) are calculated.

Gaussian blur (gauss)#

Apply Gaussian blur of the specified size, followed by [0, 1]-normalization to obtain per-pixel scores.

Moran’s I (moran)#

Apply Gaussian blur of the specified size, followed by computing the Moran’s I using a Gaussian kernel (the size of which is controlled by providing k to moran_kwargs.) Per-pixel scores are computed as the Moran’s I value of each pixel, given that the p-value is less than p_threshold (default 0.05). Pixels who’s p-value is greater than or equal to this threshold are set to have a score of zero (indicating no spatial autocorrelation). This is the only method which may produce negative score values (indicating anticorrelation).

Negative binomial mixture model (methods including EM or VI)#

The negative binomial distribution is widely used to model count data. It has been applied to many different kinds of biological data (such as sequencing data [5]), and this is also how Spateo models the number of UMIs detected per pixel. For the purpose of cell segmentation, any pixel is either occupied or unoccupied by a cell. This is modeled as a two-component negative binomial mixture model, with one component for the UMIs detected in pixels occupied by a cell and the other for those unoccupied (“background”). Mathematically,

$$P(X|p,r,r',\theta,\theta')=p \cdot NB(X|r,\theta)+(1-p) \cdot NB(X|r',\theta')$$


\begin{align} X &\text{: number of observed UMIs}\\ p &\text{: proportion of occupied pixels}\\ NB(\cdot|r,\theta) &\text{: negative binomial PDF with parameters } r, \theta\\ r,\theta &\text{: parameters to the negative binomial for occupied pixels}\\ r',\theta' &\text{: parameters to the negative binomial for unoccupied pixels} \end{align}

Ultimately, we wish to obtain estimates for \(r,r',\theta,\theta'\). Spateo offers two strategies: expectation-maximization (EM) [6] and a custom variational inference (VI) model. The latter is implemented with Pyro, which is a Bayesian modeling and estimation framework built on top of PyTorch.

Once the desired parameter estimates are obtained, likelihoods of obtaining the number of observed UMIs \(X\), for each pixel \((x,y)\), conditional on the pixel being occupied and unoccupied, are calculated.

\begin{align} P_{(x,y)}(X|r,\theta) &\triangleq P_{(x,y)}(X|occupied)\\ P_{(x,y)}(X|r',\theta') &\triangleq P_{(x,y)}(X|unoccupied) \end{align}

Finally, using Bayes’ theorem,

$$P_{(x,y)}(occupied|X_{(x,y)}) = \frac{p \cdot P_{(x,y)}(X|r,\theta)}{P(X|p,r,r',\theta,\theta')}$$

which are used as the per-pixel scores.

Belief propgataion (methods including BP)#

One important caveat of the Negative binomial mixture model is that it does not yield the marginal probabilities \(P_{(x,y)}(occupied),P_{(x,y)}(unoccupied)\). In order to obtain these probabilities directly, Spateo can apply an efficient belief propagation algorithm. An undirected graphical model is constructed by considering each pixel as a node, and edges (“potentials”) between neighboring pixels (a.k.a. a grid Markov random field).

Markov Random Field [7]

Each pixel has two possible states: occupied or unoccupied. The conditional probabilities obtained with the Negative binomial mixture model are used as the node potentials, and the edge potentials are defined in such a way that it is more probable for two connected nodes have the same state. This encodes the expectation that if a pixel is occupied, then its neighbors are also likely to also be occupied (and vice-versa). Loopy belief propagation is run on this graph until convergence, which yields estimates for the desired marginal probabilities. The estimated marginal probability that each pixel is occupied, \(\tilde{P}_{(x,y)}(occupied)\) is used as the per-pixel scores.

Labeling approaches#

A variety of labeling approaches can be used to identify individual cells from the RNA-based mask.


In testing, we have found that the labeling approach for RNA-based segmentation perform poorly when using total RNA. Therefore, we recommend only using this approach when using fairly sparse RNA, such as specific genes (i.e. nuclearly localized genes) or unspliced RNA.

Connected components#

Connected-component labeling may be used to identify cell labels. Traditionally, connected-component labeling simply labels each connected “blob” of pixels as a single label. However, Spateo includes a modified version of this algorithm, spateo.segmentation.label_connected_components(), that provides various options to control its behavior. area_threshold=np.inf may be provided to perform traditional connected-component labeling.

The key difference is the addition of an area-aware splitting procedure. Briefly, after obtaining initial labels via traditional connected-component labeling, we attempt to split components that are larger than area_threshold pixels (default 500) by iteratively eroding these large components (this procedure will slowly “split” large connected components), while retaining components less than or equal to min_area (default 100). Then, all large connected components have been split, these new, smaller components are dilated up to a distance of distance (default 8), up to a maximum area of max_area (default 400).

Additionally, the seed_layer argument can be used to provide a set of “seed” labels. When this option is provided, any connected component that contains at least one seed label will not undergo the iterative erosion operation described in the previous paragraph. Instead, it will only undergo the iterative dilation operation using the seed labels as the initial components. This has the effect of labeling new components (using the modified connected-component labeling strategy described in the previous paragraph), while also retaining labels identified previously. This option is useful when transfering nuclei labels (obtained from nuclearly localized RNA and/or unspliced RNA) to cell labels (obtained from total RNA).


An approach similar to that used to label cells in Watershed-based segmentation may also be used. spateo.segmentation.find_peaks() can be used to find local peaks in the RNA image and use those as the Watershed markers, or spateo.segmentation.find_peaks_from_mask() can be used to find markers from the RNA-based mask itself by finding the peaks in the distance transform (spateo.segmentation.find_peaks_with_erosion() approximates this process and may be faster in some cases).

Finally, the Watershed algorithm is applied, limiting the labels to within the mask obtained in the first step, using the initial markers from the previous step. This is done using the function spateo.segmentation.watershed(). Either a blurred RNA image or the distance transform (if spateo.segmentation.find_peaks_from_mask() was used) may be used as the raw values for Watershed.


Some spatial assays may have artificial “patterns” on the spatial surface (a.k.a. chip). In particular, Stereo-seq chips contain horizontal and veritcal “tracklines” that are used to register the stain and RNA images, but also interferes with RNA capture. In such cases, using the RNA image directly as the Watershed values will result in discontinuities in the labels at these regions and is not recommended.

Expand labels#

The spateo.segmentation.expand_labels() function with the mask_layer argument can also be used to obtain a labeling. This approach requires you to already have a set of initial labels (via one of the two previously labeling approaches) and is used to “expand” each of these labels by a certain distance distance (default 5), up to a maximum area max_area (default 400), while keeping the labels within the mask provided with mask_layer.