spateo.segmentation¶

Spatiotemporal modeling of spatial transcriptomics

Submodules¶

Functions¶

`refine_alignment`(adata[, stain_layer, rna_layer, ...])	Refine the alignment between the staining image and RNA coordinates.
`compare`(), seed)	Compute segmentation statistics.
`merge_densities`(adata, layer[, mapping, out_layer])	Merge density bins either using an explicit mapping or in a semi-supervised
`segment_densities`(adata, layer, binsize, k, dk[, ...])	Segment into regions by UMI density.
`cellpose`(adata[, model, diameter, normalize, ...])	Run Cellpose to label cells from a staining image.
`deepcell`(adata[, model, equalize, layer, out_layer])	Run DeepCell to label cells from a staining image.
`stardist`(adata, model, 2D_versatile_he, ...)	Run StarDist to label cells from a staining image.
`mask_cells_from_stain`(adata[, otsu_classes, ...])	Create a boolean mask indicating cells from stained image.
`mask_nuclei_from_stain`(adata[, otsu_classes, ...])	Create a boolean mask indicating nuclei from stained nuclei image, and
`score_and_mask_pixels`(adata, layer, k, method[, ...])	Score and mask pixels by how likely it is occupied.
`augment_labels`(adata, source_layer, target_layer[, ...])	Augment the labels in one label array using the labels in another.
`expand_labels`(adata, layer[, distance, max_area, ...])	Expand labels up to a certain distance.
`find_peaks`(adata, layer, k, min_distance[, ...])	Find peaks from an array.
`find_peaks_from_mask`(adata, layer, min_distance[, ...])	Find peaks from a boolean mask. Used to obatin Watershed markers.
`find_peaks_with_erosion`(adata[, layer, k, square, ...])	Find peaks for use in Watershed via iterative erosion.
`label_connected_components`(adata, layer[, seed_layer, ...])	Label connected components while splitting components that are too large.
`watershed`(adata[, layer, k, mask_layer, ...])	Assign individual nuclei/cells using the Watershed algorithm.
`generate_random_labels`(adata, areas[, seed, out_layer])	Create random labels, usually for benchmarking and QC purposes.
`generate_random_labels_like`(adata, layer[, seed, ...])	Create random labels, using another layer as a template.
`select_qc_regions`(adata[, regions, n, size, seed, ...])	Select regions to use for segmentation quality control purposes.

Package Contents¶

spateo.segmentation.refine_alignment(adata: anndata.AnnData, stain_layer: str = SKM.STAIN_LAYER_KEY, rna_layer: str = SKM.UNSPLICED_LAYER_KEY, mode: typing_extensions.Literal[rigid, non - rigid] = 'rigid', downscale: float = 1, k: int = 5, n_epochs: int = 100, transform_layers: str | List[str] | None = None, **kwargs)[source]¶

Refine the alignment between the staining image and RNA coordinates.

There are often small misalignments between the staining image and RNA, which results in incorrect aggregation of pixels into cells based on staining. This function attempts to refine these alignments based on the staining and (unspliced) RNA masks.

Parameters:

adata

Input Anndata

stain_layer

Layer containing staining image.

rna_layer

Layer containing (unspliced) RNA.

mode

The alignment mode. Two modes are supported: * rigid: A global alignment method that finds a rigid (affine)

transformation matrix

non-rigid: A semi-local alignment method that finds a thin-plate-spline
with a mesh of certain size. By default, each cell in the mesh consists of 1000 x 1000 pixels. This value can be modified by providing a binsize argument to this function (specifically, as part of additional **kwargs).

downscale

Downscale matrices by this factor to reduce memory and runtime.

k

Kernel size for Gaussian blur of the RNA matrix.

n_epochs

Number of epochs to run optimization

transform_layers

Layers to transform and overwrite inplace.

**kwargs

Additional keyword arguments to pass to the Pytorch module.

spateo.segmentation.compare(adata: anndata.AnnData, true_layer: str, pred_layer: str, data_layer: str = SKM.X_LAYER, umi_pixels_only: bool = True, random_background: bool = True, ap_taus: Tuple[int, Ellipsis] = tuple(np.arange(0.5, 1, 0.05)), seed: int | None = None) → pandas.DataFrame[source]¶

Compute segmentation statistics.

Parameters:

adata: Input Anndata
true_layer: Layer containing true labels
pred_layer: Layer containing predicted labels
data_layer: Layer containing UMIs
umi_pixels_only: Whether or not to only consider pixels that have at least one UMI captured (as determined by data_layer).
random_background: Simulate random background by randomly permuting the pred_layer labels and computing the same statistics against true_layer. The returned DataFrame will have an additional column for these statistics.
ap_taus: Tau thresholds to calculate average precision. Defaults to 0.05 increments starting at 0.5 and ending at (and including) 0.95.
seed: Random seed.

Returns:

Pandas DataFrame containing classification and labeling statistics

spateo.segmentation.merge_densities(adata: anndata.AnnData, layer: str, mapping: Dict[int, int] | None = None, out_layer: str | None = None)[source]¶

Merge density bins either using an explicit mapping or in a semi-supervised way.

Parameters:

adata: Input Anndata
layer: Layer that was used to generate density bins. Defaults to using {layer}_bins. If not present, will be taken as a literal.
mapping: Mapping to use to transform bins
out_layer: Layer to store results. Defaults to same layer as input.

spateo.segmentation.segment_densities(adata: anndata.AnnData, layer: str, binsize: int, k: int, dk: int, distance_threshold: float | None = None, background: Tuple[int, int] | typing_extensions.Literal[False] | None = None, out_layer: str | None = None)[source]¶

Segment into regions by UMI density.

The tissue is segmented into UMI density bins according to the following procedure. 1. The UMI matrix is binned according to binsize (recommended >= 20). 2. The binned UMI matrix (from the previous step) is Gaussian blurred with

kernel size k. Note that k is in terms of bins, not pixels.

The elements of the blurred, binned UMI matrix is hierarchically clustered
with Ward linkage, distance threshold distance_threshold, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.
Each density bin is diluted with kernel size dk, starting from the
bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.
If background is not provided, the density bin that is most common in the
perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a (x, y) tuple instead. This feature can be turned off by providing False.
The density bin matrix is resized to be the same size as the original UMI
matrix.

Parameters:

adata: Input Anndata
layer: Layer that contains UMI counts to segment based on.
binsize: Size of bins to use. For density segmentation, pixels are binned to reduce runtime. 20 is usually a good starting point. Note that this value is relative to the original binsize used to read in the AnnData.
k: Kernel size for Gaussian blur, in bins
dk: Kernel size for final dilation, in bins
distance_threshold: Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.
background: Pixel that should be categorized as background. By default, the bin that is most assigned to the outermost pixels are categorized as background. Set to False to turn off background detection.
out_layer: Layer to put resulting bins. Defaults to {layer}_bins.

spateo.segmentation.cellpose(adata: anndata.AnnData, model: typing_extensions.Literal[cyto, nuclei] | cellpose.models.CellposeModel = 'nuclei', diameter: int | None = None, normalize: bool = True, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]¶

Run Cellpose to label cells from a staining image.

Parameters:

adata: Input Anndata
model: Cellpose model to use. Can be one of the two pretrained models: * cyto: Labeled cytoplasm * nuclei: Labeled nuclei Or any generic CellposeModel model.
diameter: Expected diameter of each segmentation (cells for model=”cyto”, nuclei for model=”nuclei”). Can be None to run automatic detection.
normalize: Whether or not to percentile-normalize the image. This is an argument to Cellpose.eval().
equalize: Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.
layer: Layer that contains staining image. Defaults to stain.
out_layer: Layer to put resulting labels. Defaults to {layer}_labels.
**kwargs: Additional keyword arguments to Cellpose.eval() function.

Returns:

Numpy array containing cell labels.

spateo.segmentation.deepcell(adata: anndata.AnnData, model: deepcell.applications.Application | None = None, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]¶

Run DeepCell to label cells from a staining image.

Parameters:

adata: Input Anndata
model: DeepCell model to use
equalize: Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.
layer: Layer that contains staining image. Defaults to stain.
out_layer: Layer to put resulting labels. Defaults to {layer}_labels.
**kwargs: Additional keyword arguments to Application.predict() function.

Returns:

Numpy array containing cell labels.

spateo.segmentation.stardist(adata: anndata.AnnData, model: Union[typing_extensions.Literal[2D_versatile_fluo, 2D_versatile_he, 2D_paper_dsb2018], stardist.models.StarDist2D] = '2D_versatile_fluo', tilesize: int = 2000, min_overlap: Optional[int] = None, context: Optional[int] = None, normalizer: Optional[csbdeep.data.Normalizer] = PercentileNormalizer(), equalize: float = 2.0, sanitize: bool = True, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs)[source]¶

Run StarDist to label cells from a staining image.

Note

When using min_overlap, the crucial assumption is that all predicted object instances are smaller than the provided min_overlap. Also, it must hold that: min_overlap + 2*context < tilesize. https://github.com/stardist/stardist/blob/858cae17cf17f979122000ad2294a156d0547135/stardist/models/base.py#L776

Parameters:

adata: Input Anndata
img: Image as a Numpy array.
model: Stardist model to use. Can be one of the three pretrained models from StarDist2D: 1. ‘2D_versatile_fluo’: ‘Versatile (fluorescent nuclei)’ 2. ‘2D_versatile_he’: ‘Versatile (H&E nuclei)’ 3. ‘2D_paper_dsb2018’: ‘DSB 2018 (from StarDist 2D paper)’ Or any generic Stardist2D model.
tilesize: Run prediction separately on tiles of size tilesize x tilesize and merge them afterwards. Useful to avoid out-of-memory errors. Can be set to <= 0 to disable tiling. When min_overlap is also provided, this becomes the block_size parameter to StarDist2D.predict_instances_big().
min_overlap: Amount of guaranteed overlaps between tiles.
context: Amount of image context on all sides of a tile, which is dicarded. Only used when min_overlap is not None. By default, an automatic estimate is used.
normalizer: Normalizer to use to perform normalization prior to prediction. By default, percentile-based normalization is performed. None may be provided to disable normalization.
equalize: Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.
sanitize: Whether to sanitize disconnected labels.
layer: Layer that contains staining image. Defaults to stain.
out_layer: Layer to put resulting labels. Defaults to {layer}_labels.
**kwargs: Additional keyword arguments to pass to StarDist2D.predict_instances().

spateo.segmentation.mask_cells_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, mk: int = 7, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]¶

Create a boolean mask indicating cells from stained image.

Parameters:

adata: Input Anndata
otsu_classes: Number of classes to assign pixels to for cell detection
otsu_index: Which threshold index should be used for classifying cells. All pixel intensities >= the value at this index will be classified as cell.
mk: Size of the kernel used for morphological close and open operations applied at the very end.
layer: Layer that contains staining image.
out_layer: Layer to put resulting nuclei mask. Defaults to {layer}_mask.

spateo.segmentation.mask_nuclei_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, local_k: int = 55, offset: int = 5, mk: int = 5, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]¶

Create a boolean mask indicating nuclei from stained nuclei image, and save this mask in the AnnData as an additional layer.

Parameters:

adata: Input Anndata
otsu_classes: Number of classes to assign pixels to for background detection.
otsu_index: Which threshold index should be used for background. All pixel intensities less than the value at this index will be classified as background.
local_k: The size of the local neighborhood of each pixel to use for local (adaptive) thresholding to identify the foreground (i.e. nuclei).
offset: Offset to local thresholding values such that values > 0 lead to more “strict” thresholding, and therefore may be helpful in distinguishing nuclei in dense regions.
mk: Size of the kernel used for morphological close and open operations applied at the very end.
layer: Layer containing nuclei staining
out_layer: Layer to put resulting nuclei mask. Defaults to {layer}_mask.

spateo.segmentation.score_and_mask_pixels(adata: anndata.AnnData, layer: str, k: int, method: typing_extensions.Literal[gauss, spateo.segmentation.moran, EM, EM + gauss, EM + BP, VI + gauss, VI + BP], moran_kwargs: dict | None = None, em_kwargs: dict | None = None, vi_kwargs: dict | None = None, bp_kwargs: dict | None = None, threshold: float | None = None, use_knee: bool | None = False, mk: int | None = None, bins_layer: typing_extensions.Literal[False] | str | None = None, certain_layer: str | None = None, scores_layer: str | None = None, mask_layer: str | None = None)[source]¶

Score and mask pixels by how likely it is occupied.

Parameters:

adata

Input Anndata

layer

Layer that contains UMI counts to use

k

Kernel size for convolution

method

Method to use to obtain per-pixel scores. Valid methods are: gauss: Gaussian blur moran: Moran’s I based method EM: EM algorithm to estimate cell and background expression

parameters.

EM+gauss: Negative binomial EM algorithm followed by Gaussian blur. EM+BP: EM algorithm followed by belief propagation to estimate the

marginal probabilities of cell and background.

VI+gauss: Negative binomial VI algorithm followed by Gaussian blur.: Note that VI also supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.
VI+BP: VI algorithm followed by belief propagation. Note that VI also: supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.

moran_kwargs

Keyword arguments to the moran.run_moran() function.

em_kwargs

Keyword arguments to the em.run_em() function.

bp_kwargs

Keyword arguments to the bp.run_bp() function.

threshold

Score cutoff, above which pixels are considered occupied. By default, a threshold is automatically determined by using Otsu thresholding.

use_knee

Whether to use knee point as threshold. By default is False. If True, threshold would be ignored.

mk

Kernel size of morphological open and close operations to reduce noise in the mask. Defaults to k`+2 if EM or VI is run. Otherwise, defaults to `k-2.

bins_layer

Layer containing assignment of pixels into bins. Each bin is considered separately. Defaults to {layer}_bins. This can be set to False to disable binning, even if the layer exists.

certain_layer

Layer containing a boolean mask indicating which pixels are certain to be occupied. If the array is not a boolean array, it is casted to boolean.

scores_layer

Layer to save pixel scores before thresholding. Defaults to {layer}_scores.

mask_layer

Layer to save the final mask. Defaults to {layer}_mask.

spateo.segmentation.augment_labels(adata: anndata.AnnData, source_layer: str, target_layer: str, out_layer: str | None = None)[source]¶

Augment the labels in one label array using the labels in another.

Parameters:

adata: Input Anndata
source_layer: Layer containing source labels to (possibly) take labels from.
target_layer: Layer containing target labels to augment.
out_layer: Layer to save results. Defaults to {target_layer}_augmented.

spateo.segmentation.expand_labels(adata: anndata.AnnData, layer: str, distance: int = 5, max_area: int = 400, mask_layer: str | None = None, out_layer: str | None = None)[source]¶

Expand labels up to a certain distance.

Parameters:

adata: Input Anndata
layer: Layer from which the labels were derived. Then, {layer}_labels is used as the labels. If not present, it is taken as a literal.
distance: Distance to expand. Internally, this is used as the number of iterations of distance 1 dilations.
max_area: Maximum area of each label.
mask_layer: Layer containing mask to restrict expansion to within.
out_layer: Layer to save results. By default, uses {layer}_labels_expanded.

spateo.segmentation.find_peaks(adata: anndata.AnnData, layer: str, k: int, min_distance: int, mask_layer: str | None = None, out_layer: str | None = None)[source]¶

Find peaks from an array.

Parameters:

adata: Input AnnData
layer: Layer to use as values to find peaks from.
k: Apply a Gaussian blur with this kernel size prior to peak detection.
min_distance: Minimum distance, in pixels, between peaks.
mask_layer: Find peaks only in regions specified by the mask.
out_layer: Layer to save identified peaks as markers. By default, uses {layer}_markers.

spateo.segmentation.find_peaks_from_mask(adata: anndata.AnnData, layer: str, min_distance: int, distances_layer: str | None = None, markers_layer: str | None = None)[source]¶

Find peaks from a boolean mask. Used to obatin Watershed markers.

Parameters:

adata: Input AnnData
layer: Layer containing boolean mask. This will default to {layer}_mask. If not present in the provided AnnData, this argument used as a literal.
min_distance: Minimum distance, in pixels, between peaks.
distances_layer: Layer to save distance from each pixel to the nearest zero (False) pixel (a.k.a. distance transform). By default, uses {layer}_distances.
markers_layer: Layer to save identified peaks as markers. By default, uses {layer}_markers.

spateo.segmentation.find_peaks_with_erosion(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, square: bool = False, min_area: int = 80, n_iter: int = -1, float_k: int = 5, float_threshold: float | None = None, out_layer: str | None = None)[source]¶

Find peaks for use in Watershed via iterative erosion.

Parameters:

adata: Input Anndata
layer: Layer that was used to create scores or masks. If {layer}_scores is present, that is used. Otherwise if {layer}_mask is present, that is used. Otherwise, the layer is taken as a literal.
k: Erosion kernel size
square: Whether to use a square kernel
min_area: Minimum area
n_iter: Number of erosions to perform.
float_k: Morphological close and open kernel size when X is a float array.
float_threshold: Threshold to use to determine connected components when X is a float array. By default, a threshold is automatically determined by using Otsu method.
out_layer: Layer to save results. By default, this will be {layer}_markers.

spateo.segmentation.label_connected_components(adata: anndata.AnnData, layer: str, seed_layer: str | None = None, area_threshold: int = 500, k: int = 3, min_area: int = 100, n_iter: int = -1, distance: int = 8, max_area: int = 400, out_layer: str | None = None)[source]¶

Label connected components while splitting components that are too large.

Parameters:

adata: Input Anndata
layer: Data layer that was used to generate the mask. First, will look for {layer}_mask. Otherwise, this will be use as a literal.
seed_layer: Layer containing seed labels. These are labels that should be used whenever possible in labeling connected components.
area_threshold: Connected components with area greater than this value will be split into smaller portions by first eroding and then expanding.
k: Kernel size for erosion.
min_area: Don’t erode labels smaller than this area.
n_iter: Number of erosion operations. -1 means continue eroding until every label is less than min_area.
distance: Distance to expand eroded labels.
max_area: Maximum area when expanding labels.
out_layer: Layer to save results. Defaults to {layer}_labels.

Returns:

New label array

spateo.segmentation.watershed(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, mask_layer: str | None = None, markers_layer: str | None = None, out_layer: str | None = None)[source]¶

Assign individual nuclei/cells using the Watershed algorithm.

Parameters:

adata: Input AnnData
layer: Original data layer from which segmentation will derive from.
k: Size of the kernel to use for Gaussian blur.
mask_layer: Layer containing mask. This will default to {layer}_mask.
markers_layer: Layer containing Watershed markers. This will default to {layer}_markers. May either be a boolean or integer array. If this is a boolean array, the markers are identified by calling cv2.connectedComponents.
out_layer: Layer to save results. Defaults to {layer}_labels.

spateo.segmentation.generate_random_labels(adata: anndata.AnnData, areas: List[int], seed: int | None = None, out_layer: str = 'random_labels')[source]¶

Create random labels, usually for benchmarking and QC purposes.

Parameters:

adata: Input Anndata
areas: List of desired areas.
seed: Random seed.
out_layer: Layer to save results.

spateo.segmentation.generate_random_labels_like(adata: anndata.AnnData, layer: str, seed: int | None = None, out_layer: str = 'random_labels')[source]¶

Create random labels, using another layer as a template.

Parameters:

adata: Input Anndata
layer: Layer containing template labels
seed: Random seed.
out_layer: Layer to save results.

spateo.segmentation.select_qc_regions(adata: anndata.AnnData, regions: List[Tuple[int, int]] | List[Tuple[int, int, int, int]] = None, n: int = 4, size: int = 2000, seed: int | None = None, use_scale: bool = True, absolute: bool = False, weight_func: Callable[[anndata.AnnData], float] | None = lambda adata: ...)[source]¶

Select regions to use for segmentation quality control purposes.

Note

All coordinates are in terms of “real” coordinates (i.e. the coordinates in adata.obs_names and adata.var_names) so that slicing the AnnData retains the regions correctly.

Parameters:

adata: Input AnnData
regions: List of tuples in the form (xmin, ymin) or (xmin, xmax, ymin, ymax). If the later, the size argument is used to compute the bounding box.
n: Number of regions to select if regions is not provided.
size: Width and height, in pixels, of each randomly selected region.
seed: Random seed.
use_scale: Whether or not the provided regions are in scale units. This option only has effect when regions are provided. False means the provided coordinates are in terms of pixels.
absolute: Whether or not the provided regions are in terms of absolute X and Y coordinates. This option only has effect when regions are provided. False means the provided coordinates are relative with respect to the coordinates in the provided adata.
weight_func: Weighting function when regions is not provided. The probability of selecting each size x size region will be weighted by this function, which accepts a single AnnData (the region) as its argument, and returns a single float weight, such that higher weights mean higher probability. By default, the log1p of the sum of the counts in the X layer is used. Set to None to weight each region equally.