spateo.segmentation#

Spatiotemporal modeling of spatial transcriptomics

Subpackages#

Submodules#

Package Contents#

Functions#

refine_alignment(adata[, stain_layer, rna_layer, ...])

Refine the alignment between the staining image and RNA coordinates.

compare(), seed)

Compute segmentation statistics.

merge_densities(adata, layer[, mapping, out_layer])

Merge density bins either using an explicit mapping or in a semi-supervised

segment_densities(adata, layer, binsize, k, dk[, ...])

Segment into regions by UMI density.

cellpose(adata[, model, diameter, normalize, ...])

Run Cellpose to label cells from a staining image.

deepcell(adata[, model, equalize, layer, out_layer])

Run DeepCell to label cells from a staining image.

stardist(adata, model, 2D_versatile_he, ...)

Run StarDist to label cells from a staining image.

mask_cells_from_stain(adata[, otsu_classes, ...])

Create a boolean mask indicating cells from stained image.

mask_nuclei_from_stain(adata[, otsu_classes, ...])

Create a boolean mask indicating nuclei from stained nuclei image, and

score_and_mask_pixels(adata, layer, k, method[, ...])

Score and mask pixels by how likely it is occupied.

augment_labels(adata, source_layer, target_layer[, ...])

Augment the labels in one label array using the labels in another.

expand_labels(adata, layer[, distance, max_area, ...])

Expand labels up to a certain distance.

find_peaks(adata, layer, k, min_distance[, ...])

Find peaks from an array.

find_peaks_from_mask(adata, layer, min_distance[, ...])

Find peaks from a boolean mask. Used to obatin Watershed markers.

find_peaks_with_erosion(adata[, layer, k, square, ...])

Find peaks for use in Watershed via iterative erosion.

label_connected_components(adata, layer[, seed_layer, ...])

Label connected components while splitting components that are too large.

watershed(adata[, layer, k, mask_layer, ...])

Assign individual nuclei/cells using the Watershed algorithm.

generate_random_labels(adata, areas[, seed, out_layer])

Create random labels, usually for benchmarking and QC purposes.

generate_random_labels_like(adata, layer[, seed, ...])

Create random labels, using another layer as a template.

select_qc_regions(adata[, regions, n, size, seed, ...])

Select regions to use for segmentation quality control purposes.

spateo.segmentation.refine_alignment(adata: anndata.AnnData, stain_layer: str = SKM.STAIN_LAYER_KEY, rna_layer: str = SKM.UNSPLICED_LAYER_KEY, mode: typing_extensions.Literal[rigid, non - rigid] = 'rigid', downscale: float = 1, k: int = 5, n_epochs: int = 100, transform_layers: str | List[str] | None = None, **kwargs)[source]#

Refine the alignment between the staining image and RNA coordinates.

There are often small misalignments between the staining image and RNA, which results in incorrect aggregation of pixels into cells based on staining. This function attempts to refine these alignments based on the staining and (unspliced) RNA masks.

Parameters:
adata

Input Anndata

stain_layer

Layer containing staining image.

rna_layer

Layer containing (unspliced) RNA.

mode

The alignment mode. Two modes are supported: * rigid: A global alignment method that finds a rigid (affine)

transformation matrix

  • non-rigid: A semi-local alignment method that finds a thin-plate-spline

    with a mesh of certain size. By default, each cell in the mesh consists of 1000 x 1000 pixels. This value can be modified by providing a binsize argument to this function (specifically, as part of additional **kwargs).

downscale

Downscale matrices by this factor to reduce memory and runtime.

k

Kernel size for Gaussian blur of the RNA matrix.

n_epochs

Number of epochs to run optimization

transform_layers

Layers to transform and overwrite inplace.

**kwargs

Additional keyword arguments to pass to the Pytorch module.

spateo.segmentation.compare(adata: anndata.AnnData, true_layer: str, pred_layer: str, data_layer: str = SKM.X_LAYER, umi_pixels_only: bool = True, random_background: bool = True, ap_taus: Tuple[int, Ellipsis] = tuple(np.arange(0.5, 1, 0.05)), seed: int | None = None) pandas.DataFrame[source]#

Compute segmentation statistics.

Parameters:
adata

Input Anndata

true_layer

Layer containing true labels

pred_layer

Layer containing predicted labels

data_layer

Layer containing UMIs

umi_pixels_only

Whether or not to only consider pixels that have at least one UMI captured (as determined by data_layer).

random_background

Simulate random background by randomly permuting the pred_layer labels and computing the same statistics against true_layer. The returned DataFrame will have an additional column for these statistics.

ap_taus

Tau thresholds to calculate average precision. Defaults to 0.05 increments starting at 0.5 and ending at (and including) 0.95.

seed

Random seed.

Returns:

Pandas DataFrame containing classification and labeling statistics

spateo.segmentation.merge_densities(adata: anndata.AnnData, layer: str, mapping: Dict[int, int] | None = None, out_layer: str | None = None)[source]#

Merge density bins either using an explicit mapping or in a semi-supervised way.

Parameters:
adata

Input Anndata

layer

Layer that was used to generate density bins. Defaults to using {layer}_bins. If not present, will be taken as a literal.

mapping

Mapping to use to transform bins

out_layer

Layer to store results. Defaults to same layer as input.

spateo.segmentation.segment_densities(adata: anndata.AnnData, layer: str, binsize: int, k: int, dk: int, distance_threshold: float | None = None, background: Tuple[int, int] | typing_extensions.Literal[False] | None = None, out_layer: str | None = None)[source]#

Segment into regions by UMI density.

The tissue is segmented into UMI density bins according to the following procedure. 1. The UMI matrix is binned according to binsize (recommended >= 20). 2. The binned UMI matrix (from the previous step) is Gaussian blurred with

kernel size k. Note that k is in terms of bins, not pixels.

  1. The elements of the blurred, binned UMI matrix is hierarchically clustered

    with Ward linkage, distance threshold distance_threshold, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.

  2. Each density bin is diluted with kernel size dk, starting from the

    bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.

  3. If background is not provided, the density bin that is most common in the

    perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a (x, y) tuple instead. This feature can be turned off by providing False.

  4. The density bin matrix is resized to be the same size as the original UMI

    matrix.

Parameters:
adata

Input Anndata

layer

Layer that contains UMI counts to segment based on.

binsize

Size of bins to use. For density segmentation, pixels are binned to reduce runtime. 20 is usually a good starting point. Note that this value is relative to the original binsize used to read in the AnnData.

k

Kernel size for Gaussian blur, in bins

dk

Kernel size for final dilation, in bins

distance_threshold

Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.

background

Pixel that should be categorized as background. By default, the bin that is most assigned to the outermost pixels are categorized as background. Set to False to turn off background detection.

out_layer

Layer to put resulting bins. Defaults to {layer}_bins.

spateo.segmentation.cellpose(adata: anndata.AnnData, model: typing_extensions.Literal[cyto, nuclei] | cellpose.models.CellposeModel = 'nuclei', diameter: int | None = None, normalize: bool = True, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]#

Run Cellpose to label cells from a staining image.

Parameters:
adata

Input Anndata

model

Cellpose model to use. Can be one of the two pretrained models: * cyto: Labeled cytoplasm * nuclei: Labeled nuclei Or any generic CellposeModel model.

diameter

Expected diameter of each segmentation (cells for model=”cyto”, nuclei for model=”nuclei”). Can be None to run automatic detection.

normalize

Whether or not to percentile-normalize the image. This is an argument to Cellpose.eval().

equalize

Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.

layer

Layer that contains staining image. Defaults to stain.

out_layer

Layer to put resulting labels. Defaults to {layer}_labels.

**kwargs

Additional keyword arguments to Cellpose.eval() function.

Returns:

Numpy array containing cell labels.

spateo.segmentation.deepcell(adata: anndata.AnnData, model: deepcell.applications.Application | None = None, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]#

Run DeepCell to label cells from a staining image.

Parameters:
adata

Input Anndata

model

DeepCell model to use

equalize

Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.

layer

Layer that contains staining image. Defaults to stain.

out_layer

Layer to put resulting labels. Defaults to {layer}_labels.

**kwargs

Additional keyword arguments to Application.predict() function.

Returns:

Numpy array containing cell labels.

spateo.segmentation.stardist(adata: anndata.AnnData, model: Union[typing_extensions.Literal[2D_versatile_fluo, 2D_versatile_he, 2D_paper_dsb2018], stardist.models.StarDist2D] = '2D_versatile_fluo', tilesize: int = 2000, min_overlap: Optional[int] = None, context: Optional[int] = None, normalizer: Optional[csbdeep.data.Normalizer] = PercentileNormalizer(), equalize: float = 2.0, sanitize: bool = True, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs)[source]#

Run StarDist to label cells from a staining image.

Note

When using min_overlap, the crucial assumption is that all predicted object instances are smaller than the provided min_overlap. Also, it must hold that: min_overlap + 2*context < tilesize. https://github.com/stardist/stardist/blob/858cae17cf17f979122000ad2294a156d0547135/stardist/models/base.py#L776

Parameters:
adata

Input Anndata

img

Image as a Numpy array.

model

Stardist model to use. Can be one of the three pretrained models from StarDist2D: 1. ‘2D_versatile_fluo’: ‘Versatile (fluorescent nuclei)’ 2. ‘2D_versatile_he’: ‘Versatile (H&E nuclei)’ 3. ‘2D_paper_dsb2018’: ‘DSB 2018 (from StarDist 2D paper)’ Or any generic Stardist2D model.

tilesize

Run prediction separately on tiles of size tilesize x tilesize and merge them afterwards. Useful to avoid out-of-memory errors. Can be set to <= 0 to disable tiling. When min_overlap is also provided, this becomes the block_size parameter to StarDist2D.predict_instances_big().

min_overlap

Amount of guaranteed overlaps between tiles.

context

Amount of image context on all sides of a tile, which is dicarded. Only used when min_overlap is not None. By default, an automatic estimate is used.

normalizer

Normalizer to use to perform normalization prior to prediction. By default, percentile-based normalization is performed. None may be provided to disable normalization.

equalize

Controls the clip_limit argument to the clahe() function. Set this value to a non-positive value to turn off equalization.

sanitize

Whether to sanitize disconnected labels.

layer

Layer that contains staining image. Defaults to stain.

out_layer

Layer to put resulting labels. Defaults to {layer}_labels.

**kwargs

Additional keyword arguments to pass to StarDist2D.predict_instances().

spateo.segmentation.mask_cells_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, mk: int = 7, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]#

Create a boolean mask indicating cells from stained image.

Parameters:
adata

Input Anndata

otsu_classes

Number of classes to assign pixels to for cell detection

otsu_index

Which threshold index should be used for classifying cells. All pixel intensities >= the value at this index will be classified as cell.

mk

Size of the kernel used for morphological close and open operations applied at the very end.

layer

Layer that contains staining image.

out_layer

Layer to put resulting nuclei mask. Defaults to {layer}_mask.

spateo.segmentation.mask_nuclei_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, local_k: int = 55, offset: int = 5, mk: int = 5, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]#

Create a boolean mask indicating nuclei from stained nuclei image, and save this mask in the AnnData as an additional layer.

Parameters:
adata

Input Anndata

otsu_classes

Number of classes to assign pixels to for background detection.

otsu_index

Which threshold index should be used for background. All pixel intensities less than the value at this index will be classified as background.

local_k

The size of the local neighborhood of each pixel to use for local (adaptive) thresholding to identify the foreground (i.e. nuclei).

offset

Offset to local thresholding values such that values > 0 lead to more “strict” thresholding, and therefore may be helpful in distinguishing nuclei in dense regions.

mk

Size of the kernel used for morphological close and open operations applied at the very end.

layer

Layer containing nuclei staining

out_layer

Layer to put resulting nuclei mask. Defaults to {layer}_mask.

spateo.segmentation.score_and_mask_pixels(adata: anndata.AnnData, layer: str, k: int, method: typing_extensions.Literal[gauss, spateo.segmentation.moran, EM, EM + gauss, EM + BP, VI + gauss, VI + BP], moran_kwargs: dict | None = None, em_kwargs: dict | None = None, vi_kwargs: dict | None = None, bp_kwargs: dict | None = None, threshold: float | None = None, use_knee: bool | None = False, mk: int | None = None, bins_layer: typing_extensions.Literal[False] | str | None = None, certain_layer: str | None = None, scores_layer: str | None = None, mask_layer: str | None = None)[source]#

Score and mask pixels by how likely it is occupied.

Parameters:
adata

Input Anndata

layer

Layer that contains UMI counts to use

k

Kernel size for convolution

method

Method to use to obtain per-pixel scores. Valid methods are: gauss: Gaussian blur moran: Moran’s I based method EM: EM algorithm to estimate cell and background expression

parameters.

EM+gauss: Negative binomial EM algorithm followed by Gaussian blur. EM+BP: EM algorithm followed by belief propagation to estimate the

marginal probabilities of cell and background.

VI+gauss: Negative binomial VI algorithm followed by Gaussian blur.

Note that VI also supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.

VI+BP: VI algorithm followed by belief propagation. Note that VI also

supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.

moran_kwargs

Keyword arguments to the moran.run_moran() function.

em_kwargs

Keyword arguments to the em.run_em() function.

bp_kwargs

Keyword arguments to the bp.run_bp() function.

threshold

Score cutoff, above which pixels are considered occupied. By default, a threshold is automatically determined by using Otsu thresholding.

use_knee

Whether to use knee point as threshold. By default is False. If True, threshold would be ignored.

mk

Kernel size of morphological open and close operations to reduce noise in the mask. Defaults to k`+2 if EM or VI is run. Otherwise, defaults to `k-2.

bins_layer

Layer containing assignment of pixels into bins. Each bin is considered separately. Defaults to {layer}_bins. This can be set to False to disable binning, even if the layer exists.

certain_layer

Layer containing a boolean mask indicating which pixels are certain to be occupied. If the array is not a boolean array, it is casted to boolean.

scores_layer

Layer to save pixel scores before thresholding. Defaults to {layer}_scores.

mask_layer

Layer to save the final mask. Defaults to {layer}_mask.

spateo.segmentation.augment_labels(adata: anndata.AnnData, source_layer: str, target_layer: str, out_layer: str | None = None)[source]#

Augment the labels in one label array using the labels in another.

Parameters:
adata

Input Anndata

source_layer

Layer containing source labels to (possibly) take labels from.

target_layer

Layer containing target labels to augment.

out_layer

Layer to save results. Defaults to {target_layer}_augmented.

spateo.segmentation.expand_labels(adata: anndata.AnnData, layer: str, distance: int = 5, max_area: int = 400, mask_layer: str | None = None, out_layer: str | None = None)[source]#

Expand labels up to a certain distance.

Parameters:
adata

Input Anndata

layer

Layer from which the labels were derived. Then, {layer}_labels is used as the labels. If not present, it is taken as a literal.

distance

Distance to expand. Internally, this is used as the number of iterations of distance 1 dilations.

max_area

Maximum area of each label.

mask_layer

Layer containing mask to restrict expansion to within.

out_layer

Layer to save results. By default, uses {layer}_labels_expanded.

spateo.segmentation.find_peaks(adata: anndata.AnnData, layer: str, k: int, min_distance: int, mask_layer: str | None = None, out_layer: str | None = None)[source]#

Find peaks from an array.

Parameters:
adata

Input AnnData

layer

Layer to use as values to find peaks from.

k

Apply a Gaussian blur with this kernel size prior to peak detection.

min_distance

Minimum distance, in pixels, between peaks.

mask_layer

Find peaks only in regions specified by the mask.

out_layer

Layer to save identified peaks as markers. By default, uses {layer}_markers.

spateo.segmentation.find_peaks_from_mask(adata: anndata.AnnData, layer: str, min_distance: int, distances_layer: str | None = None, markers_layer: str | None = None)[source]#

Find peaks from a boolean mask. Used to obatin Watershed markers.

Parameters:
adata

Input AnnData

layer

Layer containing boolean mask. This will default to {layer}_mask. If not present in the provided AnnData, this argument used as a literal.

min_distance

Minimum distance, in pixels, between peaks.

distances_layer

Layer to save distance from each pixel to the nearest zero (False) pixel (a.k.a. distance transform). By default, uses {layer}_distances.

markers_layer

Layer to save identified peaks as markers. By default, uses {layer}_markers.

spateo.segmentation.find_peaks_with_erosion(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, square: bool = False, min_area: int = 80, n_iter: int = -1, float_k: int = 5, float_threshold: float | None = None, out_layer: str | None = None)[source]#

Find peaks for use in Watershed via iterative erosion.

Parameters:
adata

Input Anndata

layer

Layer that was used to create scores or masks. If {layer}_scores is present, that is used. Otherwise if {layer}_mask is present, that is used. Otherwise, the layer is taken as a literal.

k

Erosion kernel size

square

Whether to use a square kernel

min_area

Minimum area

n_iter

Number of erosions to perform.

float_k

Morphological close and open kernel size when X is a float array.

float_threshold

Threshold to use to determine connected components when X is a float array. By default, a threshold is automatically determined by using Otsu method.

out_layer

Layer to save results. By default, this will be {layer}_markers.

spateo.segmentation.label_connected_components(adata: anndata.AnnData, layer: str, seed_layer: str | None = None, area_threshold: int = 500, k: int = 3, min_area: int = 100, n_iter: int = -1, distance: int = 8, max_area: int = 400, out_layer: str | None = None)[source]#

Label connected components while splitting components that are too large.

Parameters:
adata

Input Anndata

layer

Data layer that was used to generate the mask. First, will look for {layer}_mask. Otherwise, this will be use as a literal.

seed_layer

Layer containing seed labels. These are labels that should be used whenever possible in labeling connected components.

area_threshold

Connected components with area greater than this value will be split into smaller portions by first eroding and then expanding.

k

Kernel size for erosion.

min_area

Don’t erode labels smaller than this area.

n_iter

Number of erosion operations. -1 means continue eroding until every label is less than min_area.

distance

Distance to expand eroded labels.

max_area

Maximum area when expanding labels.

out_layer

Layer to save results. Defaults to {layer}_labels.

Returns:

New label array

spateo.segmentation.watershed(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, mask_layer: str | None = None, markers_layer: str | None = None, out_layer: str | None = None)[source]#

Assign individual nuclei/cells using the Watershed algorithm.

Parameters:
adata

Input AnnData

layer

Original data layer from which segmentation will derive from.

k

Size of the kernel to use for Gaussian blur.

mask_layer

Layer containing mask. This will default to {layer}_mask.

markers_layer

Layer containing Watershed markers. This will default to {layer}_markers. May either be a boolean or integer array. If this is a boolean array, the markers are identified by calling cv2.connectedComponents.

out_layer

Layer to save results. Defaults to {layer}_labels.

spateo.segmentation.generate_random_labels(adata: anndata.AnnData, areas: List[int], seed: int | None = None, out_layer: str = 'random_labels')[source]#

Create random labels, usually for benchmarking and QC purposes.

Parameters:
adata

Input Anndata

areas

List of desired areas.

seed

Random seed.

out_layer

Layer to save results.

spateo.segmentation.generate_random_labels_like(adata: anndata.AnnData, layer: str, seed: int | None = None, out_layer: str = 'random_labels')[source]#

Create random labels, using another layer as a template.

Parameters:
adata

Input Anndata

layer

Layer containing template labels

seed

Random seed.

out_layer

Layer to save results.

spateo.segmentation.select_qc_regions(adata: anndata.AnnData, regions: List[Tuple[int, int]] | List[Tuple[int, int, int, int]] = None, n: int = 4, size: int = 2000, seed: int | None = None, use_scale: bool = True, absolute: bool = False, weight_func: Callable[[anndata.AnnData], float] | None = lambda adata: ...)[source]#

Select regions to use for segmentation quality control purposes.

Note

All coordinates are in terms of “real” coordinates (i.e. the coordinates in adata.obs_names and adata.var_names) so that slicing the AnnData retains the regions correctly.

Parameters:
adata

Input AnnData

regions

List of tuples in the form (xmin, ymin) or (xmin, xmax, ymin, ymax). If the later, the size argument is used to compute the bounding box.

n

Number of regions to select if regions is not provided.

size

Width and height, in pixels, of each randomly selected region.

seed

Random seed.

use_scale

Whether or not the provided regions are in scale units. This option only has effect when regions are provided. False means the provided coordinates are in terms of pixels.

absolute

Whether or not the provided regions are in terms of absolute X and Y coordinates. This option only has effect when regions are provided. False means the provided coordinates are relative with respect to the coordinates in the provided adata.

weight_func

Weighting function when regions is not provided. The probability of selecting each size x size region will be weighted by this function, which accepts a single AnnData (the region) as its argument, and returns a single float weight, such that higher weights mean higher probability. By default, the log1p of the sum of the counts in the X layer is used. Set to None to weight each region equally.