spateo.segmentation¶
Spatiotemporal modeling of spatial transcriptomics
Submodules¶
- spateo.segmentation.align
- spateo.segmentation.benchmark
- spateo.segmentation.bp
- spateo.segmentation.density
- spateo.segmentation.em
- spateo.segmentation.external
- spateo.segmentation.icell
- spateo.segmentation.label
- spateo.segmentation.moran
- spateo.segmentation.qc
- spateo.segmentation.simulation
- spateo.segmentation.simulation_evaluation
- spateo.segmentation.utils
- spateo.segmentation.vi
Functions¶
|
Refine the alignment between the staining image and RNA coordinates. |
|
Compute segmentation statistics. |
|
Merge density bins either using an explicit mapping or in a semi-supervised |
|
Segment into regions by UMI density. |
|
Run Cellpose to label cells from a staining image. |
|
Run DeepCell to label cells from a staining image. |
|
Run StarDist to label cells from a staining image. |
|
Create a boolean mask indicating cells from stained image. |
|
Create a boolean mask indicating nuclei from stained nuclei image, and |
|
Score and mask pixels by how likely it is occupied. |
|
Augment the labels in one label array using the labels in another. |
|
Expand labels up to a certain distance. |
|
Find peaks from an array. |
|
Find peaks from a boolean mask. Used to obatin Watershed markers. |
|
Find peaks for use in Watershed via iterative erosion. |
|
Label connected components while splitting components that are too large. |
|
Assign individual nuclei/cells using the Watershed algorithm. |
|
Create random labels, usually for benchmarking and QC purposes. |
|
Create random labels, using another layer as a template. |
|
Select regions to use for segmentation quality control purposes. |
Package Contents¶
- spateo.segmentation.refine_alignment(adata: anndata.AnnData, stain_layer: str = SKM.STAIN_LAYER_KEY, rna_layer: str = SKM.UNSPLICED_LAYER_KEY, mode: typing_extensions.Literal[rigid, non - rigid] = 'rigid', downscale: float = 1, k: int = 5, n_epochs: int = 100, transform_layers: str | List[str] | None = None, **kwargs)[source]¶
Refine the alignment between the staining image and RNA coordinates.
There are often small misalignments between the staining image and RNA, which results in incorrect aggregation of pixels into cells based on staining. This function attempts to refine these alignments based on the staining and (unspliced) RNA masks.
- Parameters:
- adata
Input Anndata
- stain_layer
Layer containing staining image.
- rna_layer
Layer containing (unspliced) RNA.
- mode
The alignment mode. Two modes are supported: * rigid: A global alignment method that finds a rigid (affine)
transformation matrix
- non-rigid: A semi-local alignment method that finds a thin-plate-spline
with a mesh of certain size. By default, each cell in the mesh consists of 1000 x 1000 pixels. This value can be modified by providing a binsize argument to this function (specifically, as part of additional **kwargs).
- downscale
Downscale matrices by this factor to reduce memory and runtime.
- k
Kernel size for Gaussian blur of the RNA matrix.
- n_epochs
Number of epochs to run optimization
- transform_layers
Layers to transform and overwrite inplace.
- **kwargs
Additional keyword arguments to pass to the Pytorch module.
- spateo.segmentation.compare(adata: anndata.AnnData, true_layer: str, pred_layer: str, data_layer: str = SKM.X_LAYER, umi_pixels_only: bool = True, random_background: bool = True, ap_taus: Tuple[int, Ellipsis] = tuple(np.arange(0.5, 1, 0.05)), seed: int | None = None) pandas.DataFrame [source]¶
Compute segmentation statistics.
- Parameters:
- adata
Input Anndata
- true_layer
Layer containing true labels
- pred_layer
Layer containing predicted labels
- data_layer
Layer containing UMIs
- umi_pixels_only
Whether or not to only consider pixels that have at least one UMI captured (as determined by data_layer).
- random_background
Simulate random background by randomly permuting the pred_layer labels and computing the same statistics against true_layer. The returned DataFrame will have an additional column for these statistics.
- ap_taus
Tau thresholds to calculate average precision. Defaults to 0.05 increments starting at 0.5 and ending at (and including) 0.95.
- seed
Random seed.
- Returns:
Pandas DataFrame containing classification and labeling statistics
- spateo.segmentation.merge_densities(adata: anndata.AnnData, layer: str, mapping: Dict[int, int] | None = None, out_layer: str | None = None)[source]¶
Merge density bins either using an explicit mapping or in a semi-supervised way.
- Parameters:
- adata
Input Anndata
- layer
Layer that was used to generate density bins. Defaults to using {layer}_bins. If not present, will be taken as a literal.
- mapping
Mapping to use to transform bins
- out_layer
Layer to store results. Defaults to same layer as input.
- spateo.segmentation.segment_densities(adata: anndata.AnnData, layer: str, binsize: int, k: int, dk: int, distance_threshold: float | None = None, background: Tuple[int, int] | typing_extensions.Literal[False] | None = None, out_layer: str | None = None)[source]¶
Segment into regions by UMI density.
The tissue is segmented into UMI density bins according to the following procedure. 1. The UMI matrix is binned according to binsize (recommended >= 20). 2. The binned UMI matrix (from the previous step) is Gaussian blurred with
kernel size k. Note that k is in terms of bins, not pixels.
- The elements of the blurred, binned UMI matrix is hierarchically clustered
with Ward linkage, distance threshold distance_threshold, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.
- Each density bin is diluted with kernel size dk, starting from the
bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.
- If background is not provided, the density bin that is most common in the
perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a (x, y) tuple instead. This feature can be turned off by providing False.
- The density bin matrix is resized to be the same size as the original UMI
matrix.
- Parameters:
- adata
Input Anndata
- layer
Layer that contains UMI counts to segment based on.
- binsize
Size of bins to use. For density segmentation, pixels are binned to reduce runtime. 20 is usually a good starting point. Note that this value is relative to the original binsize used to read in the AnnData.
- k
Kernel size for Gaussian blur, in bins
- dk
Kernel size for final dilation, in bins
- distance_threshold
Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.
- background
Pixel that should be categorized as background. By default, the bin that is most assigned to the outermost pixels are categorized as background. Set to False to turn off background detection.
- out_layer
Layer to put resulting bins. Defaults to {layer}_bins.
- spateo.segmentation.cellpose(adata: anndata.AnnData, model: typing_extensions.Literal[cyto, nuclei] | cellpose.models.CellposeModel = 'nuclei', diameter: int | None = None, normalize: bool = True, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]¶
Run Cellpose to label cells from a staining image.
- Parameters:
- adata
Input Anndata
- model
Cellpose model to use. Can be one of the two pretrained models: * cyto: Labeled cytoplasm * nuclei: Labeled nuclei Or any generic CellposeModel model.
- diameter
Expected diameter of each segmentation (cells for model=”cyto”, nuclei for model=”nuclei”). Can be None to run automatic detection.
- normalize
Whether or not to percentile-normalize the image. This is an argument to
Cellpose.eval()
.- equalize
Controls the clip_limit argument to the
clahe()
function. Set this value to a non-positive value to turn off equalization.- layer
Layer that contains staining image. Defaults to stain.
- out_layer
Layer to put resulting labels. Defaults to {layer}_labels.
- **kwargs
Additional keyword arguments to
Cellpose.eval()
function.
- Returns:
Numpy array containing cell labels.
- spateo.segmentation.deepcell(adata: anndata.AnnData, model: deepcell.applications.Application | None = None, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None, **kwargs)[source]¶
Run DeepCell to label cells from a staining image.
- Parameters:
- adata
Input Anndata
- model
DeepCell model to use
- equalize
Controls the clip_limit argument to the
clahe()
function. Set this value to a non-positive value to turn off equalization.- layer
Layer that contains staining image. Defaults to stain.
- out_layer
Layer to put resulting labels. Defaults to {layer}_labels.
- **kwargs
Additional keyword arguments to
Application.predict()
function.
- Returns:
Numpy array containing cell labels.
- spateo.segmentation.stardist(adata: anndata.AnnData, model: Union[typing_extensions.Literal[2D_versatile_fluo, 2D_versatile_he, 2D_paper_dsb2018], stardist.models.StarDist2D] = '2D_versatile_fluo', tilesize: int = 2000, min_overlap: Optional[int] = None, context: Optional[int] = None, normalizer: Optional[csbdeep.data.Normalizer] = PercentileNormalizer(), equalize: float = 2.0, sanitize: bool = True, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs)[source]¶
Run StarDist to label cells from a staining image.
Note
When using min_overlap, the crucial assumption is that all predicted object instances are smaller than the provided min_overlap. Also, it must hold that: min_overlap + 2*context < tilesize. https://github.com/stardist/stardist/blob/858cae17cf17f979122000ad2294a156d0547135/stardist/models/base.py#L776
- Parameters:
- adata
Input Anndata
- img
Image as a Numpy array.
- model
Stardist model to use. Can be one of the three pretrained models from StarDist2D: 1. ‘2D_versatile_fluo’: ‘Versatile (fluorescent nuclei)’ 2. ‘2D_versatile_he’: ‘Versatile (H&E nuclei)’ 3. ‘2D_paper_dsb2018’: ‘DSB 2018 (from StarDist 2D paper)’ Or any generic Stardist2D model.
- tilesize
Run prediction separately on tiles of size tilesize x tilesize and merge them afterwards. Useful to avoid out-of-memory errors. Can be set to <= 0 to disable tiling. When min_overlap is also provided, this becomes the block_size parameter to
StarDist2D.predict_instances_big()
.- min_overlap
Amount of guaranteed overlaps between tiles.
- context
Amount of image context on all sides of a tile, which is dicarded. Only used when min_overlap is not None. By default, an automatic estimate is used.
- normalizer
Normalizer to use to perform normalization prior to prediction. By default, percentile-based normalization is performed. None may be provided to disable normalization.
- equalize
Controls the clip_limit argument to the
clahe()
function. Set this value to a non-positive value to turn off equalization.- sanitize
Whether to sanitize disconnected labels.
- layer
Layer that contains staining image. Defaults to stain.
- out_layer
Layer to put resulting labels. Defaults to {layer}_labels.
- **kwargs
Additional keyword arguments to pass to
StarDist2D.predict_instances()
.
- spateo.segmentation.mask_cells_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, mk: int = 7, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]¶
Create a boolean mask indicating cells from stained image.
- Parameters:
- adata
Input Anndata
- otsu_classes
Number of classes to assign pixels to for cell detection
- otsu_index
Which threshold index should be used for classifying cells. All pixel intensities >= the value at this index will be classified as cell.
- mk
Size of the kernel used for morphological close and open operations applied at the very end.
- layer
Layer that contains staining image.
- out_layer
Layer to put resulting nuclei mask. Defaults to {layer}_mask.
- spateo.segmentation.mask_nuclei_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, local_k: int = 55, offset: int = 5, mk: int = 5, layer: str = SKM.STAIN_LAYER_KEY, out_layer: str | None = None)[source]¶
Create a boolean mask indicating nuclei from stained nuclei image, and save this mask in the AnnData as an additional layer.
- Parameters:
- adata
Input Anndata
- otsu_classes
Number of classes to assign pixels to for background detection.
- otsu_index
Which threshold index should be used for background. All pixel intensities less than the value at this index will be classified as background.
- local_k
The size of the local neighborhood of each pixel to use for local (adaptive) thresholding to identify the foreground (i.e. nuclei).
- offset
Offset to local thresholding values such that values > 0 lead to more “strict” thresholding, and therefore may be helpful in distinguishing nuclei in dense regions.
- mk
Size of the kernel used for morphological close and open operations applied at the very end.
- layer
Layer containing nuclei staining
- out_layer
Layer to put resulting nuclei mask. Defaults to {layer}_mask.
- spateo.segmentation.score_and_mask_pixels(adata: anndata.AnnData, layer: str, k: int, method: typing_extensions.Literal[gauss, spateo.segmentation.moran, EM, EM + gauss, EM + BP, VI + gauss, VI + BP], moran_kwargs: dict | None = None, em_kwargs: dict | None = None, vi_kwargs: dict | None = None, bp_kwargs: dict | None = None, threshold: float | None = None, use_knee: bool | None = False, mk: int | None = None, bins_layer: typing_extensions.Literal[False] | str | None = None, certain_layer: str | None = None, scores_layer: str | None = None, mask_layer: str | None = None)[source]¶
Score and mask pixels by how likely it is occupied.
- Parameters:
- adata
Input Anndata
- layer
Layer that contains UMI counts to use
- k
Kernel size for convolution
- method
Method to use to obtain per-pixel scores. Valid methods are: gauss: Gaussian blur moran: Moran’s I based method EM: EM algorithm to estimate cell and background expression
parameters.
EM+gauss: Negative binomial EM algorithm followed by Gaussian blur. EM+BP: EM algorithm followed by belief propagation to estimate the
marginal probabilities of cell and background.
- VI+gauss: Negative binomial VI algorithm followed by Gaussian blur.
Note that VI also supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.
- VI+BP: VI algorithm followed by belief propagation. Note that VI also
supports the zero-inflated negative binomial (ZINB) by providing zero_inflated=True.
- moran_kwargs
Keyword arguments to the
moran.run_moran()
function.- em_kwargs
Keyword arguments to the
em.run_em()
function.- bp_kwargs
Keyword arguments to the
bp.run_bp()
function.- threshold
Score cutoff, above which pixels are considered occupied. By default, a threshold is automatically determined by using Otsu thresholding.
- use_knee
Whether to use knee point as threshold. By default is False. If True, threshold would be ignored.
- mk
Kernel size of morphological open and close operations to reduce noise in the mask. Defaults to k`+2 if EM or VI is run. Otherwise, defaults to `k-2.
- bins_layer
Layer containing assignment of pixels into bins. Each bin is considered separately. Defaults to {layer}_bins. This can be set to False to disable binning, even if the layer exists.
- certain_layer
Layer containing a boolean mask indicating which pixels are certain to be occupied. If the array is not a boolean array, it is casted to boolean.
- scores_layer
Layer to save pixel scores before thresholding. Defaults to {layer}_scores.
- mask_layer
Layer to save the final mask. Defaults to {layer}_mask.
- spateo.segmentation.augment_labels(adata: anndata.AnnData, source_layer: str, target_layer: str, out_layer: str | None = None)[source]¶
Augment the labels in one label array using the labels in another.
- Parameters:
- adata
Input Anndata
- source_layer
Layer containing source labels to (possibly) take labels from.
- target_layer
Layer containing target labels to augment.
- out_layer
Layer to save results. Defaults to {target_layer}_augmented.
- spateo.segmentation.expand_labels(adata: anndata.AnnData, layer: str, distance: int = 5, max_area: int = 400, mask_layer: str | None = None, out_layer: str | None = None)[source]¶
Expand labels up to a certain distance.
- Parameters:
- adata
Input Anndata
- layer
Layer from which the labels were derived. Then, {layer}_labels is used as the labels. If not present, it is taken as a literal.
- distance
Distance to expand. Internally, this is used as the number of iterations of distance 1 dilations.
- max_area
Maximum area of each label.
- mask_layer
Layer containing mask to restrict expansion to within.
- out_layer
Layer to save results. By default, uses {layer}_labels_expanded.
- spateo.segmentation.find_peaks(adata: anndata.AnnData, layer: str, k: int, min_distance: int, mask_layer: str | None = None, out_layer: str | None = None)[source]¶
Find peaks from an array.
- Parameters:
- adata
Input AnnData
- layer
Layer to use as values to find peaks from.
- k
Apply a Gaussian blur with this kernel size prior to peak detection.
- min_distance
Minimum distance, in pixels, between peaks.
- mask_layer
Find peaks only in regions specified by the mask.
- out_layer
Layer to save identified peaks as markers. By default, uses {layer}_markers.
- spateo.segmentation.find_peaks_from_mask(adata: anndata.AnnData, layer: str, min_distance: int, distances_layer: str | None = None, markers_layer: str | None = None)[source]¶
Find peaks from a boolean mask. Used to obatin Watershed markers.
- Parameters:
- adata
Input AnnData
- layer
Layer containing boolean mask. This will default to {layer}_mask. If not present in the provided AnnData, this argument used as a literal.
- min_distance
Minimum distance, in pixels, between peaks.
- distances_layer
Layer to save distance from each pixel to the nearest zero (False) pixel (a.k.a. distance transform). By default, uses {layer}_distances.
- markers_layer
Layer to save identified peaks as markers. By default, uses {layer}_markers.
- spateo.segmentation.find_peaks_with_erosion(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, square: bool = False, min_area: int = 80, n_iter: int = -1, float_k: int = 5, float_threshold: float | None = None, out_layer: str | None = None)[source]¶
Find peaks for use in Watershed via iterative erosion.
- Parameters:
- adata
Input Anndata
- layer
Layer that was used to create scores or masks. If {layer}_scores is present, that is used. Otherwise if {layer}_mask is present, that is used. Otherwise, the layer is taken as a literal.
- k
Erosion kernel size
- square
Whether to use a square kernel
- min_area
Minimum area
- n_iter
Number of erosions to perform.
- float_k
Morphological close and open kernel size when X is a float array.
- float_threshold
Threshold to use to determine connected components when X is a float array. By default, a threshold is automatically determined by using Otsu method.
- out_layer
Layer to save results. By default, this will be {layer}_markers.
- spateo.segmentation.label_connected_components(adata: anndata.AnnData, layer: str, seed_layer: str | None = None, area_threshold: int = 500, k: int = 3, min_area: int = 100, n_iter: int = -1, distance: int = 8, max_area: int = 400, out_layer: str | None = None)[source]¶
Label connected components while splitting components that are too large.
- Parameters:
- adata
Input Anndata
- layer
Data layer that was used to generate the mask. First, will look for {layer}_mask. Otherwise, this will be use as a literal.
- seed_layer
Layer containing seed labels. These are labels that should be used whenever possible in labeling connected components.
- area_threshold
Connected components with area greater than this value will be split into smaller portions by first eroding and then expanding.
- k
Kernel size for erosion.
- min_area
Don’t erode labels smaller than this area.
- n_iter
Number of erosion operations. -1 means continue eroding until every label is less than min_area.
- distance
Distance to expand eroded labels.
- max_area
Maximum area when expanding labels.
- out_layer
Layer to save results. Defaults to {layer}_labels.
- Returns:
New label array
- spateo.segmentation.watershed(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, mask_layer: str | None = None, markers_layer: str | None = None, out_layer: str | None = None)[source]¶
Assign individual nuclei/cells using the Watershed algorithm.
- Parameters:
- adata
Input AnnData
- layer
Original data layer from which segmentation will derive from.
- k
Size of the kernel to use for Gaussian blur.
- mask_layer
Layer containing mask. This will default to {layer}_mask.
- markers_layer
Layer containing Watershed markers. This will default to {layer}_markers. May either be a boolean or integer array. If this is a boolean array, the markers are identified by calling cv2.connectedComponents.
- out_layer
Layer to save results. Defaults to {layer}_labels.
- spateo.segmentation.generate_random_labels(adata: anndata.AnnData, areas: List[int], seed: int | None = None, out_layer: str = 'random_labels')[source]¶
Create random labels, usually for benchmarking and QC purposes.
- Parameters:
- adata
Input Anndata
- areas
List of desired areas.
- seed
Random seed.
- out_layer
Layer to save results.
- spateo.segmentation.generate_random_labels_like(adata: anndata.AnnData, layer: str, seed: int | None = None, out_layer: str = 'random_labels')[source]¶
Create random labels, using another layer as a template.
- Parameters:
- adata
Input Anndata
- layer
Layer containing template labels
- seed
Random seed.
- out_layer
Layer to save results.
- spateo.segmentation.select_qc_regions(adata: anndata.AnnData, regions: List[Tuple[int, int]] | List[Tuple[int, int, int, int]] = None, n: int = 4, size: int = 2000, seed: int | None = None, use_scale: bool = True, absolute: bool = False, weight_func: Callable[[anndata.AnnData], float] | None = lambda adata: ...)[source]¶
Select regions to use for segmentation quality control purposes.
Note
All coordinates are in terms of “real” coordinates (i.e. the coordinates in adata.obs_names and adata.var_names) so that slicing the AnnData retains the regions correctly.
- Parameters:
- adata
Input AnnData
- regions
List of tuples in the form (xmin, ymin) or (xmin, xmax, ymin, ymax). If the later, the size argument is used to compute the bounding box.
- n
Number of regions to select if regions is not provided.
- size
Width and height, in pixels, of each randomly selected region.
- seed
Random seed.
- use_scale
Whether or not the provided regions are in scale units. This option only has effect when regions are provided. False means the provided coordinates are in terms of pixels.
- absolute
Whether or not the provided regions are in terms of absolute X and Y coordinates. This option only has effect when regions are provided. False means the provided coordinates are relative with respect to the coordinates in the provided adata.
- weight_func
Weighting function when regions is not provided. The probability of selecting each size x size region will be weighted by this function, which accepts a single AnnData (the region) as its argument, and returns a single float weight, such that higher weights mean higher probability. By default, the log1p of the sum of the counts in the X layer is used. Set to None to weight each region equally.