spateo.segmentation =================== .. py:module:: spateo.segmentation .. autoapi-nested-parse:: Spatiotemporal modeling of spatial transcriptomics Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/spateo/segmentation/align/index /autoapi/spateo/segmentation/benchmark/index /autoapi/spateo/segmentation/bp/index /autoapi/spateo/segmentation/density/index /autoapi/spateo/segmentation/em/index /autoapi/spateo/segmentation/external/index /autoapi/spateo/segmentation/icell/index /autoapi/spateo/segmentation/label/index /autoapi/spateo/segmentation/moran/index /autoapi/spateo/segmentation/qc/index /autoapi/spateo/segmentation/simulation/index /autoapi/spateo/segmentation/simulation_evaluation/index /autoapi/spateo/segmentation/utils/index /autoapi/spateo/segmentation/vi/index Functions --------- .. autoapisummary:: spateo.segmentation.refine_alignment spateo.segmentation.compare spateo.segmentation.merge_densities spateo.segmentation.segment_densities spateo.segmentation.cellpose spateo.segmentation.deepcell spateo.segmentation.stardist spateo.segmentation.mask_cells_from_stain spateo.segmentation.mask_nuclei_from_stain spateo.segmentation.score_and_mask_pixels spateo.segmentation.augment_labels spateo.segmentation.expand_labels spateo.segmentation.find_peaks spateo.segmentation.find_peaks_from_mask spateo.segmentation.find_peaks_with_erosion spateo.segmentation.label_connected_components spateo.segmentation.watershed spateo.segmentation.generate_random_labels spateo.segmentation.generate_random_labels_like spateo.segmentation.select_qc_regions Package Contents ---------------- .. py:function:: refine_alignment(adata: anndata.AnnData, stain_layer: str = SKM.STAIN_LAYER_KEY, rna_layer: str = SKM.UNSPLICED_LAYER_KEY, mode: typing_extensions.Literal[rigid, non-rigid] = 'rigid', downscale: float = 1, k: int = 5, n_epochs: int = 100, transform_layers: Optional[Union[str, List[str]]] = None, **kwargs) Refine the alignment between the staining image and RNA coordinates. There are often small misalignments between the staining image and RNA, which results in incorrect aggregation of pixels into cells based on staining. This function attempts to refine these alignments based on the staining and (unspliced) RNA masks. :param adata: Input Anndata :param stain_layer: Layer containing staining image. :param rna_layer: Layer containing (unspliced) RNA. :param mode: The alignment mode. Two modes are supported: * rigid: A global alignment method that finds a rigid (affine) transformation matrix * non-rigid: A semi-local alignment method that finds a thin-plate-spline with a mesh of certain size. By default, each cell in the mesh consists of 1000 x 1000 pixels. This value can be modified by providing a `binsize` argument to this function (specifically, as part of additional **kwargs). :param downscale: Downscale matrices by this factor to reduce memory and runtime. :param k: Kernel size for Gaussian blur of the RNA matrix. :param n_epochs: Number of epochs to run optimization :param transform_layers: Layers to transform and overwrite inplace. :param \*\*kwargs: Additional keyword arguments to pass to the Pytorch module. .. py:function:: compare(adata: anndata.AnnData, true_layer: str, pred_layer: str, data_layer: str = SKM.X_LAYER, umi_pixels_only: bool = True, random_background: bool = True, ap_taus: Tuple[int, Ellipsis] = tuple(np.arange(0.5, 1, 0.05)), seed: Optional[int] = None) -> pandas.DataFrame Compute segmentation statistics. :param adata: Input Anndata :param true_layer: Layer containing true labels :param pred_layer: Layer containing predicted labels :param data_layer: Layer containing UMIs :param umi_pixels_only: Whether or not to only consider pixels that have at least one UMI captured (as determined by `data_layer`). :param random_background: Simulate random background by randomly permuting the `pred_layer` labels and computing the same statistics against `true_layer`. The returned DataFrame will have an additional column for these statistics. :param ap_taus: Tau thresholds to calculate average precision. Defaults to 0.05 increments starting at 0.5 and ending at (and including) 0.95. :param seed: Random seed. :returns: Pandas DataFrame containing classification and labeling statistics .. py:function:: merge_densities(adata: anndata.AnnData, layer: str, mapping: Optional[Dict[int, int]] = None, out_layer: Optional[str] = None) Merge density bins either using an explicit mapping or in a semi-supervised way. :param adata: Input Anndata :param layer: Layer that was used to generate density bins. Defaults to using `{layer}_bins`. If not present, will be taken as a literal. :param mapping: Mapping to use to transform bins :param out_layer: Layer to store results. Defaults to same layer as input. .. py:function:: segment_densities(adata: anndata.AnnData, layer: str, binsize: int, k: int, dk: int, distance_threshold: Optional[float] = None, background: Optional[Union[Tuple[int, int], typing_extensions.Literal[False]]] = None, out_layer: Optional[str] = None) Segment into regions by UMI density. The tissue is segmented into UMI density bins according to the following procedure. 1. The UMI matrix is binned according to `binsize` (recommended >= 20). 2. The binned UMI matrix (from the previous step) is Gaussian blurred with kernel size `k`. Note that `k` is in terms of bins, not pixels. 3. The elements of the blurred, binned UMI matrix is hierarchically clustered with Ward linkage, distance threshold `distance_threshold`, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix. 4. Each density bin is diluted with kernel size `dk`, starting from the bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and "choppy" borders in subsequent steps. 5. If `background` is not provided, the density bin that is most common in the perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a `(x, y)` tuple instead. This feature can be turned off by providing `False`. 6. The density bin matrix is resized to be the same size as the original UMI matrix. :param adata: Input Anndata :param layer: Layer that contains UMI counts to segment based on. :param binsize: Size of bins to use. For density segmentation, pixels are binned to reduce runtime. 20 is usually a good starting point. Note that this value is relative to the original binsize used to read in the AnnData. :param k: Kernel size for Gaussian blur, in bins :param dk: Kernel size for final dilation, in bins :param distance_threshold: Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance. :param background: Pixel that should be categorized as background. By default, the bin that is most assigned to the outermost pixels are categorized as background. Set to False to turn off background detection. :param out_layer: Layer to put resulting bins. Defaults to `{layer}_bins`. .. py:function:: cellpose(adata: anndata.AnnData, model: Union[typing_extensions.Literal[cyto, nuclei], cellpose.models.CellposeModel] = 'nuclei', diameter: Optional[int] = None, normalize: bool = True, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs) Run Cellpose to label cells from a staining image. :param adata: Input Anndata :param model: Cellpose model to use. Can be one of the two pretrained models: * cyto: Labeled cytoplasm * nuclei: Labeled nuclei Or any generic CellposeModel model. :param diameter: Expected diameter of each segmentation (cells for `model="cyto"`, nuclei for `model="nuclei"`). Can be `None` to run automatic detection. :param normalize: Whether or not to percentile-normalize the image. This is an argument to :func:`Cellpose.eval`. :param equalize: Controls the `clip_limit` argument to the :func:`clahe` function. Set this value to a non-positive value to turn off equalization. :param layer: Layer that contains staining image. Defaults to `stain`. :param out_layer: Layer to put resulting labels. Defaults to `{layer}_labels`. :param \*\*kwargs: Additional keyword arguments to :func:`Cellpose.eval` function. :returns: Numpy array containing cell labels. .. py:function:: deepcell(adata: anndata.AnnData, model: Optional[deepcell.applications.Application] = None, equalize: float = 2.0, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs) Run DeepCell to label cells from a staining image. :param adata: Input Anndata :param model: DeepCell model to use :param equalize: Controls the `clip_limit` argument to the :func:`clahe` function. Set this value to a non-positive value to turn off equalization. :param layer: Layer that contains staining image. Defaults to `stain`. :param out_layer: Layer to put resulting labels. Defaults to `{layer}_labels`. :param \*\*kwargs: Additional keyword arguments to :func:`Application.predict` function. :returns: Numpy array containing cell labels. .. py:function:: stardist(adata: anndata.AnnData, model: Union[typing_extensions.Literal[2D_versatile_fluo, 2D_versatile_he, 2D_paper_dsb2018], stardist.models.StarDist2D] = '2D_versatile_fluo', tilesize: int = 2000, min_overlap: Optional[int] = None, context: Optional[int] = None, normalizer: Optional[csbdeep.data.Normalizer] = PercentileNormalizer(), equalize: float = 2.0, sanitize: bool = True, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None, **kwargs) Run StarDist to label cells from a staining image. .. note:: When using `min_overlap`, the crucial assumption is that all predicted object instances are smaller than the provided `min_overlap`. Also, it must hold that: min_overlap + 2*context < tilesize. https://github.com/stardist/stardist/blob/858cae17cf17f979122000ad2294a156d0547135/stardist/models/base.py#L776 :param adata: Input Anndata :param img: Image as a Numpy array. :param model: Stardist model to use. Can be one of the three pretrained models from StarDist2D: 1. '2D_versatile_fluo': 'Versatile (fluorescent nuclei)' 2. '2D_versatile_he': 'Versatile (H&E nuclei)' 3. '2D_paper_dsb2018': 'DSB 2018 (from StarDist 2D paper)' Or any generic Stardist2D model. :param tilesize: Run prediction separately on tiles of size `tilesize` x `tilesize` and merge them afterwards. Useful to avoid out-of-memory errors. Can be set to <= 0 to disable tiling. When `min_overlap` is also provided, this becomes the `block_size` parameter to :func:`StarDist2D.predict_instances_big`. :param min_overlap: Amount of guaranteed overlaps between tiles. :param context: Amount of image context on all sides of a tile, which is dicarded. Only used when `min_overlap` is not None. By default, an automatic estimate is used. :param normalizer: Normalizer to use to perform normalization prior to prediction. By default, percentile-based normalization is performed. `None` may be provided to disable normalization. :param equalize: Controls the `clip_limit` argument to the :func:`clahe` function. Set this value to a non-positive value to turn off equalization. :param sanitize: Whether to sanitize disconnected labels. :param layer: Layer that contains staining image. Defaults to `stain`. :param out_layer: Layer to put resulting labels. Defaults to `{layer}_labels`. :param \*\*kwargs: Additional keyword arguments to pass to :func:`StarDist2D.predict_instances`. .. py:function:: mask_cells_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, mk: int = 7, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None) Create a boolean mask indicating cells from stained image. :param adata: Input Anndata :param otsu_classes: Number of classes to assign pixels to for cell detection :param otsu_index: Which threshold index should be used for classifying cells. All pixel intensities >= the value at this index will be classified as cell. :param mk: Size of the kernel used for morphological close and open operations applied at the very end. :param layer: Layer that contains staining image. :param out_layer: Layer to put resulting nuclei mask. Defaults to `{layer}_mask`. .. py:function:: mask_nuclei_from_stain(adata: anndata.AnnData, otsu_classes: int = 3, otsu_index: int = 0, local_k: int = 55, offset: int = 5, mk: int = 5, layer: str = SKM.STAIN_LAYER_KEY, out_layer: Optional[str] = None) Create a boolean mask indicating nuclei from stained nuclei image, and save this mask in the AnnData as an additional layer. :param adata: Input Anndata :param otsu_classes: Number of classes to assign pixels to for background detection. :param otsu_index: Which threshold index should be used for background. All pixel intensities less than the value at this index will be classified as background. :param local_k: The size of the local neighborhood of each pixel to use for local (adaptive) thresholding to identify the foreground (i.e. nuclei). :param offset: Offset to local thresholding values such that values > 0 lead to more "strict" thresholding, and therefore may be helpful in distinguishing nuclei in dense regions. :param mk: Size of the kernel used for morphological close and open operations applied at the very end. :param layer: Layer containing nuclei staining :param out_layer: Layer to put resulting nuclei mask. Defaults to `{layer}_mask`. .. py:function:: score_and_mask_pixels(adata: anndata.AnnData, layer: str, k: int, method: typing_extensions.Literal[gauss, spateo.segmentation.moran, EM, EM+gauss, EM+BP, VI+gauss, VI+BP], moran_kwargs: Optional[dict] = None, em_kwargs: Optional[dict] = None, vi_kwargs: Optional[dict] = None, bp_kwargs: Optional[dict] = None, threshold: Optional[float] = None, use_knee: Optional[bool] = False, mk: Optional[int] = None, bins_layer: Optional[Union[typing_extensions.Literal[False], str]] = None, certain_layer: Optional[str] = None, scores_layer: Optional[str] = None, mask_layer: Optional[str] = None) Score and mask pixels by how likely it is occupied. :param adata: Input Anndata :param layer: Layer that contains UMI counts to use :param k: Kernel size for convolution :param method: Method to use to obtain per-pixel scores. Valid methods are: gauss: Gaussian blur moran: Moran's I based method EM: EM algorithm to estimate cell and background expression parameters. EM+gauss: Negative binomial EM algorithm followed by Gaussian blur. EM+BP: EM algorithm followed by belief propagation to estimate the marginal probabilities of cell and background. VI+gauss: Negative binomial VI algorithm followed by Gaussian blur. Note that VI also supports the zero-inflated negative binomial (ZINB) by providing `zero_inflated=True`. VI+BP: VI algorithm followed by belief propagation. Note that VI also supports the zero-inflated negative binomial (ZINB) by providing `zero_inflated=True`. :param moran_kwargs: Keyword arguments to the :func:`moran.run_moran` function. :param em_kwargs: Keyword arguments to the :func:`em.run_em` function. :param bp_kwargs: Keyword arguments to the :func:`bp.run_bp` function. :param threshold: Score cutoff, above which pixels are considered occupied. By default, a threshold is automatically determined by using Otsu thresholding. :param use_knee: Whether to use knee point as threshold. By default is False. If True, threshold would be ignored. :param mk: Kernel size of morphological open and close operations to reduce noise in the mask. Defaults to `k`+2 if EM or VI is run. Otherwise, defaults to `k`-2. :param bins_layer: Layer containing assignment of pixels into bins. Each bin is considered separately. Defaults to `{layer}_bins`. This can be set to `False` to disable binning, even if the layer exists. :param certain_layer: Layer containing a boolean mask indicating which pixels are certain to be occupied. If the array is not a boolean array, it is casted to boolean. :param scores_layer: Layer to save pixel scores before thresholding. Defaults to `{layer}_scores`. :param mask_layer: Layer to save the final mask. Defaults to `{layer}_mask`. .. py:function:: augment_labels(adata: anndata.AnnData, source_layer: str, target_layer: str, out_layer: Optional[str] = None) Augment the labels in one label array using the labels in another. :param adata: Input Anndata :param source_layer: Layer containing source labels to (possibly) take labels from. :param target_layer: Layer containing target labels to augment. :param out_layer: Layer to save results. Defaults to `{target_layer}_augmented`. .. py:function:: expand_labels(adata: anndata.AnnData, layer: str, distance: int = 5, max_area: int = 400, mask_layer: Optional[str] = None, out_layer: Optional[str] = None) Expand labels up to a certain distance. :param adata: Input Anndata :param layer: Layer from which the labels were derived. Then, `{layer}_labels` is used as the labels. If not present, it is taken as a literal. :param distance: Distance to expand. Internally, this is used as the number of iterations of distance 1 dilations. :param max_area: Maximum area of each label. :param mask_layer: Layer containing mask to restrict expansion to within. :param out_layer: Layer to save results. By default, uses `{layer}_labels_expanded`. .. py:function:: find_peaks(adata: anndata.AnnData, layer: str, k: int, min_distance: int, mask_layer: Optional[str] = None, out_layer: Optional[str] = None) Find peaks from an array. :param adata: Input AnnData :param layer: Layer to use as values to find peaks from. :param k: Apply a Gaussian blur with this kernel size prior to peak detection. :param min_distance: Minimum distance, in pixels, between peaks. :param mask_layer: Find peaks only in regions specified by the mask. :param out_layer: Layer to save identified peaks as markers. By default, uses `{layer}_markers`. .. py:function:: find_peaks_from_mask(adata: anndata.AnnData, layer: str, min_distance: int, distances_layer: Optional[str] = None, markers_layer: Optional[str] = None) Find peaks from a boolean mask. Used to obatin Watershed markers. :param adata: Input AnnData :param layer: Layer containing boolean mask. This will default to `{layer}_mask`. If not present in the provided AnnData, this argument used as a literal. :param min_distance: Minimum distance, in pixels, between peaks. :param distances_layer: Layer to save distance from each pixel to the nearest zero (False) pixel (a.k.a. distance transform). By default, uses `{layer}_distances`. :param markers_layer: Layer to save identified peaks as markers. By default, uses `{layer}_markers`. .. py:function:: find_peaks_with_erosion(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, square: bool = False, min_area: int = 80, n_iter: int = -1, float_k: int = 5, float_threshold: Optional[float] = None, out_layer: Optional[str] = None) Find peaks for use in Watershed via iterative erosion. :param adata: Input Anndata :param layer: Layer that was used to create scores or masks. If `{layer}_scores` is present, that is used. Otherwise if `{layer}_mask` is present, that is used. Otherwise, the layer is taken as a literal. :param k: Erosion kernel size :param square: Whether to use a square kernel :param min_area: Minimum area :param n_iter: Number of erosions to perform. :param float_k: Morphological close and open kernel size when `X` is a float array. :param float_threshold: Threshold to use to determine connected components when `X` is a float array. By default, a threshold is automatically determined by using Otsu method. :param out_layer: Layer to save results. By default, this will be `{layer}_markers`. .. py:function:: label_connected_components(adata: anndata.AnnData, layer: str, seed_layer: Optional[str] = None, area_threshold: int = 500, k: int = 3, min_area: int = 100, n_iter: int = -1, distance: int = 8, max_area: int = 400, out_layer: Optional[str] = None) Label connected components while splitting components that are too large. :param adata: Input Anndata :param layer: Data layer that was used to generate the mask. First, will look for `{layer}_mask`. Otherwise, this will be use as a literal. :param seed_layer: Layer containing seed labels. These are labels that should be used whenever possible in labeling connected components. :param area_threshold: Connected components with area greater than this value will be split into smaller portions by first eroding and then expanding. :param k: Kernel size for erosion. :param min_area: Don't erode labels smaller than this area. :param n_iter: Number of erosion operations. -1 means continue eroding until every label is less than `min_area`. :param distance: Distance to expand eroded labels. :param max_area: Maximum area when expanding labels. :param out_layer: Layer to save results. Defaults to `{layer}_labels`. :returns: New label array .. py:function:: watershed(adata: anndata.AnnData, layer: str = SKM.STAIN_LAYER_KEY, k: int = 3, mask_layer: Optional[str] = None, markers_layer: Optional[str] = None, out_layer: Optional[str] = None) Assign individual nuclei/cells using the Watershed algorithm. :param adata: Input AnnData :param layer: Original data layer from which segmentation will derive from. :param k: Size of the kernel to use for Gaussian blur. :param mask_layer: Layer containing mask. This will default to `{layer}_mask`. :param markers_layer: Layer containing Watershed markers. This will default to `{layer}_markers`. May either be a boolean or integer array. If this is a boolean array, the markers are identified by calling `cv2.connectedComponents`. :param out_layer: Layer to save results. Defaults to `{layer}_labels`. .. py:function:: generate_random_labels(adata: anndata.AnnData, areas: List[int], seed: Optional[int] = None, out_layer: str = 'random_labels') Create random labels, usually for benchmarking and QC purposes. :param adata: Input Anndata :param areas: List of desired areas. :param seed: Random seed. :param out_layer: Layer to save results. .. py:function:: generate_random_labels_like(adata: anndata.AnnData, layer: str, seed: Optional[int] = None, out_layer: str = 'random_labels') Create random labels, using another layer as a template. :param adata: Input Anndata :param layer: Layer containing template labels :param seed: Random seed. :param out_layer: Layer to save results. .. py:function:: select_qc_regions(adata: anndata.AnnData, regions: Union[List[Tuple[int, int]], List[Tuple[int, int, int, int]]] = None, n: int = 4, size: int = 2000, seed: Optional[int] = None, use_scale: bool = True, absolute: bool = False, weight_func: Optional[Callable[[anndata.AnnData], float]] = lambda adata: np.log1p(adata.X.sum())) Select regions to use for segmentation quality control purposes. .. note:: All coordinates are in terms of "real" coordinates (i.e. the coordinates in `adata.obs_names` and `adata.var_names`) so that slicing the AnnData retains the regions correctly. :param adata: Input AnnData :param regions: List of tuples in the form `(xmin, ymin)` or `(xmin, xmax, ymin, ymax)`. If the later, the `size` argument is used to compute the bounding box. :param n: Number of regions to select if `regions` is not provided. :param size: Width and height, in pixels, of each randomly selected region. :param seed: Random seed. :param use_scale: Whether or not the provided `regions` are in scale units. This option only has effect when `regions` are provided. `False` means the provided coordinates are in terms of pixels. :param absolute: Whether or not the provided `regions` are in terms of absolute X and Y coordinates. This option only has effect when `regions` are provided. `False` means the provided coordinates are relative with respect to the coordinates in the provided `adata`. :param weight_func: Weighting function when `regions` is not provided. The probability of selecting each `size x size` region will be weighted by this function, which accepts a single AnnData (the region) as its argument, and returns a single float weight, such that higher weights mean higher probability. By default, the log1p of the sum of the counts in the `X` layer is used. Set to `None` to weight each region equally.