spateo.segmentation.density#

Functions to segment regions of a slice by UMI density.

Module Contents#

Functions#

_create_spatial_adjacency(→ scipy.sparse.csr_matrix)

Create a sparse adjacency matrix for a 2D grid graph of specified shape.

_schc(→ numpy.ndarray)

Spatially-constrained hierarchical clustering.

_segment_densities(→ numpy.ndarray)

Segment a matrix containing UMI counts into regions by UMI density.

segment_densities(adata, layer, binsize, k, dk[, ...])

Segment into regions by UMI density.

merge_densities(adata, layer[, mapping, out_layer])

Merge density bins either using an explicit mapping or in a semi-supervised

spateo.segmentation.density._create_spatial_adjacency(shape: Tuple[int, int]) scipy.sparse.csr_matrix[source]#

Create a sparse adjacency matrix for a 2D grid graph of specified shape. https://stackoverflow.com/a/16342639

Parameters:
shape

Shape of grid

Returns:

A sparse adjacency matrix

spateo.segmentation.density._schc(X: numpy.ndarray, distance_threshold: float | None = None) numpy.ndarray[source]#

Spatially-constrained hierarchical clustering.

Perform hierarchical clustering with Ward linkage on an array containing UMI counts per pixel. Spatial constraints are imposed by limiting the neighbors of each node to immediate 4 pixel neighbors.

This function runs in two steps. First, it computes a Ward linkage tree by calling sklearn.cluster.ward_tree(), with return_distance=True, which yields distances between clusters. then, if distance_threshold is not provided, a dynamic threshold is calculated by finding the inflection (knee) of the distance (x) vs number of clusters (y) line using the top 1000 distances, making the assumption that for the vast majority of cases, there will be less than 1000 density clusters.

Parameters:
X

UMI counts per pixel

distance_threshold

Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.

Returns:

Clustering result as a Numpy array of same shape, where clusters are indicated by integers.

spateo.segmentation.density._segment_densities(X: scipy.sparse.spmatrix | numpy.ndarray, k: int, dk: int, distance_threshold: float | None = None) numpy.ndarray[source]#

Segment a matrix containing UMI counts into regions by UMI density.

Parameters:
X

UMI counts per pixel

k

Kernel size for Gaussian blur

dk

Kernel size for final dilation

distance_threshold

Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.

Returns:

Clustering result as a Numpy array of same shape, where clusters are indicated by positive integers.

spateo.segmentation.density.segment_densities(adata: anndata.AnnData, layer: str, binsize: int, k: int, dk: int, distance_threshold: float | None = None, background: Tuple[int, int] | typing_extensions.Literal[False] | None = None, out_layer: str | None = None)[source]#

Segment into regions by UMI density.

The tissue is segmented into UMI density bins according to the following procedure. 1. The UMI matrix is binned according to binsize (recommended >= 20). 2. The binned UMI matrix (from the previous step) is Gaussian blurred with

kernel size k. Note that k is in terms of bins, not pixels.

  1. The elements of the blurred, binned UMI matrix is hierarchically clustered

    with Ward linkage, distance threshold distance_threshold, and spatial constraints (immediate neighbors). This yields pixel density bins (a.k.a. labels) the same shape as the binned matrix.

  2. Each density bin is diluted with kernel size dk, starting from the

    bin with the smallest mean UMI (a.k.a. least dense) and going to the bin with the largest mean UMI (a.k.a. most dense). This is done in an effort to mitigate RNA diffusion and “choppy” borders in subsequent steps.

  3. If background is not provided, the density bin that is most common in the

    perimeter of the matrix is selected to be background, and thus its label is changed to take a value of 0. A pixel can be manually selected to be background by providing a (x, y) tuple instead. This feature can be turned off by providing False.

  4. The density bin matrix is resized to be the same size as the original UMI

    matrix.

Parameters:
adata

Input Anndata

layer

Layer that contains UMI counts to segment based on.

binsize

Size of bins to use. For density segmentation, pixels are binned to reduce runtime. 20 is usually a good starting point. Note that this value is relative to the original binsize used to read in the AnnData.

k

Kernel size for Gaussian blur, in bins

dk

Kernel size for final dilation, in bins

distance_threshold

Distance threshold for the Ward linkage such that clusters will not be merged if they have greater than this distance.

background

Pixel that should be categorized as background. By default, the bin that is most assigned to the outermost pixels are categorized as background. Set to False to turn off background detection.

out_layer

Layer to put resulting bins. Defaults to {layer}_bins.

spateo.segmentation.density.merge_densities(adata: anndata.AnnData, layer: str, mapping: Dict[int, int] | None = None, out_layer: str | None = None)[source]#

Merge density bins either using an explicit mapping or in a semi-supervised way.

Parameters:
adata

Input Anndata

layer

Layer that was used to generate density bins. Defaults to using {layer}_bins. If not present, will be taken as a literal.

mapping

Mapping to use to transform bins

out_layer

Layer to store results. Defaults to same layer as input.