spateo.digitization#

Spatiotemporal modeling of spatial transcriptomics

Submodules#

Package Contents#

Functions#

get_borderline(→ spateo.digitization.utils.np.ndarray)

Identify the borderline at the interface of the source and target cell clusters.

grid_borderline(→ None)

Extend the borderline to either interior or exterior side to each create layer_num layers, and segment such

identify_boundary(adata, cluster_key, source_id, target_id)

boundary_gridding(adata, boundary_line_img, ...[, ...])

gen_cluster_image(→ numpy.ndarray)

Generate matrix/image of spatial clusters with distinct labels/colors.

extract_cluster_contours(→ Tuple[Tuple, numpy.ndarray, ...)

Extract contour(s) for area(s) formed by buckets of the same spatial cluster.

set_domains(→ None)

Set the domains for each bucket based on spatial clusters. Use adata object of low resolution for contour

fill_grid_label(adata, spatial_key, seg_grid_img, ...)

format_boundary_line(boundary_line_img, pt_start, pt_end)

draw_seg_grid(boundary_line_img, bdl_seg_coor_x, ...)

euclidean_dist(point_x, point_y)

segment_bd_line(boundary_line_list, n_column)

extend_layer(boundary_line_img, boundary_line_list[, ...])

field_contour_line(ctr_seq, pnt_pos, min_pnt, max_pnt)

field_contours(contour, pnt_xy, pnt_Xy, pnt_xY, pnt_XY)

Identify four boundary lines according to given corner points.

add_ep_boundary(op_field, op_line, value)

Add equal weight boundary to op_field.

add_gp_boundary(op_field, op_line, value_s, value_e)

Add growing weight boundary to op_field.

effective_L2_error(op_field_i, op_field_j, field_mask)

Calculate effective L2 error between two fields.

calc_op_field(op_field, min_line, max_line, ...[, ...])

Calculate op_field (weights) for given boundary weights.

extract_cluster_contours(→ Tuple[Tuple, numpy.ndarray, ...)

Extract contour(s) for area(s) formed by buckets of the same spatial cluster.

gen_cluster_image(→ numpy.ndarray)

Generate matrix/image of spatial clusters with distinct labels/colors.

set_domains(→ None)

Set the domains for each bucket based on spatial clusters. Use adata object of low resolution for contour

digitize(→ None)

Calculate the "heat" for a closed area of interests by solving a PDE, partial differential equation, the heat

gridit(→ None)

Segment the area of interests into specific layer/column number, according to precomputed digitization heat

order_borderline(→ Tuple[List, numpy.ndarray])

Retrieve the borderline segment given the start end end point with the coordinates ordered.

Attributes#

spateo.digitization.get_borderline(adata: spateo.digitization.utils.AnnData, cluster_key: str, source_clusters: int, target_clusters: int, bin_size: int = 1, spatial_key: str = 'spatial', borderline_key: str = 'borderline', k_size: int = 8, min_area: int = 30, dilate_k_size: int = 3) spateo.digitization.utils.np.ndarray[source]#

Identify the borderline at the interface of the source and target cell clusters.

The borderline will be identified by first retrieving the outline/contour formed by the source clusters, which will then be cleaned up to retrieve the borderline by masking with the expanded contours formed by the target clusters.

Parameters:
adata

The adata object to be used for identifying the borderline.

cluster_key

The key name of the spatial cluster in adata.obs

source_clusters

The source cluster(s) that will interface with the target clusters.

target_clusters

The target cluster(s) that will interface with the source clusters.

bin_size

The size of the binning.

spatial_key

The key name of the spatial coordinates in adata.obs

borderline_key

The key name in adata.obs that will be used to store the borderline.

k_size

Kernel size of the elliptic structuring element.

min_area

Minimal area threshold corresponding to the resulting contour(s).

dilate_k_size

Kernel size of the cv2.dilate function.

Returns:

The matrix that stores the image information of the borderline between the source and target

cluster(s). Note that the adata object will also be updated with the boundary_line key that stores the information about whether the bucket is on the borderline.

Return type:

borderline_img

spateo.digitization.grid_borderline(adata: spateo.digitization.utils.AnnData, borderline_img: spateo.digitization.utils.np.ndarray, borderline_list: spateo.digitization.utils.List, layer_num: int = 3, column_num: int = 25, layer_width: int = 10, spatial_key: str = 'spatial', init: bool = False) None[source]#
Extend the borderline to either interior or exterior side to each create layer_num layers, and segment such

layers to column_num columns.

Parameters:
adata

The adata object to be used for identifying the interior/exterior layers and columns.

borderline_img

The matrix that stores the image information of the borderline between the source and target cluster(s).

borderline_list

An order list of np.arrays of coordinates of the borderlines.

layer_num

Number of layers to extend on either interior or exterior side.

column_num

Number of columns to segment for each layer.

layer_width

Layer/column boundary width. This only affects grid_label.

spatial_key

The key name in adata.obsm of the spatial coordinates. Default to “spatial”. Passed to fill_grid_label function.

init

Whether to generate (and potentially overwrite) the layer_label_key and column_label_key in fill_grid_label function.

Returns:

  1. layer_label_key: this key points to layer labels.

  2. column_label_key: this key points to column labels.

Return type:

Nothing but update the adata object with following keys in .obs

spateo.digitization.SKM[source]#
spateo.digitization.lm#
spateo.digitization.identify_boundary(adata: spateo.digitization.contour.AnnData, cluster_key, source_id, target_id, bin_size: int = 1, spatial_key: str = 'spatial', boundary_key: str = 'boundary_line', k_size=8, min_area=30, dilate_k_size: int = 3)[source]#
spateo.digitization.boundary_gridding(adata: spateo.digitization.contour.AnnData, boundary_line_img, boundary_line_list, n_layer=3, n_column=25, layer_width=10, spatial_key: str = 'spatial', init: bool = False)[source]#
spateo.digitization.gen_cluster_image(adata: anndata.AnnData, bin_size: int | None = None, spatial_key: str = 'spatial', cluster_key: str = 'scc', label_mapping_key: str = 'cluster_img_label', cmap: str = 'tab20', show: bool = True) numpy.ndarray[source]#

Generate matrix/image of spatial clusters with distinct labels/colors.

Parameters:
adata

The adata object used to create the matrix/image for clusters.

bin_size

The size of the binning.

spatial_key

The key name of the spatial coordinates in adata.obs

cluster_key

The key name of the spatial cluster in adata.obs

label_mapping_key

The key name to store the label index values, mapped from the cluster names in adata.obs. Note that background is 0 so label_mapping_key starts from 1.

cmap

The colormap that will be used to draw colors for the resultant cluster image.

show

Whether to visualize the cluster image.

Returns:

A numpy array that stores the image of clusters, each with a distinct color. When show

is True, plt.imshow(cluster_rgb_image) will be used to plot the clusters each with distinct labels prepared from the designated cmap.

Return type:

cluster_label_image

spateo.digitization.extract_cluster_contours(cluster_label_image: numpy.ndarray, cluster_labels: int | List, bin_size: int, k_size: float = 2, min_area: float = 9, close_kernel: int = cv2.MORPH_ELLIPSE, show: bool = True) Tuple[Tuple, numpy.ndarray, numpy.ndarray][source]#

Extract contour(s) for area(s) formed by buckets of the same spatial cluster.

Parameters:
cluster_label_image

the image that sets the pixels of the cluster of interests as the front color (background is 0).

cluster_labels

The label value(s) of clusters of interests.

bin_size

The size of the binning.

k_size

Kernel size of the elliptic structuring element.

min_area

Minimal area threshold corresponding to the resulting contour(s).

close_kernel

The value to indicate the structuring element. By default, we use a circular structuring element.

show

Visualize the result.

Returns:

The Tuple coordinates of contours identified. cluster_image_close: The resultant image of the area of interest with small area removed. cluster_image_contour: The resultant image of the contour, generated from cluster_image_close.

Return type:

contours

spateo.digitization.set_domains(adata_high_res: anndata.AnnData, adata_low_res: anndata.AnnData | None = None, spatial_key: str = 'spatial', cluster_key: str = 'scc', domain_key_prefix: str = 'domain', bin_size_high: int | None = None, bin_size_low: int | None = None, k_size: float = 2, min_area: float = 9) None[source]#
Set the domains for each bucket based on spatial clusters. Use adata object of low resolution for contour

identification but adata object of high resolution for domain assignment.

Parameters:
adata_high_res

The anndata object in high spatial resolution. The adata with smaller binning (or single cell segmetnation) is more suitable to define more fine grained spatial domains.

adata_low_res

The anndata object in low spatial resolution. When using data with big binning, it can often produce better spatial domain clustering results with the scc method and thus domain/domain contour identification.

spatial_key

The key in .obsm of the spatial coordinate for each bucket. Should be same key in both adata_high_res and adata_low_res.

cluster_key

The key in .obs (adata_low_res) to the spatial cluster.

domain_key_prefix

The key prefix in .obs (in adata_high_res) that will be used to store the spatial domain for each bucket. The full key name will be set as: domain_key_prefix + “_” + cluster_key.

bin_size_low

The binning size of the adata_high_res object.

bin_size_low

The binning size of the adata_low_res object (only works when adata_low_res is provided).

k_size

Kernel size of the elliptic structuring element.

min_area

Minimal area threshold corresponding to the resulting contour(s).

Returns:

Nothing but update the adata_high_res with the domain in domain_key_prefix + “_” + cluster_key.

spateo.digitization.fill_grid_label(adata, spatial_key, seg_grid_img, bdl_seg_coor_x, bdl_seg_coor_y, curr_layer, curr_sign, layer_label_key: str = 'layer_label', column_label_key: str = 'column_label', init: bool = False)[source]#
spateo.digitization.format_boundary_line(boundary_line_img, pt_start, pt_end)[source]#
spateo.digitization.draw_seg_grid(boundary_line_img, bdl_seg_coor_x, bdl_seg_coor_y, gridline_width=1, mode='grid')[source]#
spateo.digitization.euclidean_dist(point_x: Tuple, point_y: Tuple)[source]#
spateo.digitization.segment_bd_line(boundary_line_list, n_column)[source]#
spateo.digitization.extend_layer(boundary_line_img, boundary_line_list, extend_width=10)[source]#
spateo.digitization.field_contour_line(ctr_seq, pnt_pos, min_pnt, max_pnt)[source]#
spateo.digitization.field_contours(contour, pnt_xy, pnt_Xy, pnt_xY, pnt_XY)[source]#

Identify four boundary lines according to given corner points.

Parameters:
contour _type_

_description_

pnt_xy _type_

_description_

pnt_Xy _type_

_description_

pnt_xY _type_

_description_

pnt_XY _type_

_description_

Returns:

_description_

Return type:

_type_

spateo.digitization.add_ep_boundary(op_field, op_line, value)[source]#

Add equal weight boundary to op_field.

Parameters:
op_field _type_

_description_

op_line _type_

_description_

value _type_

_description_

Returns:

_description_

Return type:

_type_

spateo.digitization.add_gp_boundary(op_field, op_line, value_s, value_e)[source]#

Add growing weight boundary to op_field.

Parameters:
op_field _type_

_description_

op_line _type_

_description_

value_s _type_

_description_

value_e _type_

_description_

Returns:

_description_

Return type:

_type_

spateo.digitization.effective_L2_error(op_field_i, op_field_j, field_mask)[source]#

Calculate effective L2 error between two fields.

Parameters:
op_field_i _type_

_description_

op_field_j _type_

_description_

field_mask _type_

_description_

Returns:

_description_

Return type:

_type_

spateo.digitization.calc_op_field(op_field, min_line, max_line, edge_line_a, edge_line_b, field_border, field_mask, max_err=1e-05, max_itr=100000.0, lp=1, hp=100)[source]#

Calculate op_field (weights) for given boundary weights.

Parameters:
op_field _type_

_description_

min_line _type_

_description_

max_line _type_

_description_

edge_line_a _type_

_description_

edge_line_b _type_

_description_

field_border _type_

_description_

field_mask _type_

_description_

max_err _type_, optional

_description_. Defaults to 1e-5.

max_itr _type_, optional

_description_. Defaults to 1e5.

lp int, optional

_description_. Defaults to 1.

hp int, optional

_description_. Defaults to 100.

Returns:

_description_

Return type:

_type_

spateo.digitization.extract_cluster_contours(cluster_label_image: numpy.ndarray, cluster_labels: int | List, bin_size: int, k_size: float = 2, min_area: float = 9, close_kernel: int = cv2.MORPH_ELLIPSE, show: bool = True) Tuple[Tuple, numpy.ndarray, numpy.ndarray][source]#

Extract contour(s) for area(s) formed by buckets of the same spatial cluster.

Parameters:
cluster_label_image

the image that sets the pixels of the cluster of interests as the front color (background is 0).

cluster_labels

The label value(s) of clusters of interests.

bin_size

The size of the binning.

k_size

Kernel size of the elliptic structuring element.

min_area

Minimal area threshold corresponding to the resulting contour(s).

close_kernel

The value to indicate the structuring element. By default, we use a circular structuring element.

show

Visualize the result.

Returns:

The Tuple coordinates of contours identified. cluster_image_close: The resultant image of the area of interest with small area removed. cluster_image_contour: The resultant image of the contour, generated from cluster_image_close.

Return type:

contours

spateo.digitization.gen_cluster_image(adata: anndata.AnnData, bin_size: int | None = None, spatial_key: str = 'spatial', cluster_key: str = 'scc', label_mapping_key: str = 'cluster_img_label', cmap: str = 'tab20', show: bool = True) numpy.ndarray[source]#

Generate matrix/image of spatial clusters with distinct labels/colors.

Parameters:
adata

The adata object used to create the matrix/image for clusters.

bin_size

The size of the binning.

spatial_key

The key name of the spatial coordinates in adata.obs

cluster_key

The key name of the spatial cluster in adata.obs

label_mapping_key

The key name to store the label index values, mapped from the cluster names in adata.obs. Note that background is 0 so label_mapping_key starts from 1.

cmap

The colormap that will be used to draw colors for the resultant cluster image.

show

Whether to visualize the cluster image.

Returns:

A numpy array that stores the image of clusters, each with a distinct color. When show

is True, plt.imshow(cluster_rgb_image) will be used to plot the clusters each with distinct labels prepared from the designated cmap.

Return type:

cluster_label_image

spateo.digitization.set_domains(adata_high_res: anndata.AnnData, adata_low_res: anndata.AnnData | None = None, spatial_key: str = 'spatial', cluster_key: str = 'scc', domain_key_prefix: str = 'domain', bin_size_high: int | None = None, bin_size_low: int | None = None, k_size: float = 2, min_area: float = 9) None[source]#
Set the domains for each bucket based on spatial clusters. Use adata object of low resolution for contour

identification but adata object of high resolution for domain assignment.

Parameters:
adata_high_res

The anndata object in high spatial resolution. The adata with smaller binning (or single cell segmetnation) is more suitable to define more fine grained spatial domains.

adata_low_res

The anndata object in low spatial resolution. When using data with big binning, it can often produce better spatial domain clustering results with the scc method and thus domain/domain contour identification.

spatial_key

The key in .obsm of the spatial coordinate for each bucket. Should be same key in both adata_high_res and adata_low_res.

cluster_key

The key in .obs (adata_low_res) to the spatial cluster.

domain_key_prefix

The key prefix in .obs (in adata_high_res) that will be used to store the spatial domain for each bucket. The full key name will be set as: domain_key_prefix + “_” + cluster_key.

bin_size_low

The binning size of the adata_high_res object.

bin_size_low

The binning size of the adata_low_res object (only works when adata_low_res is provided).

k_size

Kernel size of the elliptic structuring element.

min_area

Minimal area threshold corresponding to the resulting contour(s).

Returns:

Nothing but update the adata_high_res with the domain in domain_key_prefix + “_” + cluster_key.

spateo.digitization.digitize(adata: spateo.digitization.utils.AnnData, ctrs: spateo.digitization.utils.Tuple, ctr_idx: int, pnt_xy: spateo.digitization.utils.Tuple[int, int], pnt_Xy: spateo.digitization.utils.Tuple[int, int], pnt_xY: spateo.digitization.utils.Tuple[int, int], pnt_XY: spateo.digitization.utils.Tuple[int, int], spatial_key: str = 'spatial', dgl_layer_key: str = 'digital_layer', dgl_column_key: str = 'digital_column', max_itr: int = 100000.0, lh: float = 1, hh: float = 100) None[source]#
Calculate the “heat” for a closed area of interests by solving a PDE, partial differential equation, the heat

equation. Boundary conditions are defined upon four user provided coordinates that set the direction of heat diffusion. The value of “heat” will be used for define different spatial layers, domains and grids.

Parameters:
adata

The adata object to digitize.

ctrs

Contours generated by cv2.findContours.

ctr_idx

The index of the contour of interests.

pnt_xy

Corner point to define an area of interest. pnt_xy corresponds to the point with minimal layer and minimal column value.

pnt_Xy

Corner point corresponds to the point with maximal column value but minimal layer value.

pnt_xY

Corner point corresponds to the point with minimal column value but maximal layer value.

pnt_XY

Corner point corresponds to the point with maximal layer and maximal columns value.

spatial_key

The key name in adata.obsm of the spatial coordinates. Default to “spatial”.

dgl_layer_key

The key name in adata.obs to store layer digital-heat (temperature). Default to “digital_layer”.

dgl_column_key

The key name to store column digital-heat (temperature).

max_itr

Maximum number of iterations dedicated to solving the heat equation.

lh

lowest digital-heat (temperature). Defaults to 1.

hh

highest digital-heat (temperature). Defaults to 100.

Returns:

  1. dgl_layer_key: The key in adata.obs points to the values of layer digital-heat (temperature).

  2. dgl_column_key: The key in adata.obs points to the values of column digital-heat (temperature).

Return type:

Nothing but update the adata object with the following keys in .obs

spateo.digitization.gridit(adata: spateo.digitization.utils.AnnData, layer_num: int, column_num: int, lh: float = 1, hh: float = 100, dgl_layer_key: str = 'digital_layer', dgl_column_key: str = 'digital_column', layer_border_width: int = 2, column_border_width: int = 2, layer_label_key: str = 'layer_label', column_label_key: str = 'column_label', grid_label_key: str = 'grid_label') None[source]#

Segment the area of interests into specific layer/column number, according to precomputed digitization heat value.

Parameters:
adata

The adata object to do layer/column/grid segmentation.

layer_num

Number of layers to segment.

column_num

Number of columns to segment.

lh

lowest digital-heat. Default to 1.

hh

highest digi-heat. Default to 100.

layer_border_width

Layer boundary width. Only affect grid_label.

column_border_width

Column boundary width. Only affect grid_label.

dgl_layer_key

The key name of layer digitization heat in adata.obs. Default to “digital_layer”, precomputed.

dgl_column_key

The key name of column digitization heat in adata.obs. Default to “digital_column”, precomputed.

layer_label_key

The key name to store layer labels in adata.obs. Default to “layer_label”, will be added.

column_label_key

The key name to store column labels in adata.obs. Default to “column_label”, will be added.

grid_label_key

The key name to store grid labels in adata.obs. Default to “grid_label”, will be added.

Returns:

  1. layer_label_key: this key points to layer labels.

  2. column_label_key: this key points to column labels.

  3. grid_label_key: this key points to grid labels.

Return type:

Nothing but update the adata object with the following keys in .obs

spateo.digitization.order_borderline(borderline_img: numpy.ndarray, pt_start: Tuple[int, int], pt_end: Tuple[int, int]) Tuple[List, numpy.ndarray][source]#

Retrieve the borderline segment given the start end end point with the coordinates ordered.

Parameters:
borderline_img

The matrix that stores the image of the borderline.

pt_start

The coordinate tuple of the start point.

pt_end

The coordinate tuple of the start point.

Returns:

List of points along the borderline segment. ordered_bdl_img: A numpy aray that stores the image of the borderline segment.

Return type:

ordered_bdl_list