spateo.tools.spatial_smooth#

Module Contents#

Functions#

smooth(→ Tuple[scipy.sparse.csr_matrix, ...)

Leverages neighborhood information to smooth gene expression.

compute_jaccard_similarity_matrix(→ numpy.ndarray)

Compute the Jaccard similarity matrix for input data with rows corresponding to samples and columns

sparse_matrix_median(→ scipy.sparse.csr_matrix)

Computes the median value of a sparse matrix, used here for determining a threshold value for Jaccard similarity.

smooth_process_column(→ scipy.sparse.csr_matrix)

Helper function for parallelization of smoothing via probabilistic selection of expression values.

get_eligible_rows(→ numpy.ndarray)

Helper function for parallelization of smoothing via probabilistic selection of expression values.

sample_from_eligible_neighbors(W, feat, eligible_rows)

Sample feature values probabilistically based on weights matrix W.

subsample_neighbors_dense(→ numpy.ndarray)

Given dense spatial weights matrix W and number of random neighbors n to take, perform subsampling.

subsample_neighbors_sparse(→ scipy.sparse.csr_matrix)

Given sparse spatial weights matrix W and number of random neighbors n to take, perform subsampling.

spateo.tools.spatial_smooth.smooth(X: numpy.ndarray | scipy.sparse.csr_matrix, W: numpy.ndarray | scipy.sparse.csr_matrix, ct: numpy.ndarray | None = None, gene_expr_subset: numpy.ndarray | scipy.sparse.csr_matrix | None = None, min_jaccard: float | None = 0.05, manual_mask: numpy.ndarray | None = None, normalize_W: bool = True, return_discrete: bool = False, smoothing_threshold: int | None = None, n_subsample: int | None = None, return_W: bool = False) Tuple[scipy.sparse.csr_matrix, numpy.ndarray | scipy.sparse.csr_matrix | None, numpy.ndarray | None][source]#

Leverages neighborhood information to smooth gene expression.

Parameters:
X

Gene expression array or sparse matrix (shape n x m, where n is the number of cells and m is the number of genes)

W

Spatial weights matrix (shape n x n)

ct

Optional, indicates the cell type label for each cell (shape n x 1). If given, will smooth only within each cell type.

gene_expr_subset

Optional, array corresponding to the expression of select genes (shape n x k, where k is the number of genes in the subset). If given, will smooth only over cells that largely match the expression patterns over these genes (assessed using a Jaccard index threshold that is greater than the median score).

min_jaccard

Optional, and only used if ‘gene_expr_subset’ is also given. Minimum Jaccard similarity score to be considered “nonzero”.

manual_mask

Optional, binary array of shape n x n. For each cell (row), manually indicate which neighbors ( if any) to use for smoothing.

normalize_W

Set True to scale the rows of the weights matrix to sum to 1. Use this to smooth by taking an average over the entire neighborhood, including zeros. Set False to take the average over only the nonzero elements in the neighborhood.

return_discrete

Set True to return

smoothing_threshold

Optional, sets the threshold for smoothing in terms of the number of neighboring cells that must express each gene for a cell to be smoothed for that gene. The more gene-expressing neighbors, the more confidence in the biological signal.

n_subsample

Optional, sets the number of random neighbor samples to use in the smoothing. If not given, will use all neighbors (nonzero weights) for each cell.

return_W

Set True to return the weights matrix post-processing

Returns:

Smoothed gene expression array or sparse matrix W: If return_W is True, returns the weights matrix post-processing d: Only if normalize_W is True, returns the row sums of the weights matrix

Return type:

x_new

spateo.tools.spatial_smooth.compute_jaccard_similarity_matrix(data: numpy.ndarray | scipy.sparse.csr_matrix, chunk_size: int = 1000, min_jaccard: float = 0.1) numpy.ndarray[source]#

Compute the Jaccard similarity matrix for input data with rows corresponding to samples and columns corresponding to features, processing in chunks for memory efficiency.

Parameters:
data

A dense numpy array or a sparse matrix in CSR format, with rows as features

chunk_size

The number of rows to process in a single chunk

min_jaccard

Minimum Jaccard similarity to be considered “nonzero”

Returns:

A square matrix of Jaccard similarity coefficients

Return type:

jaccard_matrix

spateo.tools.spatial_smooth.sparse_matrix_median(spmat: scipy.sparse.csr_matrix, nonzero_only: bool = False) scipy.sparse.csr_matrix[source]#

Computes the median value of a sparse matrix, used here for determining a threshold value for Jaccard similarity.

Parameters:
spmat

The sparse matrix to compute the median value of

nonzero_only

If True, only consider nonzero values in the sparse matrix

Returns:

The median value of the sparse matrix

Return type:

median_value

spateo.tools.spatial_smooth.smooth_process_column(i: int, X: numpy.ndarray | scipy.sparse.csr_matrix, W: numpy.ndarray | scipy.sparse.csr_matrix, threshold: float) scipy.sparse.csr_matrix[source]#

Helper function for parallelization of smoothing via probabilistic selection of expression values.

Parameters:
i

Index of the column to be processed

X

Dense or sparse array input data matrix

W

Dense or sparse array pairwise spatial weights matrix

threshold

Threshold value for the number of feature-expressing neighbors for a given row to be included in the smoothing.

random_state

Optional, set a random seed for reproducibility

Returns:

Processed column after probabilistic smoothing

Return type:

smoothed_column

spateo.tools.spatial_smooth.get_eligible_rows(W: numpy.ndarray | scipy.sparse.csr_matrix, feat: numpy.ndarray | scipy.sparse.csr_matrix, threshold: float) numpy.ndarray[source]#

Helper function for parallelization of smoothing via probabilistic selection of expression values.

Parameters:
W

Dense or sparse array pairwise spatial weights matrix

feat

1D array of feature expression values

threshold

Threshold value for the number of feature-expressing neighbors for a given row to be included in the smoothing.

Returns:

Array of row indices that meet the threshold criterion

Return type:

eligible_rows

spateo.tools.spatial_smooth.sample_from_eligible_neighbors(W: numpy.ndarray | scipy.sparse.csr_matrix, feat: numpy.ndarray | scipy.sparse.csr_matrix, eligible_rows: numpy.ndarray)[source]#

Sample feature values probabilistically based on weights matrix W.

Parameters:
W

Dense or sparse array pairwise spatial weights matrix

feat

1D array of feature expression values

eligible_rows

Array of row indices that meet a prior-determined threshold criterion

Returns:

Array of sampled values

Return type:

sampled_values

spateo.tools.spatial_smooth.subsample_neighbors_dense(W: numpy.ndarray, n: int, verbose: bool = False) numpy.ndarray[source]#

Given dense spatial weights matrix W and number of random neighbors n to take, perform subsampling.

Parameters:
W

Spatial weights matrix

n

Number of neighbors to keep for each row

verbose

Set True to print warnings for cells with fewer than n neighbors

Returns:

Subsampled spatial weights matrix

Return type:

W_new

spateo.tools.spatial_smooth.subsample_neighbors_sparse(W: scipy.sparse.csr_matrix, n: int, verbose: bool = False) scipy.sparse.csr_matrix[source]#

Given sparse spatial weights matrix W and number of random neighbors n to take, perform subsampling.

Parameters:
W

Spatial weights matrix

n

Number of neighbors to keep for each row

verbose

Set True to print warnings for cells with fewer than n neighbors

Returns:

Subsampled spatial weights matrix

Return type:

W_new