spateo.tools.spatial_smooth
#
Module Contents#
Functions#
|
Leverages neighborhood information to smooth gene expression. |
|
Compute the Jaccard similarity matrix for input data with rows corresponding to samples and columns |
|
Computes the median value of a sparse matrix, used here for determining a threshold value for Jaccard similarity. |
|
Helper function for parallelization of smoothing via probabilistic selection of expression values. |
|
Helper function for parallelization of smoothing via probabilistic selection of expression values. |
|
Sample feature values probabilistically based on weights matrix W. |
|
Given dense spatial weights matrix W and number of random neighbors n to take, perform subsampling. |
|
Given sparse spatial weights matrix W and number of random neighbors n to take, perform subsampling. |
- spateo.tools.spatial_smooth.smooth(X: numpy.ndarray | scipy.sparse.csr_matrix, W: numpy.ndarray | scipy.sparse.csr_matrix, ct: numpy.ndarray | None = None, gene_expr_subset: numpy.ndarray | scipy.sparse.csr_matrix | None = None, min_jaccard: float | None = 0.05, manual_mask: numpy.ndarray | None = None, normalize_W: bool = True, return_discrete: bool = False, smoothing_threshold: int | None = None, n_subsample: int | None = None, return_W: bool = False) Tuple[scipy.sparse.csr_matrix, numpy.ndarray | scipy.sparse.csr_matrix | None, numpy.ndarray | None] [source]#
Leverages neighborhood information to smooth gene expression.
- Parameters:
- X
Gene expression array or sparse matrix (shape n x m, where n is the number of cells and m is the number of genes)
- W
Spatial weights matrix (shape n x n)
- ct
Optional, indicates the cell type label for each cell (shape n x 1). If given, will smooth only within each cell type.
- gene_expr_subset
Optional, array corresponding to the expression of select genes (shape n x k, where k is the number of genes in the subset). If given, will smooth only over cells that largely match the expression patterns over these genes (assessed using a Jaccard index threshold that is greater than the median score).
- min_jaccard
Optional, and only used if ‘gene_expr_subset’ is also given. Minimum Jaccard similarity score to be considered “nonzero”.
- manual_mask
Optional, binary array of shape n x n. For each cell (row), manually indicate which neighbors ( if any) to use for smoothing.
- normalize_W
Set True to scale the rows of the weights matrix to sum to 1. Use this to smooth by taking an average over the entire neighborhood, including zeros. Set False to take the average over only the nonzero elements in the neighborhood.
- return_discrete
Set True to return
- smoothing_threshold
Optional, sets the threshold for smoothing in terms of the number of neighboring cells that must express each gene for a cell to be smoothed for that gene. The more gene-expressing neighbors, the more confidence in the biological signal.
- n_subsample
Optional, sets the number of random neighbor samples to use in the smoothing. If not given, will use all neighbors (nonzero weights) for each cell.
- return_W
Set True to return the weights matrix post-processing
- Returns:
Smoothed gene expression array or sparse matrix W: If return_W is True, returns the weights matrix post-processing d: Only if normalize_W is True, returns the row sums of the weights matrix
- Return type:
x_new
- spateo.tools.spatial_smooth.compute_jaccard_similarity_matrix(data: numpy.ndarray | scipy.sparse.csr_matrix, chunk_size: int = 1000, min_jaccard: float = 0.1) numpy.ndarray [source]#
Compute the Jaccard similarity matrix for input data with rows corresponding to samples and columns corresponding to features, processing in chunks for memory efficiency.
- Parameters:
- data
A dense numpy array or a sparse matrix in CSR format, with rows as features
- chunk_size
The number of rows to process in a single chunk
- min_jaccard
Minimum Jaccard similarity to be considered “nonzero”
- Returns:
A square matrix of Jaccard similarity coefficients
- Return type:
jaccard_matrix
- spateo.tools.spatial_smooth.sparse_matrix_median(spmat: scipy.sparse.csr_matrix, nonzero_only: bool = False) scipy.sparse.csr_matrix [source]#
Computes the median value of a sparse matrix, used here for determining a threshold value for Jaccard similarity.
- Parameters:
- spmat
The sparse matrix to compute the median value of
- nonzero_only
If True, only consider nonzero values in the sparse matrix
- Returns:
The median value of the sparse matrix
- Return type:
median_value
- spateo.tools.spatial_smooth.smooth_process_column(i: int, X: numpy.ndarray | scipy.sparse.csr_matrix, W: numpy.ndarray | scipy.sparse.csr_matrix, threshold: float) scipy.sparse.csr_matrix [source]#
Helper function for parallelization of smoothing via probabilistic selection of expression values.
- Parameters:
- i
Index of the column to be processed
- X
Dense or sparse array input data matrix
- W
Dense or sparse array pairwise spatial weights matrix
- threshold
Threshold value for the number of feature-expressing neighbors for a given row to be included in the smoothing.
- random_state
Optional, set a random seed for reproducibility
- Returns:
Processed column after probabilistic smoothing
- Return type:
smoothed_column
- spateo.tools.spatial_smooth.get_eligible_rows(W: numpy.ndarray | scipy.sparse.csr_matrix, feat: numpy.ndarray | scipy.sparse.csr_matrix, threshold: float) numpy.ndarray [source]#
Helper function for parallelization of smoothing via probabilistic selection of expression values.
- Parameters:
- W
Dense or sparse array pairwise spatial weights matrix
- feat
1D array of feature expression values
- threshold
Threshold value for the number of feature-expressing neighbors for a given row to be included in the smoothing.
- Returns:
Array of row indices that meet the threshold criterion
- Return type:
eligible_rows
- spateo.tools.spatial_smooth.sample_from_eligible_neighbors(W: numpy.ndarray | scipy.sparse.csr_matrix, feat: numpy.ndarray | scipy.sparse.csr_matrix, eligible_rows: numpy.ndarray)[source]#
Sample feature values probabilistically based on weights matrix W.
- Parameters:
- W
Dense or sparse array pairwise spatial weights matrix
- feat
1D array of feature expression values
- eligible_rows
Array of row indices that meet a prior-determined threshold criterion
- Returns:
Array of sampled values
- Return type:
sampled_values
- spateo.tools.spatial_smooth.subsample_neighbors_dense(W: numpy.ndarray, n: int, verbose: bool = False) numpy.ndarray [source]#
Given dense spatial weights matrix W and number of random neighbors n to take, perform subsampling.
- Parameters:
- W
Spatial weights matrix
- n
Number of neighbors to keep for each row
- verbose
Set True to print warnings for cells with fewer than n neighbors
- Returns:
Subsampled spatial weights matrix
- Return type:
W_new
- spateo.tools.spatial_smooth.subsample_neighbors_sparse(W: scipy.sparse.csr_matrix, n: int, verbose: bool = False) scipy.sparse.csr_matrix [source]#
Given sparse spatial weights matrix W and number of random neighbors n to take, perform subsampling.
- Parameters:
- W
Spatial weights matrix
- n
Number of neighbors to keep for each row
- verbose
Set True to print warnings for cells with fewer than n neighbors
- Returns:
Subsampled spatial weights matrix
- Return type:
W_new