spateo.tools.find_neighbors
#
Functions for finding nearest neighbors and the distances between them in spatial transcriptomics data.
Module Contents#
Functions#
Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a |
|
Given an AnnData object, compute distance array in gene expression space. |
|
Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space |
|
|
Remove values greater than a threshold from a sparse matrix. |
Creates graph based on distance in space. |
|
Starting from a k-nearest neighbor graph, generate a nearest neighbor graph. |
|
|
Calculate normalized gaussian value for a given distance from central point |
|
Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard |
Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge |
|
|
Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance. |
|
Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all |
|
Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its |
|
Given AnnData object and key to array of x- and y-coordinates, first "collapse" the dataset by aggregating |
|
Constructing bucket-to-bucket nearest neighbors graph. |
|
Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array. |
- spateo.tools.find_neighbors.weighted_spatial_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', fixed: str = 'n_neighbors', n_neighbors_method: str = 'ball_tree', n_neighbors: int = 30, decay_type: str = 'reciprocal', p: float = 0.05, sigma: float = 100) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData] [source]#
Given an AnnData object, compute distance array with either a fixed number of neighbors for each bucket or a fixed search radius for each bucket. Additional note: parameters ‘p’ and ‘sigma’ (used only if ‘fixed’ is ‘radius’) are used to modulate the radius when defining neighbors using a fixed radius. ‘Sigma’ parameterizes the standard deviation (e.g. in pixels, micrometers, etc.) of a Gaussian distribution that is centered at a particular bucket with height ‘a’- to search for that bucket’s neighbors, ‘p’ is the cutoff height of the Gaussian, as a proportion of the peak height ‘a’. Essentially, to define the radius that should be used for all buckets, this function measures how far out from each bucket you would need to go before the Gaussian decays to e.g. 0.05 of its peak height. With knowledge of e.g. diffusion kinetics for particular soluble factors, the neighborhood can be defined taking this into account.
- Parameters
- adata
an anndata object.
- spatial_key
Key in .obsm containing coordinates for each bucket.
- fixed
Options: ‘n_neighbors’, ‘radius’- sets either fixed number of neighbors or fixed search radius for each bucket.
- n_neighbors_method
Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”. Unused unless ‘fixed’ is ‘n_neighbors’.
- n_neighbors
Number of neighbors each bucket has. Unused unless ‘fixed’ is ‘n_neighbors’.
- decay_type
Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”. Unused unless ‘fixed’ is ‘n_neighbors’.
- p
Cutoff for Gaussian (used to find where distribution drops below p * (max_value)). Unused unless ‘fixed’ is ‘radius’.
- sigma
Standard deviation of the Gaussian. Unused unless ‘fixed’ is ‘radius’.
- Returns
Weighted nearest neighbors graph with shape [n_samples, n_samples] distance_graph: Unweighted graph with shape [n_samples, n_samples] adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.
- Return type
out_graph
- spateo.tools.find_neighbors.weighted_expr_neighbors_graph(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30, num_neighbors: int = 30, decay_type: str = 'reciprocal') Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData] [source]#
Given an AnnData object, compute distance array in gene expression space.
- Parameters
- adata
an anndata object.
- nbr_object
An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
- basis
str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X
- n_neighbors_method
str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
- n_pca_components
Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.
- num_neighbors
Number of neighbors for each bucket, used in computing distance graph
- decay_type
Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.
- Returns
Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distance’ in .obsp and ‘spatial_neighbors’ in .uns.
- Return type
out_graph
- spateo.tools.find_neighbors.transcriptomic_connectivity(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30) Tuple[sklearn.neighbors.NearestNeighbors, anndata.AnnData] [source]#
Given an AnnData object, compute pairwise connectivity matrix in transcriptomic space
- Parameters
- adata
an anndata object.
- nbr_object
An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
- basis
str, default ‘pca’ The space that will be used for nearest neighbor search. Valid names includes, for example, pca, umap, or X
- n_neighbors_method
str, default ‘ball_tree’ Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
- n_pca_components
Only used if ‘basis’ is ‘pca’. Sets number of principal components to compute.
- num_neighbors
Number of neighbors for each bucket, used in computing distance graph
- Returns
Object of class sklearn.neighbors.NearestNeighbors adata : Modified AnnData object
- Return type
nbrs
- spateo.tools.find_neighbors.remove_greater_than(graph: scipy.sparse.csr_matrix, threshold: float, copy: bool = False, verbose: bool = False) scipy.sparse.csr_matrix [source]#
Remove values greater than a threshold from a sparse matrix.
- Parameters
- graph
The input scipy matrix of the graph.
- threshold
Upper numerical threshold to avoid filtering.
- copy
Set True to avoid altering original graph.
- verbose
Set True to display messages at runtime- not recommended generally since this will print entire arrays.
- Returns
The updated graph with values greater than the threshold removed.
- Return type
graph
- spateo.tools.find_neighbors.generate_spatial_distance_graph(locations: numpy.ndarray, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', num_neighbors: Union[None, int] = None, radius: Union[None, float] = None) Tuple[sklearn.neighbors.NearestNeighbors, scipy.sparse.csr_matrix] [source]#
Creates graph based on distance in space.
- Parameters
- locations
Spatial coordinates for each bucket with shape [n_samples, 2]
- nbr_object
An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
- method
Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
- num_neighbors
Number of neighbors for each bucket.
- radius
Search radius around each bucket.
- Returns
sklearn NearestNeighbor object. graph_out: A sparse matrix of the spatial graph.
- Return type
nbrs
- spateo.tools.find_neighbors.generate_spatial_weights_fixed_nbrs(adata: anndata.AnnData, spatial_key: str = 'spatial', num_neighbors: int = 10, method: str = 'ball_tree', decay_type: str = 'reciprocal', nbr_object: sklearn.neighbors.NearestNeighbors = None) Union[Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData]] [source]#
Starting from a k-nearest neighbor graph, generate a nearest neighbor graph.
- Parameters
- spatial_key
Key in .obsm where x- and y-coordinates are stored.
- num_neighbors
Number of neighbors each bucket has.
- method
Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options:
- "kd_tree". : "ball_tree" and
- decay_type
Sets method by which edge weights are defined. Options: “reciprocal”, “ranked”, “uniform”.
- Returns
Weighted k-nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.
- Return type
out_graph
- spateo.tools.find_neighbors.gaussian_weight_2d(distance: float, sigma: float) float [source]#
Calculate normalized gaussian value for a given distance from central point Normalized by 2*pi*sigma-squared
- spateo.tools.find_neighbors.p_equiv_radius(p: float, sigma: float) float [source]#
Find radius at which you eliminate fraction p of a radial Gaussian probability distribution with standard deviation sigma.
- spateo.tools.find_neighbors.generate_spatial_weights_fixed_radius(adata: anndata.AnnData, spatial_key: str = 'spatial', p: float = 0.05, sigma: float = 100, nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', verbose: bool = False) Tuple[scipy.sparse.csr_matrix, scipy.sparse.csr_matrix, anndata.AnnData] [source]#
Starting from a radius-based neighbor graph, generate a sparse graph (csr format) with weighted edges, where edge weights decay with distance.
Note that decay is assumed to follow a Gaussian distribution.
- Parameters
- spatial_key
Key in .obsm where x- and y-coordinates are stored.
- p
Cutoff for Gaussian (used to find where distribution drops below p * (max_value)).
- sigma
Standard deviation of the Gaussian.
- method
Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
- Returns
Weighted nearest neighbors graph with shape [n_samples, n_samples]. distance_graph: Unweighted graph with shape [n_samples, n_samples]. adata: Updated AnnData object containing ‘spatial_distances’,’spatial_weights’,’spatial_connectivities’ in .obsp and ‘spatial_neighbors’ in .uns.
- Return type
out_graph
- spateo.tools.find_neighbors.calculate_distance(position: numpy.ndarray, dist_metric: str = 'euclidean') numpy.ndarray [source]#
Given array of x- and y-coordinates, compute pairwise distances between all samples using Euclidean distance.
- spateo.tools.find_neighbors.construct_spatial_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData [source]#
Given AnnData object and key to array of x- and y-coordinates, compute pairwise spatial distances between all samples.
- Parameters
- adata
An AnnData object.
- spatial_key
Key in .obsm in which x- and y-coordinates are stored.
- dist_metric
Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
- min_dist_threshold
Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
- max_dist_threshold
Optional, used to remove clusters of isolated cells close to one another but far from all other cells.
- Returns
Input AnnData object with spatial distance matrix in .obsp.
- Return type
adata
- spateo.tools.find_neighbors.construct_geodesic_distance_matrix(adata: anndata.AnnData, spatial_key: str = 'spatial', nbr_object: sklearn.neighbors.NearestNeighbors = None, method: str = 'ball_tree', n_neighbors: int = 30, min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None) anndata.AnnData [source]#
Given AnnData object and key to array of x- and y-coordinates, compute geodesic distance each sample and its nearest neighbors (geodesic distance is the shortest path between vertices, where paths are lines in space that connect points).
- Parameters
- adata
AnnData object.
- spatial_key
Key in .obsm in which x- and y-coordinates are stored.
- nbr_object
An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality.
- method
Specifies algorithm to use in computing neighbors using sklearn’s implementation. Options: “ball_tree” and “kd_tree”.
- n_neighbors
For each bucket, number of neighbors to include in the distance matrix.
- min_dist_threshold
Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
- max_dist_threshold
Optional, used to remove clusters of isolated cells close to one another but far from all other cells.
- Returns
Input AnnData object with spatial distance matrix and geodesic distance matrix in .obsp.
- Return type
adata
- spateo.tools.find_neighbors.construct_binned_spatial_distance(adata: anndata.AnnData, bin_size: int = 1, coords_key: str = 'spatial', distance_method: str = 'spatial', min_dist_threshold: Optional[float] = None, max_dist_threshold: Optional[float] = None, distance_metric: Optional[str] = 'euclidean', n_neighbors: Optional[int] = 30)[source]#
Given AnnData object and key to array of x- and y-coordinates, first “collapse” the dataset by aggregating nearby cells together into bins, and then compute pairwise spatial distances between all samples.
- Parameters
- adata
AnnData object.
- bin_size
Shrinking factor to be applied to spatial coordinates; the size of this factor dictates the size of the regions that will be combined into one pseudo-cell (larger -> generally higher number of cells in each bin).
- coords_key
Key in .obsm in which spatial coordinates are stored.
- distance_method
Options: “spatial” and “geodesic”, indicating that pairwise spatial distance or pairwise geodesic distance should be computed, respectively.
- distance_metric
Optional, can be used to change the distance metric used when “distance_method” is “spatial”. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
- min_dist_threshold
Optional, sets the max allowable distance that a cell can be from its nearest neighbor to avoid being filtered out. Used to remove singular isolated cells.
- max_dist_threshold
Optional, used to remove clusters of isolated cells close to one another but far from all other cells.
- n_neighbors
For each bucket, number of neighbors to include in the distance matrix. Must be given if “distance_method” is “geodesic”.
- Returns
New AnnData object generated by the binning process. M: Pairwise distance array.
- Return type
adata_binned
- spateo.tools.find_neighbors.construct_nn_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', n_neighbors: int = 8, exclude_self: bool = True, save_id: Union[None, str] = None) None [source]#
Constructing bucket-to-bucket nearest neighbors graph.
- Parameters
- adata
An anndata object.
- spatial_key
Key in .obsm in which x- and y-coordinates are stored.
- dist_metric
Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’.
- n_neighbors
Number of nearest neighbors to compute for each bucket.
- exclude_self
Set True to set elements along the diagonal to zero.
- save_id
Optional string; if not None, will save distance matrix and neighbors matrix to path:
- path : './neighbors/{save_id}_distance.csv' and
‘./neighbors/{save_id}_neighbors.csv’, respectively.
- spateo.tools.find_neighbors.normalize_adj(adj: numpy.ndarray, exclude_self: bool = True) numpy.ndarray [source]#
Symmetrically normalize adjacency matrix, set diagonal to 1 and return processed adjacency array.
- Parameters
- adj
Pairwise distance matrix of shape [n_samples, n_samples].
- exclude_self
Set True to set diagonal of adjacency matrix to 1.
- Returns
The normalized adjacency matrix.
- Return type
adj_proc