spateo.tools.CCI_effects_modeling¶
General and generalized linear modeling of spatial transcriptomics
Option to call functions from CCI_effects_modeling (e.g. st.tl.CCI_effects_modeling.Niche_Model) or directly from Spateo (e.g. st.tl.Niche_Model).
Submodules¶
Classes¶
Spatially weighted regression on spatial omics data with parallel processing. Runs after being called |
|
Interpretation and downstream analysis of spatially weighted regression models. |
|
Various methods to select initial targets or predictors for intercellular analyses. |
Functions¶
|
Defines and returns MPI and argparse objects for model fitting and interpretation. |
Package Contents¶
- class spateo.tools.CCI_effects_modeling.MuSIC(parser: argparse.ArgumentParser, args_list: List[str] | None = None, verbose: bool = True, save_subsampling: bool = True)[source]¶
Spatially weighted regression on spatial omics data with parallel processing. Runs after being called from the command line.
- Parameters:
- comm
MPI communicator object initialized with mpi4py, to control parallel processing operations
- parser
ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling.
- args_list
If parser is provided by function call, the arguments to parse must be provided as a separate list. It is recommended to use the return from :func define_spateo_argparse() for this.
- verbose
Set True to print updates to screen. Will be set False when initializing downstream analysis object, which inherits from this class but for which the information is generally not as useful.
- save_subsampling
Set True to save the subsampled data to a .json file. Defaults to True, recommended to set True for ease of access to the subsampling results.
- mod_type¶
The type of model that will be employed- this dictates how the data will be processed and prepared. Options:
“niche”: Spatially-aware, uses categorical cell type labels as independent variables.
- “lr”: Spatially-aware, essentially uses the combination of receptor expression in the “target” cell
and spatially lagged ligand expression in the neighboring cells as independent variables.
- “ligand”: Spatially-aware, essentially uses ligand expression in the neighboring cells as
independent variables.
“receptor”: Uses receptor expression in the “target” cell as independent variables.
- adata_path¶
Path to the AnnData object from which to extract data for modeling
- csv_path¶
Can also be used to specify path to non-AnnData .csv object. Assumes the first three columns contain x- and y-coordinates and then dependent variable values, in that order, with all subsequent columns containing independent variable values.
- normalize¶
Set True to perform library size normalization, to set total counts in each cell to the same number (adjust for cell size).
- smooth¶
Set True to correct for dropout effects by leveraging gene expression neighborhoods to smooth expression. It is advisable not to do this if performing Poisson or negative binomial regression.
- log_transform¶
Set True if log-transformation should be applied to expression. It is advisable not to do this if performing Poisson or negative binomial regression.
- normalize_signaling¶
Set True to minmax scale the final ligand expression array (for :attr mod_type = “ligand”), or the final ligand-receptor array (for :attr mod_type = “lr”). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms.
- target_expr_threshold¶
Only used if :param mod_type is “lr” or “ligand” and :param targets_path is not given. When manually selecting targets, expression above a threshold percentage of cells will be used to filter to a smaller subset of interesting genes. Defaults to 0.2.
- multicollinear_threshold¶
Variance inflation factor threshold used to filter out multicollinear features. A value of 5 or 10 is recommended.
- custom_lig_path¶
Optional path to a .txt file containing a list of ligands for the model, separated by newlines. Only used if :attr mod_type is “lr” or “ligand” (and thus uses ligand expression directly in the inference). If not provided, will select ligands using a threshold based on expression levels in the data.
- custom_ligands¶
Optional list of ligands for the model, can be used as an alternative to :attr custom_lig_path. Only used if :attr mod_type is “lr” or “ligand”.
- custom_rec_path¶
Optional path to a .txt file containing a list of receptors for the model, separated by newlines. Only used if :attr mod_type is “lr” (and thus uses receptor expression directly in the inference). If not provided, will select receptors using a threshold based on expression levels in the data.
- custom_receptors¶
Optional list of receptors for the model, can be used as an alternative to :attr custom_rec_path. Only used if :attr mod_type is “lr”.
- custom_pathways_path¶
Rather than providing a list of receptors, can provide a list of signaling pathways- all receptors with annotations in this pathway will be included in the model. Only used if :attr mod_type is “lr”.
- custom_pathways¶
Optional list of signaling pathways for the model, can be used as an alternative to :attr custom_pathways_path. Only used if :attr mod_type is “lr”.
- targets_path¶
Optional path to a .txt file containing a list of prediction target genes for the model, separated by newlines. If not provided, targets will be strategically selected from the given receptors.
- custom_targets¶
Optional list of prediction target genes for the model, can be used as an alternative to :attr targets_path.
- init_betas_path¶
Optional path to a .json file or .csv file containing initial coefficient values for the model for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ].
- cci_dir¶
Full path to the directory containing cell-cell communication databases
- species¶
Selects the cell-cell communication database the relevant ligands will be drawn from. Options: “human”, “mouse”.
- output_path¶
Full path name for the .csv file in which results will be saved
- coords_key¶
Key in .obsm of the AnnData object that contains the coordinates of the cells
- group_key¶
Key in .obs of the AnnData object that contains the category grouping for each cell
- group_subset¶
Subset of cell types to include in the model (provided as a whitespace-separated list in command line). If given, will consider only cells of these types in modeling. Defaults to all cell types.
- covariate_keys¶
Can be used to optionally provide any number of keys in .obs or .var containing a continuous covariate (e.g. expression of a particular TF, avg. distance from a perturbed cell, etc.)
- total_counts_key¶
Entry in
adata
.obs that contains total counts for each cell. Required if subsetting by total counts.
- total_counts_threshold¶
Threshold for total counts to subset cells by- cells with total counts greater than this threshold will be retained.
- bw¶
Used to provide previously obtained bandwidth for the spatial kernel. Consists of either a distance value or N for the number of nearest neighbors. Pass “np.inf” if all other points should have the same spatial weight.
- minbw¶
For use in automated bandwidth selection- the lower-bound bandwidth to test.
- maxbw¶
For use in automated bandwidth selection- the upper-bound bandwidth to test.
- distr¶
Distribution family for the dependent variable; one of “gaussian”, “poisson”, “nb”
- kernel¶
Type of kernel function used to weight observations; one of “bisquare”, “exponential”, “gaussian”, “quadratic”, “triangular” or “uniform”.
- n_neighbors_membrane_bound¶
For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for membrane-bound ligands.
- n_neighbors_secreted¶
For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for secreted or ECM ligands.
- use_expression_neighbors¶
The default for finding spatial neighborhoods for the modeling process is to use neighbors in physical space. If this argument is provided, expression will instead be used to find neighbors.
- bw_fixed¶
Set True for distance-based kernel function and False for nearest neighbor-based kernel function
- exclude_self¶
If True, ignore each sample itself when computing the kernel density estimation
- fit_intercept¶
Set True to include intercept in the model and False to exclude intercept
- logger¶
- parser¶
- args_list¶
- verbose¶
- save_subsampling¶
- mod_type = None¶
- species = None¶
- ligands = None¶
- receptors = None¶
- targets = None¶
- normalize = None¶
- smooth = None¶
- log_transform = None¶
- target_expr_threshold = None¶
- coords = None¶
- groups = None¶
- y = None¶
- X = None¶
- bw = None¶
- minbw = None¶
- maxbw = None¶
- distr = None¶
- kernel = None¶
- n_samples = None¶
- n_features = None¶
- set_up = False¶
- X_df = None¶
- adata = None¶
- cell_categories = None¶
- clip = None¶
- cof_db = None¶
- ct_vec = None¶
- feature_distance = None¶
- feature_names = None¶
- grn = None¶
- ligands_expr = None¶
- ligands_expr_nonlag = None¶
- lr_db = None¶
- lr_pairs = None¶
- n_samples_subsampled = None¶
- n_samples_subset = None¶
- neighboring_unsampled = None¶
- optimal_bw = None¶
- r_tf_db = None¶
- receptors_expr = None¶
- sample_names = None¶
- subsampled = None¶
- subsampled_sample_names = None¶
- subset = None¶
- subset_indices = None¶
- subset_sample_names = None¶
- targets_expr = None¶
- tf_tf_db = None¶
- x_chunk = None¶
- load_and_process(upstream: bool = False)[source]¶
Load AnnData object and process it for modeling.
- Parameters:
- upstream
Set False if performing the actual model fitting process, True to define only the AnnData object for upstream purposes.
- downstream
Set True if setting up a downstream model- in this case, ligand/receptor preprocessing will be skipped.
- setup_downstream(adata: anndata.AnnData | None = None)[source]¶
Setup for downstream tasks- namely, models for inferring signaling-associated differential expression.
- define_sig_inputs(adata: anndata.AnnData | None = None, recompute: bool = False)[source]¶
For signaling-relevant models, define necessary quantities that will later be used to define the independent variable array- the one-hot cell-type array, the ligand expression array and the receptor expression array.
- Parameters:
- recompute
Re-calculate all quantities and re-save even if already-existing file can be found in path
- run_subsample(verbose: bool = True, y: pandas.DataFrame | None = None)[source]¶
To combat computational intensiveness of this regressive protocol, subsampling will be performed in cases where there are >= 5000 cells or in cases where specific cell types are manually selected for fitting- local fit will be performed only on this subset under the assumption that discovered signals will not be significantly different for the subsampled data.
- New Attributes:
subsampled_indices: Dictionary containing indices of the subsampled cells for each dependent variable n_samples_subsampled: Dictionary containing number of samples to be fit (not total number of samples) for
each dependent variable
- subsampled_sample_names: Dictionary containing lists of names of the subsampled cells for each dependent
variable
- neighboring_unsampled: Dictionary containing a mapping between each unsampled point and the closest
sampled point
- map_new_cells()[source]¶
There may be instances where new cells are added to an AnnData object that has already been fit to- in this instance, accelerate the process by using neighboring results to project model fit to the new cells.
- _set_search_range()[source]¶
Set the search range for the bandwidth selection procedure.
- Parameters:
- y
Array of dependent variable values, used to determine the search range for the bandwidth selection
- _compute_all_wi(bw: float | int, bw_fixed: bool | None = None, exclude_self: bool | None = None, kernel: str | None = None, verbose: bool = False) scipy.sparse.spmatrix [source]¶
Compute spatial weights for all samples in the dataset given a specified bandwidth.
- Parameters:
- bw
Bandwidth for the spatial kernel
- fixed_bw
Whether the bandwidth considers a uniform distance for each sample (True) or a nonconstant distance for each sample that depends on the number of neighbors (False). If not given, will default to self.fixed_bw.
- exclude_self
Whether to include each sample itself as one of its nearest neighbors. If not given, will default to self.exclude_self.
- kernel
Kernel to use for the spatial weights. If not given, will default to self.kernel.
- verbose
Whether to display messages during runtime
- Returns:
Array of weights for all samples in the dataset
- Return type:
wi
- local_fit(i: int, y: numpy.ndarray, X: numpy.ndarray, bw: float | int, y_label: str, coords: numpy.ndarray | None = None, mask_indices: numpy.ndarray | None = None, feature_mask: numpy.ndarray | None = None, final: bool = False, fit_predictor: bool = False) numpy.ndarray | List[float] [source]¶
Fit a local regression model for each sample.
- Parameters:
- i
Index of sample for which local regression model is to be fitted
- y
Response variable
- X
Independent variable array
- bw
Bandwidth for the spatial kernel
- y_label
Name of the response variable
- coords
Can be optionally used to provide coordinates for samples- used if subsampling was performed to maintain all original sample coordinates (to take original neighborhoods into account)
- mask_indices
Can be optionally used to provide indices of samples to mask out of the dataset
- feature_mask
Can be optionally used to provide a mask for features to mask out of the dataset
- final
Set True to indicate that no additional parameter selection needs to be performed; the model can be fit and more stats can be returned.
- fit_predictor
Set True to indicate that dependent variable to fit is a linear predictor rather than a true response variable
- Returns:
A single output will be given for each case, and can contain either betas or a list w/ combinations of the following:
i: Index of sample for which local regression model was fitted
- diagnostic: Portion of the output to be used for diagnostic purposes- for Gaussian regression,
this is the residual for the fitted response variable value compared to the observed value. For non-Gaussian generalized linear regression, this is the fitted response variable value (which will be used to compute deviance and log-likelihood later on).
- hat_i: Row i of the hat matrix, which is the effect of deleting sample i from the dataset on the
estimated predicted value for sample i
- bw_diagnostic: Output to be used for diagnostic purposes during bandwidth selection- for Gaussian
regression, this is the squared residual, for non-Gaussian generalized linear regression, this is the fitted response variable value. One of the returns if :param final is False
betas: Estimated coefficients for sample i
- leverages: Leverages for sample i, representing the influence of each independent variable on the
predicted values (linear predictor for GLMs, response variable for Gaussian regression).
- find_optimal_bw(range_lowest: float, range_highest: float, function: Callable) float [source]¶
Perform golden section search to find the optimal bandwidth.
- Parameters:
- range_lowest
Lower bound of the search range
- range_highest
Upper bound of the search range
- function
Function to be minimized
- Returns:
Optimal bandwidth
- Return type:
bw
- mpi_fit(y: numpy.ndarray | None, X: numpy.ndarray | None, X_labels: List[str], y_label: str, bw: float | int, coords: numpy.ndarray | None = None, mask_indices: numpy.ndarray | None = None, feature_mask: numpy.ndarray | None = None, final: bool = False, fit_predictor: bool = False) None [source]¶
Fit local regression model for each sample in parallel, given a specified bandwidth.
- Parameters:
- y
Response variable
- X
Independent variable array- if not given, will default to :attr X. Note that if object was initialized using an AnnData object, this will be overridden with :attr X even if a different array is given.
- X_labels
Optional list of labels for the features in the X array. Needed if :attr X passed to the function is not identical to the dependent variable array compiled in preprocessing.
- y_label
Used to provide a unique ID for the dependent variable for saving purposes and to query keys from various dictionaries
- bw
Bandwidth for the spatial kernel
- coords
Coordinates of each point in the X array
- mask_indices
Optional array used to mask out indices in the fitting process
- feature_mask
Optional array used to mask out features in the fitting process
- final
Set True to indicate that no additional parameter selection needs to be performed; the model can be fit and more stats can be returned.
- fit_predictor
Set True to indicate that dependent variable to fit is a linear predictor rather than a true response variable
- fit(y: pandas.DataFrame | None = None, X: numpy.ndarray | None = None, fit_predictor: bool = False, verbose: bool = True) Tuple[None | Dict[str, numpy.ndarray], Dict[str, float]] | None [source]¶
For each column of the dependent variable array, fit model. If given bandwidth, run :func SWR.mpi_fit() with the given bandwidth. Otherwise, compute optimal bandwidth using :func SWR.find_optimal_bw(), minimizing AICc.
- Parameters:
- y
Optional dataframe, can be used to provide dependent variable array directly to the fit function. If None, will use :attr targets_expr computed using the given AnnData object to create this (each individual column will serve as an independent variable). Needed to be given as a dataframe so that column(s) are labeled, so each result can be associated with a labeled dependent variable.
- X
Optional array, can be used to provide dependent variable array directly to the fit function. If None, will use :attr X computed using the given AnnData object and the type of the model to create.
- n_feat
Optional int, can be used to specify one column of the X array to fit to.
- init_betas
Optional dictionary containing arrays with initial values for the coefficients. Keys should correspond to target genes and values should be arrays of shape [n_features, 1].
- fit_predictor
Set True to indicate that dependent variable to fit is a linear predictor rather than a response variable
- verbose
Set True to print out information about the bandwidth selection and/or fitting process.
- predict(input: pandas.DataFrame | None = None, coeffs: numpy.ndarray | Dict[str, pandas.DataFrame] | None = None, adjust_for_subsampling: bool = False) pandas.DataFrame [source]¶
Given input data and learned coefficients, predict the dependent variables.
- Parameters:
- input
Input data to be predicted on.
- coeffs
Coefficients to be used in the prediction. If None, will attempt to load the coefficients learned in the fitting process from file.
- compute_aicc_linear(RSS: float, trace_hat: float, n_samples: int | None = None) float [source]¶
Compute the corrected Akaike Information Criterion (AICc) for the linear GWR model.
- compute_aicc_glm(ll: float, trace_hat: float, n_samples: int | None = None) float [source]¶
Compute the corrected Akaike Information Criterion (AICc) for the generalized linear GWR models. Given by: :math AICc = -2*log-likelihood + 2k + (2k(k+1))/(n_eff-k-1).
- Parameters:
- ll
Model log-likelihood
- trace_hat
Trace of the hat matrix
- n_samples
Number of samples model was fitted to
- output_diagnostics(aicc: float | None = None, ENP: float | None = None, r_squared: float | None = None, deviance: float | None = None, y_label: str | None = None) None [source]¶
Output diagnostic information about the GWR model.
- save_results(data: numpy.ndarray, header: str, label: str | None) None [source]¶
Save the results of the GWR model to file, and return the coefficients.
- Parameters:
- data
Elements of data to save to .csv
- header
Column names
- label
Optional, can be used to provide unique ID to save file- notably used when multiple dependent variables with different names are fit during this process.
- Returns:
Model coefficients
- Return type:
betas
- predict_and_save(input: numpy.ndarray | None = None, coeffs: numpy.ndarray | Dict[str, pandas.DataFrame] | None = None, adjust_for_subsampling: bool = True)[source]¶
Given input data and learned coefficients, predict the dependent variables and then save the output.
- Parameters:
- input
Input data to be predicted on.
- coeffs
Coefficients to be used in the prediction. If None, will attempt to load the coefficients learned in the fitting process from file.
- adjust_for_subsampling
Set True if subsampling was performed; this indicates that the coefficients for the subsampled points need to be extended to the neighboring non-sampled points.
- return_outputs(adjust_for_subsampling: bool = True, load_for_interpreter: bool = False, load_from_downstream: Literal['ligand', 'receptor', 'target_gene'] | None = None) Tuple[Dict[str, pandas.DataFrame], Dict[str, pandas.DataFrame]] [source]¶
Return final coefficients for all fitted models.
- Parameters:
- adjust_for_subsampling
Set True if subsampling was performed; this indicates that the coefficients for the subsampled points need to be extended to the neighboring non-sampled points.
- load_for_interpreter
Set True if this is being called from within instance of :class MuSIC_Interpreter.
- load_from_downstream
Set to “ligand”, “receptor”, or “target_gene” to load coefficients from downstream models where targets are ligands, receptors or target genes. Must be given if “load_downstream” is True.
- Outputs:
all_coeffs: Dictionary containing dataframe consisting of coefficients for each target gene all_se: Dictionary containing dataframe consisting of standard errors for each target gene
- return_intercepts() None | numpy.ndarray | Dict[str, numpy.ndarray] [source]¶
Return final intercepts for all fitted models.
- class spateo.tools.CCI_effects_modeling.MuSIC_Interpreter(parser: argparse.ArgumentParser, args_list: List[str] | None = None, keep_column_threshold_proportion_cells: float | None = None)[source]¶
Bases:
spateo.tools.CCI_effects_modeling.MuSIC.MuSIC
Interpretation and downstream analysis of spatially weighted regression models.
- Parameters:
- parser
ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling.
- args_list
If parser is provided by function call, the arguments to parse must be provided as a separate list. It is recommended to use the return from :func define_spateo_argparse() for this.
- keep_coeff_threshold_proportion_cells
If provided, will threshold columns to only keep those that are nonzero in a proportion of cells greater than this threshold. For example, if this is set to 0.5, more than half of the cells must have a nonzero value for a given column for it to be retained for further inspection. Intended to be used to filter out likely false positives.
- k¶
- n_cells_expressing_targets¶
- downstream_parent_dir¶
- id¶
- dm_dir¶
- downstream_model_ligand_design_matrix¶
- downstream_model_receptor_design_matrix¶
- downstream_model_target_design_matrix¶
- design_matrix¶
- parent_dir¶
- filter_targets¶
- filter_target_threshold¶
- ligand_for_downstream¶
- receptor_for_downstream¶
- pathway_for_downstream¶
- target_for_downstream¶
- sender_ct_for_downstream¶
- receiver_ct_for_downstream¶
- cci_degs_model_interactions¶
- no_cell_type_markers¶
- compute_pathway_effect¶
- diff_sending_or_receiving¶
- compute_coeff_significance(method: str = 'fdr_bh', significance_threshold: float = 0.05)[source]¶
Computes local statistical significance for fitted coefficients.
- Parameters:
- method
- Method to use for correction. Available methods can be found in the documentation for
statsmodels.stats.multitest.multipletests(), and are also listed below (in correct case) for convenience: - Named methods:
bonferroni
sidak
holm-sidak
holm
simes-hochberg
hommel
- Abbreviated methods:
fdr_bh: Benjamini-Hochberg correction
fdr_by: Benjamini-Yekutieli correction
fdr_tsbh: Two-stage Benjamini-Hochberg
fdr_tsbky: Two-stage Benjamini-Krieger-Yekutieli method
significance_threshold: p-value (or q-value) needed to call a parameter significant.
- Returns:
Dataframe of identical shape to coeffs, where each element is True or False if it meets the threshold for significance pvalues: Dataframe of identical shape to coeffs, where each element is a p-value for that instance of that
feature
- qvalues: Dataframe of identical shape to coeffs, where each element is a q-value for that instance of that
feature
- Return type:
is_significant
- filter_adata_spatial(instructions: List[str])[source]¶
Based on spatial coordinates, filter the adata object to only include cells that meet the criteria. Criteria provided in the form of a list of instructions of the form “x less than 0.5 and y greater than 0.5”, etc., where each instruction is executed sequentially.
- Parameters:
- instructions
List of instructions to filter adata object by. Each instruction is a string of the form “x less than 0.5 and y greater than 0.5”, etc., where each instruction is executed sequentially.
- filter_adata_custom(cell_ids: List[str])[source]¶
Filter AnnData object to only the cells specified by the custom list.
- Parameters:
- cell_ids
List of cell IDs to keep. Each ID must be found in adata.obs_names
- add_interaction_effect_to_adata(targets: str | List[str], interactions: str | List[str], visualize: bool = False) anndata.AnnData [source]¶
For each specified interaction/list of interactions, add the predicted interaction effect to the adata object.
- Parameters:
- targets
Target(s) to add interaction effect for. Can be a single target or a list of targets.
- interactions
Interaction(s) to add interaction effect for. Can be a single interaction or a list of interactions. Should be the name of a gene for ligand models, or an L:R pair for L:R models (for example, “Igf1:Igf1r”).
- visualize
Whether to visualize the interaction effect for each target/interaction pair. If True, will generate spatial scatter plot and save to HTML file.
- Returns:
AnnData object with interaction effects added to .obs.
- Return type:
adata
- compute_and_visualize_diagnostics(type: Literal['correlations', 'confusion', 'rmse'], n_genes_per_plot: int = 20)[source]¶
For true and predicted gene expression, compute and generate either: confusion matrices, or correlations, including the Pearson correlation, Spearman correlation, or root mean-squared-error (RMSE).
- Parameters:
- type
Type of diagnostic to compute and visualize. Options: “correlations” for Pearson & Spearman correlation, “confusion” for confusion matrix, “rmse” for root mean-squared-error.
- n_genes_per_plot
Only used if “type” is “confusion”. Number of genes to plot per figure. If there are more than this number of genes, multiple figures will be generated.
- plot_interaction_effect_3D(target: str, interaction: str, save_path: str, pcutoff: float | None = 99.7, min_value: float | None = 0, zero_opacity: float = 1.0, size: float = 2.0, n_neighbors_smooth: int | None = 0)[source]¶
Quick-visualize the magnitude of the predicted effect on target for a given interaction.
- Parameters:
- target
Target gene to visualize
- interaction
Interaction to visualize (e.g. “Igf1:Igf1r” for L:R model, “Igf1” for ligand model)
- save_path
Path to save the figure to (will save as HTML file)
- pcutoff
Percentile cutoff for the colorbar. Will set all values above this percentile to this value.
- min_value
Minimum value to set the colorbar to. Will set all values below this value to this value. Defaults to 0.
- zero_opacity
Opacity of points with zero expression. Between 0.0 and 1.0. Default is 1.0.
- size
Size of the points in the scatter plot. Default is 2.
- n_neighbors_smooth
Number of neighbors to use for smoothing (to make effect patterns more apparent). If 0, no smoothing is applied. Default is 0.
- plot_multiple_interaction_effects_3D(effects: List[str], save_path: str, include_combos_of_two: bool = False)[source]¶
Quick-visualize the magnitude of the predicted effect on target for a given interaction.
- Parameters:
- effects
List of effects to visualize (e.g. [“Igf1:Igf1r”, “Igf1:InsR”] for L:R model, [“Igf1”] for ligand model)
- save_path
Path to save the figure to (will save as HTML file)
- include_combos_of_two
Whether to include paired combinations of effects (e.g. “Igf1:Igf1r and Igf1:InsR”) as separate categories. If False, will include these in the generic “Multiple interactions” category.
- plot_tf_effect_3D(target: str, tf: str, save_path: str, ligand_targets: bool = True, receptor_targets: bool = False, target_gene_targets: bool = False, pcutoff: float = 99.7, min_value: float = 0, zero_opacity: float = 1.0, size: float = 2.0)[source]¶
Quick-visualize the magnitude of the predicted effect on target for a given TF. Can only find the files necessary for this if :func CCI_deg_detection() has been run.
- Parameters:
- target
Target gene of interest
- tf
TF of interest (e.g. “Foxo1”)
- save_path
Path to save the figure to (will save as HTML file)
- ligand_targets
Set True if ligands were used as the target genes for the :func CCI_deg_detection() model.
- receptor_targets
Set True if receptors were used as the target genes for the :func CCI_deg_detection() model.
- target_gene_targets
Set True if target genes were used as the target genes for the :func CCI_deg_detection() model.
- pcutoff
Percentile cutoff for the colorbar. Will set all values above this percentile to this value.
- min_value
Minimum value to set the colorbar to. Will set all values below this value to this value.
- zero_opacity
Opacity of points with zero expression. Between 0.0 and 1.0. Default is 1.0.
- size
Size of the points in the scatter plot. Default is 2.
- visualize_overlap_between_interacting_components_3D(target: str, interaction: str, save_path: str, size: float = 2.0)[source]¶
Visualize the spatial distribution of signaling features (ligand, receptor, or L:R field) and target gene, as well as the overlapping region. Intended for use with 3D spatial coordinates.
- Parameters:
- target
Target gene to visualize
- interaction
Interaction to visualize (e.g. “Igf1:Igf1r” for L:R model, “Igf1” for ligand model)
- save_path
Path to save the figure to (will save as HTML file)
- size
Size of the points in the plot. Defaults to 2.
- gene_expression_heatmap(use_ligands: bool = False, use_receptors: bool = False, use_target_genes: bool = False, genes: List[str] | None = None, position_key: str = 'spatial', coord_column: int | str | None = None, reprocess: bool = False, neatly_arrange_y: bool = True, window_size: int = 3, recompute: bool = False, title: str | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'magma', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the distribution of gene expression across cells in the spatial coordinates of cells; provides an idea of the simultaneous relative positions/patternings of different genes.
- Parameters:
- use_ligands
Set True to use ligands as the genes to visualize. If True, will ignore “genes” argument. “ligands_expr” file must be present in the model’s directory.
- use_receptors
Set True to use receptors as the genes to visualize. If True, will ignore “genes” argument. “receptors_expr” file must be present in the model’s directory.
- use_target_genes
Set True to use target genes as the genes to visualize. If True, will ignore “genes” argument. “targets” file must be present in the model’s directory.
- genes
Optional list of genes to visualize. If “use_ligands”, “use_receptors”, and “use_target_genes” are all False, this must be given. This can also be used to visualize only a subset of the genes once processing & saving has already completed using e.g. “use_ligands”, “use_receptors”, etc.
- position_key
Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to “spatial”. For each value in the position array (each coordinate, each category), multiple cells must have the same value.
- coord_column
Optional, only used if “position_key” points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide “xy”, “yz”, “xz”, “-xy”, “-yz”, “-xz” to draw a line between the two coordinate axes. “xy” will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), “-xy” will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on.
- reprocess
Set to True to reprocess the data and overwrite the existing files. Use if the genes to visualize have changed compared to the saved file (if existing), e.g. if “use_ligands” is True when the initial analysis used “use_target_genes”.
- neatly_arrange_y
Set True to order the y-axis in terms of how early along the position axis the max z-scores for each row occur in. Used for a more uniform plot where similarly patterned interaction-target pairs are grouped together. If False, will sort this axis by the identity of the interaction (i.e. all “Fgf1” rows will be grouped together).
- window_size
Size of window to use for smoothing. Must be an odd integer. If 1, no smoothing is applied.
- recompute
Set to True to recompute the data and overwrite the existing files
- title
Optional, can be used to provide title for plot
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- cmap
Colormap to use. Options: Any divergent matplotlib colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- effect_distribution_heatmap(target_subset: List[str] | None = None, interaction_subset: List[str] | None = None, position_key: str = 'spatial', coord_column: int | str | None = None, effect_threshold: float | None = None, check_downstream_ligand_effects: bool = False, check_downstream_receptor_effects: bool = False, check_downstream_target_effects: bool = False, use_significant: bool = False, sort_by_target: bool = False, neatly_arrange_y: bool = True, window_size: int = 3, recompute: bool = False, title: str | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'magma', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the distribution of interaction effects across cells in the spatial coordinates of cells; provides an idea of the simultaneous relative positions of different interaction effects.
- Parameters:
- target_subset
List of targets to consider. If None, will use all targets used in model fitting.
- interaction_subset
List of interactions to consider. If None, will use all interactions used in model.
- position_key
Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to “spatial”. For each value in the position array (each coordinate, each category), multiple cells must have the same value.
- coord_column
Optional, only used if “position_key” points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide “xy”, “yz”, “xz”, “-xy”, “-yz”, “-xz” to draw a line between the two coordinate axes. “xy” will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), “-xy” will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on.
- effect_threshold
Optional threshold minimum effect size to consider an effect for further analysis, as an absolute value. Use this to choose only the cells for which an interaction is predicted to have a strong effect. If None, use the median interaction effect.
- check_downstream_ligand_effects
Set True to check the coefficients of downstream ligand models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are.
- check_downstream_receptor_effects
Set True to check the coefficients of downstream receptor models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are.
- check_downstream_target_effects
Set True to check the coefficients of downstream target models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are.
- use_significant
Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func compute_coeff_significance() has been run.
- sort_by_target
Set True to order the y-axis in terms of the identity of the target gene. Incompatible with “neatly_arrange_y”. If both this and “neatly_arrange_y” are False, will sort this axis by the identity of the interaction (i.e. all “Fgf1” rows will be grouped together).
- neatly_arrange_y
Set True to order the y-axis in terms of how early along the position axis the max z-scores for each row occur in. Used for a more uniform plot where similarly patterned interaction-target pairs are grouped together. If False, will sort this axis by the identity of the interaction (i.e. all “Fgf1” rows will be grouped together).
- window_size
Size of window to use for smoothing. Must be an odd integer. If 1, no smoothing is applied.
- recompute
Set to True to recompute the data and overwrite the existing files
- title
Optional, can be used to provide title for plot
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- cmap
Colormap to use. Options: Any divergent matplotlib colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- effect_distribution_density(effect_names: List[str], position_key: str = 'spatial', coord_column: int | str | None = None, max_coord_val: float = 1.0, title: str | None = None, x_label: str | None = None, region_lower_bound: float | None = None, region_upper_bound: float | None = None, region_label: str | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the spatial enrichment of cell-cell interaction effects using density plots over spatial coordinates. Uses existing dataframe saved by
effect_distribution_heatmap()
, which must be run first.- Parameters:
- effect_names
List of interaction effects to include in plot, in format “Target-Ligand:Receptor” (for L:R models) or “Target-Ligand” (for ligand models).
- position_key
Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to “spatial”. For each value in the position array (each coordinate, each category), multiple cells must have the same value.
- coord_column
Optional, only used if “position_key” points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide “xy”, “yz”, “xz”, “-xy”, “-yz”, “-xz” to draw a line between the two coordinate axes. “xy” will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), “-xy” will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on.
- max_coord_val
Optional, can be used to adjust the numbers displayed along the x-axis for the relative position along the coordinate axis. Defaults to 1.0.
- title
Optional, can be used to provide title for plot
- x_label
Optional, can be used to provide x-axis label for plot
- region_lower_bound
Optional, can be used to provide a lower bound for the region of interest to label on the plot- this can correspond to a spatial domain, etc.
- region_upper_bound
Optional, can be used to provide an upper bound for the region of interest to label on the plot- this can correspond to a spatial domain, etc.
- region_label
Optional, can be used to provide a label for the region of interest to label on the plot
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- cmap
Colormap to use. Options: Any divergent matplotlib colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- visualize_effect_specificity(agg_method: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['heatmap', 'volcano'] = 'heatmap', target_subset: List[str] | None = None, interaction_subset: List[str] | None = None, ct_subset: List[str] | None = None, group_key: str | None = None, n_anchors: int | None = None, effect_threshold: float | None = None, use_significant: bool = False, target_cooccurrence_threshold: float = 0.1, significance_cutoff: float = 1.3, fold_change_cutoff: float = 1.5, fold_change_cutoff_for_labels: float = 3.0, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'seismic', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {}, save_df: bool = False)[source]¶
Computes and visualizes the specificity of each interaction on each target. This is done by first separating the target-expressing cells (and their neighbors) from the rest of the cells (conditioned on predicted effect and also conditioned on receptor expression if L:R model is used). Then, computing the fold change of the average expression of the ligand in the neighborhood of the first subset vs. the neighborhoods of the second subset.
- Parameters:
- agg_method
Method to use for aggregating the specificity of each interaction on each target. Options: “mean” for mean ligand expression, “percentage” for the percentage of cells expressing the ligand.
- plot_type
Type of plot to use for visualization. Options: “heatmap” for heatmap, “volcano” for volcano plot.
- target_subset
List of targets to consider. If None, will use all targets used in model fitting.
- interaction_subset
List of interactions to consider. If None, will use all interactions used in model.
- ct_subset
Can be used to constrain the first group of cells (the query group) to the target-expressing cells of a particular type (conditioned on any other relevant variables). If given, will search for cell types in “group_key” attribute from model initialization. If not given, will use all cell types.
- group_key
Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr group_key from model initialization.
- n_anchors
Optional, number of target gene-expressing cells to use as anchors for analysis. Will be selected randomly from the set of target gene-expressing cells (conditioned on any other relevant values).
- effect_threshold
Optional threshold minimum effect size to consider an effect for further analysis, as an absolute value. Use this to choose only the cells for which an interaction is predicted to have a strong effect. If None, use the median interaction effect.
- use_significant
Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func compute_coeff_significance() has been run.
- significance_cutoff
Cutoff for negative log-10 q-value to consider an interaction/effect significant. Only used if “plot_type” is “volcano”. Defaults to 1.3 (corresponding to an approximate q-value of 0.05).
- fold_change_cutoff
Cutoff for fold change to consider an interaction/effect significant. Only used if “plot_type” is “volcano”. Defaults to 1.5.
- fold_change_cutoff_for_labels
Cutoff for fold change to include the label for an interaction/effect. Only used if “plot_type” is “volcano”. Defaults to 3.0.
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- cmap
Colormap to use. Options: Any divergent matplotlib colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- save_df
Set True to save the metric dataframe in the end
- visualize_neighborhood(target: str, interaction: str, interaction_type: Literal['secreted', 'membrane-bound'], select_examples_criterion: Literal['positive', 'negative'] = 'positive', effect_threshold: float | None = None, cell_type: str | None = None, group_key: str | None = None, use_significant: bool = False, n_anchors: int = 100, n_neighbors_expressing: int = 20, display_plot: bool = True) anndata.AnnData [source]¶
Sets up AnnData object for visualization of interaction effects- cells will be colored by expression of the target gene, potentially conditioned on receptor expression, and neighboring cells will be colored by ligand expression.
- Parameters:
- target
Target gene of interest
- interaction
Interaction feature to visualize, given in the same form as in the design matrix (if model is a ligand-based model or receptor-based model, this will be of form “Col4a1”. If model is a ligand-receptor based model, this will be of form “Col4a1:Itgb1”, for example).
- interaction_type
Specifies whether the chosen interaction is secreted or membrane-bound. Options: “secreted” or “membrane-bound”.
- select_examples_criterion
Whether to select cells with positive or negative interaction effects for visualization. Defaults to “positive”, which searches for cells for which the predicted interaction effect is above the given threshold. “Negative” will select cells for which the predicted interaction has no effect on the target expression.
- effect_threshold
Optional threshold for the effect size of an interaction/effect to be considered for analysis; only used if “to_plot” is “percentage”. If not given, will use the upper quartile value among all interaction effect values to determine the threshold.
- cell_type
Optional, can be used to select anchor cells from only a particular cell type. If None, will select from all cells.
- group_key
Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr group_key from model initialization. Only used if “cell_type” is not None.
- use_significant
Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func compute_coeff_significance() has been run.
- n_anchors
Number of target gene-expressing cells to use as anchors for visualization. Will be selected randomly from the set of target gene-expressing cells.
- n_neighbors_expressing
Filters the set of cells that can be selected as anchors based on the number of their neighbors that express the chosen ligand. Only used for models that incorporate ligand expression.
- display_plot
Whether to save a plot. If False, will return the AnnData object without doing anything else- this can then be visualized e.g. using spateo-viewer.
- Returns:
- Modified AnnData object containing the expression information for the target gene and neighboring
ligand expression.
- Return type:
adata
- cell_type_specific_interactions(to_plot: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['heatmap', 'barplot'] = 'heatmap', group_key: str | None = None, ct_subset: List[str] | None = None, target_subset: List[str] | None = None, interaction_subset: List[str] | None = None, lower_threshold: float = 0.3, upper_threshold: float = 1.0, effect_threshold: float | None = None, use_significant: bool = False, row_normalize: bool = False, col_normalize: bool = False, normalize_targets: bool = False, hierarchical_cluster_ct: bool = False, group_y_cell_type: bool = False, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, center: float | None = None, cmap: str = 'Reds', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {}, save_df: bool = False)[source]¶
Map interactions and interaction effects that are specific to particular cell type groupings. Returns a heatmap representing the enrichment of the interaction/effect within cells of that grouping (if “to_plot” is effect, this will be enrichment of the effect on cell type-specific expression). Enrichment determined by mean effect size or expression.
- Parameters:
- to_plot
Whether to plot the mean effect size or the proportion of cells in a cell type w/ effect on target. Options are “mean” or “percentage”.
- plot_type
Whether to plot the results as a heatmap or barplot. Options are “heatmap” or “barplot”. If “barplot”, must provide a subset of up to four interactions to visualize.
- group_key
Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr group_key from model initialization.
- ct_subset
Can be used to restrict the enrichment analysis to only cells of a particular type. If given, will search for cell types in “group_key” attribute from model initialization. Recommended to use to subset to cell types with sufficient numbers.
- target_subset
List of targets to consider. If None, will use all targets used in model fitting.
- interaction_subset
List of interactions to consider. If None, will use all interactions used in model. Is necessary if “plot_type” is “barplot”, since the barplot is only designed to accomodate up to three interactions at once.
- lower_threshold
Lower threshold for the proportion of cells in a cell type group that must express a particular interaction/effect for it to be colored on the plot, as a proportion of the max value. Threshold will be applied to the non-normalized values (if normalization is applicable). Defaults to 0.3.
- upper_threshold
Upper threshold for the proportion of cells in a cell type group that must express a particular interaction/effect for it to be colored on the plot, as a proportion of the max value. Threshold will be applied to the non-normalized values (if normalization is applicable). Defaults to 1.0 (the max value).
- effect_threshold
Optional threshold for the effect size of an interaction/effect to be considered for analysis; only used if “to_plot” is “percentage”. If not given, will use the upper quartile value among all interaction effect values to determine the threshold.
- use_significant
Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func compute_coeff_significance() has been run.
- row_normalize
Whether to minmax scale the metric values by row (i.e. for each interaction/effect). Helps to alleviate visual differences that result from scale rather than differences in mean value across cell types.
- col_normalize
Whether to minmax scale the metric values by column (i.e. for each interaction/effect). Helps to alleviate visual differences that result from scale rather than differences in mean value across cell types.
- normalize_targets
Whether to minmax scale the metric values by column for each target (i.e. for each interaction/effect), to remove differences that occur as a result of scale of expression. Provides a clearer picture of enrichment for each target.
- hierarchical_cluster_ct
Whether to cluster the x-axis (target gene in cell type) using hierarchical clustering. If False, will order the x-axis by the order of the target genes for organization purposes.
- group_y_cell_type
Whether to group the y-axis (target gene in cell type) by cell type. If False, will group by target gene instead. Defaults to False.
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- center
Optional, determines position of the colormap center. Between 0 and 1.
- cmap
Colormap to use for heatmap. If metric is “number”, “proportion”, “specificity”, the bottom end of the range is 0. It is recommended to use a sequential colormap (e.g. “Reds”, “Blues”, “Viridis”, etc.). For metric = “fc”, if a divergent colormap is not provided, “seismic” will automatically be used.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- save_df
Set True to save the metric dataframe in the end
- cell_type_interaction_fold_change(ref_ct: str, query_ct: str, group_key: str | None = None, target_subset: List[str] | None = None, interaction_subset: List[str] | None = None, to_plot: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['volcano', 'barplot'] = 'barplot', source_data: Literal['interaction', 'effect', 'target'] = 'effect', top_n_to_plot: int | None = None, significance_cutoff: float = 1.3, fold_change_cutoff: float = 1.5, fold_change_cutoff_for_labels: float = 3.0, plot_query_over_ref: bool = False, plot_ref_over_query: bool = False, plot_only_significant: bool = False, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'seismic', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {}, save_df: bool = False)[source]¶
Computes fold change in predicted interaction effects between two cell types, and visualizes result.
- Parameters:
- ref_ct
Label of the first cell type to consider. Fold change will be computed with respect to the level in this cell type.
- query_ct
Label of the second cell type to consider
- group_key
Name of the key in .obs containing cell type information. If not given, will use :attr group_key from model initialization.
- target_subset
List of targets to consider. If None, will use all targets used in model fitting.
- interaction_subset
List of interactions to consider. If None, will use all interactions used in model.
- to_plot
Whether to plot the mean effect size or the proportion of cells in a cell type w/ effect on target. Options are “mean” or “percentage”.
- plot_type
Whether to plot the results as a volcano plot or barplot. Options are “volcano” or “barplot”.
- source_data
Selects what to use in computing fold changes. Options: - “interaction”: will use the design matrix (e.g. neighboring ligand expression or L:R mapping) - “effect”: will use the coefficient arrays for each target - “target”: will use the target gene expression
- top_n_to_plot
If given, will only include the top n features in the visualization. Recommended if “source_data” is “effect”, as all combinations of interaction and target will be considered in this case.
- significance_cutoff
Cutoff for negative log-10 q-value to consider an interaction/effect significant. Only used if “plot_type” is “volcano”. Defaults to 1.3 (corresponding to an approximate q-value of 0.05).
- fold_change_cutoff
Cutoff for fold change to consider an interaction/effect significant. Only used if “plot_type” is “volcano”. Defaults to 1.5.
- fold_change_cutoff_for_labels
Cutoff for fold change to include the label for an interaction/effect. Only used if “plot_type” is “volcano”. Defaults to 3.0.
- plot_query_over_ref
Whether to plot/visualize only the portion that corresponds to the fold change of the query cell type over the reference cell type (and the portion that is significant). If False (and “plot_ref_over_query” is False), will plot the entire volcano plot. Only used if “plot_type” is “volcano”.
- plot_ref_over_query
Whether to plot/visualize only the portion that corresponds to the fold change of the reference cell type over the query cell type (and the portion that is significant). If False (and “plot_query_over_ref” is False), will plot the entire volcano plot. Only used if “plot_type” is “volcano”.
- plot_only_significant
Whether to plot/visualize only the portion that passes the “significance_cutoff” p-value threshold. Only used if “plot_type” is “volcano”.
- fontsize
Size of font for x and y labels.
- figsize
Size of figure.
- cmap
Colormap to use for heatmap. If metric is “number”, “proportion”, “specificity”, the bottom end of the range is 0. It is recommended to use a sequential colormap (e.g. “Reds”, “Blues”, “Viridis”, etc.). For metric = “fc”, if a divergent colormap is not provided, “seismic” will automatically be used.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- save_df
Set True to save the metric dataframe in the end
- enriched_interactions_barplot(interactions: str | List[str] | None = None, targets: str | List[str] | None = None, plot_type: Literal['average', 'proportion'] = 'average', effect_size_threshold: float = 0.0, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'Reds', top_n: int | None = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the top predicted effect sizes for each interaction on particular target gene(s).
- Parameters:
- interactions
Optional subset of interactions to focus on, given in the form ligand(s):receptor(s), following the formatting in the design matrix. If not given, will consider all interactions that were specified in model fitting.
- targets
Can optionally specify a subset of the targets to compute this on. If not given, will use all targets that were specified in model fitting. If multiple targets are given, “save_show_or_return” should be “save” (and provide appropriate keyword arguments for saving using “save_kwargs”), otherwise only the last target will be shown.
- plot_type
Options: “average” or “proportion”. Whether to plot the average effect size or the proportion of cells expressing the target predicted to be affected by the interaction.
- effect_size_threshold
Lower bound for average effect size to include a particular interaction in the barplot
- fontsize
Size of font for x and y labels
- figsize
Size of figure
- cmap
Colormap to use for barplot. It is recommended to use a sequential colormap (e.g. “Reds”, “Blues”, “Viridis”, etc.).
- top_n
If given, will only include the top n features in the visualization. If not given, will include all features that pass the “effect_size_threshold”.
- save_show_or_return
Whether to save, show or return the figure If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- summarize_interaction_effects(interactions: str | List[str] | None = None, targets: str | List[str] | None = None, effect_size_threshold: float = 0.0)[source]¶
Summarize the interaction effects for each target gene in dataframe format. Each element will be the average effect size for a particular interaction on a particular target gene.
- Parameters:
- interactions
Optional subset of interactions to focus on. If not given, will consider all interactions.
- targets
Can optionally specify a subset of the targets. If not given, will use all targets.
- effect_size_threshold
Lower bound for average effect size to include a particular interaction.
- Returns:
- Dataframe with the average effect size for each interaction (rows) on each target gene (
columns).
- Return type:
effects_df
- enriched_tfs_barplot(tfs: str | List[str] | None = None, targets: str | List[str] | None = None, target_type: Literal['ligand', 'receptor', 'target_gene'] = 'target_gene', plot_type: Literal['average', 'proportion'] = 'average', effect_size_threshold: float = 0.0, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'Reds', top_n: int | None = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the top predicted effect sizes for each transcription factor on particular target gene(s).
- Parameters:
- tfs
Optional subset of transcription factors to focus on. If not given, will consider all transcription factors that were specified in model fitting.
- targets
Can optionally specify a subset of the targets to compute this on. If not given, will use all targets that were specified in model fitting. If multiple targets are given, “save_show_or_return” should be “save” (and provide appropriate keyword arguments for saving using “save_kwargs”), otherwise only the last target will be shown.
- target_type
Set whether the given targets are ligands, receptors or target genes. Used to determine which folder to check for outputs.
- plot_type
Options: “average” or “proportion”. Whether to plot the average effect size or the proportion of cells expressing the target predicted to be affected by the interaction.
- effect_size_threshold
Lower bound for average effect size to include a particular interaction in the barplot
- fontsize
Size of font for x and y labels
- figsize
Size of figure
- cmap
Colormap to use for barplot. It is recommended to use a sequential colormap (e.g. “Reds”, “Blues”, “Viridis”, etc.).
- top_n
If given, will only include the top n features in the visualization. If not given, will include all features that pass the “effect_size_threshold”.
- save_show_or_return
Whether to save, show or return the figure If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- summarize_tf_effects(tfs: str | List[str] | None = None, targets: str | List[str] | None = None, target_type: Literal['ligand', 'receptor', 'target_gene'] = 'target_gene', effect_size_threshold: float = 0.0)[source]¶
Return a DataFrame with effect sizes for each transcription factor (TF) against given targets.
- Parameters:
- tfs
Optional subset of TFs to focus on. If not given, considers all TFs.
- targets
Subset of targets. If not given, uses all targets.
- target_type
Whether the targets are ligands, receptors, or target genes.
- effect_size_threshold
Lower bound for including an effect size.
- Returns:
DataFrame with the average effect size for each TF (rows) on each target gene (columns).
- Return type:
effects_df
- get_effect_potential(target: str | None = None, ligand: str | None = None, receptor: str | None = None, sender_cell_type: str | None = None, receiver_cell_type: str | None = None, spatial_weights_membrane_bound: numpy.ndarray | scipy.sparse.spmatrix | None = None, spatial_weights_secreted: numpy.ndarray | scipy.sparse.spmatrix | None = None, spatial_weights_niche: numpy.ndarray | scipy.sparse.spmatrix | None = None, store_summed_potential: bool = True) Tuple[scipy.sparse.spmatrix, numpy.ndarray, numpy.ndarray] [source]¶
For each cell, computes the ‘signaling effect potential’, interpreted as a quantification of the strength of effect of intercellular communication on downstream expression in a given cell mediated by any given other cell with any combination of ligands and/or cognate receptors, as inferred from the model results. Computations are similar to those of :func ~`.inferred_effect_direction`, but stops short of computing vector fields.
- Parameters:
- target
Optional string to select target from among the genes used to fit the model to compute signaling effects for. Note that this function takes only one target at a time. If not given, will take the first name from among all targets.
- ligand
Needed if :attr mod_type is ‘ligand’; select ligand from among the ligands used to fit the model to compute signaling potential.
- receptor
Needed if :attr mod_type is ‘lr’; together with ‘ligand’, used to select ligand-receptor pair from among the ligand-receptor pairs used to fit the model to compute signaling potential.
- sender_cell_type
Can optionally be used to select cell type from among the cell types used to fit the model to compute sent potential. Must be given if :attr mod_type is ‘niche’.
- receiver_cell_type
Can optionally be used to condition sent potential on receiver cell type.
- store_summed_potential
If True, will store both sent and received signaling potential as entries in .obs of the AnnData object.
- Returns:
- Sparse array of shape [n_samples, n_samples]; proxy for the “signaling effect potential”
with respect to a particular target gene between each sender-receiver pair of cells.
- normalized_effect_potential_sum_sender: Array of shape [n_samples,]; for each sending cell, the sum of the
signaling potential to all receiver cells for a given target gene, normalized between 0 and 1.
- normalized_effect_potential_sum_receiver: Array of shape [n_samples,]; for each receiving cell, the sum of
the signaling potential from all sender cells for a given target gene, normalized between 0 and 1.
- Return type:
effect_potential
- get_pathway_potential(pathway: str | None = None, target: str | None = None, spatial_weights_secreted: numpy.ndarray | scipy.sparse.spmatrix | None = None, spatial_weights_membrane_bound: numpy.ndarray | scipy.sparse.spmatrix | None = None, store_summed_potential: bool = True)[source]¶
For each cell, computes the ‘pathway effect potential’, which is an aggregation of the effect potentials of all pathway member ligand-receptor pairs (or all pathway member ligands, for ligand-only models).
- Parameters:
- pathway
Name of pathway to compute pathway effect potential for.
- target
Optional string to select target from among the genes used to fit the model to compute signaling effects for. Note that this function takes only one target at a time. If not given, will take the first name from among all targets.
- spatial_weights_secreted
Optional pairwise spatial weights matrix for secreted factors
- spatial_weights_membrane_bound
Optional pairwise spatial weights matrix for membrane-bound factors
- store_summed_potential
If True, will store both sent and received signaling potential as entries in .obs of the AnnData object.
- Returns:
- Array of shape [n_samples, n_samples]; proxy for the combined “signaling effect
potential” with respect to a particular target gene for ligand-receptor pairs in a pathway.
- normalized_pathway_effect_potential_sum_sender: Array of shape [n_samples,]; for each sending cell,
the sum of the pathway sum potential to all receiver cells for a given target gene, normalized between 0 and 1.
- normalized_pathway_effect_potential_sum_receiver: Array of shape [n_samples,]; for each receiving cell,
the sum of the pathway sum potential from all sender cells for a given target gene, normalized between 0 and 1.
- Return type:
pathway_sum_potential
- inferred_effect_direction(targets: str | List[str] | None = None, compute_pathway_effect: bool = False)[source]¶
For visualization purposes, used for models that consider ligand expression (:attr mod_type is ‘ligand’ or ‘lr’ (for receptor models, assigning directionality is impossible and for niche models, it makes much less sense to draw/compute a vector field). Construct spatial vector fields to infer the directionality of observed effects (the “sources” of the downstream expression).
Parts of this function are inspired by ‘communication_direction’ from COMMOT: https://github.com/zcang/COMMOT
- Parameters:
- targets
Optional string or list of strings to select targets from among the genes used to fit the model to compute signaling effects for. If not given, will use all targets.
- compute_pathway_effect
Whether to compute the effect potential for each pathway in the model. If True, will collectively take the effect potential of all pathway components. If False, will compute effect potential for each for each individual signal.
- define_effect_vf(effect_potential: scipy.sparse.spmatrix, normalized_effect_potential_sum_sender: numpy.ndarray, normalized_effect_potential_sum_receiver: numpy.ndarray, sig: str, target: str, max_val: float = 0.05)[source]¶
Given the pairwise effect potential array, computes the effect vector field.
- Parameters:
- effect_potential
Sparse array containing computed effect potentials- output from
get_effect_potential()
- normalized_effect_potential_sum_sender
Array containing the sum of the effect potentials sent by each cell. Output from
get_effect_potential()
.- normalized_effect_potential_sum_receiver
Array containing the sum of the effect potentials received by each cell. Output from
get_effect_potential()
.- max_val
Constrains the size of the vector field vectors. Recommended to set within the order of magnitude of 1/100 of the desired plot dimensions.
- sig
Label for the mediating interaction (e.g. name of a ligand, name of a ligand-receptor pair, etc.)
- target
Name of the target that the vector field describes the effect for
- visualize_effect_vf_3D(interaction: str, target: str, vf_key: str | None = None, vector_magnitude_lower_bound: float = 0.0, manual_vector_scale_factor: float | None = None, bin_size: float | Tuple[float] | None = None, plot_cells: bool = True, cell_size: float = 1.0, alpha: float = 0.3, no_color_coding: bool = False, only_view_effect_region: bool = False, add_group_label: str | None = None, group_label_obs_key: str | None = None, title_position: Tuple[float, float] = (0.5, 0.9), save_path: str | None = None, **kwargs)[source]¶
Visualize the directionality of the effect on target for a given interaction, overlaid onto the 3D spatial plot. Can only be used for models that use ligand expression (:attr mod_type is ‘ligand’ or ‘lr’).
- Parameters:
- interaction
Interaction to incorporate into the visualization (e.g. “Igf1:Igf1r” for L:R model, “Igf1” for ligand model)
- target
Name of the target gene of interest. Will search key “spatial_effect_sender_vf_{interaction}_{ target}” to create vector field plot.
- vf_key
Optional key in .obsm to specify which vector field to use. If not given, will use the provided “interaction” and “target” to find the key specifying the vector field.
- vector_magnitude_lower_bound
Lower bound for the magnitude of the vector field vectors to be plotted, as a fraction of the maximum vector magnitude. Defaults to 0.0.
- manual_vector_scale_factor
If not None, will manually scale the vector field by this factor ( multiplicatively). Used for visualization purposes, not recommended to set above 2.0 (otherwise likely to get misleading results with vectors that are too long).
- bin_size
Optional, can be used to de-clutter plotting space by splitting the space into 3D bins and displaying one vector per bin. Can be given as a floating point number to create cubic bins, or as a tuple of floats to specify different bin sizes for each dimension. If not given, will plot one vector per cell. Defaults to None.
- plot_cells
If False, will not plot any of the cells (unless a group label is given), so will only visualize vector field. Defaults to True.
- cell_size
Size of the cells in the 3D plot. Defaults to 1.0.
- alpha
If visualizing cells not affected by the interaction, this argument specifies the transparency of those cells.
- no_color_coding
If True, will color all cells the same color (except cells of given category, if given).
- only_view_effect_region
If True, will only plot the region where the effect is predicted to be found, rather than the entire 3D object
- add_group_label
This optional argument represents a cell type category. Will color the cells belonging to this particular category orange. If given, it is recommended to also provide group_label_obs_key (which will be :attr group_key if not given).
- group_label_obs_key
If add_group_label is given, this argument represents the observation key in the AnnData object that contains the group label. If not given, will default to :attr group_key.
- title_position
Position of the title in the plot, given as a tuple of floats (i.e. (x, y)). Defaults to (0.5, 0.9).
- save_path
Path to save the figure to (will save as HTML file)
- kwargs
Additional arguments that can be passed to :func plotly.graph_objects.Cone. Common arguments: - “colorscale”: Sets the colorscale. The colorscale must be an array containing arrays mapping a
normalized value to an rgb, rgba, hex, hsl, hsv, or named color string.
- ”sizemode”: Determines whether sizeref is set as a “scaled” (i.e unitless) scalar (normalized by the
max u/v/w norm in the vector field) or as “absolute” value (in the same units as the vector field). Defaults to “scaled”.
- ”sizeref”: The scalar reference for the cone size. The cone size is determined by its u/v/w norm
multiplied by sizeref. Defaults to 2.0.
”showscale”: Determines whether or not a colorbar is displayed for this trace.
- CCI_deg_detection_setup(group_key: str | None = None, custom_tfs: List[str] | None = None, sender_receiver_or_target_degs: Literal['sender', 'receiver', 'target'] = 'sender', use_ligands: bool = True, use_receptors: bool = False, use_pathways: bool = False, use_targets: bool = False, use_cell_types: bool = False, compute_dim_reduction: bool = False)[source]¶
Computes differential expression signatures of cells with various levels of ligand expression.
- Parameters:
- group_key
Key to add to .obs of the AnnData object created by this function, containing cell type labels for each cell. If not given, will use :attr group_key.
- custom_tfs
Optional list of transcription factors to make sure to be included in analysis. If given, these TFs will be included among the regulators regardless of the expression-based thresholding done in preprocessing.
- sender_receiver_or_target_degs
Only makes a difference if ‘use_pathways’ or ‘use_cell_types’ is specified. Determines whether to compute DEGs for ligands, receptors or target genes. If ‘use_pathways’ is True, the value of this argument will determine whether ligands or receptors are used to define the model. Note that in either case, differential expression of TFs, binding factors, etc. will be computed in association w/ ligands/receptors/target genes (only valid if ‘use_cell_types’ and not ‘use_pathways’ is specified.
- use_ligands
Use ligand array for differential expression analysis. Will take precedent over sender/receiver cell type if also provided.
- use_receptors
Use receptor array for differential expression analysis. Will take precedent over sender/receiver cell type if also provided.
- use_pathways
Use pathway array for differential expression analysis. Will use ligands in these pathways to collectively compute signaling potential score. Will take precedent over sender cell types if also provided.
- use_targets
Use target array for differential expression analysis.
- use_cell_types
Use cell types to use for differential expression analysis. If given, will preprocess/construct the necessary components to initialize cell type-specific models. Note- should be used alongside ‘use_ligands’, ‘use_receptors’, ‘use_pathways’ or ‘use_targets’ to select which molecules to investigate in each cell type.
- compute_dim_reduction
Whether to compute PCA representation of the data subsetted to targets.
- CCI_deg_detection(group_key: str, cci_dir_path: str, sender_receiver_or_target_degs: Literal['sender', 'receiver', 'target'] = 'sender', use_ligands: bool = True, use_receptors: bool = False, use_pathways: bool = False, use_targets: bool = False, ligand_subset: List[str] | None = None, receptor_subset: List[str] | None = None, target_subset: List[str] | None = None, cell_type: str | None = None, use_dim_reduction: bool = False, **kwargs)[source]¶
Downstream method that when called, creates a separate instance of :class MuSIC specifically designed for the downstream task of detecting differentially expressed genes associated w/ ligand expression.
- Parameters:
- group_key
Key in adata.obs that corresponds to the cell type (or other grouping) labels
- cci_dir_path
Path to directory containing all Spateo databases
- sender_receiver_or_target_degs
Only makes a difference if ‘use_pathways’ or ‘use_cell_types’ is specified. Determines whether to compute DEGs for ligands, receptors or target genes. If ‘use_pathways’ is True, the value of this argument will determine whether ligands or receptors are used to define the model. Note that in either case, differential expression of TFs, binding factors, etc. will be computed in association w/ ligands/receptors/target genes (only valid if ‘use_cell_types’ and not ‘use_pathways’ is specified.
- use_ligands
Use ligand array for differential expression analysis. Will take precedent over receptors and sender/receiver cell types if also provided. Should match the input to :func CCI_sender_deg_detection_setup.
- use_receptors
Use receptor array for differential expression analysis.
- use_pathways
Use pathway array for differential expression analysis. Will use ligands in these pathways to collectively compute signaling potential score. Will take precedent over sender cell types if also provided. Should match the input to :func CCI_sender_deg_detection_setup.
- use_targets
Use target genes array for differential expression analysis.
- ligand_subset
Subset of ligands to use for differential expression analysis. If not given, will use all ligands from the upstream model.
- receptor_subset
Subset of receptors to use for differential expression analysis. If not given, will use all receptors from the upstream model.
- target_subset
Subset of target genes to use for differential expression analysis. If not given, will use all target genes from the upstream model.
- cell_type
Cell type to use to use for differential expression analysis. If given, will use the ligand/receptor subset obtained from :func ~`CCI_deg_detection_setup` and cells of the chosen cell type in the model.
- use_dim_reduction
Whether to use PCA representation of the data to find nearest neighbors. If False, will instead use the Jaccard distance. Defaults to False. Note that this will ultimately fail if dimensionality reduction was not performed in :func ~`CCI_deg_detection_setup`.
- kwargs
Keyword arguments for any of the Spateo argparse arguments. Should not include ‘adata_path’, ‘custom_lig_path’ & ‘ligand’ or ‘custom_pathways_path’ & ‘pathway’ (depending on whether ligands or pathways are being used for the analysis), and should not include ‘output_path’ (which will be determined by the output path used for the main model). Should also not include any of the other arguments for this function
- Returns:
Fitted model instance that can be used for further downstream applications
- Return type:
downstream_model
- deg_effect_barplot(target: str, interaction_subset: List[str] | None = None, top_n_interactions: int | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'Blues', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the proportion of cells expressing a particular target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by each transcription factor, or that are predicted to be affected by each L:R pair/ligand.
- Parameters:
- target
Target gene
- interaction_subset
Optional, can be used to specify subset of interactions (transcription factors, L:R pairs, etc.) to visualize, e.g. [“Sox2”, “Irx3”]. If not given, will default to all TFs, L:R pairs, etc.
- top_n_interactions
Optional, can be used to specify the top n interactions (transcription factors, L:R pair, ligand, etc.) to visualize. If not given, will default to all TFs, L:R pairs, etc.
- fontsize
Font size to determine size of the axis labels, ticks, title, etc.
- figsize
Width and height of plotting window
- cmap
Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- deg_effect_heatmap(target_subset: List[str] | None = None, target_type: Literal['ligand', 'receptor', 'target_gene', 'tf_target'] = 'target_gene', to_plot: Literal['proportion', 'specificity'] = 'proportion', interaction_subset: List[str] | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'magma', lower_proportion_threshold: float = 0.1, order_interactions: bool = False, order_targets: bool = False, remove_rows_and_cols_threshold: int | None = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {}, save_df: bool = False)[source]¶
Visualize the proportion of cells expressing any target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by each transcription factor, or that are predicted to be affected by each L:R pair/ligand, using a heatmap for visualization.
- Parameters:
- target_subset
Optional, can be used to specify subset of targets (ligands, receptors, target genes, or “TF_target” for target genes where the interaction to plot is TF effect) to visualize, e.g. [“Tubb1a”, “Tubb1b”]. If not given, will default to all targets.
- target_type
Type of target gene to visualize. Must be one of “ligand”, “receptor”, or “target_gene”. Defaults to “target_gene”. Used to specify where to search for the target genes to process.
- to_plot
Two options, “proportion” or “specificity”: for proportion, plot the proportion of cells expressing the target that are affected by each interaction. For specificity, take the proportion of cells affected by each interaction for which the interaction is predicted to affect a specific target.
- interaction_subset
Optional, can be used to specify subset of interactions (transcription factors, L:R pairs, etc.) to visualize, e.g. [“Sox2”, “Irx3”]. If not given, will default to all TFs, L:R pairs, etc.
- fontsize
Font size to determine size of the axis labels, ticks, title, etc.
- figsize
Width and height of plotting window
- cmap
Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap.
- lower_proportion_threshold
Proportion threshold below which to set the proportion to 0 in the display. Defaults to 0.1.
- order_interactions
Whether to hierarchically sort the y-axis/interactions (transcription factors, L:R pairs, etc.).
- order_targets
Whether to hierarchically sort the x-axis/targets (ligands, receptors, target genes)
- remove_rows_and_cols_threshold
Optional, can be used to specify the threshold for the number of nonzero interactions/TFs a row/column needs to be displayed. If not given, all rows and columns will be displayed.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- save_df
Set True to save the metric dataframe in the end
- top_target_barplot(interaction: str, target_subset: List[str] | None = None, use_ligand_targets: bool = False, use_receptor_targets: bool = False, use_target_gene_targets: bool = True, use_target_gene_tf_targets: bool = False, top_n_targets: int | None = None, fontsize: None | int = None, figsize: None | Tuple[float, float] = None, cmap: str = 'Blues', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: dict | None = {})[source]¶
Visualize the proportion of cells expressing each target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by a given interaction, i.e. transcription factor, L:R pair/ligand.
- Parameters:
- interaction
The interaction to investigate, in the form specified in the design matrix, e.g. “Sox9” or “Igf1:Igf1r”.
- target_subset
Optional, specify subset of target genes to visualize. If not given, defaults to all targets.
- use_ligand_targets
Whether ligands should be used as targets, i.e. if “interaction” is a TF and the target genes being influenced by the TF are ligands. If True, will ignore “use_receptor_targets” and “use_target_gene_targets”.
- use_receptor_targets
Whether receptors should be used as targets, i.e. if “interaction” is a TF and the target genes being influenced by the TF are receptors. If True, will ignore “use_target_gene_targets”.
- use_target_gene_targets
Whether target genes should be used as targets, i.e. if “interaction” is an L:R interaction
- use_target_gene_tf_targets
Whether target genes should be used as targets, i.e. if “interaction” is a TF and the target genes being influenced by the TF are target genes (that are not ligands or receptors).
- top_n_targets
Number of top targets to visualize. Defaults to 10.
- fontsize
Font size to determine size of the axis labels, ticks, title, etc.
- figsize
Width and height of plotting window
- cmap
Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap.
- save_show_or_return
Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.
- save_kwargs
A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.
- permutation_test(gene: str, n_permutations: int = 100, permute_nonzeros_only: bool = False, **kwargs)[source]¶
Sets up permutation test for determination of statistical significance of model diagnostics. Can be used to identify true/the strongest signal-responsive expression patterns.
- Parameters:
- gene
Target gene to perform permutation test on.
- n_permutations
Number of permutations of the gene expression to perform. Default is 100.
- permute_nonzeros_only
Whether to only perform the permutation over the gene-expressing cells
- kwargs
Keyword arguments for any of the Spateo argparse arguments. Should not include ‘adata_path’, ‘target_path’, or ‘output_path’ (which will be determined by the output path used for the main model). Also should not include ‘custom_lig_path’, ‘custom_rec_path’, ‘mod_type’, ‘bw_fixed’ or ‘kernel’ (which will be determined by the initial model instantiation).
- eval_permutation_test(gene: str)[source]¶
Evaluation function for permutation tests. Will compute multiple metrics (correlation coefficients, F1 scores, AUROC in the case that all cells were permuted, etc.) to compare true and model-predicted gene expression vectors.
- Parameters:
- gene
Target gene for which to evaluate permutation test
- class spateo.tools.CCI_effects_modeling.MuSIC_Molecule_Selector(parser: argparse.ArgumentParser, args_list: List[str] | None = None)[source]¶
Bases:
spateo.tools.CCI_effects_modeling.MuSIC.MuSIC
Various methods to select initial targets or predictors for intercellular analyses.
- Parameters:
- parser
ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling.
- mod_type¶
The type of model that will be employed for eventual downstream modeling. Will dictate how predictors will be found (if applicable). Options:
“niche”: Spatially-aware, uses categorical cell type labels as independent variables.
- “lr”: Spatially-aware, essentially uses the combination of receptor expression in the “target” cell
and spatially lagged ligand expression in the neighboring cells as independent variables.
- “ligand”: Spatially-aware, essentially uses ligand expression in the neighboring cells as
independent variables.
“receptor”: Uses receptor expression in the “target” cell as independent variables.
- distr¶
Distribution family for the dependent variable; one of “gaussian”, “poisson”, “nb”
- adata_path¶
Path to the AnnData object from which to extract data for modeling
- normalize¶
Set True to Perform library size normalization, to set total counts in each cell to the same number (adjust for cell size).
- smooth¶
Set True to correct for dropout effects by leveraging gene expression neighborhoods to smooth expression.
- log_transform¶
Set True if log-transformation should be applied to expression.
- target_expr_threshold¶
When selecting targets, expression above a threshold percentage of cells will be used to filter to a smaller subset of interesting genes. Defaults to 0.1.
- r_squared_threshold¶
When selecting targets, only genes with an R^2 above this threshold will be used as targets
- custom_lig_path¶
Optional path to a .txt file containing a list of ligands for the model, separated by newlines. If provided, will find targets for which this set of ligands collectively explains the most variance for (on a gene-by-gene basis) when taking neighborhood expression into account
- custom_ligands¶
Optional list of ligands for the model, can be used as an alternative to :attr custom_lig_path. If provided, will find targets for which this set of ligands collectively explains the most variance for (on a gene-by-gene basis) when taking neighborhood expression into account
- custom_rec_path¶
Optional path to a .txt file containing a list of receptors for the model, separated by newlines. If provided, will find targets for which this set of receptors collectively explains the most variance for
- custom_receptors¶
Optional list of receptors for the model, can be used as an alternative to :attr custom_rec_path. If provided, will find targets for which this set of receptors collectively explains the most variance for
- custom_pathways_path¶
Rather than providing a list of receptors, can provide a list of signaling pathways- all receptors with annotations in this pathway will be included in the model. If provided, will find targets for which receptors in these pathways collectively explain the most variance for
- custom_pathways¶
Optional list of signaling pathways for the model, can be used as an alternative to :attr custom_pathways_path. If provided, will find targets for which receptors in these pathways collectively explain the most variance for
- targets_path¶
Optional path to a .txt file containing a list of prediction target genes for the model, separated by newlines. If not provided, targets will be strategically selected from the given receptors.
- custom_targets¶
Optional list of prediction target genes for the model, can be used as an alternative to :attr targets_path.
- cci_dir¶
Full path to the directory containing cell-cell communication databases
- species¶
Selects the cell-cell communication database the relevant ligands will be drawn from. Options: “human”, “mouse”.
- output_path¶
Full path name for the .csv file in which results will be saved
- group_key¶
Key in .obs of the AnnData object that contains the cell type labels, used if targeting molecules that have cell type-specific activity
- coords_key¶
Key in .obsm of the AnnData object that contains the coordinates of the cells
- n_neighbors¶
Number of nearest neighbors to use in the case that ligands are provided or in the case that ligands of interest should be found
- find_targets(save_id: str | None = None, bw_membrane_bound: float | int = 8, bw_secreted: float | int = 25, kernel: Literal['bisquare', 'exponential', 'gaussian', 'quadratic', 'triangular', 'uniform'] = 'bisquare', **kwargs)[source]¶
- Find genes that may serve as interesting targets by computing the IoU with receptor signal. Will find
genes that are highly coexpressed with receptors or ligand:receptor signals.
- Parameters:
- save_id
Optional string to append to the end of the saved file name. Will save signaling molecule names as “ligand_{save_id}.txt”, etc.
- bw_membrane_bound
Bandwidth used to compute spatial weights for membrane-bound ligands. If integer, will convert to appropriate distance bandwidth.
- bw_secreted
Bandwidth used to compute spatial weights for secreted ligands. If integer, will convert to appropriate distance bandwidth.
- kernel
Type of kernel function used to weight observations when computing spatial weights; one of “bisquare”, “exponential”, “gaussian”, “quadratic”, “triangular” or “uniform”.
- kwargs
Keyword arguments for any of the Spateo argparse arguments. Should not include ‘output_path’ ( which will be determined by the output path used for the main model). Should also not include any of ‘ligands’ or ‘receptors’, which will be determined by this function.
- spateo.tools.CCI_effects_modeling.define_spateo_argparse(**kwargs)[source]¶
Defines and returns MPI and argparse objects for model fitting and interpretation.
- Parameters:
- kwargs
Keyword arguments for any of the argparse arguments defined below.
- Parser arguments:
run_upstream: Flag to run the upstream target selection step. If True, will run the target selection step adata_path: Path to AnnData object containing gene expression data. This or ‘csv_path’ must be given to run. csv_path: Path to .csv file containing gene expression data. This or ‘adata_path’ must be given to run. n_spatial_dim_csv: Number of spatial dimensions to the data provided to ‘csv_path’. Defaults to 2. spatial_subsample: Flag to subsample the data- at a big picture level, this will be done by dividing the tissue
into regions and subsampling from each of these regions. Recommended for large datasets (>5000 samples).
- multiscale: Flag to create multiscale models. Currently, it is recommended to only create multiscale models
for Gaussian data.
- multiscale_params_only: Flag to return additional metrics along with the coefficients for multiscale models (
specifying this argument sets Flag to True)
- mod_type: The type of model that will be employed- this dictates how the data will be processed and
- prepared. Options:
“niche”: Spatially-aware, uses categorical cell type labels as independent variables.
- “lr”: Spatially-aware, essentially uses the combination of receptor expression in the “target” cell
and spatially lagged ligand expression in the neighboring cells as independent variables.
- “ligand”: Spatially-aware, essentially uses ligand expression in the neighboring cells as
independent variables.
“receptor”: Uses receptor expression in the “target” cell as independent variables.
- “downstream”: For the purposes of downstream analysis, used to model ligand expression as a
function of upstream regulators
- include_unpaired_lr: Only if
mod_type
is “lr”- if True, will include individual ligands/complexes and individual receptors in the design matrix if their cognate interacting partners cannot also be found.
cci_dir: Path to directory containing cell-cell interaction databases species: Selects the cell-cell communication database the relevant ligands will be drawn from. Options:
“human”, “mouse”.
- output_path: Full path name for the .csv file in which results will be saved. Make sure the parent directory
is empty- any existing files will be deleted. It is recommended to create a new folder to serve as the output directory. This should be supplied of the form ‘/path/to/file.csv’, where file.csv will store coefficients. The name of the target will be appended at runtime.
- custom_lig_path: Path to .txt file containing a custom list of ligands. Each ligand should have its own line
in the .txt file.
- ligand: Alternative to the custom ligand path, can be used to provide a single ligand or a list of ligands (
separated by whitespace in the command line).
- custom_rec_path: Path to .txt file containing a custom list of receptors. Each receptor should have its own
line in the .txt file.
- receptor: Alternative to the custom receptor path, can be used to provide a single receptor or a list of
receptors (separated by whitespace in the command line).
- custom_pathways_path: Path to .txt file containing a custom list of pathways. Each pathway should have its own
line in the .txt file.
- pathway: Alternative to the custom pathway path, can be used to provide a single pathway or a list of pathways (
separated by whitespace in the command line).
- targets_path: Path to .txt file containing a custom list of targets. Each target should have its own line in
the .txt file.
- target: Alternative to the custom target path, can be used to provide a single target or a list of targets (
separated by whitespace in the command line).
- init_betas_path: Optional path to a .json file or .csv file containing initial coefficient values for the model
for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ].
- normalize: Flag to perform library size normalization, to set total counts in each cell to the same
number (adjust for cell size). Will be set to True if provided.
- smooth: Flag to correct for dropout effects by leveraging gene expression neighborhoods to smooth
expression. It is advisable not to do this if performing Poisson or negative binomial regression. Will be set to True if provided.
- log_transform: Flag for whether log-transformation should be applied to expression. It is advisable not to do
this if performing Poisson or negative binomial regression. Will be set to True if provided.
- normalize_signaling: Flag to minmax scale the final ligand expression array (for :attr mod_type =
“ligand”), or the final ligand-receptor array (for :attr mod_type = “lr”). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms.
- target_expr_threshold: Only used when automatically selecting targets- finds the L:R-downstream TFs and their
targets and searches for expression above a threshold proportion of cells to filter to a subset of candidate target genes. This argument sets that proportion, and defaults to 0.05.
- multicollinear_threshold: Variance inflation factor threshold used to filter out multicollinear features. A
value of 5 or 10 is recommended.
coords_key: Entry in
adata
.obsm that contains spatial coordinates. Defaults to “spatial”. group_key: Entry inadata
.obs that contains cell type labels. Required for ‘mod_type’ = “niche”. group_subset: Subset of cell types to include in the model (provided as a whitespace-separated list incommand line). If given, will consider only cells of these types in modeling. Defaults to all cell types.
- covariate_keys: Entries in
adata
.obs oradata
.var that contain covariates to include in the model. Can be provided as a whitespace-separated list in the command line. Numerical covariates should be minmax scaled between 0 and 1.
- total_counts_key: Entry in
adata
.obs that contains total counts for each cell. Required if subsetting by total counts. Defaults to “total_counts”.
- total_counts_threshold: Threshold for total counts to subset cells by- cells with total counts greater than
this threshold will be retained.
- bw: Bandwidth for kernel density estimation. Consists of either a distance value or N for the number of
nearest neighbors, depending on
bw_fixed
minbw: For use in automated bandwidth selection- the lower-bound bandwidth to test. maxbw: For use in automated bandwidth selection- the upper-bound bandwidth to test. bw_fixed: Flag to use a fixed bandwidth (True) or to automatically select a bandwidth (False). This should be
True if the input to/values to test for
bw
are distance values, and False if they are numbers of neighbors.- exclude_self: Flag to exclude the target cell from the neighborhood when computing spatial weights. Note that
if True and
bw
is defined by the number of neighbors, your desired bw should be 1 + the number of neighbors you want to include.- kernel: Type of kernel function used to weight observations when computing spatial weights and fitting the
model; one of “bisquare”, “exponential”, “gaussian”, “quadratic”, “triangular” or “uniform”.
- distance_membrane_bound: In model setup, distance threshold to consider cells as neighbors for membrane-bound
ligands. If provided, will take priority over :attr ‘n_neighbors_membrane_bound’.
- distance_secreted: In model setup, distance threshold to consider cells as neighbors for secreted or ECM
ligands. If provided, will take priority over :attr ‘n_neighbors_secreted’.
- n_neighbors_membrane_bound: For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for membrane-bound ligands. Defaults to 8.
- n_neighbors_secreted: For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for secreted or ECM ligands.
distr: Distribution family for the dependent variable; one of “gaussian”, “poisson”, “nb” fit_intercept: Flag to fit an intercept term in the model. Will be set to True if provided.
tolerance: Convergence tolerance for IWLS max_iter: Maximum number of iterations for IWLS patience: When checking various values for the bandwidth, this is the number of iterations to wait for
without the score changing before stopping. Defaults to 5.
- ridge_lambda: Sets the strength of the regularization, between 0 and 1. The higher values typically will
result in more features removed.
- search_bw: For downstream analysis; specifies the bandwidth to search for senders/receivers. Recommended to
set equal to the bandwidth of a fitted model.
- top_k_receivers: For downstream analysis, specifically when constructing vector fields of signaling effects.
Specifies the number of nearest neighbors to consider when computing signaling effect vectors.
- filter_targets: For downstream analysis, specifically :func infer_effect_direction; if True, will subset to
only the targets that were predicted well by the model.
- filter_target_threshold: For downstream analysis, specifically :func infer_effect_direction; specifies the
threshold Pearson coefficient for target subsetting. Only used if filter_targets is True.
- diff_sending_or_receiving: For downstream analyses, specifically :func
sender_receiver_effect_deg_detection; specifies whether to compute differential expression of genes in cells with high or low sending effect potential (‘sending cells’) or high or low receiving effect potential (‘receiving cells’).
- target_for_downstream: A string or a list (provided as a whitespace-separated list in the command line) of
- target genes for :func get_effect_potential, :func get_pathway_potential and :func
calc_and_group_sender_receiver_effect_degs (provide only one target), as well as :func compute_cell_type_coupling (can provide multiple targets).
- ligand_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the ligand gene to consider with respect to the target.
- receptor_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the receptor gene to consider with respect to the target.
- pathway_for_downstream: For downstream analyses; used for :func get_pathway_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the pathway to consider with respect to the target.
- sender_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a sender.
- receiver_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a receiver.
- n_components: Used for :func CCI_sender_deg_detection and :func CCI_receiver_deg_detection;
determines the dimensionality of the space to embed into using UMAP.
- cci_degs_model_interactions: Used for :func CCI_sender_deg_detection; if True, will consider transcription
factor interactions with cofactors and other transcription factors, with these interactions combined into features. If False, will use each cofactor independently in the prediction.
- no_cell_type_markers: Used for :func CCI_receiver_deg_detection; if True, will exclude cell type markers
from the set of genes for which to compare to sent/received signal.
- compute_pathway_effect: Used for :func inferred_effect_direction; if True, will summarize the effects of all
ligands/ligand-receptor interactions in a pathway.
- Returns:
Argparse object defining important arguments for model fitting and interpretation args_list: If argparse object is returned from a function, the parser must read in arguments in the form of a
list- this return contains that processed list.
- Return type:
parser