spateo.tools.CCI_effects_modeling.SWR¶
Attributes¶
Functions¶
|
Defines and returns MPI and argparse objects for model fitting and interpretation. |
Module Contents¶
- spateo.tools.CCI_effects_modeling.SWR.define_spateo_argparse(**kwargs)[source]¶
Defines and returns MPI and argparse objects for model fitting and interpretation.
- Parameters:
- kwargs
Keyword arguments for any of the argparse arguments defined below.
- Parser arguments:
run_upstream: Flag to run the upstream target selection step. If True, will run the target selection step adata_path: Path to AnnData object containing gene expression data. This or ‘csv_path’ must be given to run. csv_path: Path to .csv file containing gene expression data. This or ‘adata_path’ must be given to run. n_spatial_dim_csv: Number of spatial dimensions to the data provided to ‘csv_path’. Defaults to 2. spatial_subsample: Flag to subsample the data- at a big picture level, this will be done by dividing the tissue
into regions and subsampling from each of these regions. Recommended for large datasets (>5000 samples).
- multiscale: Flag to create multiscale models. Currently, it is recommended to only create multiscale models
for Gaussian data.
- multiscale_params_only: Flag to return additional metrics along with the coefficients for multiscale models (
specifying this argument sets Flag to True)
- mod_type: The type of model that will be employed- this dictates how the data will be processed and
- prepared. Options:
“niche”: Spatially-aware, uses categorical cell type labels as independent variables.
- “lr”: Spatially-aware, essentially uses the combination of receptor expression in the “target” cell
and spatially lagged ligand expression in the neighboring cells as independent variables.
- “ligand”: Spatially-aware, essentially uses ligand expression in the neighboring cells as
independent variables.
“receptor”: Uses receptor expression in the “target” cell as independent variables.
- “downstream”: For the purposes of downstream analysis, used to model ligand expression as a
function of upstream regulators
- include_unpaired_lr: Only if
mod_type
is “lr”- if True, will include individual ligands/complexes and individual receptors in the design matrix if their cognate interacting partners cannot also be found.
cci_dir: Path to directory containing cell-cell interaction databases species: Selects the cell-cell communication database the relevant ligands will be drawn from. Options:
“human”, “mouse”.
- output_path: Full path name for the .csv file in which results will be saved. Make sure the parent directory
is empty- any existing files will be deleted. It is recommended to create a new folder to serve as the output directory. This should be supplied of the form ‘/path/to/file.csv’, where file.csv will store coefficients. The name of the target will be appended at runtime.
- custom_lig_path: Path to .txt file containing a custom list of ligands. Each ligand should have its own line
in the .txt file.
- ligand: Alternative to the custom ligand path, can be used to provide a single ligand or a list of ligands (
separated by whitespace in the command line).
- custom_rec_path: Path to .txt file containing a custom list of receptors. Each receptor should have its own
line in the .txt file.
- receptor: Alternative to the custom receptor path, can be used to provide a single receptor or a list of
receptors (separated by whitespace in the command line).
- custom_pathways_path: Path to .txt file containing a custom list of pathways. Each pathway should have its own
line in the .txt file.
- pathway: Alternative to the custom pathway path, can be used to provide a single pathway or a list of pathways (
separated by whitespace in the command line).
- targets_path: Path to .txt file containing a custom list of targets. Each target should have its own line in
the .txt file.
- target: Alternative to the custom target path, can be used to provide a single target or a list of targets (
separated by whitespace in the command line).
- init_betas_path: Optional path to a .json file or .csv file containing initial coefficient values for the model
for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ].
- normalize: Flag to perform library size normalization, to set total counts in each cell to the same
number (adjust for cell size). Will be set to True if provided.
- smooth: Flag to correct for dropout effects by leveraging gene expression neighborhoods to smooth
expression. It is advisable not to do this if performing Poisson or negative binomial regression. Will be set to True if provided.
- log_transform: Flag for whether log-transformation should be applied to expression. It is advisable not to do
this if performing Poisson or negative binomial regression. Will be set to True if provided.
- normalize_signaling: Flag to minmax scale the final ligand expression array (for :attr mod_type =
“ligand”), or the final ligand-receptor array (for :attr mod_type = “lr”). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms.
- target_expr_threshold: Only used when automatically selecting targets- finds the L:R-downstream TFs and their
targets and searches for expression above a threshold proportion of cells to filter to a subset of candidate target genes. This argument sets that proportion, and defaults to 0.05.
- multicollinear_threshold: Variance inflation factor threshold used to filter out multicollinear features. A
value of 5 or 10 is recommended.
coords_key: Entry in
adata
.obsm that contains spatial coordinates. Defaults to “spatial”. group_key: Entry inadata
.obs that contains cell type labels. Required for ‘mod_type’ = “niche”. group_subset: Subset of cell types to include in the model (provided as a whitespace-separated list incommand line). If given, will consider only cells of these types in modeling. Defaults to all cell types.
- covariate_keys: Entries in
adata
.obs oradata
.var that contain covariates to include in the model. Can be provided as a whitespace-separated list in the command line. Numerical covariates should be minmax scaled between 0 and 1.
- total_counts_key: Entry in
adata
.obs that contains total counts for each cell. Required if subsetting by total counts. Defaults to “total_counts”.
- total_counts_threshold: Threshold for total counts to subset cells by- cells with total counts greater than
this threshold will be retained.
- bw: Bandwidth for kernel density estimation. Consists of either a distance value or N for the number of
nearest neighbors, depending on
bw_fixed
minbw: For use in automated bandwidth selection- the lower-bound bandwidth to test. maxbw: For use in automated bandwidth selection- the upper-bound bandwidth to test. bw_fixed: Flag to use a fixed bandwidth (True) or to automatically select a bandwidth (False). This should be
True if the input to/values to test for
bw
are distance values, and False if they are numbers of neighbors.- exclude_self: Flag to exclude the target cell from the neighborhood when computing spatial weights. Note that
if True and
bw
is defined by the number of neighbors, your desired bw should be 1 + the number of neighbors you want to include.- kernel: Type of kernel function used to weight observations when computing spatial weights and fitting the
model; one of “bisquare”, “exponential”, “gaussian”, “quadratic”, “triangular” or “uniform”.
- distance_membrane_bound: In model setup, distance threshold to consider cells as neighbors for membrane-bound
ligands. If provided, will take priority over :attr ‘n_neighbors_membrane_bound’.
- distance_secreted: In model setup, distance threshold to consider cells as neighbors for secreted or ECM
ligands. If provided, will take priority over :attr ‘n_neighbors_secreted’.
- n_neighbors_membrane_bound: For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for membrane-bound ligands. Defaults to 8.
- n_neighbors_secreted: For
mod_type
“ligand” or “lr”- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for secreted or ECM ligands.
distr: Distribution family for the dependent variable; one of “gaussian”, “poisson”, “nb” fit_intercept: Flag to fit an intercept term in the model. Will be set to True if provided.
tolerance: Convergence tolerance for IWLS max_iter: Maximum number of iterations for IWLS patience: When checking various values for the bandwidth, this is the number of iterations to wait for
without the score changing before stopping. Defaults to 5.
- ridge_lambda: Sets the strength of the regularization, between 0 and 1. The higher values typically will
result in more features removed.
- search_bw: For downstream analysis; specifies the bandwidth to search for senders/receivers. Recommended to
set equal to the bandwidth of a fitted model.
- top_k_receivers: For downstream analysis, specifically when constructing vector fields of signaling effects.
Specifies the number of nearest neighbors to consider when computing signaling effect vectors.
- filter_targets: For downstream analysis, specifically :func infer_effect_direction; if True, will subset to
only the targets that were predicted well by the model.
- filter_target_threshold: For downstream analysis, specifically :func infer_effect_direction; specifies the
threshold Pearson coefficient for target subsetting. Only used if filter_targets is True.
- diff_sending_or_receiving: For downstream analyses, specifically :func
sender_receiver_effect_deg_detection; specifies whether to compute differential expression of genes in cells with high or low sending effect potential (‘sending cells’) or high or low receiving effect potential (‘receiving cells’).
- target_for_downstream: A string or a list (provided as a whitespace-separated list in the command line) of
- target genes for :func get_effect_potential, :func get_pathway_potential and :func
calc_and_group_sender_receiver_effect_degs (provide only one target), as well as :func compute_cell_type_coupling (can provide multiple targets).
- ligand_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the ligand gene to consider with respect to the target.
- receptor_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the receptor gene to consider with respect to the target.
- pathway_for_downstream: For downstream analyses; used for :func get_pathway_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the pathway to consider with respect to the target.
- sender_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a sender.
- receiver_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func
calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a receiver.
- n_components: Used for :func CCI_sender_deg_detection and :func CCI_receiver_deg_detection;
determines the dimensionality of the space to embed into using UMAP.
- cci_degs_model_interactions: Used for :func CCI_sender_deg_detection; if True, will consider transcription
factor interactions with cofactors and other transcription factors, with these interactions combined into features. If False, will use each cofactor independently in the prediction.
- no_cell_type_markers: Used for :func CCI_receiver_deg_detection; if True, will exclude cell type markers
from the set of genes for which to compare to sent/received signal.
- compute_pathway_effect: Used for :func inferred_effect_direction; if True, will summarize the effects of all
ligands/ligand-receptor interactions in a pathway.
- Returns:
Argparse object defining important arguments for model fitting and interpretation args_list: If argparse object is returned from a function, the parser must read in arguments in the form of a
list- this return contains that processed list.
- Return type:
parser