spateo.tools.CCI_effects_modeling.SWR

Attributes

Functions

define_spateo_argparse(**kwargs)

Defines and returns MPI and argparse objects for model fitting and interpretation.

Module Contents

spateo.tools.CCI_effects_modeling.SWR.define_spateo_argparse(**kwargs)[source]

Defines and returns MPI and argparse objects for model fitting and interpretation.

Parameters:
kwargs

Keyword arguments for any of the argparse arguments defined below.

Parser arguments:

run_upstream: Flag to run the upstream target selection step. If True, will run the target selection step adata_path: Path to AnnData object containing gene expression data. This or ‘csv_path’ must be given to run. csv_path: Path to .csv file containing gene expression data. This or ‘adata_path’ must be given to run. n_spatial_dim_csv: Number of spatial dimensions to the data provided to ‘csv_path’. Defaults to 2. spatial_subsample: Flag to subsample the data- at a big picture level, this will be done by dividing the tissue

into regions and subsampling from each of these regions. Recommended for large datasets (>5000 samples).

multiscale: Flag to create multiscale models. Currently, it is recommended to only create multiscale models

for Gaussian data.

multiscale_params_only: Flag to return additional metrics along with the coefficients for multiscale models (

specifying this argument sets Flag to True)

mod_type: The type of model that will be employed- this dictates how the data will be processed and
prepared. Options:
  • “niche”: Spatially-aware, uses categorical cell type labels as independent variables.

  • “lr”: Spatially-aware, essentially uses the combination of receptor expression in the “target” cell

    and spatially lagged ligand expression in the neighboring cells as independent variables.

  • “ligand”: Spatially-aware, essentially uses ligand expression in the neighboring cells as

    independent variables.

  • “receptor”: Uses receptor expression in the “target” cell as independent variables.

  • “downstream”: For the purposes of downstream analysis, used to model ligand expression as a

    function of upstream regulators

include_unpaired_lr: Only if mod_type is “lr”- if True, will include individual ligands/complexes and

individual receptors in the design matrix if their cognate interacting partners cannot also be found.

cci_dir: Path to directory containing cell-cell interaction databases species: Selects the cell-cell communication database the relevant ligands will be drawn from. Options:

“human”, “mouse”.

output_path: Full path name for the .csv file in which results will be saved. Make sure the parent directory

is empty- any existing files will be deleted. It is recommended to create a new folder to serve as the output directory. This should be supplied of the form ‘/path/to/file.csv’, where file.csv will store coefficients. The name of the target will be appended at runtime.

custom_lig_path: Path to .txt file containing a custom list of ligands. Each ligand should have its own line

in the .txt file.

ligand: Alternative to the custom ligand path, can be used to provide a single ligand or a list of ligands (

separated by whitespace in the command line).

custom_rec_path: Path to .txt file containing a custom list of receptors. Each receptor should have its own

line in the .txt file.

receptor: Alternative to the custom receptor path, can be used to provide a single receptor or a list of

receptors (separated by whitespace in the command line).

custom_pathways_path: Path to .txt file containing a custom list of pathways. Each pathway should have its own

line in the .txt file.

pathway: Alternative to the custom pathway path, can be used to provide a single pathway or a list of pathways (

separated by whitespace in the command line).

targets_path: Path to .txt file containing a custom list of targets. Each target should have its own line in

the .txt file.

target: Alternative to the custom target path, can be used to provide a single target or a list of targets (

separated by whitespace in the command line).

init_betas_path: Optional path to a .json file or .csv file containing initial coefficient values for the model

for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ].

normalize: Flag to perform library size normalization, to set total counts in each cell to the same

number (adjust for cell size). Will be set to True if provided.

smooth: Flag to correct for dropout effects by leveraging gene expression neighborhoods to smooth

expression. It is advisable not to do this if performing Poisson or negative binomial regression. Will be set to True if provided.

log_transform: Flag for whether log-transformation should be applied to expression. It is advisable not to do

this if performing Poisson or negative binomial regression. Will be set to True if provided.

normalize_signaling: Flag to minmax scale the final ligand expression array (for :attr mod_type =

“ligand”), or the final ligand-receptor array (for :attr mod_type = “lr”). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms.

target_expr_threshold: Only used when automatically selecting targets- finds the L:R-downstream TFs and their

targets and searches for expression above a threshold proportion of cells to filter to a subset of candidate target genes. This argument sets that proportion, and defaults to 0.05.

multicollinear_threshold: Variance inflation factor threshold used to filter out multicollinear features. A

value of 5 or 10 is recommended.

coords_key: Entry in adata .obsm that contains spatial coordinates. Defaults to “spatial”. group_key: Entry in adata .obs that contains cell type labels. Required for ‘mod_type’ = “niche”. group_subset: Subset of cell types to include in the model (provided as a whitespace-separated list in

command line). If given, will consider only cells of these types in modeling. Defaults to all cell types.

covariate_keys: Entries in adata .obs or adata .var that contain covariates to include

in the model. Can be provided as a whitespace-separated list in the command line. Numerical covariates should be minmax scaled between 0 and 1.

total_counts_key: Entry in adata .obs that contains total counts for each cell. Required if subsetting

by total counts. Defaults to “total_counts”.

total_counts_threshold: Threshold for total counts to subset cells by- cells with total counts greater than

this threshold will be retained.

bw: Bandwidth for kernel density estimation. Consists of either a distance value or N for the number of

nearest neighbors, depending on bw_fixed

minbw: For use in automated bandwidth selection- the lower-bound bandwidth to test. maxbw: For use in automated bandwidth selection- the upper-bound bandwidth to test. bw_fixed: Flag to use a fixed bandwidth (True) or to automatically select a bandwidth (False). This should be

True if the input to/values to test for bw are distance values, and False if they are numbers of neighbors.

exclude_self: Flag to exclude the target cell from the neighborhood when computing spatial weights. Note that

if True and bw is defined by the number of neighbors, your desired bw should be 1 + the number of neighbors you want to include.

kernel: Type of kernel function used to weight observations when computing spatial weights and fitting the

model; one of “bisquare”, “exponential”, “gaussian”, “quadratic”, “triangular” or “uniform”.

distance_membrane_bound: In model setup, distance threshold to consider cells as neighbors for membrane-bound

ligands. If provided, will take priority over :attr ‘n_neighbors_membrane_bound’.

distance_secreted: In model setup, distance threshold to consider cells as neighbors for secreted or ECM

ligands. If provided, will take priority over :attr ‘n_neighbors_secreted’.

n_neighbors_membrane_bound: For mod_type “ligand” or “lr”- ligand expression will be taken from the

neighboring cells- this defines the number of cells to use for membrane-bound ligands. Defaults to 8.

n_neighbors_secreted: For mod_type “ligand” or “lr”- ligand expression will be taken from the

neighboring cells- this defines the number of cells to use for secreted or ECM ligands.

distr: Distribution family for the dependent variable; one of “gaussian”, “poisson”, “nb” fit_intercept: Flag to fit an intercept term in the model. Will be set to True if provided.

tolerance: Convergence tolerance for IWLS max_iter: Maximum number of iterations for IWLS patience: When checking various values for the bandwidth, this is the number of iterations to wait for

without the score changing before stopping. Defaults to 5.

ridge_lambda: Sets the strength of the regularization, between 0 and 1. The higher values typically will

result in more features removed.

search_bw: For downstream analysis; specifies the bandwidth to search for senders/receivers. Recommended to

set equal to the bandwidth of a fitted model.

top_k_receivers: For downstream analysis, specifically when constructing vector fields of signaling effects.

Specifies the number of nearest neighbors to consider when computing signaling effect vectors.

filter_targets: For downstream analysis, specifically :func infer_effect_direction; if True, will subset to

only the targets that were predicted well by the model.

filter_target_threshold: For downstream analysis, specifically :func infer_effect_direction; specifies the

threshold Pearson coefficient for target subsetting. Only used if filter_targets is True.

diff_sending_or_receiving: For downstream analyses, specifically :func

sender_receiver_effect_deg_detection; specifies whether to compute differential expression of genes in cells with high or low sending effect potential (‘sending cells’) or high or low receiving effect potential (‘receiving cells’).

target_for_downstream: A string or a list (provided as a whitespace-separated list in the command line) of
target genes for :func get_effect_potential, :func get_pathway_potential and :func

calc_and_group_sender_receiver_effect_degs (provide only one target), as well as :func compute_cell_type_coupling (can provide multiple targets).

ligand_for_downstream: For downstream analyses; used for :func get_effect_potential and :func

calc_and_group_sender_receiver_effect_degs, used to specify the ligand gene to consider with respect to the target.

receptor_for_downstream: For downstream analyses; used for :func get_effect_potential and :func

calc_and_group_sender_receiver_effect_degs, used to specify the receptor gene to consider with respect to the target.

pathway_for_downstream: For downstream analyses; used for :func get_pathway_potential and :func

calc_and_group_sender_receiver_effect_degs, used to specify the pathway to consider with respect to the target.

sender_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func

calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a sender.

receiver_ct_for_downstream: For downstream analyses; used for :func get_effect_potential and :func

calc_and_group_sender_receiver_effect_degs, used to specify the cell type to consider as a receiver.

n_components: Used for :func CCI_sender_deg_detection and :func CCI_receiver_deg_detection;

determines the dimensionality of the space to embed into using UMAP.

cci_degs_model_interactions: Used for :func CCI_sender_deg_detection; if True, will consider transcription

factor interactions with cofactors and other transcription factors, with these interactions combined into features. If False, will use each cofactor independently in the prediction.

no_cell_type_markers: Used for :func CCI_receiver_deg_detection; if True, will exclude cell type markers

from the set of genes for which to compare to sent/received signal.

compute_pathway_effect: Used for :func inferred_effect_direction; if True, will summarize the effects of all

ligands/ligand-receptor interactions in a pathway.

Returns:

Argparse object defining important arguments for model fitting and interpretation args_list: If argparse object is returned from a function, the parser must read in arguments in the form of a

list- this return contains that processed list.

Return type:

parser

spateo.tools.CCI_effects_modeling.SWR.parser[source]