spateo.tools ============ .. py:module:: spateo.tools Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/spateo/tools/CCI_effects_modeling/index /autoapi/spateo/tools/architype/index /autoapi/spateo/tools/cci_fdr/index /autoapi/spateo/tools/cci_two_cluster/index /autoapi/spateo/tools/cell_communication/index /autoapi/spateo/tools/cluster/index /autoapi/spateo/tools/cluster_degs/index /autoapi/spateo/tools/cluster_lasso/index /autoapi/spateo/tools/coarse_align/index /autoapi/spateo/tools/dimensionality_reduction/index /autoapi/spateo/tools/find_neighbors/index /autoapi/spateo/tools/gene_expression_variance/index /autoapi/spateo/tools/glm/index /autoapi/spateo/tools/labels/index /autoapi/spateo/tools/lisa/index /autoapi/spateo/tools/live_wire/index /autoapi/spateo/tools/roi/index /autoapi/spateo/tools/spatial_correlation/index /autoapi/spateo/tools/spatial_degs/index /autoapi/spateo/tools/spatial_smooth/index /autoapi/spateo/tools/spatially_variable_gene_ot/index /autoapi/spateo/tools/utils/index Classes ------- .. autoapisummary:: spateo.tools.MuSIC spateo.tools.MuSIC_Interpreter spateo.tools.MuSIC_Molecule_Selector spateo.tools.pySTAGATE spateo.tools.Lasso spateo.tools.Label spateo.tools.LiveWireSegmentation Functions --------- .. autoapisummary:: spateo.tools.archetypes spateo.tools.archetypes_genes spateo.tools.find_spatial_archetypes spateo.tools.find_spatially_related_genes spateo.tools.get_genes_from_spatial_archetype spateo.tools.define_spateo_argparse spateo.tools.find_cci_two_group spateo.tools.prepare_cci_cellpair_adata spateo.tools.prepare_cci_df spateo.tools.niches spateo.tools.predict_ligand_activities spateo.tools.predict_target_genes spateo.tools.spagcn_vanilla spateo.tools.CAST spateo.tools.kmeans_clustering spateo.tools.mclust_py spateo.tools.scc spateo.tools.smooth spateo.tools.spagcn_pyg spateo.tools.compute_pca_components spateo.tools.ecp_silhouette spateo.tools.integrate spateo.tools.pca_spateo spateo.tools.pearson_residuals spateo.tools.scc spateo.tools.spagcn_pyg spateo.tools.find_all_cluster_degs spateo.tools.find_cluster_degs spateo.tools.find_spatial_cluster_degs spateo.tools.top_n_degs spateo.tools.AffineTrans spateo.tools.align_slices_pca spateo.tools.pca_align spateo.tools.procrustes spateo.tools.construct_nn_graph spateo.tools.neighbors spateo.tools.glm_degs spateo.tools.create_label_class spateo.tools.GM_lag_model spateo.tools.lisa_geo_df spateo.tools.local_moran_i spateo.tools.compute_shortest_path spateo.tools.live_wire spateo.tools.spatial_bv_local_moran spateo.tools.spatial_bv_moran_obs_genes spateo.tools.cellbin_morani spateo.tools.moran_i Package Contents ---------------- .. py:function:: archetypes(adata: anndata.AnnData, moran_i_genes: Union[numpy.ndarray, list], num_clusters: int = 5, layer: Union[str, None] = None) -> numpy.ndarray Identify archetypes from the anndata object. :param adata: Anndata object of interests. :param moran_i_genes: genes that are identified as singificant autocorrelation genes in space based on Moran's I. :param num_clusters: number of archetypes. :param layers: the layer for the gene expression, can be None which corresponds to adata.X. :returns: the archetypes within the genes with high moran I scores. :rtype: archetypes .. rubric:: Examples >>> archetypes = st.tl.archetypes(adata) >>> adata.obs = pd.concat((adata.obs, df), 1) >> arch_cols = adata.obs.columns >>> st.pl.space(adata, basis="spatial", color=arch_cols, pointsize=0.1, alpha=1) .. py:function:: archetypes_genes(adata: anndata.AnnData, archetypes: numpy.ndarray, num_clusters: int, moran_i_genes: Union[numpy.ndarray, list], layer: Union[str, None] = None) -> dict Identify genes that belong to each expression archetype. :param adata: Anndata object of interests. :param archetypes: the archetypes output of find_spatial_archetypes :param num_clusters: number of archetypes. :param moran_i_genes: genes that are identified as singificant autocorrelation genes in space based on Moran's I. :param layer: the layer for the gene expression, can be None which corresponds to adata.X. :returns: a dictionary where the key is the index of the archetype and the values are the top genes for that particular archetype. :rtype: archetypes_dict .. rubric:: Examples >>> st.tl.archetypes_genes(adata) >>> dyn.pl.scatters(subset_adata, >>> basis="spatial", >>> color=['archetype %d'% i] + typical_genes.to_list(), >>> pointsize=0.03, >>> alpha=1, >>> figsize=(3, ptp_vec[1]/ptp_vec[0] * 3) >>> ) .. py:function:: find_spatial_archetypes(num_clusters: int, exp_mat: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] Clusters the expression data and finds gene archetypes. Current implementation is based on hierarchical clustering with the Ward method. The archetypes are simply the average of genes belong to the same cell cluster. :param num_clusters: number of gene clusters or archetypes. :param exp_mat: expression matrix. Rows are genes and columns are buckets. :returns: Returns the archetypes, the gene sets (clusters) and the Pearson correlations of every gene with respect to each archetype. .. py:function:: find_spatially_related_genes(exp_mat: numpy.ndarray, gene_names: Union[numpy.ndarray, list], archetypes: numpy.ndarray, gene: int, pval_threshold: float = 0) Given a gene, find other genes which correlate well spatially. :param exp_mat: expression matrix. :param gene_names: gene name list that associates with the rows of expression matrix. :param archetypes: the archetypes output of find_spatial_archetypes :param gene: the index of the gene to be queried :param pval_threshold: the pvalue returned from the pearsonr function :returns: a list of genes which are the best representatives of the archetype .. py:function:: get_genes_from_spatial_archetype(exp_mat: numpy.ndarray, gene_names: Union[numpy.ndarray, list], archetypes: numpy.ndarray, archetype: int, pval_threshold: float = 0) -> Union[numpy.ndarray, list] Get a list of genes which are the best representatives of the archetype. :param exp_mat: expression matrix. :param gene_names: the gene names list that associates with the rows of expression matrix :param archetypes: the archetypes output of find_spatial_archetypes :param archetype: a number denoting the archetype :param pval_threshold: the pvalue returned from the pearsonr function :returns: a list of genes which are the best representatives of the archetype .. py:class:: MuSIC(parser: argparse.ArgumentParser, args_list: Optional[List[str]] = None, verbose: bool = True, save_subsampling: bool = True) Spatially weighted regression on spatial omics data with parallel processing. Runs after being called from the command line. :param comm: MPI communicator object initialized with mpi4py, to control parallel processing operations :param parser: ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling. :param args_list: If parser is provided by function call, the arguments to parse must be provided as a separate list. It is recommended to use the return from :func `define_spateo_argparse()` for this. :param verbose: Set True to print updates to screen. Will be set False when initializing downstream analysis object, which inherits from this class but for which the information is generally not as useful. :param save_subsampling: Set True to save the subsampled data to a .json file. Defaults to True, recommended to set True for ease of access to the subsampling results. .. attribute:: mod_type The type of model that will be employed- this dictates how the data will be processed and prepared. Options: - "niche": Spatially-aware, uses categorical cell type labels as independent variables. - "lr": Spatially-aware, essentially uses the combination of receptor expression in the "target" cell and spatially lagged ligand expression in the neighboring cells as independent variables. - "ligand": Spatially-aware, essentially uses ligand expression in the neighboring cells as independent variables. - "receptor": Uses receptor expression in the "target" cell as independent variables. .. attribute:: adata_path Path to the AnnData object from which to extract data for modeling .. attribute:: csv_path Can also be used to specify path to non-AnnData .csv object. Assumes the first three columns contain x- and y-coordinates and then dependent variable values, in that order, with all subsequent columns containing independent variable values. .. attribute:: normalize Set True to perform library size normalization, to set total counts in each cell to the same number (adjust for cell size). .. attribute:: smooth Set True to correct for dropout effects by leveraging gene expression neighborhoods to smooth expression. It is advisable not to do this if performing Poisson or negative binomial regression. .. attribute:: log_transform Set True if log-transformation should be applied to expression. It is advisable not to do this if performing Poisson or negative binomial regression. .. attribute:: normalize_signaling Set True to minmax scale the final ligand expression array (for :attr `mod_type` = "ligand"), or the final ligand-receptor array (for :attr `mod_type` = "lr"). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms. .. attribute:: target_expr_threshold Only used if :param `mod_type` is "lr" or "ligand" and :param `targets_path` is not given. When manually selecting targets, expression above a threshold percentage of cells will be used to filter to a smaller subset of interesting genes. Defaults to 0.2. .. attribute:: multicollinear_threshold Variance inflation factor threshold used to filter out multicollinear features. A value of 5 or 10 is recommended. .. attribute:: custom_lig_path Optional path to a .txt file containing a list of ligands for the model, separated by newlines. Only used if :attr `mod_type` is "lr" or "ligand" (and thus uses ligand expression directly in the inference). If not provided, will select ligands using a threshold based on expression levels in the data. .. attribute:: custom_ligands Optional list of ligands for the model, can be used as an alternative to :attr `custom_lig_path`. Only used if :attr `mod_type` is "lr" or "ligand". .. attribute:: custom_rec_path Optional path to a .txt file containing a list of receptors for the model, separated by newlines. Only used if :attr `mod_type` is "lr" (and thus uses receptor expression directly in the inference). If not provided, will select receptors using a threshold based on expression levels in the data. .. attribute:: custom_receptors Optional list of receptors for the model, can be used as an alternative to :attr `custom_rec_path`. Only used if :attr `mod_type` is "lr". .. attribute:: custom_pathways_path Rather than providing a list of receptors, can provide a list of signaling pathways- all receptors with annotations in this pathway will be included in the model. Only used if :attr `mod_type` is "lr". .. attribute:: custom_pathways Optional list of signaling pathways for the model, can be used as an alternative to :attr `custom_pathways_path`. Only used if :attr `mod_type` is "lr". .. attribute:: targets_path Optional path to a .txt file containing a list of prediction target genes for the model, separated by newlines. If not provided, targets will be strategically selected from the given receptors. .. attribute:: custom_targets Optional list of prediction target genes for the model, can be used as an alternative to :attr `targets_path`. .. attribute:: init_betas_path Optional path to a .json file or .csv file containing initial coefficient values for the model for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ]. .. attribute:: cci_dir Full path to the directory containing cell-cell communication databases .. attribute:: species Selects the cell-cell communication database the relevant ligands will be drawn from. Options: "human", "mouse". .. attribute:: output_path Full path name for the .csv file in which results will be saved .. attribute:: coords_key Key in .obsm of the AnnData object that contains the coordinates of the cells .. attribute:: group_key Key in .obs of the AnnData object that contains the category grouping for each cell .. attribute:: group_subset Subset of cell types to include in the model (provided as a whitespace-separated list in command line). If given, will consider only cells of these types in modeling. Defaults to all cell types. .. attribute:: covariate_keys Can be used to optionally provide any number of keys in .obs or .var containing a continuous covariate (e.g. expression of a particular TF, avg. distance from a perturbed cell, etc.) .. attribute:: total_counts_key Entry in :attr:`adata` .obs that contains total counts for each cell. Required if subsetting by total counts. .. attribute:: total_counts_threshold Threshold for total counts to subset cells by- cells with total counts greater than this threshold will be retained. .. attribute:: bw Used to provide previously obtained bandwidth for the spatial kernel. Consists of either a distance value or N for the number of nearest neighbors. Pass "np.inf" if all other points should have the same spatial weight. .. attribute:: minbw For use in automated bandwidth selection- the lower-bound bandwidth to test. .. attribute:: maxbw For use in automated bandwidth selection- the upper-bound bandwidth to test. .. attribute:: distr Distribution family for the dependent variable; one of "gaussian", "poisson", "nb" .. attribute:: kernel Type of kernel function used to weight observations; one of "bisquare", "exponential", "gaussian", "quadratic", "triangular" or "uniform". .. attribute:: n_neighbors_membrane_bound For :attr:`mod_type` "ligand" or "lr"- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for membrane-bound ligands. .. attribute:: n_neighbors_secreted For :attr:`mod_type` "ligand" or "lr"- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for secreted or ECM ligands. .. attribute:: use_expression_neighbors The default for finding spatial neighborhoods for the modeling process is to use neighbors in physical space. If this argument is provided, expression will instead be used to find neighbors. .. attribute:: bw_fixed Set True for distance-based kernel function and False for nearest neighbor-based kernel function .. attribute:: exclude_self If True, ignore each sample itself when computing the kernel density estimation .. attribute:: fit_intercept Set True to include intercept in the model and False to exclude intercept .. py:attribute:: logger .. py:attribute:: parser .. py:attribute:: args_list :value: None .. py:attribute:: verbose :value: True .. py:attribute:: save_subsampling :value: True .. py:attribute:: mod_type :value: None .. py:attribute:: species :value: None .. py:attribute:: ligands :value: None .. py:attribute:: receptors :value: None .. py:attribute:: targets :value: None .. py:attribute:: normalize :value: None .. py:attribute:: smooth :value: None .. py:attribute:: log_transform :value: None .. py:attribute:: target_expr_threshold :value: None .. py:attribute:: coords :value: None .. py:attribute:: groups :value: None .. py:attribute:: y :value: None .. py:attribute:: X :value: None .. py:attribute:: bw :value: None .. py:attribute:: minbw :value: None .. py:attribute:: maxbw :value: None .. py:attribute:: distr :value: None .. py:attribute:: kernel :value: None .. py:attribute:: n_samples :value: None .. py:attribute:: n_features :value: None .. py:attribute:: set_up :value: False .. py:attribute:: X_df :value: None .. py:attribute:: adata :value: None .. py:attribute:: cell_categories :value: None .. py:attribute:: clip :value: None .. py:attribute:: cof_db :value: None .. py:attribute:: ct_vec :value: None .. py:attribute:: feature_distance :value: None .. py:attribute:: feature_names :value: None .. py:attribute:: grn :value: None .. py:attribute:: ligands_expr :value: None .. py:attribute:: ligands_expr_nonlag :value: None .. py:attribute:: lr_db :value: None .. py:attribute:: lr_pairs :value: None .. py:attribute:: n_samples_subsampled :value: None .. py:attribute:: n_samples_subset :value: None .. py:attribute:: neighboring_unsampled :value: None .. py:attribute:: optimal_bw :value: None .. py:attribute:: r_tf_db :value: None .. py:attribute:: receptors_expr :value: None .. py:attribute:: sample_names :value: None .. py:attribute:: subsampled :value: None .. py:attribute:: subsampled_sample_names :value: None .. py:attribute:: subset :value: None .. py:attribute:: subset_indices :value: None .. py:attribute:: subset_sample_names :value: None .. py:attribute:: targets_expr :value: None .. py:attribute:: tf_tf_db :value: None .. py:attribute:: x_chunk :value: None .. py:method:: _set_up_model(verbose: bool = True) .. py:method:: parse_stgwr_args() Parse command line arguments for arguments pertinent to modeling. .. py:method:: load_and_process(upstream: bool = False) Load AnnData object and process it for modeling. :param upstream: Set False if performing the actual model fitting process, True to define only the AnnData object for upstream purposes. :param downstream: Set True if setting up a downstream model- in this case, ligand/receptor preprocessing will be skipped. .. py:method:: setup_downstream(adata: Optional[anndata.AnnData] = None) Setup for downstream tasks- namely, models for inferring signaling-associated differential expression. .. py:method:: define_sig_inputs(adata: Optional[anndata.AnnData] = None, recompute: bool = False) For signaling-relevant models, define necessary quantities that will later be used to define the independent variable array- the one-hot cell-type array, the ligand expression array and the receptor expression array. :param recompute: Re-calculate all quantities and re-save even if already-existing file can be found in path .. py:method:: run_subsample(verbose: bool = True, y: Optional[pandas.DataFrame] = None) To combat computational intensiveness of this regressive protocol, subsampling will be performed in cases where there are >= 5000 cells or in cases where specific cell types are manually selected for fitting- local fit will be performed only on this subset under the assumption that discovered signals will not be significantly different for the subsampled data. New Attributes: subsampled_indices: Dictionary containing indices of the subsampled cells for each dependent variable n_samples_subsampled: Dictionary containing number of samples to be fit (not total number of samples) for each dependent variable subsampled_sample_names: Dictionary containing lists of names of the subsampled cells for each dependent variable neighboring_unsampled: Dictionary containing a mapping between each unsampled point and the closest sampled point .. py:method:: map_new_cells() There may be instances where new cells are added to an AnnData object that has already been fit to- in this instance, accelerate the process by using neighboring results to project model fit to the new cells. .. py:method:: _set_search_range() Set the search range for the bandwidth selection procedure. :param y: Array of dependent variable values, used to determine the search range for the bandwidth selection .. py:method:: _compute_all_wi(bw: Union[float, int], bw_fixed: Optional[bool] = None, exclude_self: Optional[bool] = None, kernel: Optional[str] = None, verbose: bool = False) -> scipy.sparse.spmatrix Compute spatial weights for all samples in the dataset given a specified bandwidth. :param bw: Bandwidth for the spatial kernel :param fixed_bw: Whether the bandwidth considers a uniform distance for each sample (True) or a nonconstant distance for each sample that depends on the number of neighbors (False). If not given, will default to self.fixed_bw. :param exclude_self: Whether to include each sample itself as one of its nearest neighbors. If not given, will default to self.exclude_self. :param kernel: Kernel to use for the spatial weights. If not given, will default to self.kernel. :param verbose: Whether to display messages during runtime :returns: Array of weights for all samples in the dataset :rtype: wi .. py:method:: local_fit(i: int, y: numpy.ndarray, X: numpy.ndarray, bw: Union[float, int], y_label: str, coords: Optional[numpy.ndarray] = None, mask_indices: Optional[numpy.ndarray] = None, feature_mask: Optional[numpy.ndarray] = None, final: bool = False, fit_predictor: bool = False) -> Union[numpy.ndarray, List[float]] Fit a local regression model for each sample. :param i: Index of sample for which local regression model is to be fitted :param y: Response variable :param X: Independent variable array :param bw: Bandwidth for the spatial kernel :param y_label: Name of the response variable :param coords: Can be optionally used to provide coordinates for samples- used if subsampling was performed to maintain all original sample coordinates (to take original neighborhoods into account) :param mask_indices: Can be optionally used to provide indices of samples to mask out of the dataset :param feature_mask: Can be optionally used to provide a mask for features to mask out of the dataset :param final: Set True to indicate that no additional parameter selection needs to be performed; the model can be fit and more stats can be returned. :param fit_predictor: Set True to indicate that dependent variable to fit is a linear predictor rather than a true response variable :returns: A single output will be given for each case, and can contain either `betas` or a list w/ combinations of the following: - i: Index of sample for which local regression model was fitted - diagnostic: Portion of the output to be used for diagnostic purposes- for Gaussian regression, this is the residual for the fitted response variable value compared to the observed value. For non-Gaussian generalized linear regression, this is the fitted response variable value (which will be used to compute deviance and log-likelihood later on). - hat_i: Row i of the hat matrix, which is the effect of deleting sample i from the dataset on the estimated predicted value for sample i - bw_diagnostic: Output to be used for diagnostic purposes during bandwidth selection- for Gaussian regression, this is the squared residual, for non-Gaussian generalized linear regression, this is the fitted response variable value. One of the returns if :param `final` is False - betas: Estimated coefficients for sample i - leverages: Leverages for sample i, representing the influence of each independent variable on the predicted values (linear predictor for GLMs, response variable for Gaussian regression). .. py:method:: find_optimal_bw(range_lowest: float, range_highest: float, function: Callable) -> float Perform golden section search to find the optimal bandwidth. :param range_lowest: Lower bound of the search range :param range_highest: Upper bound of the search range :param function: Function to be minimized :returns: Optimal bandwidth :rtype: bw .. py:method:: mpi_fit(y: Optional[numpy.ndarray], X: Optional[numpy.ndarray], X_labels: List[str], y_label: str, bw: Union[float, int], coords: Optional[numpy.ndarray] = None, mask_indices: Optional[numpy.ndarray] = None, feature_mask: Optional[numpy.ndarray] = None, final: bool = False, fit_predictor: bool = False) -> None Fit local regression model for each sample in parallel, given a specified bandwidth. :param y: Response variable :param X: Independent variable array- if not given, will default to :attr `X`. Note that if object was initialized using an AnnData object, this will be overridden with :attr `X` even if a different array is given. :param X_labels: Optional list of labels for the features in the X array. Needed if :attr `X` passed to the function is not identical to the dependent variable array compiled in preprocessing. :param y_label: Used to provide a unique ID for the dependent variable for saving purposes and to query keys from various dictionaries :param bw: Bandwidth for the spatial kernel :param coords: Coordinates of each point in the X array :param mask_indices: Optional array used to mask out indices in the fitting process :param feature_mask: Optional array used to mask out features in the fitting process :param final: Set True to indicate that no additional parameter selection needs to be performed; the model can be fit and more stats can be returned. :param fit_predictor: Set True to indicate that dependent variable to fit is a linear predictor rather than a true response variable .. py:method:: fit(y: Optional[pandas.DataFrame] = None, X: Optional[numpy.ndarray] = None, fit_predictor: bool = False, verbose: bool = True) -> Optional[Tuple[Union[None, Dict[str, numpy.ndarray]], Dict[str, float]]] For each column of the dependent variable array, fit model. If given bandwidth, run :func `SWR.mpi_fit()` with the given bandwidth. Otherwise, compute optimal bandwidth using :func `SWR.find_optimal_bw()`, minimizing AICc. :param y: Optional dataframe, can be used to provide dependent variable array directly to the fit function. If None, will use :attr `targets_expr` computed using the given AnnData object to create this (each individual column will serve as an independent variable). Needed to be given as a dataframe so that column(s) are labeled, so each result can be associated with a labeled dependent variable. :param X: Optional array, can be used to provide dependent variable array directly to the fit function. If None, will use :attr `X` computed using the given AnnData object and the type of the model to create. :param n_feat: Optional int, can be used to specify one column of the X array to fit to. :param init_betas: Optional dictionary containing arrays with initial values for the coefficients. Keys should correspond to target genes and values should be arrays of shape [n_features, 1]. :param fit_predictor: Set True to indicate that dependent variable to fit is a linear predictor rather than a response variable :param verbose: Set True to print out information about the bandwidth selection and/or fitting process. .. py:method:: predict(input: Optional[pandas.DataFrame] = None, coeffs: Optional[Union[numpy.ndarray, Dict[str, pandas.DataFrame]]] = None, adjust_for_subsampling: bool = False) -> pandas.DataFrame Given input data and learned coefficients, predict the dependent variables. :param input: Input data to be predicted on. :param coeffs: Coefficients to be used in the prediction. If None, will attempt to load the coefficients learned in the fitting process from file. .. py:method:: compute_aicc_linear(RSS: float, trace_hat: float, n_samples: Optional[int] = None) -> float Compute the corrected Akaike Information Criterion (AICc) for the linear GWR model. .. py:method:: compute_aicc_glm(ll: float, trace_hat: float, n_samples: Optional[int] = None) -> float Compute the corrected Akaike Information Criterion (AICc) for the generalized linear GWR models. Given by: :math AICc = -2*log-likelihood + 2k + (2k(k+1))/(n_eff-k-1). :param ll: Model log-likelihood :param trace_hat: Trace of the hat matrix :param n_samples: Number of samples model was fitted to .. py:method:: output_diagnostics(aicc: Optional[float] = None, ENP: Optional[float] = None, r_squared: Optional[float] = None, deviance: Optional[float] = None, y_label: Optional[str] = None) -> None Output diagnostic information about the GWR model. .. py:method:: save_results(data: numpy.ndarray, header: str, label: Optional[str]) -> None Save the results of the GWR model to file, and return the coefficients. :param data: Elements of data to save to .csv :param header: Column names :param label: Optional, can be used to provide unique ID to save file- notably used when multiple dependent variables with different names are fit during this process. :returns: Model coefficients :rtype: betas .. py:method:: predict_and_save(input: Optional[numpy.ndarray] = None, coeffs: Optional[Union[numpy.ndarray, Dict[str, pandas.DataFrame]]] = None, adjust_for_subsampling: bool = True) Given input data and learned coefficients, predict the dependent variables and then save the output. :param input: Input data to be predicted on. :param coeffs: Coefficients to be used in the prediction. If None, will attempt to load the coefficients learned in the fitting process from file. :param adjust_for_subsampling: Set True if subsampling was performed; this indicates that the coefficients for the subsampled points need to be extended to the neighboring non-sampled points. .. py:method:: return_outputs(adjust_for_subsampling: bool = True, load_for_interpreter: bool = False, load_from_downstream: Optional[Literal['ligand', 'receptor', 'target_gene']] = None) -> Tuple[Dict[str, pandas.DataFrame], Dict[str, pandas.DataFrame]] Return final coefficients for all fitted models. :param adjust_for_subsampling: Set True if subsampling was performed; this indicates that the coefficients for the subsampled points need to be extended to the neighboring non-sampled points. :param load_for_interpreter: Set True if this is being called from within instance of :class `MuSIC_Interpreter`. :param load_from_downstream: Set to "ligand", "receptor", or "target_gene" to load coefficients from downstream models where targets are ligands, receptors or target genes. Must be given if "load_downstream" is True. Outputs: all_coeffs: Dictionary containing dataframe consisting of coefficients for each target gene all_se: Dictionary containing dataframe consisting of standard errors for each target gene .. py:method:: return_intercepts() -> Union[None, numpy.ndarray, Dict[str, numpy.ndarray]] Return final intercepts for all fitted models. .. py:class:: MuSIC_Interpreter(parser: argparse.ArgumentParser, args_list: Optional[List[str]] = None, keep_column_threshold_proportion_cells: Optional[float] = None) Bases: :py:obj:`spateo.tools.CCI_effects_modeling.MuSIC.MuSIC` Interpretation and downstream analysis of spatially weighted regression models. :param parser: ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling. :param args_list: If parser is provided by function call, the arguments to parse must be provided as a separate list. It is recommended to use the return from :func `define_spateo_argparse()` for this. :param keep_coeff_threshold_proportion_cells: If provided, will threshold columns to only keep those that are nonzero in a proportion of cells greater than this threshold. For example, if this is set to 0.5, more than half of the cells must have a nonzero value for a given column for it to be retained for further inspection. Intended to be used to filter out likely false positives. .. py:attribute:: k .. py:attribute:: downstream_model_ligand_design_matrix :value: None .. py:attribute:: downstream_model_receptor_design_matrix :value: None .. py:attribute:: downstream_model_target_design_matrix :value: None .. py:attribute:: design_matrix :value: None .. py:attribute:: filter_targets .. py:attribute:: filter_target_threshold .. py:attribute:: ligand_for_downstream .. py:attribute:: receptor_for_downstream .. py:attribute:: pathway_for_downstream .. py:attribute:: target_for_downstream .. py:attribute:: sender_ct_for_downstream .. py:attribute:: receiver_ct_for_downstream .. py:attribute:: cci_degs_model_interactions .. py:attribute:: no_cell_type_markers .. py:attribute:: compute_pathway_effect .. py:attribute:: diff_sending_or_receiving .. py:method:: compute_coeff_significance(method: str = 'fdr_bh', significance_threshold: float = 0.05) Computes local statistical significance for fitted coefficients. :param method: Method to use for correction. Available methods can be found in the documentation for statsmodels.stats.multitest.multipletests(), and are also listed below (in correct case) for convenience: - Named methods: - bonferroni - sidak - holm-sidak - holm - simes-hochberg - hommel - Abbreviated methods: - fdr_bh: Benjamini-Hochberg correction - fdr_by: Benjamini-Yekutieli correction - fdr_tsbh: Two-stage Benjamini-Hochberg - fdr_tsbky: Two-stage Benjamini-Krieger-Yekutieli method significance_threshold: p-value (or q-value) needed to call a parameter significant. :returns: Dataframe of identical shape to coeffs, where each element is True or False if it meets the threshold for significance pvalues: Dataframe of identical shape to coeffs, where each element is a p-value for that instance of that feature qvalues: Dataframe of identical shape to coeffs, where each element is a q-value for that instance of that feature :rtype: is_significant .. py:method:: filter_adata_spatial(instructions: List[str]) Based on spatial coordinates, filter the adata object to only include cells that meet the criteria. Criteria provided in the form of a list of instructions of the form "x less than 0.5 and y greater than 0.5", etc., where each instruction is executed sequentially. :param instructions: List of instructions to filter adata object by. Each instruction is a string of the form "x less than 0.5 and y greater than 0.5", etc., where each instruction is executed sequentially. .. py:method:: filter_adata_custom(cell_ids: List[str]) Filter AnnData object to only the cells specified by the custom list. :param cell_ids: List of cell IDs to keep. Each ID must be found in adata.obs_names .. py:method:: add_interaction_effect_to_adata(targets: Union[str, List[str]], interactions: Union[str, List[str]], visualize: bool = False) -> anndata.AnnData For each specified interaction/list of interactions, add the predicted interaction effect to the adata object. :param targets: Target(s) to add interaction effect for. Can be a single target or a list of targets. :param interactions: Interaction(s) to add interaction effect for. Can be a single interaction or a list of interactions. Should be the name of a gene for ligand models, or an L:R pair for L:R models (for example, "Igf1:Igf1r"). :param visualize: Whether to visualize the interaction effect for each target/interaction pair. If True, will generate spatial scatter plot and save to HTML file. :returns: AnnData object with interaction effects added to .obs. :rtype: adata .. py:method:: compute_and_visualize_diagnostics(type: Literal['correlations', 'confusion', 'rmse'], n_genes_per_plot: int = 20) For true and predicted gene expression, compute and generate either: confusion matrices, or correlations, including the Pearson correlation, Spearman correlation, or root mean-squared-error (RMSE). :param type: Type of diagnostic to compute and visualize. Options: "correlations" for Pearson & Spearman correlation, "confusion" for confusion matrix, "rmse" for root mean-squared-error. :param n_genes_per_plot: Only used if "type" is "confusion". Number of genes to plot per figure. If there are more than this number of genes, multiple figures will be generated. .. py:method:: plot_interaction_effect_3D(target: str, interaction: str, save_path: str, pcutoff: Optional[float] = 99.7, min_value: Optional[float] = 0, zero_opacity: float = 1.0, size: float = 2.0, n_neighbors_smooth: Optional[int] = 0) Quick-visualize the magnitude of the predicted effect on target for a given interaction. :param target: Target gene to visualize :param interaction: Interaction to visualize (e.g. "Igf1:Igf1r" for L:R model, "Igf1" for ligand model) :param save_path: Path to save the figure to (will save as HTML file) :param pcutoff: Percentile cutoff for the colorbar. Will set all values above this percentile to this value. :param min_value: Minimum value to set the colorbar to. Will set all values below this value to this value. Defaults to 0. :param zero_opacity: Opacity of points with zero expression. Between 0.0 and 1.0. Default is 1.0. :param size: Size of the points in the scatter plot. Default is 2. :param n_neighbors_smooth: Number of neighbors to use for smoothing (to make effect patterns more apparent). If 0, no smoothing is applied. Default is 0. .. py:method:: plot_multiple_interaction_effects_3D(effects: List[str], save_path: str, include_combos_of_two: bool = False) Quick-visualize the magnitude of the predicted effect on target for a given interaction. :param effects: List of effects to visualize (e.g. ["Igf1:Igf1r", "Igf1:InsR"] for L:R model, ["Igf1"] for ligand model) :param save_path: Path to save the figure to (will save as HTML file) :param include_combos_of_two: Whether to include paired combinations of effects (e.g. "Igf1:Igf1r and Igf1:InsR") as separate categories. If False, will include these in the generic "Multiple interactions" category. .. py:method:: plot_tf_effect_3D(target: str, tf: str, save_path: str, ligand_targets: bool = True, receptor_targets: bool = False, target_gene_targets: bool = False, pcutoff: float = 99.7, min_value: float = 0, zero_opacity: float = 1.0, size: float = 2.0) Quick-visualize the magnitude of the predicted effect on target for a given TF. Can only find the files necessary for this if :func `CCI_deg_detection()` has been run. :param target: Target gene of interest :param tf: TF of interest (e.g. "Foxo1") :param save_path: Path to save the figure to (will save as HTML file) :param ligand_targets: Set True if ligands were used as the target genes for the :func `CCI_deg_detection()` model. :param receptor_targets: Set True if receptors were used as the target genes for the :func `CCI_deg_detection()` model. :param target_gene_targets: Set True if target genes were used as the target genes for the :func `CCI_deg_detection()` model. :param pcutoff: Percentile cutoff for the colorbar. Will set all values above this percentile to this value. :param min_value: Minimum value to set the colorbar to. Will set all values below this value to this value. :param zero_opacity: Opacity of points with zero expression. Between 0.0 and 1.0. Default is 1.0. :param size: Size of the points in the scatter plot. Default is 2. .. py:method:: visualize_overlap_between_interacting_components_3D(target: str, interaction: str, save_path: str, size: float = 2.0) Visualize the spatial distribution of signaling features (ligand, receptor, or L:R field) and target gene, as well as the overlapping region. Intended for use with 3D spatial coordinates. :param target: Target gene to visualize :param interaction: Interaction to visualize (e.g. "Igf1:Igf1r" for L:R model, "Igf1" for ligand model) :param save_path: Path to save the figure to (will save as HTML file) :param size: Size of the points in the plot. Defaults to 2. .. py:method:: gene_expression_heatmap(use_ligands: bool = False, use_receptors: bool = False, use_target_genes: bool = False, genes: Optional[List[str]] = None, position_key: str = 'spatial', coord_column: Optional[Union[int, str]] = None, reprocess: bool = False, neatly_arrange_y: bool = True, window_size: int = 3, recompute: bool = False, title: Optional[str] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'magma', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the distribution of gene expression across cells in the spatial coordinates of cells; provides an idea of the simultaneous relative positions/patternings of different genes. :param use_ligands: Set True to use ligands as the genes to visualize. If True, will ignore "genes" argument. "ligands_expr" file must be present in the model's directory. :param use_receptors: Set True to use receptors as the genes to visualize. If True, will ignore "genes" argument. "receptors_expr" file must be present in the model's directory. :param use_target_genes: Set True to use target genes as the genes to visualize. If True, will ignore "genes" argument. "targets" file must be present in the model's directory. :param genes: Optional list of genes to visualize. If "use_ligands", "use_receptors", and "use_target_genes" are all False, this must be given. This can also be used to visualize only a subset of the genes once processing & saving has already completed using e.g. "use_ligands", "use_receptors", etc. :param position_key: Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to "spatial". For each value in the position array (each coordinate, each category), multiple cells must have the same value. :param coord_column: Optional, only used if "position_key" points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide "xy", "yz", "xz", "-xy", "-yz", "-xz" to draw a line between the two coordinate axes. "xy" will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), "-xy" will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on. :param reprocess: Set to True to reprocess the data and overwrite the existing files. Use if the genes to visualize have changed compared to the saved file (if existing), e.g. if "use_ligands" is True when the initial analysis used "use_target_genes". :param neatly_arrange_y: Set True to order the y-axis in terms of how early along the position axis the max z-scores for each row occur in. Used for a more uniform plot where similarly patterned interaction-target pairs are grouped together. If False, will sort this axis by the identity of the interaction (i.e. all "Fgf1" rows will be grouped together). :param window_size: Size of window to use for smoothing. Must be an odd integer. If 1, no smoothing is applied. :param recompute: Set to True to recompute the data and overwrite the existing files :param title: Optional, can be used to provide title for plot :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param cmap: Colormap to use. Options: Any divergent matplotlib colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: effect_distribution_heatmap(target_subset: Optional[List[str]] = None, interaction_subset: Optional[List[str]] = None, position_key: str = 'spatial', coord_column: Optional[Union[int, str]] = None, effect_threshold: Optional[float] = None, check_downstream_ligand_effects: bool = False, check_downstream_receptor_effects: bool = False, check_downstream_target_effects: bool = False, use_significant: bool = False, sort_by_target: bool = False, neatly_arrange_y: bool = True, window_size: int = 3, recompute: bool = False, title: Optional[str] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'magma', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the distribution of interaction effects across cells in the spatial coordinates of cells; provides an idea of the simultaneous relative positions of different interaction effects. :param target_subset: List of targets to consider. If None, will use all targets used in model fitting. :param interaction_subset: List of interactions to consider. If None, will use all interactions used in model. :param position_key: Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to "spatial". For each value in the position array (each coordinate, each category), multiple cells must have the same value. :param coord_column: Optional, only used if "position_key" points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide "xy", "yz", "xz", "-xy", "-yz", "-xz" to draw a line between the two coordinate axes. "xy" will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), "-xy" will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on. :param effect_threshold: Optional threshold minimum effect size to consider an effect for further analysis, as an absolute value. Use this to choose only the cells for which an interaction is predicted to have a strong effect. If None, use the median interaction effect. :param check_downstream_ligand_effects: Set True to check the coefficients of downstream ligand models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are. :param check_downstream_receptor_effects: Set True to check the coefficients of downstream receptor models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are. :param check_downstream_target_effects: Set True to check the coefficients of downstream target models instead of coefficients of the upstream CCI model. Note that this may not necessarily look nice because TF-target relationships are not spatially dependent like L:R effects are. :param use_significant: Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func `compute_coeff_significance()` has been run. :param sort_by_target: Set True to order the y-axis in terms of the identity of the target gene. Incompatible with "neatly_arrange_y". If both this and "neatly_arrange_y" are False, will sort this axis by the identity of the interaction (i.e. all "Fgf1" rows will be grouped together). :param neatly_arrange_y: Set True to order the y-axis in terms of how early along the position axis the max z-scores for each row occur in. Used for a more uniform plot where similarly patterned interaction-target pairs are grouped together. If False, will sort this axis by the identity of the interaction (i.e. all "Fgf1" rows will be grouped together). :param window_size: Size of window to use for smoothing. Must be an odd integer. If 1, no smoothing is applied. :param recompute: Set to True to recompute the data and overwrite the existing files :param title: Optional, can be used to provide title for plot :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param cmap: Colormap to use. Options: Any divergent matplotlib colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: effect_distribution_density(effect_names: List[str], position_key: str = 'spatial', coord_column: Optional[Union[int, str]] = None, max_coord_val: float = 1.0, title: Optional[str] = None, x_label: Optional[str] = None, region_lower_bound: Optional[float] = None, region_upper_bound: Optional[float] = None, region_label: Optional[str] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the spatial enrichment of cell-cell interaction effects using density plots over spatial coordinates. Uses existing dataframe saved by :func:`effect_distribution_heatmap()`, which must be run first. :param effect_names: List of interaction effects to include in plot, in format "Target-Ligand:Receptor" (for L:R models) or "Target-Ligand" (for ligand models). :param position_key: Key in adata.obs or adata.obsm that provides a relative indication of the position of cells. i.e. spatial coordinates. Defaults to "spatial". For each value in the position array (each coordinate, each category), multiple cells must have the same value. :param coord_column: Optional, only used if "position_key" points to an entry in .obsm. In this case, this is the index or name of the column to be used to provide the positional context. Can also provide "xy", "yz", "xz", "-xy", "-yz", "-xz" to draw a line between the two coordinate axes. "xy" will extend the new axis in the direction of increasing x and increasing y starting from x=0 and y=0 (or min. x/min. y), "-xy" will extend the new axis in the direction of decreasing x and increasing y starting from x=minimum x and y=maximum y, and so on. :param max_coord_val: Optional, can be used to adjust the numbers displayed along the x-axis for the relative position along the coordinate axis. Defaults to 1.0. :param title: Optional, can be used to provide title for plot :param x_label: Optional, can be used to provide x-axis label for plot :param region_lower_bound: Optional, can be used to provide a lower bound for the region of interest to label on the plot- this can correspond to a spatial domain, etc. :param region_upper_bound: Optional, can be used to provide an upper bound for the region of interest to label on the plot- this can correspond to a spatial domain, etc. :param region_label: Optional, can be used to provide a label for the region of interest to label on the plot :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param cmap: Colormap to use. Options: Any divergent matplotlib colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: visualize_effect_specificity(agg_method: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['heatmap', 'volcano'] = 'heatmap', target_subset: Optional[List[str]] = None, interaction_subset: Optional[List[str]] = None, ct_subset: Optional[List[str]] = None, group_key: Optional[str] = None, n_anchors: Optional[int] = None, effect_threshold: Optional[float] = None, use_significant: bool = False, target_cooccurrence_threshold: float = 0.1, significance_cutoff: float = 1.3, fold_change_cutoff: float = 1.5, fold_change_cutoff_for_labels: float = 3.0, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'seismic', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}, save_df: bool = False) Computes and visualizes the specificity of each interaction on each target. This is done by first separating the target-expressing cells (and their neighbors) from the rest of the cells (conditioned on predicted effect and also conditioned on receptor expression if L:R model is used). Then, computing the fold change of the average expression of the ligand in the neighborhood of the first subset vs. the neighborhoods of the second subset. :param agg_method: Method to use for aggregating the specificity of each interaction on each target. Options: "mean" for mean ligand expression, "percentage" for the percentage of cells expressing the ligand. :param plot_type: Type of plot to use for visualization. Options: "heatmap" for heatmap, "volcano" for volcano plot. :param target_subset: List of targets to consider. If None, will use all targets used in model fitting. :param interaction_subset: List of interactions to consider. If None, will use all interactions used in model. :param ct_subset: Can be used to constrain the first group of cells (the query group) to the target-expressing cells of a particular type (conditioned on any other relevant variables). If given, will search for cell types in "group_key" attribute from model initialization. If not given, will use all cell types. :param group_key: Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr `group_key` from model initialization. :param n_anchors: Optional, number of target gene-expressing cells to use as anchors for analysis. Will be selected randomly from the set of target gene-expressing cells (conditioned on any other relevant values). :param effect_threshold: Optional threshold minimum effect size to consider an effect for further analysis, as an absolute value. Use this to choose only the cells for which an interaction is predicted to have a strong effect. If None, use the median interaction effect. :param use_significant: Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func `compute_coeff_significance()` has been run. :param significance_cutoff: Cutoff for negative log-10 q-value to consider an interaction/effect significant. Only used if "plot_type" is "volcano". Defaults to 1.3 (corresponding to an approximate q-value of 0.05). :param fold_change_cutoff: Cutoff for fold change to consider an interaction/effect significant. Only used if "plot_type" is "volcano". Defaults to 1.5. :param fold_change_cutoff_for_labels: Cutoff for fold change to include the label for an interaction/effect. Only used if "plot_type" is "volcano". Defaults to 3.0. :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param cmap: Colormap to use. Options: Any divergent matplotlib colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. :param save_df: Set True to save the metric dataframe in the end .. py:method:: visualize_neighborhood(target: str, interaction: str, interaction_type: Literal['secreted', 'membrane-bound'], select_examples_criterion: Literal['positive', 'negative'] = 'positive', effect_threshold: Optional[float] = None, cell_type: Optional[str] = None, group_key: Optional[str] = None, use_significant: bool = False, n_anchors: int = 100, n_neighbors_expressing: int = 20, display_plot: bool = True) -> anndata.AnnData Sets up AnnData object for visualization of interaction effects- cells will be colored by expression of the target gene, potentially conditioned on receptor expression, and neighboring cells will be colored by ligand expression. :param target: Target gene of interest :param interaction: Interaction feature to visualize, given in the same form as in the design matrix (if model is a ligand-based model or receptor-based model, this will be of form "Col4a1". If model is a ligand-receptor based model, this will be of form "Col4a1:Itgb1", for example). :param interaction_type: Specifies whether the chosen interaction is secreted or membrane-bound. Options: "secreted" or "membrane-bound". :param select_examples_criterion: Whether to select cells with positive or negative interaction effects for visualization. Defaults to "positive", which searches for cells for which the predicted interaction effect is above the given threshold. "Negative" will select cells for which the predicted interaction has no effect on the target expression. :param effect_threshold: Optional threshold for the effect size of an interaction/effect to be considered for analysis; only used if "to_plot" is "percentage". If not given, will use the upper quartile value among all interaction effect values to determine the threshold. :param cell_type: Optional, can be used to select anchor cells from only a particular cell type. If None, will select from all cells. :param group_key: Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr `group_key` from model initialization. Only used if "cell_type" is not None. :param use_significant: Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func `compute_coeff_significance()` has been run. :param n_anchors: Number of target gene-expressing cells to use as anchors for visualization. Will be selected randomly from the set of target gene-expressing cells. :param n_neighbors_expressing: Filters the set of cells that can be selected as anchors based on the number of their neighbors that express the chosen ligand. Only used for models that incorporate ligand expression. :param display_plot: Whether to save a plot. If False, will return the AnnData object without doing anything else- this can then be visualized e.g. using spateo-viewer. :returns: Modified AnnData object containing the expression information for the target gene and neighboring ligand expression. :rtype: adata .. py:method:: cell_type_specific_interactions(to_plot: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['heatmap', 'barplot'] = 'heatmap', group_key: Optional[str] = None, ct_subset: Optional[List[str]] = None, target_subset: Optional[List[str]] = None, interaction_subset: Optional[List[str]] = None, lower_threshold: float = 0.3, upper_threshold: float = 1.0, effect_threshold: Optional[float] = None, use_significant: bool = False, row_normalize: bool = False, col_normalize: bool = False, normalize_targets: bool = False, hierarchical_cluster_ct: bool = False, group_y_cell_type: bool = False, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, center: Optional[float] = None, cmap: str = 'Reds', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}, save_df: bool = False) Map interactions and interaction effects that are specific to particular cell type groupings. Returns a heatmap representing the enrichment of the interaction/effect within cells of that grouping (if "to_plot" is effect, this will be enrichment of the effect on cell type-specific expression). Enrichment determined by mean effect size or expression. :param to_plot: Whether to plot the mean effect size or the proportion of cells in a cell type w/ effect on target. Options are "mean" or "percentage". :param plot_type: Whether to plot the results as a heatmap or barplot. Options are "heatmap" or "barplot". If "barplot", must provide a subset of up to four interactions to visualize. :param group_key: Can be used to specify entry in adata.obs that contains cell type groupings. If None, will use :attr `group_key` from model initialization. :param ct_subset: Can be used to restrict the enrichment analysis to only cells of a particular type. If given, will search for cell types in "group_key" attribute from model initialization. Recommended to use to subset to cell types with sufficient numbers. :param target_subset: List of targets to consider. If None, will use all targets used in model fitting. :param interaction_subset: List of interactions to consider. If None, will use all interactions used in model. Is necessary if "plot_type" is "barplot", since the barplot is only designed to accomodate up to three interactions at once. :param lower_threshold: Lower threshold for the proportion of cells in a cell type group that must express a particular interaction/effect for it to be colored on the plot, as a proportion of the max value. Threshold will be applied to the non-normalized values (if normalization is applicable). Defaults to 0.3. :param upper_threshold: Upper threshold for the proportion of cells in a cell type group that must express a particular interaction/effect for it to be colored on the plot, as a proportion of the max value. Threshold will be applied to the non-normalized values (if normalization is applicable). Defaults to 1.0 (the max value). :param effect_threshold: Optional threshold for the effect size of an interaction/effect to be considered for analysis; only used if "to_plot" is "percentage". If not given, will use the upper quartile value among all interaction effect values to determine the threshold. :param use_significant: Whether to use only significant effects in computing the specificity. If True, will filter to cells + interactions where the interaction is significant for the target. Only valid if :func `compute_coeff_significance()` has been run. :param row_normalize: Whether to minmax scale the metric values by row (i.e. for each interaction/effect). Helps to alleviate visual differences that result from scale rather than differences in mean value across cell types. :param col_normalize: Whether to minmax scale the metric values by column (i.e. for each interaction/effect). Helps to alleviate visual differences that result from scale rather than differences in mean value across cell types. :param normalize_targets: Whether to minmax scale the metric values by column for each target (i.e. for each interaction/effect), to remove differences that occur as a result of scale of expression. Provides a clearer picture of enrichment for each target. :param hierarchical_cluster_ct: Whether to cluster the x-axis (target gene in cell type) using hierarchical clustering. If False, will order the x-axis by the order of the target genes for organization purposes. :param group_y_cell_type: Whether to group the y-axis (target gene in cell type) by cell type. If False, will group by target gene instead. Defaults to False. :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param center: Optional, determines position of the colormap center. Between 0 and 1. :param cmap: Colormap to use for heatmap. If metric is "number", "proportion", "specificity", the bottom end of the range is 0. It is recommended to use a sequential colormap (e.g. "Reds", "Blues", "Viridis", etc.). For metric = "fc", if a divergent colormap is not provided, "seismic" will automatically be used. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. :param save_df: Set True to save the metric dataframe in the end .. py:method:: cell_type_interaction_fold_change(ref_ct: str, query_ct: str, group_key: Optional[str] = None, target_subset: Optional[List[str]] = None, interaction_subset: Optional[List[str]] = None, to_plot: Literal['mean', 'percentage'] = 'mean', plot_type: Literal['volcano', 'barplot'] = 'barplot', source_data: Literal['interaction', 'effect', 'target'] = 'effect', top_n_to_plot: Optional[int] = None, significance_cutoff: float = 1.3, fold_change_cutoff: float = 1.5, fold_change_cutoff_for_labels: float = 3.0, plot_query_over_ref: bool = False, plot_ref_over_query: bool = False, plot_only_significant: bool = False, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'seismic', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}, save_df: bool = False) Computes fold change in predicted interaction effects between two cell types, and visualizes result. :param ref_ct: Label of the first cell type to consider. Fold change will be computed with respect to the level in this cell type. :param query_ct: Label of the second cell type to consider :param group_key: Name of the key in .obs containing cell type information. If not given, will use :attr `group_key` from model initialization. :param target_subset: List of targets to consider. If None, will use all targets used in model fitting. :param interaction_subset: List of interactions to consider. If None, will use all interactions used in model. :param to_plot: Whether to plot the mean effect size or the proportion of cells in a cell type w/ effect on target. Options are "mean" or "percentage". :param plot_type: Whether to plot the results as a volcano plot or barplot. Options are "volcano" or "barplot". :param source_data: Selects what to use in computing fold changes. Options: - "interaction": will use the design matrix (e.g. neighboring ligand expression or L:R mapping) - "effect": will use the coefficient arrays for each target - "target": will use the target gene expression :param top_n_to_plot: If given, will only include the top n features in the visualization. Recommended if "source_data" is "effect", as all combinations of interaction and target will be considered in this case. :param significance_cutoff: Cutoff for negative log-10 q-value to consider an interaction/effect significant. Only used if "plot_type" is "volcano". Defaults to 1.3 (corresponding to an approximate q-value of 0.05). :param fold_change_cutoff: Cutoff for fold change to consider an interaction/effect significant. Only used if "plot_type" is "volcano". Defaults to 1.5. :param fold_change_cutoff_for_labels: Cutoff for fold change to include the label for an interaction/effect. Only used if "plot_type" is "volcano". Defaults to 3.0. :param plot_query_over_ref: Whether to plot/visualize only the portion that corresponds to the fold change of the query cell type over the reference cell type (and the portion that is significant). If False (and "plot_ref_over_query" is False), will plot the entire volcano plot. Only used if "plot_type" is "volcano". :param plot_ref_over_query: Whether to plot/visualize only the portion that corresponds to the fold change of the reference cell type over the query cell type (and the portion that is significant). If False (and "plot_query_over_ref" is False), will plot the entire volcano plot. Only used if "plot_type" is "volcano". :param plot_only_significant: Whether to plot/visualize only the portion that passes the "significance_cutoff" p-value threshold. Only used if "plot_type" is "volcano". :param fontsize: Size of font for x and y labels. :param figsize: Size of figure. :param cmap: Colormap to use for heatmap. If metric is "number", "proportion", "specificity", the bottom end of the range is 0. It is recommended to use a sequential colormap (e.g. "Reds", "Blues", "Viridis", etc.). For metric = "fc", if a divergent colormap is not provided, "seismic" will automatically be used. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. :param save_df: Set True to save the metric dataframe in the end .. py:method:: enriched_interactions_barplot(interactions: Optional[Union[str, List[str]]] = None, targets: Optional[Union[str, List[str]]] = None, plot_type: Literal['average', 'proportion'] = 'average', effect_size_threshold: float = 0.0, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'Reds', top_n: Optional[int] = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the top predicted effect sizes for each interaction on particular target gene(s). :param interactions: Optional subset of interactions to focus on, given in the form ligand(s):receptor(s), following the formatting in the design matrix. If not given, will consider all interactions that were specified in model fitting. :param targets: Can optionally specify a subset of the targets to compute this on. If not given, will use all targets that were specified in model fitting. If multiple targets are given, "save_show_or_return" should be "save" (and provide appropriate keyword arguments for saving using "save_kwargs"), otherwise only the last target will be shown. :param plot_type: Options: "average" or "proportion". Whether to plot the average effect size or the proportion of cells expressing the target predicted to be affected by the interaction. :param effect_size_threshold: Lower bound for average effect size to include a particular interaction in the barplot :param fontsize: Size of font for x and y labels :param figsize: Size of figure :param cmap: Colormap to use for barplot. It is recommended to use a sequential colormap (e.g. "Reds", "Blues", "Viridis", etc.). :param top_n: If given, will only include the top n features in the visualization. If not given, will include all features that pass the "effect_size_threshold". :param save_show_or_return: Whether to save, show or return the figure If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: summarize_interaction_effects(interactions: Optional[Union[str, List[str]]] = None, targets: Optional[Union[str, List[str]]] = None, effect_size_threshold: float = 0.0) Summarize the interaction effects for each target gene in dataframe format. Each element will be the average effect size for a particular interaction on a particular target gene. :param interactions: Optional subset of interactions to focus on. If not given, will consider all interactions. :param targets: Can optionally specify a subset of the targets. If not given, will use all targets. :param effect_size_threshold: Lower bound for average effect size to include a particular interaction. :returns: Dataframe with the average effect size for each interaction (rows) on each target gene ( columns). :rtype: effects_df .. py:method:: enriched_tfs_barplot(tfs: Optional[Union[str, List[str]]] = None, targets: Optional[Union[str, List[str]]] = None, target_type: Literal['ligand', 'receptor', 'target_gene'] = 'target_gene', plot_type: Literal['average', 'proportion'] = 'average', effect_size_threshold: float = 0.0, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'Reds', top_n: Optional[int] = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the top predicted effect sizes for each transcription factor on particular target gene(s). :param tfs: Optional subset of transcription factors to focus on. If not given, will consider all transcription factors that were specified in model fitting. :param targets: Can optionally specify a subset of the targets to compute this on. If not given, will use all targets that were specified in model fitting. If multiple targets are given, "save_show_or_return" should be "save" (and provide appropriate keyword arguments for saving using "save_kwargs"), otherwise only the last target will be shown. :param target_type: Set whether the given targets are ligands, receptors or target genes. Used to determine which folder to check for outputs. :param plot_type: Options: "average" or "proportion". Whether to plot the average effect size or the proportion of cells expressing the target predicted to be affected by the interaction. :param effect_size_threshold: Lower bound for average effect size to include a particular interaction in the barplot :param fontsize: Size of font for x and y labels :param figsize: Size of figure :param cmap: Colormap to use for barplot. It is recommended to use a sequential colormap (e.g. "Reds", "Blues", "Viridis", etc.). :param top_n: If given, will only include the top n features in the visualization. If not given, will include all features that pass the "effect_size_threshold". :param save_show_or_return: Whether to save, show or return the figure If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: summarize_tf_effects(tfs: Optional[Union[str, List[str]]] = None, targets: Optional[Union[str, List[str]]] = None, target_type: Literal['ligand', 'receptor', 'target_gene'] = 'target_gene', effect_size_threshold: float = 0.0) Return a DataFrame with effect sizes for each transcription factor (TF) against given targets. :param tfs: Optional subset of TFs to focus on. If not given, considers all TFs. :param targets: Subset of targets. If not given, uses all targets. :param target_type: Whether the targets are ligands, receptors, or target genes. :param effect_size_threshold: Lower bound for including an effect size. :returns: DataFrame with the average effect size for each TF (rows) on each target gene (columns). :rtype: effects_df .. py:method:: get_effect_potential(target: Optional[str] = None, ligand: Optional[str] = None, receptor: Optional[str] = None, sender_cell_type: Optional[str] = None, receiver_cell_type: Optional[str] = None, spatial_weights_membrane_bound: Optional[Union[numpy.ndarray, scipy.sparse.spmatrix]] = None, spatial_weights_secreted: Optional[Union[numpy.ndarray, scipy.sparse.spmatrix]] = None, spatial_weights_niche: Optional[Union[numpy.ndarray, scipy.sparse.spmatrix]] = None, store_summed_potential: bool = True) -> Tuple[scipy.sparse.spmatrix, numpy.ndarray, numpy.ndarray] For each cell, computes the 'signaling effect potential', interpreted as a quantification of the strength of effect of intercellular communication on downstream expression in a given cell mediated by any given other cell with any combination of ligands and/or cognate receptors, as inferred from the model results. Computations are similar to those of :func ~`.inferred_effect_direction`, but stops short of computing vector fields. :param target: Optional string to select target from among the genes used to fit the model to compute signaling effects for. Note that this function takes only one target at a time. If not given, will take the first name from among all targets. :param ligand: Needed if :attr `mod_type` is 'ligand'; select ligand from among the ligands used to fit the model to compute signaling potential. :param receptor: Needed if :attr `mod_type` is 'lr'; together with 'ligand', used to select ligand-receptor pair from among the ligand-receptor pairs used to fit the model to compute signaling potential. :param sender_cell_type: Can optionally be used to select cell type from among the cell types used to fit the model to compute sent potential. Must be given if :attr `mod_type` is 'niche'. :param receiver_cell_type: Can optionally be used to condition sent potential on receiver cell type. :param store_summed_potential: If True, will store both sent and received signaling potential as entries in .obs of the AnnData object. :returns: Sparse array of shape [n_samples, n_samples]; proxy for the "signaling effect potential" with respect to a particular target gene between each sender-receiver pair of cells. normalized_effect_potential_sum_sender: Array of shape [n_samples,]; for each sending cell, the sum of the signaling potential to all receiver cells for a given target gene, normalized between 0 and 1. normalized_effect_potential_sum_receiver: Array of shape [n_samples,]; for each receiving cell, the sum of the signaling potential from all sender cells for a given target gene, normalized between 0 and 1. :rtype: effect_potential .. py:method:: get_pathway_potential(pathway: Optional[str] = None, target: Optional[str] = None, spatial_weights_secreted: Optional[Union[numpy.ndarray, scipy.sparse.spmatrix]] = None, spatial_weights_membrane_bound: Optional[Union[numpy.ndarray, scipy.sparse.spmatrix]] = None, store_summed_potential: bool = True) For each cell, computes the 'pathway effect potential', which is an aggregation of the effect potentials of all pathway member ligand-receptor pairs (or all pathway member ligands, for ligand-only models). :param pathway: Name of pathway to compute pathway effect potential for. :param target: Optional string to select target from among the genes used to fit the model to compute signaling effects for. Note that this function takes only one target at a time. If not given, will take the first name from among all targets. :param spatial_weights_secreted: Optional pairwise spatial weights matrix for secreted factors :param spatial_weights_membrane_bound: Optional pairwise spatial weights matrix for membrane-bound factors :param store_summed_potential: If True, will store both sent and received signaling potential as entries in .obs of the AnnData object. :returns: Array of shape [n_samples, n_samples]; proxy for the combined "signaling effect potential" with respect to a particular target gene for ligand-receptor pairs in a pathway. normalized_pathway_effect_potential_sum_sender: Array of shape [n_samples,]; for each sending cell, the sum of the pathway sum potential to all receiver cells for a given target gene, normalized between 0 and 1. normalized_pathway_effect_potential_sum_receiver: Array of shape [n_samples,]; for each receiving cell, the sum of the pathway sum potential from all sender cells for a given target gene, normalized between 0 and 1. :rtype: pathway_sum_potential .. py:method:: inferred_effect_direction(targets: Optional[Union[str, List[str]]] = None, compute_pathway_effect: bool = False) For visualization purposes, used for models that consider ligand expression (:attr `mod_type` is 'ligand' or 'lr' (for receptor models, assigning directionality is impossible and for niche models, it makes much less sense to draw/compute a vector field). Construct spatial vector fields to infer the directionality of observed effects (the "sources" of the downstream expression). Parts of this function are inspired by 'communication_direction' from COMMOT: https://github.com/zcang/COMMOT :param targets: Optional string or list of strings to select targets from among the genes used to fit the model to compute signaling effects for. If not given, will use all targets. :param compute_pathway_effect: Whether to compute the effect potential for each pathway in the model. If True, will collectively take the effect potential of all pathway components. If False, will compute effect potential for each for each individual signal. .. py:method:: define_effect_vf(effect_potential: scipy.sparse.spmatrix, normalized_effect_potential_sum_sender: numpy.ndarray, normalized_effect_potential_sum_receiver: numpy.ndarray, sig: str, target: str, max_val: float = 0.05) Given the pairwise effect potential array, computes the effect vector field. :param effect_potential: Sparse array containing computed effect potentials- output from :func:`get_effect_potential` :param normalized_effect_potential_sum_sender: Array containing the sum of the effect potentials sent by each cell. Output from :func:`get_effect_potential`. :param normalized_effect_potential_sum_receiver: Array containing the sum of the effect potentials received by each cell. Output from :func:`get_effect_potential`. :param max_val: Constrains the size of the vector field vectors. Recommended to set within the order of magnitude of 1/100 of the desired plot dimensions. :param sig: Label for the mediating interaction (e.g. name of a ligand, name of a ligand-receptor pair, etc.) :param target: Name of the target that the vector field describes the effect for .. py:method:: visualize_effect_vf_3D(interaction: str, target: str, vf_key: Optional[str] = None, vector_magnitude_lower_bound: float = 0.0, manual_vector_scale_factor: Optional[float] = None, bin_size: Optional[Union[float, Tuple[float]]] = None, plot_cells: bool = True, cell_size: float = 1.0, alpha: float = 0.3, no_color_coding: bool = False, only_view_effect_region: bool = False, add_group_label: Optional[str] = None, group_label_obs_key: Optional[str] = None, title_position: Tuple[float, float] = (0.5, 0.9), save_path: Optional[str] = None, **kwargs) Visualize the directionality of the effect on target for a given interaction, overlaid onto the 3D spatial plot. Can only be used for models that use ligand expression (:attr `mod_type` is 'ligand' or 'lr'). :param interaction: Interaction to incorporate into the visualization (e.g. "Igf1:Igf1r" for L:R model, "Igf1" for ligand model) :param target: Name of the target gene of interest. Will search key "spatial_effect_sender_vf_{interaction}_{ target}" to create vector field plot. :param vf_key: Optional key in .obsm to specify which vector field to use. If not given, will use the provided "interaction" and "target" to find the key specifying the vector field. :param vector_magnitude_lower_bound: Lower bound for the magnitude of the vector field vectors to be plotted, as a fraction of the maximum vector magnitude. Defaults to 0.0. :param manual_vector_scale_factor: If not None, will manually scale the vector field by this factor ( multiplicatively). Used for visualization purposes, not recommended to set above 2.0 (otherwise likely to get misleading results with vectors that are too long). :param bin_size: Optional, can be used to de-clutter plotting space by splitting the space into 3D bins and displaying one vector per bin. Can be given as a floating point number to create cubic bins, or as a tuple of floats to specify different bin sizes for each dimension. If not given, will plot one vector per cell. Defaults to None. :param plot_cells: If False, will not plot any of the cells (unless a group label is given), so will only visualize vector field. Defaults to True. :param cell_size: Size of the cells in the 3D plot. Defaults to 1.0. :param alpha: If visualizing cells not affected by the interaction, this argument specifies the transparency of those cells. :param no_color_coding: If True, will color all cells the same color (except cells of given category, if given). :param only_view_effect_region: If True, will only plot the region where the effect is predicted to be found, rather than the entire 3D object :param add_group_label: This optional argument represents a cell type category. Will color the cells belonging to this particular category orange. If given, it is recommended to also provide `group_label_obs_key` (which will be :attr `group_key` if not given). :param group_label_obs_key: If `add_group_label` is given, this argument represents the observation key in the AnnData object that contains the group label. If not given, will default to :attr `group_key`. :param title_position: Position of the title in the plot, given as a tuple of floats (i.e. (x, y)). Defaults to (0.5, 0.9). :param save_path: Path to save the figure to (will save as HTML file) :param kwargs: Additional arguments that can be passed to :func `plotly.graph_objects.Cone`. Common arguments: - "colorscale": Sets the colorscale. The colorscale must be an array containing arrays mapping a normalized value to an rgb, rgba, hex, hsl, hsv, or named color string. - "sizemode": Determines whether sizeref is set as a “scaled” (i.e unitless) scalar (normalized by the max u/v/w norm in the vector field) or as “absolute” value (in the same units as the vector field). Defaults to "scaled". - "sizeref": The scalar reference for the cone size. The cone size is determined by its u/v/w norm multiplied by sizeref. Defaults to 2.0. - "showscale": Determines whether or not a colorbar is displayed for this trace. .. py:method:: CCI_deg_detection_setup(group_key: Optional[str] = None, custom_tfs: Optional[List[str]] = None, sender_receiver_or_target_degs: Literal['sender', 'receiver', 'target'] = 'sender', use_ligands: bool = True, use_receptors: bool = False, use_pathways: bool = False, use_targets: bool = False, use_cell_types: bool = False, compute_dim_reduction: bool = False) Computes differential expression signatures of cells with various levels of ligand expression. :param group_key: Key to add to .obs of the AnnData object created by this function, containing cell type labels for each cell. If not given, will use :attr `group_key`. :param custom_tfs: Optional list of transcription factors to make sure to be included in analysis. If given, these TFs will be included among the regulators regardless of the expression-based thresholding done in preprocessing. :param sender_receiver_or_target_degs: Only makes a difference if 'use_pathways' or 'use_cell_types' is specified. Determines whether to compute DEGs for ligands, receptors or target genes. If 'use_pathways' is True, the value of this argument will determine whether ligands or receptors are used to define the model. Note that in either case, differential expression of TFs, binding factors, etc. will be computed in association w/ ligands/receptors/target genes (only valid if 'use_cell_types' and not 'use_pathways' is specified. :param use_ligands: Use ligand array for differential expression analysis. Will take precedent over sender/receiver cell type if also provided. :param use_receptors: Use receptor array for differential expression analysis. Will take precedent over sender/receiver cell type if also provided. :param use_pathways: Use pathway array for differential expression analysis. Will use ligands in these pathways to collectively compute signaling potential score. Will take precedent over sender cell types if also provided. :param use_targets: Use target array for differential expression analysis. :param use_cell_types: Use cell types to use for differential expression analysis. If given, will preprocess/construct the necessary components to initialize cell type-specific models. Note- should be used alongside 'use_ligands', 'use_receptors', 'use_pathways' or 'use_targets' to select which molecules to investigate in each cell type. :param compute_dim_reduction: Whether to compute PCA representation of the data subsetted to targets. .. py:method:: CCI_deg_detection(group_key: str, cci_dir_path: str, sender_receiver_or_target_degs: Literal['sender', 'receiver', 'target'] = 'sender', use_ligands: bool = True, use_receptors: bool = False, use_pathways: bool = False, use_targets: bool = False, ligand_subset: Optional[List[str]] = None, receptor_subset: Optional[List[str]] = None, target_subset: Optional[List[str]] = None, cell_type: Optional[str] = None, use_dim_reduction: bool = False, **kwargs) Downstream method that when called, creates a separate instance of :class `MuSIC` specifically designed for the downstream task of detecting differentially expressed genes associated w/ ligand expression. :param group_key: Key in `adata.obs` that corresponds to the cell type (or other grouping) labels :param cci_dir_path: Path to directory containing all Spateo databases :param sender_receiver_or_target_degs: Only makes a difference if 'use_pathways' or 'use_cell_types' is specified. Determines whether to compute DEGs for ligands, receptors or target genes. If 'use_pathways' is True, the value of this argument will determine whether ligands or receptors are used to define the model. Note that in either case, differential expression of TFs, binding factors, etc. will be computed in association w/ ligands/receptors/target genes (only valid if 'use_cell_types' and not 'use_pathways' is specified. :param use_ligands: Use ligand array for differential expression analysis. Will take precedent over receptors and sender/receiver cell types if also provided. Should match the input to :func `CCI_sender_deg_detection_setup`. :param use_receptors: Use receptor array for differential expression analysis. :param use_pathways: Use pathway array for differential expression analysis. Will use ligands in these pathways to collectively compute signaling potential score. Will take precedent over sender cell types if also provided. Should match the input to :func `CCI_sender_deg_detection_setup`. :param use_targets: Use target genes array for differential expression analysis. :param ligand_subset: Subset of ligands to use for differential expression analysis. If not given, will use all ligands from the upstream model. :param receptor_subset: Subset of receptors to use for differential expression analysis. If not given, will use all receptors from the upstream model. :param target_subset: Subset of target genes to use for differential expression analysis. If not given, will use all target genes from the upstream model. :param cell_type: Cell type to use to use for differential expression analysis. If given, will use the ligand/receptor subset obtained from :func ~`CCI_deg_detection_setup` and cells of the chosen cell type in the model. :param use_dim_reduction: Whether to use PCA representation of the data to find nearest neighbors. If False, will instead use the Jaccard distance. Defaults to False. Note that this will ultimately fail if dimensionality reduction was not performed in :func ~`CCI_deg_detection_setup`. :param kwargs: Keyword arguments for any of the Spateo argparse arguments. Should not include 'adata_path', 'custom_lig_path' & 'ligand' or 'custom_pathways_path' & 'pathway' (depending on whether ligands or pathways are being used for the analysis), and should not include 'output_path' (which will be determined by the output path used for the main model). Should also not include any of the other arguments for this function :returns: Fitted model instance that can be used for further downstream applications :rtype: downstream_model .. py:method:: deg_effect_barplot(target: str, interaction_subset: Optional[List[str]] = None, top_n_interactions: Optional[int] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'Blues', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the proportion of cells expressing a particular target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by each transcription factor, or that are predicted to be affected by each L:R pair/ligand. :param target: Target gene :param interaction_subset: Optional, can be used to specify subset of interactions (transcription factors, L:R pairs, etc.) to visualize, e.g. ["Sox2", "Irx3"]. If not given, will default to all TFs, L:R pairs, etc. :param top_n_interactions: Optional, can be used to specify the top n interactions (transcription factors, L:R pair, ligand, etc.) to visualize. If not given, will default to all TFs, L:R pairs, etc. :param fontsize: Font size to determine size of the axis labels, ticks, title, etc. :param figsize: Width and height of plotting window :param cmap: Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: deg_effect_heatmap(target_subset: Optional[List[str]] = None, target_type: Literal['ligand', 'receptor', 'target_gene', 'tf_target'] = 'target_gene', to_plot: Literal['proportion', 'specificity'] = 'proportion', interaction_subset: Optional[List[str]] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'magma', lower_proportion_threshold: float = 0.1, order_interactions: bool = False, order_targets: bool = False, remove_rows_and_cols_threshold: Optional[int] = None, save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}, save_df: bool = False) Visualize the proportion of cells expressing any target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by each transcription factor, or that are predicted to be affected by each L:R pair/ligand, using a heatmap for visualization. :param target_subset: Optional, can be used to specify subset of targets (ligands, receptors, target genes, or "TF_target" for target genes where the interaction to plot is TF effect) to visualize, e.g. ["Tubb1a", "Tubb1b"]. If not given, will default to all targets. :param target_type: Type of target gene to visualize. Must be one of "ligand", "receptor", or "target_gene". Defaults to "target_gene". Used to specify where to search for the target genes to process. :param to_plot: Two options, "proportion" or "specificity": for proportion, plot the proportion of cells expressing the target that are affected by each interaction. For specificity, take the proportion of cells affected by each interaction for which the interaction is predicted to affect a specific target. :param interaction_subset: Optional, can be used to specify subset of interactions (transcription factors, L:R pairs, etc.) to visualize, e.g. ["Sox2", "Irx3"]. If not given, will default to all TFs, L:R pairs, etc. :param fontsize: Font size to determine size of the axis labels, ticks, title, etc. :param figsize: Width and height of plotting window :param cmap: Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap. :param lower_proportion_threshold: Proportion threshold below which to set the proportion to 0 in the display. Defaults to 0.1. :param order_interactions: Whether to hierarchically sort the y-axis/interactions (transcription factors, L:R pairs, etc.). :param order_targets: Whether to hierarchically sort the x-axis/targets (ligands, receptors, target genes) :param remove_rows_and_cols_threshold: Optional, can be used to specify the threshold for the number of nonzero interactions/TFs a row/column needs to be displayed. If not given, all rows and columns will be displayed. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. :param save_df: Set True to save the metric dataframe in the end .. py:method:: top_target_barplot(interaction: str, target_subset: Optional[List[str]] = None, use_ligand_targets: bool = False, use_receptor_targets: bool = False, use_target_gene_targets: bool = True, use_target_gene_tf_targets: bool = False, top_n_targets: Optional[int] = None, fontsize: Union[None, int] = None, figsize: Union[None, Tuple[float, float]] = None, cmap: str = 'Blues', save_show_or_return: Literal['save', 'show', 'return', 'both', 'all'] = 'show', save_kwargs: Optional[dict] = {}) Visualize the proportion of cells expressing each target (ligand, receptor, or target gene involved in an upstream CCI model) that are predicted to be affected by a given interaction, i.e. transcription factor, L:R pair/ligand. :param interaction: The interaction to investigate, in the form specified in the design matrix, e.g. "Sox9" or "Igf1:Igf1r". :param target_subset: Optional, specify subset of target genes to visualize. If not given, defaults to all targets. :param use_ligand_targets: Whether ligands should be used as targets, i.e. if "interaction" is a TF and the target genes being influenced by the TF are ligands. If True, will ignore "use_receptor_targets" and "use_target_gene_targets". :param use_receptor_targets: Whether receptors should be used as targets, i.e. if "interaction" is a TF and the target genes being influenced by the TF are receptors. If True, will ignore "use_target_gene_targets". :param use_target_gene_targets: Whether target genes should be used as targets, i.e. if "interaction" is an L:R interaction :param use_target_gene_tf_targets: Whether target genes should be used as targets, i.e. if "interaction" is a TF and the target genes being influenced by the TF are target genes (that are not ligands or receptors). :param top_n_targets: Number of top targets to visualize. Defaults to 10. :param fontsize: Font size to determine size of the axis labels, ticks, title, etc. :param figsize: Width and height of plotting window :param cmap: Name of matplotlib colormap specifying colormap to use. Must be a sequential colormap. :param save_show_or_return: Whether to save, show or return the figure. If "both", it will save and plot the figure at the same time. If "all", the figure will be saved, displayed and the associated axis and other object will be return. :param save_kwargs: A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {"path": None, "prefix": 'scatter', "dpi": None, "ext": 'pdf', "transparent": True, "close": True, "verbose": True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs. .. py:method:: permutation_test(gene: str, n_permutations: int = 100, permute_nonzeros_only: bool = False, **kwargs) Sets up permutation test for determination of statistical significance of model diagnostics. Can be used to identify true/the strongest signal-responsive expression patterns. :param gene: Target gene to perform permutation test on. :param n_permutations: Number of permutations of the gene expression to perform. Default is 100. :param permute_nonzeros_only: Whether to only perform the permutation over the gene-expressing cells :param kwargs: Keyword arguments for any of the Spateo argparse arguments. Should not include 'adata_path', 'target_path', or 'output_path' (which will be determined by the output path used for the main model). Also should not include 'custom_lig_path', 'custom_rec_path', 'mod_type', 'bw_fixed' or 'kernel' (which will be determined by the initial model instantiation). .. py:method:: eval_permutation_test(gene: str) Evaluation function for permutation tests. Will compute multiple metrics (correlation coefficients, F1 scores, AUROC in the case that all cells were permuted, etc.) to compare true and model-predicted gene expression vectors. :param gene: Target gene for which to evaluate permutation test .. py:class:: MuSIC_Molecule_Selector(parser: argparse.ArgumentParser, args_list: Optional[List[str]] = None) Bases: :py:obj:`spateo.tools.CCI_effects_modeling.MuSIC.MuSIC` Various methods to select initial targets or predictors for intercellular analyses. :param parser: ArgumentParser object initialized with argparse, to parse command line arguments for arguments pertinent to modeling. .. attribute:: mod_type The type of model that will be employed for eventual downstream modeling. Will dictate how predictors will be found (if applicable). Options: - "niche": Spatially-aware, uses categorical cell type labels as independent variables. - "lr": Spatially-aware, essentially uses the combination of receptor expression in the "target" cell and spatially lagged ligand expression in the neighboring cells as independent variables. - "ligand": Spatially-aware, essentially uses ligand expression in the neighboring cells as independent variables. - "receptor": Uses receptor expression in the "target" cell as independent variables. .. attribute:: distr Distribution family for the dependent variable; one of "gaussian", "poisson", "nb" .. attribute:: adata_path Path to the AnnData object from which to extract data for modeling .. attribute:: normalize Set True to Perform library size normalization, to set total counts in each cell to the same number (adjust for cell size). .. attribute:: smooth Set True to correct for dropout effects by leveraging gene expression neighborhoods to smooth expression. .. attribute:: log_transform Set True if log-transformation should be applied to expression. .. attribute:: target_expr_threshold When selecting targets, expression above a threshold percentage of cells will be used to filter to a smaller subset of interesting genes. Defaults to 0.1. .. attribute:: r_squared_threshold When selecting targets, only genes with an R^2 above this threshold will be used as targets .. attribute:: custom_lig_path Optional path to a .txt file containing a list of ligands for the model, separated by newlines. If provided, will find targets for which this set of ligands collectively explains the most variance for (on a gene-by-gene basis) when taking neighborhood expression into account .. attribute:: custom_ligands Optional list of ligands for the model, can be used as an alternative to :attr `custom_lig_path`. If provided, will find targets for which this set of ligands collectively explains the most variance for (on a gene-by-gene basis) when taking neighborhood expression into account .. attribute:: custom_rec_path Optional path to a .txt file containing a list of receptors for the model, separated by newlines. If provided, will find targets for which this set of receptors collectively explains the most variance for .. attribute:: custom_receptors Optional list of receptors for the model, can be used as an alternative to :attr `custom_rec_path`. If provided, will find targets for which this set of receptors collectively explains the most variance for .. attribute:: custom_pathways_path Rather than providing a list of receptors, can provide a list of signaling pathways- all receptors with annotations in this pathway will be included in the model. If provided, will find targets for which receptors in these pathways collectively explain the most variance for .. attribute:: custom_pathways Optional list of signaling pathways for the model, can be used as an alternative to :attr `custom_pathways_path`. If provided, will find targets for which receptors in these pathways collectively explain the most variance for .. attribute:: targets_path Optional path to a .txt file containing a list of prediction target genes for the model, separated by newlines. If not provided, targets will be strategically selected from the given receptors. .. attribute:: custom_targets Optional list of prediction target genes for the model, can be used as an alternative to :attr `targets_path`. .. attribute:: cci_dir Full path to the directory containing cell-cell communication databases .. attribute:: species Selects the cell-cell communication database the relevant ligands will be drawn from. Options: "human", "mouse". .. attribute:: output_path Full path name for the .csv file in which results will be saved .. attribute:: group_key Key in .obs of the AnnData object that contains the cell type labels, used if targeting molecules that have cell type-specific activity .. attribute:: coords_key Key in .obsm of the AnnData object that contains the coordinates of the cells .. attribute:: n_neighbors Number of nearest neighbors to use in the case that ligands are provided or in the case that ligands of interest should be found .. py:method:: find_targets(save_id: Optional[str] = None, bw_membrane_bound: Union[float, int] = 8, bw_secreted: Union[float, int] = 25, kernel: Literal['bisquare', 'exponential', 'gaussian', 'quadratic', 'triangular', 'uniform'] = 'bisquare', **kwargs) Find genes that may serve as interesting targets by computing the IoU with receptor signal. Will find genes that are highly coexpressed with receptors or ligand:receptor signals. :param save_id: Optional string to append to the end of the saved file name. Will save signaling molecule names as "ligand_{save_id}.txt", etc. :param bw_membrane_bound: Bandwidth used to compute spatial weights for membrane-bound ligands. If integer, will convert to appropriate distance bandwidth. :param bw_secreted: Bandwidth used to compute spatial weights for secreted ligands. If integer, will convert to appropriate distance bandwidth. :param kernel: Type of kernel function used to weight observations when computing spatial weights; one of "bisquare", "exponential", "gaussian", "quadratic", "triangular" or "uniform". :param kwargs: Keyword arguments for any of the Spateo argparse arguments. Should not include 'output_path' ( which will be determined by the output path used for the main model). Should also not include any of 'ligands' or 'receptors', which will be determined by this function. .. py:function:: define_spateo_argparse(**kwargs) Defines and returns argparse objects for model fitting and interpretation. :param kwargs: Keyword arguments for any of the argparse arguments defined below. Parser arguments: run_upstream: Flag to run the upstream target selection step. If True, will run the target selection step adata_path: Path to AnnData object containing gene expression data. This or 'csv_path' must be given to run. csv_path: Path to .csv file containing gene expression data. This or 'adata_path' must be given to run. n_spatial_dim_csv: Number of spatial dimensions to the data provided to 'csv_path'. Defaults to 2. spatial_subsample: Flag to subsample the data- at a big picture level, this will be done by dividing the tissue into regions and subsampling from each of these regions. Recommended for large datasets (>5000 samples). multiscale: Flag to create multiscale models. Currently, it is recommended to only create multiscale models for Gaussian data. multiscale_params_only: Flag to return additional metrics along with the coefficients for multiscale models ( specifying this argument sets Flag to True) mod_type: The type of model that will be employed- this dictates how the data will be processed and prepared. Options: - "niche": Spatially-aware, uses categorical cell type labels as independent variables. - "lr": Spatially-aware, essentially uses the combination of receptor expression in the "target" cell and spatially lagged ligand expression in the neighboring cells as independent variables. - "ligand": Spatially-aware, essentially uses ligand expression in the neighboring cells as independent variables. - "receptor": Uses receptor expression in the "target" cell as independent variables. - "downstream": For the purposes of downstream analysis, used to model ligand expression as a function of upstream regulators include_unpaired_lr: Only if :attr:`mod_type` is "lr"- if True, will include individual ligands/complexes and individual receptors in the design matrix if their cognate interacting partners cannot also be found. cci_dir: Path to directory containing cell-cell interaction databases species: Selects the cell-cell communication database the relevant ligands will be drawn from. Options: "human", "mouse". output_path: Full path name for the .csv file in which results will be saved. Make sure the parent directory is empty- any existing files will be deleted. It is recommended to create a new folder to serve as the output directory. This should be supplied of the form '/path/to/file.csv', where file.csv will store coefficients. The name of the target will be appended at runtime. custom_lig_path: Path to .txt file containing a custom list of ligands. Each ligand should have its own line in the .txt file. ligand: Alternative to the custom ligand path, can be used to provide a single ligand or a list of ligands ( separated by whitespace in the command line). custom_rec_path: Path to .txt file containing a custom list of receptors. Each receptor should have its own line in the .txt file. receptor: Alternative to the custom receptor path, can be used to provide a single receptor or a list of receptors (separated by whitespace in the command line). custom_pathways_path: Path to .txt file containing a custom list of pathways. Each pathway should have its own line in the .txt file. pathway: Alternative to the custom pathway path, can be used to provide a single pathway or a list of pathways ( separated by whitespace in the command line). targets_path: Path to .txt file containing a custom list of targets. Each target should have its own line in the .txt file. target: Alternative to the custom target path, can be used to provide a single target or a list of targets ( separated by whitespace in the command line). init_betas_path: Optional path to a .json file or .csv file containing initial coefficient values for the model for each target variable. If encoded in .json, keys should be target gene names, values should be numpy arrays containing coefficients. If encoded in .csv, columns should be target gene names. Initial coefficients should have shape [n_features, ]. normalize: Flag to perform library size normalization, to set total counts in each cell to the same number (adjust for cell size). Will be set to True if provided. smooth: Flag to correct for dropout effects by leveraging gene expression neighborhoods to smooth expression. It is advisable not to do this if performing Poisson or negative binomial regression. Will be set to True if provided. log_transform: Flag for whether log-transformation should be applied to expression. It is advisable not to do this if performing Poisson or negative binomial regression. Will be set to True if provided. normalize_signaling: Flag to minmax scale the final ligand expression array (for :attr `mod_type` = "ligand"), or the final ligand-receptor array (for :attr `mod_type` = "lr"). This is recommended to associate downstream expression with rarer/less prevalent signaling mechanisms. target_expr_threshold: Only used when automatically selecting targets- finds the L:R-downstream TFs and their targets and searches for expression above a threshold proportion of cells to filter to a subset of candidate target genes. This argument sets that proportion, and defaults to 0.05. multicollinear_threshold: Variance inflation factor threshold used to filter out multicollinear features. A value of 5 or 10 is recommended. coords_key: Entry in :attr:`adata` .obsm that contains spatial coordinates. Defaults to "spatial". group_key: Entry in :attr:`adata` .obs that contains cell type labels. Required for 'mod_type' = "niche". group_subset: Subset of cell types to include in the model (provided as a whitespace-separated list in command line). If given, will consider only cells of these types in modeling. Defaults to all cell types. covariate_keys: Entries in :attr:`adata` .obs or :attr:`adata` .var that contain covariates to include in the model. Can be provided as a whitespace-separated list in the command line. Numerical covariates should be minmax scaled between 0 and 1. total_counts_key: Entry in :attr:`adata` .obs that contains total counts for each cell. Required if subsetting by total counts. Defaults to "total_counts". total_counts_threshold: Threshold for total counts to subset cells by- cells with total counts greater than this threshold will be retained. bw: Bandwidth for kernel density estimation. Consists of either a distance value or N for the number of nearest neighbors, depending on :attr:`bw_fixed` minbw: For use in automated bandwidth selection- the lower-bound bandwidth to test. maxbw: For use in automated bandwidth selection- the upper-bound bandwidth to test. bw_fixed: Flag to use a fixed bandwidth (True) or to automatically select a bandwidth (False). This should be True if the input to/values to test for :attr:`bw` are distance values, and False if they are numbers of neighbors. exclude_self: Flag to exclude the target cell from the neighborhood when computing spatial weights. Note that if True and :attr:`bw` is defined by the number of neighbors, your desired bw should be 1 + the number of neighbors you want to include. kernel: Type of kernel function used to weight observations when computing spatial weights and fitting the model; one of "bisquare", "exponential", "gaussian", "quadratic", "triangular" or "uniform". distance_membrane_bound: In model setup, distance threshold to consider cells as neighbors for membrane-bound ligands. If provided, will take priority over :attr 'n_neighbors_membrane_bound'. distance_secreted: In model setup, distance threshold to consider cells as neighbors for secreted or ECM ligands. If provided, will take priority over :attr 'n_neighbors_secreted'. n_neighbors_membrane_bound: For :attr:`mod_type` "ligand" or "lr"- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for membrane-bound ligands. Defaults to 8. n_neighbors_secreted: For :attr:`mod_type` "ligand" or "lr"- ligand expression will be taken from the neighboring cells- this defines the number of cells to use for secreted or ECM ligands. distr: Distribution family for the dependent variable; one of "gaussian", "poisson", "nb" fit_intercept: Flag to fit an intercept term in the model. Will be set to True if provided. tolerance: Convergence tolerance for IWLS max_iter: Maximum number of iterations for IWLS patience: When checking various values for the bandwidth, this is the number of iterations to wait for without the score changing before stopping. Defaults to 5. ridge_lambda: Sets the strength of the regularization, between 0 and 1. The higher values typically will result in more features removed. search_bw: For downstream analysis; specifies the bandwidth to search for senders/receivers. Recommended to set equal to the bandwidth of a fitted model. top_k_receivers: For downstream analysis, specifically when constructing vector fields of signaling effects. Specifies the number of nearest neighbors to consider when computing signaling effect vectors. filter_targets: For downstream analysis, specifically :func `infer_effect_direction`; if True, will subset to only the targets that were predicted well by the model. filter_target_threshold: For downstream analysis, specifically :func `infer_effect_direction`; specifies the threshold Pearson coefficient for target subsetting. Only used if `filter_targets` is True. diff_sending_or_receiving: For downstream analyses, specifically :func `sender_receiver_effect_deg_detection`; specifies whether to compute differential expression of genes in cells with high or low sending effect potential ('sending cells') or high or low receiving effect potential ('receiving cells'). target_for_downstream: A string or a list (provided as a whitespace-separated list in the command line) of target genes for :func `get_effect_potential`, :func `get_pathway_potential` and :func `calc_and_group_sender_receiver_effect_degs` (provide only one target), as well as :func `compute_cell_type_coupling` (can provide multiple targets). ligand_for_downstream: For downstream analyses; used for :func `get_effect_potential` and :func `calc_and_group_sender_receiver_effect_degs`, used to specify the ligand gene to consider with respect to the target. receptor_for_downstream: For downstream analyses; used for :func `get_effect_potential` and :func `calc_and_group_sender_receiver_effect_degs`, used to specify the receptor gene to consider with respect to the target. pathway_for_downstream: For downstream analyses; used for :func `get_pathway_potential` and :func `calc_and_group_sender_receiver_effect_degs`, used to specify the pathway to consider with respect to the target. sender_ct_for_downstream: For downstream analyses; used for :func `get_effect_potential` and :func `calc_and_group_sender_receiver_effect_degs`, used to specify the cell type to consider as a sender. receiver_ct_for_downstream: For downstream analyses; used for :func `get_effect_potential` and :func `calc_and_group_sender_receiver_effect_degs`, used to specify the cell type to consider as a receiver. n_components: Used for :func `CCI_sender_deg_detection` and :func `CCI_receiver_deg_detection`; determines the dimensionality of the space to embed into using UMAP. cci_degs_model_interactions: Used for :func `CCI_sender_deg_detection`; if True, will consider transcription factor interactions with cofactors and other transcription factors, with these interactions combined into features. If False, will use each cofactor independently in the prediction. no_cell_type_markers: Used for :func `CCI_receiver_deg_detection`; if True, will exclude cell type markers from the set of genes for which to compare to sent/received signal. compute_pathway_effect: Used for :func `inferred_effect_direction`; if True, will summarize the effects of all ligands/ligand-receptor interactions in a pathway. :returns: Argparse object defining important arguments for model fitting and interpretation args_list: If argparse object is returned from a function, the parser must read in arguments in the form of a list- this return contains that processed list. :rtype: parser .. py:function:: find_cci_two_group(adata: anndata.AnnData, path: str, species: Literal['human', 'mouse', 'drosophila', 'zebrafish', 'axolotl'] = 'human', layer: Tuple[None, str] = None, group: str = None, lr_pair: list = None, sender_group: str = None, receiver_group: str = None, mode: Literal['mode1', 'mode2'] = 'mode2', filter_lr: Literal['outer', 'inner'] = 'outer', top: int = 20, spatial_neighbors: str = 'spatial_neighbors', spatial_distances: str = 'spatial_distances', min_cells_by_counts: int = 0, min_pairs: int = 5, min_pairs_ratio: float = 0.01, num: int = 1000, pvalue: float = 0.05, fdr: bool = False) -> dict Performing cell-cell transformation on an anndata object, while also limiting the nearest neighbor per cell to n_neighbors. This function returns a dictionary, where the key is 'cell_pair' and 'lr_pair'. :param adata: An Annodata object. :param path: Path to ligand_receptor network of NicheNet (prior lr_network). :param species: Which species is your adata generated from. Will be used to determine the proper ligand-receptor database. :param layer: the key to the layer. If it is None, adata.X will be used by default. :param group: The group name in adata.obs :param lr_pair: given a lr_pair list. :param sender_group: the cell group name of send ligands. :param receiver_group: the cell group name of receive receptors. :param spatial_neighbors: spatial neighbor key {spatial_neighbors} in adata.uns.keys(), :param spatial_distances: spatial neighbor distance key {spatial_distances} in adata.obsp.keys(). :param min_cells_by_counts: threshold for minimum number of cells expressing ligand/receptor to avoid being filtered out. Only used if 'lr_pair' is None. :param min_pairs: minimum number of cell pairs between cells from two groups. :param min_pairs_ratio: minimum ratio of cell pairs to theoretical cell pairs (n x M / 2) between cells from two groups. :param num: number of permutations. It is recommended that this number be at least 1000. :param pvalue: the p-value threshold that will be used to filter for significant ligand-receptor pairs. :param filter_lr: filter ligand and receptor based on specific expressed in sender groups and receiver groups. 'inner': specific both in sender groups and receiver groups; 'outer': specific in sender groups or receiver groups. :param top: the number of top expressed fraction in given sender groups(receiver groups) for each gene(ligand or receptor). :returns: a dictionary where the key is 'cell_pair' and 'lr_pair'. :rtype: result_dict .. py:function:: prepare_cci_cellpair_adata(adata: anndata.AnnData, sender_group: str = None, receiver_group: str = None, group: str = None, cci_dict: dict = None, all_cell_pair: bool = False) -> anndata.AnnData prepare for visualization cellpairs by func `st.tl.space`, plot all_cell_pair, or cell pairs which constrain by spatial distance(output of :func `cci_two_cluster`). Args: adata:An Annodata object. sender_group: the cell group name of send ligands. receiver_group: the cell group name of receive receptors. group:The group name in adata.obs, Unused unless 'all_cell_pair' is True. cci_dict: a dictionary result from :func `cci_two_cluster`, where the key is 'cell_pair' and 'lr_pair'. Unused unless 'all_cell_pair' is False. all_cell_pair: show all cells of the sender and receiver cell group, spatial_key: Key in .obsm containing coordinates for each bucket. Defult `False`. Returns: adata: Updated AnnData object containing 'spec' in .obs. .. py:function:: prepare_cci_df(cci_df: pandas.DataFrame, means_col: str, pval_col: str, lr_pair_col: str, sr_pair_col: str) Given a dataframe generated from the output of :func `cci_two_cluster`, prepare for visualization by heatmap by splitting into two dataframes, corresponding to the mean cell type-cell type L:R product and probability values from the permutation test. :param cci_df: CCI dataframe with columns for: ligand name, receptor name, L:R product, p value, and sender-receiver cell types :param means_col: Label for the column corresponding to the mean product of L:R expression between two cell types :param pval_col: Label for the column corresponding to the p-value of the interaction :param lr_pair_col: Label for the column corresponding to the ligand-receptor pair in format "{ligand}-{receptor}" :param sr_pair_col: Label for the column corresponding to the sending-receiving cell type pair in format "{ :param sender}-{receiver}": :returns: If 'adata' is None. Keys: 'means', 'pvalues', values: mean cell type-cell type L:R product, probability values, respectively :rtype: dict .. rubric:: Example res = find_cci_two_group(adata, ...) # The df to save can be found under "lr_pair": res["lr_pair"].to_csv(...) adata, dict = prepare_cci_df(res["lr_pair"]) .. py:function:: niches(adata: anndata.AnnData, path: str, layer: Tuple[None, str] = None, weighted: bool = False, spatial_neighbors: str = 'spatial_neighbors', spatial_distances: str = 'spatial_distances', species: Literal['human', 'mouse', 'drosophila', 'zebrafish', 'axolotl'] = 'human', system: Literal['niches_c2c', 'niches_n2c', 'niches_c2n', 'niches_n2n'] = 'niches_n2n', method: Literal['gmean', 'mean', 'sum'] = 'sum') -> anndata.AnnData Performing cell-cell transformation on an anndata object, while also limiting the nearest neighbor per cell to k. This function returns another anndata object, in which the columns of the matrix are bucket -bucket pairs, while the rows ligand-receptor mechanisms. This resultant anndated object allows flexible downstream manipulations such as the dimensional reduction of the row or column of this object. Our method is adapted from: Micha Sam Brickman Raredon, Junchen Yang, Neeharika Kothapalli, Naftali Kaminski, Laura E. Niklason, Yuval Kluger. Comprehensive visualization of cell-cell interactions in single-cell and spatial transcriptomics with NICHES. doi: https://doi.org/10.1101/2022.01.23.477401 :param adata: An Annodata object. :param path: Path to ligand_receptor network of NicheNet (prior lr_network). :param layer: the key to the layer. If it is None, adata.X will be used by default. :param weighted: 'False' (defult) whether to supply the edge weights according to the actual spatial distance(just as weighted kNN). Defult is 'False', means all neighbor edge weights equal to 1, others is 0. :param spatial_neighbors: neighbor_key {spatial_neighbors} in adata.uns.keys(), :param spatial_distances: neighbor_key {spatial_distances} in adata.obsp.keys(). :param system: 'niches_n2n'(defult) cell-cell signaling ('niches_c2c'), defined as the signals passed between cells, determined by the product of the ligand expression of the sending cell and the receptor expression of the receiving cell, and system-cell signaling ('niches_n2c'), defined as the signaling input to a cell, determined by taking the geometric mean of the ligand profiles of the surrounding cells and the receptor profile of the receiving cell.similarly, 'niches_c2n','niches_n2n'. :returns: An anndata of Niches, which rows are mechanisms and columns are all possible cell x cell interactions. .. py:function:: predict_ligand_activities(adata: anndata.AnnData, path: str, sender_cells: Optional[List[str]] = None, receiver_cells: Optional[List[str]] = None, geneset: Optional[List[str]] = None, ratio_expr_thresh: float = 0.01, species: Literal['human', 'mouse'] = 'human') -> pandas.DataFrame Function to predict the ligand activity. Our method is adapted from: Robin Browaeys, Wouter Saelens & Yvan Saeys. NicheNet: modeling intercellular communication by linking ligands to target genes. Nature Methods volume 17, pages159–162 (2020). :param path: Path to ligand_target_matrix, lr_network (human and mouse). :param adata: An Annodata object. :param sender_cells: Ligand cells. :param receiver_cells: Receptor cells. :param geneset: The genes set of interest. This may be the differentially expressed genes in receiver cells (comparing cells in case and control group). Ligands activity prediction is based on this gene set. By default, all genes expressed in receiver cells is used. :param ratio_expr_thresh: The minimum percentage of buckets expressing the ligand (target) in sender(receiver) cells. :returns: A pandas DataFrame of the predicted activity ligands. .. py:function:: predict_target_genes(adata: anndata.AnnData, path: str, sender_cells: Optional[List[str]] = None, receiver_cells: Optional[List[str]] = None, geneset: Optional[List[str]] = None, species: Literal['human', 'mouse'] = 'human', top_ligand: int = 20, top_target: int = 300) -> pandas.DataFrame Function to predict the target genes. :param lt_matrix_path: Path to ligand_target_matrix of NicheNet. :param adata: An Annodata object. :param sender_cells: Ligand cells. :param receiver_cells: Receptor cells. :param geneset: The genes set of interest. This may be the differentially expressed genes in receiver cells (comparing cells in case and control group). Ligands activity prediction is based on this gene set. By default, all genes expressed in receiver cells is used. :param top_ligand: `int` (default=20) select 20 top-ranked ligands for further biological interpretation. :param top_target: `int` (default=300) Infer target genes of top-ranked ligands, and choose the top targets according to the general prior model. :returns: A pandas DataFrame of the predict target genes. .. py:class:: pySTAGATE(adata: anndata.AnnData, num_batch_x, num_batch_y, basis='spatial', spatial_key: list = ['X', 'Y'], batch_size: int = 1, rad_cutoff: int = 200, num_epoch: int = 1000, lr: float = 0.001, weight_decay: float = 0.0001, hidden_dims: list = [512, 30], device: str = 'cuda:0') Class representing the object of pySTAGATE. .. py:attribute:: device .. py:attribute:: loader .. py:attribute:: num_epoch :value: 1000 .. py:attribute:: lr :value: 0.001 .. py:attribute:: weight_decay :value: 0.0001 .. py:attribute:: hidden_dims :value: [512, 30] .. py:attribute:: adata .. py:attribute:: data .. py:attribute:: model .. py:attribute:: optimizer .. py:method:: train() Train the STAGATE model. .. py:method:: predicted() Predict the STAGATE representation and ReX values for all cells. .. py:method:: cal_pSM(n_neighbors: int = 20, resolution: int = 1, max_cell_for_subsampling: int = 5000, psm_key='pSM_STAGATE') Calculate the pseudo-spatial map using diffusion pseudotime (DPT) algorithm. :param n_neighbors: Number of neighbors for constructing the kNN graph. :type n_neighbors: int :param resolution: Resolution for clustering. :type resolution: float :param max_cell_for_subsampling: Maximum number of cells for subsampling. If the number of cells is larger than this value, the subsampling will be performed. :type max_cell_for_subsampling: int :returns: **pSM_values** -- The pseudo-spatial map values. :rtype: numpy.ndarray .. py:function:: spagcn_vanilla(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'spagcn_pred', n_pca_components: Optional[int] = None, e_neigh: int = 10, resolution: float = 0.4, n_clusters: Optional[int] = None, refine_shape: Literal['hexagon', 'square'] = 'hexagon', p: float = 0.5, seed: int = 100, numIterMaxSpa: int = 2000, copy: bool = False) -> Optional[anndata.AnnData] Integrating gene expression and spatial location to identify spatial domains via SpaGCN. Original Code Repository: https://github.com/jianhuupenn/SpaGCN Reference: Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021) :param adata: An Anndata object after normalization. :param spatial_key: the key in `.obsm` that corresponds to the spatial coordinate of each bucket. :param key_added: adata.obs key under which to add the cluster labels. The initial clustering results of SpaGCN are under `key_added`, and the refined clustering results are under `f'{key_added}_refined'`. :param n_pca_components: Number of principal components to compute. If `n_pca_components` == None, the value at the inflection point of the PCA curve is automatically calculated as n_comps. :param e_neigh: Number of nearest neighbor in gene expression space. Used in dyn.pp.neighbors(adata, n_neighbors=e_neigh). :param resolution: Resolution in the Louvain clustering method. Used when `n_clusters`==None. :param n_clusters: Number of spatial domains wanted. If `n_clusters` != None, the suitable resolution in the initial Louvain clustering method will be automatically searched based on n_clusters. :param refine_shape: Smooth the spatial domains with given spatial topology, "hexagon" for Visium data, "square" for ST data. Defaults to None. :param p: Percentage of total expression contributed by neighborhoods. :param seed: Global seed for `random`, `torch`, `numpy`. Defaults to 100. :param numIterMaxSpa: SpaGCN maximum number of training iterations. :param copy: Whether to copy `adata` or modify it inplace. :returns: Depending on the parameter `copy`, when True return an updates adata with the field ``adata.obs[key_added]`` and ``adata.obs[f'{key_added}_refined']``, containing the cluster result based on SpaGCN; else inplace update the adata object. .. py:function:: CAST(adata, sample_key=None, basis='spatial', layer='norm_1e4', n_components=10, output_path='output/CAST_Mark', gpu_t=0, device='cuda:0', **kwargs) CAST is a Python library for physically aligning different spatial transcriptome regardless of technologies, magnification, individual variation, and experimental batch effects. CAST is composed of three modules: CAST Mark, CAST Stack, and CAST Projection. :param adata: an Anndata object, after normalization. :param sample_key: str, optional, default: None The key in `.obs` that corresponds to the sample labels. :param basis: str, optional, default: 'spatial' The basis used for CAST. :param layer: str, optional, default: 'norm_1e4' The layer used for CAST. :param output_path: str, optional, default: 'output/CAST_Mark' The path to save the CAST results. :param gpu_t: int, optional, default: 0 The GPU index to be used. :param device: str, optional, default: 'cuda:0' The device to be used. :param kwargs: additional parameters for CAST. .. py:function:: kmeans_clustering(adata, n_clusters=10, use_rep='X_cast', random_state=42, cluster_key='kmeans_clusters') KMeans clustering for spatial transcriptomics data. :param adata: an Anndata object, after normalization. :param n_clusters: int, optional, default: 10 The number of clusters. :param use_rep: str, optional, default: 'X_cast' The representation to be used for clustering. :param random_state: int, optional, default: 42 Random seed for reproducibility. :param cluster_key: str, optional, default: 'kmeans_clusters' The key in `.obs` that corresponds to the cluster labels .. py:function:: mclust_py(adata, n_components=None, use_rep: str = 'X_pca', modelNames='EEE', random_seed=42) Clustering using Gaussian Mixture Model (GMM), similar to mclust in R. :param adata: an Anndata object, after normalization. :param n_components: int, optional, default: None The number of mixture components. :param use_rep: str, optional, default: 'X_pca' The representation to be used for clustering. :param modelNames: str, optional, default: 'EEE' The model name to be used for clustering. - EEE: represents Equal volume, shape, and orientation (spherical). - VVV: represents Variable volume, shape, and orientation. - EEV: represents Equal volume and shape, variable orientation (tied). - VVI: represents Variable volume and shape, equal orientation (diag). :param random_seed: int, optional, default: 42 Random seed for reproducibility. .. py:function:: scc(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'scc', pca_key: str = 'pca', e_neigh: int = 30, s_neigh: int = 6, resolution: Optional[float] = None, cluster_method: str = 'louvain') -> Optional[anndata.AnnData] Spatially constrained clustering (scc) to identify continuous tissue domains. Reference: Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Wenjiao Li, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Feng Lin, Zhao Li, Yan Hong, Defeng Fu, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Huiwen Zheng, Zhifeng Wang, H Xiang, L Han, B Qin, P Guo, PM Cánoves, JP Thiery, Q Wu, F Zhao, M Li, H Kuang, J Hui, O Wang, B Wang, M Ni, W Zhang, F Mu, Y Yin, H Yang, M Lisby, RJ Cornall, J Mulder, M Uhlen, MA Esteban, Y Li, L Liu, X Xu, J Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022. :param adata: an Anndata object, after normalization. :param spatial_key: the key in `.obsm` that corresponds to the spatial coordinate of each bucket. :param key_added: adata.obs key under which to add the cluster labels. :param pca_key: label for the .obsm key containing PCA information (without the potential prefix "X_") :param e_neigh: the number of nearest neighbor in gene expression space. :param s_neigh: the number of nearest neighbor in physical space. :param resolution: the resolution parameter of the leiden clustering algorithm. :returns: An `~anndata.AnnData` object with cluster info in .obs. :rtype: adata .. py:function:: smooth(adata: anndata.AnnData, radius: int = 50, key: str = 'label') -> list Optimize the label by majority voting in the neighborhood. :param adata: an Anndata object, after normalization. :param radius: the radius of the neighborhood. :param key: the key in `.obs` that corresponds to the cluster labels. .. py:function:: spagcn_pyg(adata: anndata.AnnData, n_clusters: int, p: float = 0.5, s: int = 1, b: int = 49, refine_shape: Optional[str] = None, his_img_path: Optional[str] = None, total_umi: Optional[str] = None, x_pixel: str = None, y_pixel: str = None, x_array: str = None, y_array: str = None, seed: int = 100, copy: bool = False) -> Optional[anndata.AnnData] Function to find clusters with spagcn. Reference: Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021) :param adata: an Anndata object, after normalization. :param n_clusters: Desired number of clusters. :param p: parameter `p` in spagcn algorithm. See `SpaGCN` for details. Defaults to 0.5. :param s: alpha to control the color scale in calculating adjacent matrix. Defaults to 1. :param b: beta to control the range of neighbourhood when calculate grey value for one spot in calculating adjacent matrix. Defaults to 49. :param refine_shape: Smooth the spatial domains with given spatial topology, "hexagon" for Visium data, "square" for ST data. Defaults to None. :param his_img_path: The file path of histology image used to calculate adjacent matrix in spagcn algorithm. Defaults to None. :param total_umi: By providing the key(colname) in `adata.obs` which contains total UMIs(counts) for each spot, the function use the total counts as a grayscale image when histology image is not provided. Ignored if his_img_path is not `None`. Defaults to "total_umi". :param x_pixel: The key(colname) in `adata.obs` which contains corresponding x-pixels in histology image. Defaults to None. :param y_pixel: The key(colname) in `adata.obs` which contains corresponding y-pixels in histology image. Defaults to None. :param x_array: The key(colname) in `adata.obs` which contains corresponding x-coordinates. Defaults to None. :param y_array: The key(colname) in `adata.obs` which contains corresponding y-coordinates. Defaults to None. :param seed: Global seed for `random`, `torch`, `numpy`. Defaults to 100. :param copy: Whether to return a new deep copy of `adata` instead of updating `adata` object passed in arguments. Defaults to False. :returns: `~anndata.AnnData`: An `~anndata.AnnData` object with cluster info in "spagcn_pred", and in "spagcn_pred_refined" if `refine_shape` is set. The adjacent matrix used in spagcn algorithm is saved in `adata.uns["adj_spagcn"]`. :rtype: class .. py:function:: compute_pca_components(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], random_state: Optional[int] = 1, save_curve_img: Optional[str] = None) -> Tuple[Any, int, float] Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain. :param matrix: A dense or sparse matrix. :param save_curve_img: If save_curve_img != None, save the image of the PCA curve and inflection points. :returns: The number of principal components that PCA should retain. new_components_stored: Percentage of variance explained by the retained principal components. :rtype: new_n_components .. py:function:: ecp_silhouette(matrix: Union[numpy.ndarray, scipy.sparse.spmatrix], cluster_labels: numpy.ndarray) -> float Here we evaluate the clustering performance by calculating the Silhouette Coefficient. The silhouette analysis is used to choose an optimal value for clustering resolution. The Silhouette Coefficient is a widely used method for evaluating clustering performance, where a higher Silhouette Coefficient score relates to a model with better defined clusters and indicates a good separation between the celltypes. Advantages of the Silhouette Coefficient: * The score is bounded between -1 for incorrect clustering and +1 for highly dense clustering. Scores around zero indicate overlapping clusters. * The score is higher when clusters are dense and well separated, which relates to a standard concept of a cluster. Original Code Repository: https://scikit-learn.org/stable/modules/clustering.html#silhouette-coefficient :param matrix: A dense or sparse matrix of feature. :param cluster_labels: A array of labels for each cluster. :returns: Mean Silhouette Coefficient for all clusters. .. rubric:: Examples >>> silhouette_score(matrix=adata.obsm["X_pca"], cluster_labels=adata.obs["leiden"].values) .. py:function:: integrate(adatas: List[anndata.AnnData], batch_key: str = 'slices', fill_value: Union[int, float] = 0) -> anndata.AnnData Concatenating all anndata objects. :param adatas: AnnData matrices to concatenate with. :param batch_key: Add the batch annotation to :attr:`obs` using this key. :param fill_value: Scalar value to fill newly missing values in arrays with. :returns: The concatenated AnnData, where adata.obs[batch_key] stores a categorical variable labeling the batch. :rtype: integrated_adata .. py:function:: pca_spateo(adata: anndata.AnnData, X_data: Optional[numpy.ndarray] = None, n_pca_components: Optional[int] = None, pca_key: Optional[str] = 'X_pca', genes: Union[list, None] = None, layer: Union[str, None] = None, random_state: Optional[int] = 1) Do PCA for dimensional reduction. :param adata: An Anndata object. :param X_data: The user supplied data that will be used for dimension reduction directly. :param n_pca_components: The number of principal components that PCA will retain. If none, will Calculate the inflection point of the PCA curve to obtain the number of principal components that the PCA should retain. :param pca_key: Add the PCA result to :attr:`obsm` using this key. :param genes: The list of genes that will be used to subset the data for dimension reduction and clustering. If `None`, all genes will be used. :param layer: The layer that will be used to retrieve data for dimension reduction and clustering. If `None`, will use ``adata.X``. :returns: The processed AnnData, where adata.obsm[pca_key] stores the PCA result. :rtype: adata_after_pca .. py:function:: pearson_residuals(adata: anndata.AnnData, n_top_genes: Optional[int] = 3000, subset: bool = False, theta: float = 100, clip: Optional[float] = None, check_values: bool = True) Preprocess UMI count data with analytic Pearson residuals. Pearson residuals transform raw UMI counts into a representation where three aims are achieved: 1.Remove the technical variation that comes from differences in total counts between cells; 2.Stabilize the mean-variance relationship across genes, i.e. ensure that biological signal from both low and high expression genes can contribute similarly to downstream processing 3.Genes that are homogeneously expressed (like housekeeping genes) have small variance, while genes that are differentially expressed (like marker genes) have high variance :param adata: An anndata object. :param n_top_genes: Number of highly-variable genes to keep. :param subset: Inplace subset to highly-variable genes if `True` otherwise merely indicate highly variable genes. :param theta: The negative binomial overdispersion parameter theta for Pearson residuals. Higher values correspond to less overdispersion (var = mean + mean^2/theta), and `theta=np.Inf` corresponds to a Poisson model. :param clip: Determines if and how residuals are clipped: * If `None`, residuals are clipped to the interval [-sqrt(n), sqrt(n)], where n is the number of cells in the dataset (default behavior). * If any scalar c, residuals are clipped to the interval [-c, c]. Set `clip=np.Inf` for no clipping. :param check_values: Check if counts in selected layer are integers. A Warning is returned if set to True. :returns: Updates adata with the field ``adata.obsm["pearson_residuals"]``, containing pearson_residuals. .. py:function:: scc(adata: anndata.AnnData, spatial_key: str = 'spatial', key_added: Optional[str] = 'scc', pca_key: str = 'pca', e_neigh: int = 30, s_neigh: int = 6, resolution: Optional[float] = None, cluster_method: str = 'louvain') -> Optional[anndata.AnnData] Spatially constrained clustering (scc) to identify continuous tissue domains. Reference: Ao Chen, Sha Liao, Mengnan Cheng, Kailong Ma, Liang Wu, Yiwei Lai, Xiaojie Qiu, Jin Yang, Wenjiao Li, Jiangshan Xu, Shijie Hao, Xin Wang, Huifang Lu, Xi Chen, Xing Liu, Xin Huang, Feng Lin, Zhao Li, Yan Hong, Defeng Fu, Yujia Jiang, Jian Peng, Shuai Liu, Mengzhe Shen, Chuanyu Liu, Quanshui Li, Yue Yuan, Huiwen Zheng, Zhifeng Wang, H Xiang, L Han, B Qin, P Guo, PM Cánoves, JP Thiery, Q Wu, F Zhao, M Li, H Kuang, J Hui, O Wang, B Wang, M Ni, W Zhang, F Mu, Y Yin, H Yang, M Lisby, RJ Cornall, J Mulder, M Uhlen, MA Esteban, Y Li, L Liu, X Xu, J Wang. Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell, 2022. :param adata: an Anndata object, after normalization. :param spatial_key: the key in `.obsm` that corresponds to the spatial coordinate of each bucket. :param key_added: adata.obs key under which to add the cluster labels. :param pca_key: label for the .obsm key containing PCA information (without the potential prefix "X_") :param e_neigh: the number of nearest neighbor in gene expression space. :param s_neigh: the number of nearest neighbor in physical space. :param resolution: the resolution parameter of the leiden clustering algorithm. :returns: An `~anndata.AnnData` object with cluster info in .obs. :rtype: adata .. py:function:: spagcn_pyg(adata: anndata.AnnData, n_clusters: int, p: float = 0.5, s: int = 1, b: int = 49, refine_shape: Optional[str] = None, his_img_path: Optional[str] = None, total_umi: Optional[str] = None, x_pixel: str = None, y_pixel: str = None, x_array: str = None, y_array: str = None, seed: int = 100, copy: bool = False) -> Optional[anndata.AnnData] Function to find clusters with spagcn. Reference: Jian Hu, Xiangjie Li, Kyle Coleman, Amelia Schroeder, Nan Ma, David J. Irwin, Edward B. Lee, Russell T. Shinohara & Mingyao Li. SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network. Nature Methods volume 18, pages1342–1351 (2021) :param adata: an Anndata object, after normalization. :param n_clusters: Desired number of clusters. :param p: parameter `p` in spagcn algorithm. See `SpaGCN` for details. Defaults to 0.5. :param s: alpha to control the color scale in calculating adjacent matrix. Defaults to 1. :param b: beta to control the range of neighbourhood when calculate grey value for one spot in calculating adjacent matrix. Defaults to 49. :param refine_shape: Smooth the spatial domains with given spatial topology, "hexagon" for Visium data, "square" for ST data. Defaults to None. :param his_img_path: The file path of histology image used to calculate adjacent matrix in spagcn algorithm. Defaults to None. :param total_umi: By providing the key(colname) in `adata.obs` which contains total UMIs(counts) for each spot, the function use the total counts as a grayscale image when histology image is not provided. Ignored if his_img_path is not `None`. Defaults to "total_umi". :param x_pixel: The key(colname) in `adata.obs` which contains corresponding x-pixels in histology image. Defaults to None. :param y_pixel: The key(colname) in `adata.obs` which contains corresponding y-pixels in histology image. Defaults to None. :param x_array: The key(colname) in `adata.obs` which contains corresponding x-coordinates. Defaults to None. :param y_array: The key(colname) in `adata.obs` which contains corresponding y-coordinates. Defaults to None. :param seed: Global seed for `random`, `torch`, `numpy`. Defaults to 100. :param copy: Whether to return a new deep copy of `adata` instead of updating `adata` object passed in arguments. Defaults to False. :returns: `~anndata.AnnData`: An `~anndata.AnnData` object with cluster info in "spagcn_pred", and in "spagcn_pred_refined" if `refine_shape` is set. The adjacent matrix used in spagcn algorithm is saved in `adata.uns["adj_spagcn"]`. :rtype: class .. py:function:: find_all_cluster_degs(adata: anndata.AnnData, group: str, genes: Optional[List[str]] = None, layer: Optional[str] = None, X_data: Optional[numpy.ndarray] = None, copy: bool = True, n_jobs: int = 1) -> anndata.AnnData Find marker genes for each group of buckets based on gene expression. :param adata: An Annadata object :param group: The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes. :param genes: The list of genes that will be used to subset the data for identifying DEGs. If `None`, all genes will be used. :param layer: The layer that will be used to retrieve data for DEG analyses. If `None` and `X_data` is not given, .X is used. :param X_data: The user supplied data that will be used for marker gene detection directly. :param copy: If True (default) a new copy of the adata object will be returned, otherwise if False, the adata will be updated inplace. :param n_cores: `int` (default=1) The maximum number of concurrently running jobs. By default it is 1 and thus no parallel computing code is used at all. When -1 all CPUs are used. :returns: An `~anndata.AnnData` with a new property `cluster_markers` in the .uns attribute, which includes a concatenated pandas DataFrame of the differential expression analysis result for all groups and a dictionary where keys are cluster numbers and values are lists of marker genes for the corresponding clusters. Please note that the markers are not the top marker genes. To identify top `n` marker genes, Use `st.tl.cluster_degs.top_n_degs(adata, group='louvain')`. .. py:function:: find_cluster_degs(adata: anndata.AnnData, test_group: str, control_groups: List[str], genes: Optional[List[str]] = None, layer: Optional[str] = None, X_data: Optional[numpy.ndarray] = None, group: Optional[str] = None, qval_thresh: float = 0.05, ratio_expr_thresh: float = 0.1, diff_ratio_expr_thresh: float = 0, log2fc_thresh: float = 0, method: Literal['multiple', 'pairwise'] = 'multiple') -> pandas.DataFrame Find marker genes between one group to other groups based on gene expression. Test each gene for differential expression between buckets in one group and the other groups via Mann-Whitney U test. We calculate the percentage of buckets expressing the gene in the test group (ratio_expr), the difference between the percentages of buckets expressing the gene in the test group and control groups (diff_ratio_expr), the expression fold change between the test and control groups (log2fc), qval is calculated using Benjamini-Hochberg. In addition, the `1 - Jessen-Shannon` distance between the distribution of percentage of cells with expression across all groups to the hypothetical perfect distribution in which only the test group of cells has expression (jsd_adj_score), and Pearson's correlation coefficient between gene vector which actually detected expression in all cells and an ideal marker gene which is only expressed in test_group cells (ppc_score), as well as cosine_score are also calculated. :param adata: an Annodata object :param test_group: The group name from `group` for which markers has to be found. :param control_groups: The list of group name(s) from `group` for which markers has to be tested against. :param genes: The list of genes that will be used to subset the data for identifying DEGs. If `None`, all genes will be used. :param layer: The layer that will be used to retrieve data for DEG analyses. If `None` and `X_data` is not given, .X is used. :param group: The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes. :param X_data: The user supplied data that will be used for marker gene detection directly. :param qval_thresh: The maximal threshold of qval to be considered as significant genes. :param ratio_expr_thresh: The minimum percentage of buckets expressing the gene in the test group. :param diff_ratio_expr_thresh: The minimum of the difference between two groups. :param log2fc_thresh: The minimum expression log2 fold change. :param method: This method is to choose the difference expression genes between test group and other groups one by one or combine them together (default: 'multiple'). Valid values are "multiple" and "pairwise". :returns: A pandas DataFrame of the differential expression analysis result between the two groups. :raises ValueError: If the `method` is not one of "pairwise" or "multiple". .. py:function:: find_spatial_cluster_degs(adata: anndata.AnnData, test_group: str, x: Optional[List[int]] = None, y: Optional[List[int]] = None, group: Optional[str] = None, genes: Optional[List[str]] = None, k: int = 10, ratio_thresh: float = 0.5) -> pandas.DataFrame Function to search nearest neighbor groups in spatial space for the given test group. :param adata: an Annodata object. :param test_group: The group name from `group` for which neighbors has to be found. :param x: x-coordinates of all buckets. :param y: y-coordinates of all buckets. :param group: The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. :param genes: The list of genes that will be used to subset the data for identifying DEGs. If `None`, all genes will be used. :param k: Number of neighbors to use for kneighbors queries. :param ratio_thresh: For each non-test group, if more than 50% (default) of its buckets are in the neighboring set, this group is then selected as a neighboring group. :returns: A pandas DataFrame of the differential expression analysis result between the test group and neighbor groups. .. py:function:: top_n_degs(adata: anndata.AnnData, group: str, custom_score_func: Union[None, Callable] = None, sort_by: Union[str, List[str]] = 'log2fc', top_n_genes=10, only_deg_list: bool = True) Find top `n` marker genes for each group of buckets based on differential gene expression analysis results. :param adata: an Annodata object :param group: The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. This will be used for calculating group-specific genes. :param custom_score_func: A custom function to calculate the score based on the DEG analyses result. Note the columns in adata.uns["cluster_markers"]["deg_tables"] includes: * "test_group", * "control_group", * "ratio_expr", * "diff_ratio_expr", * "person_score", * "cosine_score", * "jsd_adj_score", * "log2fc", * "combined_score", * "pval", * "qval". :param sort_by: `str` or `list` Column name or names to sort by. :param top_n_genes: `int` The number of top sorted markers. :param only_gene_list: `bool` Whether to only return the marker gene list for each cluster. .. py:class:: Lasso(adata) Lasso an region of interest (ROI) based on spatial cluster. .. rubric:: Examples L = st.tl.Lasso(adata) L.vi_plot(group='group', group_color='group_color') .. py:attribute:: __sub_inde :value: [] .. py:attribute:: sub_adata :value: None .. py:attribute:: adata .. py:method:: vi_plot(key='spatial', group: Optional[str] = None, group_color: Optional[str] = None) Plot spatial cluster result and lasso ROI. :param key: The column key in .obsm, default to be 'spatial'. :param group: The column key/name that identifies the grouping information (for example, clusters that correspond to different cell types) of buckets. :param group_color: The key in .uns, corresponds to a dictionary that map group names to group colors. :returns: subset of adata. :rtype: sub_adata .. py:function:: AffineTrans(x: numpy.ndarray, y: numpy.ndarray, centroid_x: float, centroid_y: float, theta: Tuple[None, float], R: Tuple[None, numpy.ndarray]) -> Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray] Translate the x/y coordinates of data points by the translating the centroid to the origin. Then data will be rotated with angle theta. :param x: x coordinates for the data points (bins). 1D np.array. :param y: y coordinates for the data points (bins). 1D np.array. :param centroid_x: x coordinates for the centroid of data points (bins). :param centroid_y: y coordinates for the centroid of data points (bins). :param theta: the angle of rotation. Unit is is in `np.pi` (so 90 degree is `np.pi / 2` and value is defined in the clockwise direction. :param R: the rotation matrix. If `R` is provided, `theta` will be ignored. :returns: The translation matrix used in affine transformation. T_r: The rotation matrix used in affine transformation. trans_xy_coord: The matrix that stores the translated and rotated coordinates. :rtype: T_t .. py:function:: align_slices_pca(adata: anndata.AnnData, spatial_key: str = 'spatial', inplace: bool = False, result_key: Tuple[None, str] = None) -> None Coarsely align the slices based on the major axis, identified via PCA :param adata: the input adata object that contains the spatial key in .obsm. :param spatial_key: the key in .obsm that points to the spatial information. :param inplace: whether the spatial coordinates will be inplace updated or a new key `spatial_. :param result_key: when inplace is False, this points to the key in .obsm that stores the corrected spatial coordinates. :returns: Nothing but updates the spatial coordinates either inplace or with the `result_key` key based on the major axis identified via PCA. .. py:function:: pca_align(X: numpy.ndarray) -> Tuple[numpy.ndarray, numpy.ndarray] Use pca to rotate a coordinate matrix to reveal the largest variance on each dimension. This can be used to `correct`, for example, embryo slices to the right orientation. :param X: The input coordinate matrix. :returns: The rotated coordinate matrix that has the major variances on each dimension. R: The rotation matrix that was used to convert the input X matrix to output Y matrix. :rtype: Y .. py:function:: procrustes(X: numpy.ndarray, Y: numpy.ndarray, scaling: bool = True, reflection: str = 'best') -> Tuple[float, numpy.ndarray, dict] A port of MATLAB's `procrustes` function to Numpy. This function will need to be rewritten just with scipy.spatial.procrustes and scipy.linalg.orthogonal_procrustes later. Procrustes analysis determines a linear transformation (translation, reflection, orthogonal rotation and scaling) of the points in Y to best conform them to the points in matrix X, using the sum of squared errors as the goodness of fit criterion. d, Z, [tform] = procrustes(X, Y) :param X: matrices of target and input coordinates. they must have equal numbers of points (rows), but Y may have fewer dimensions (columns) than X. scaling: if False, the scaling component of the transformation is forced to 1 :param Y: matrices of target and input coordinates. they must have equal numbers of points (rows), but Y may have fewer dimensions (columns) than X. scaling: if False, the scaling component of the transformation is forced to 1 :param reflection: if 'best' (default), the transformation solution may or may not include a reflection component, depending on which fits the data best. setting reflection to True or False forces a solution with reflection or no reflection respectively. :returns: the residual sum of squared errors, normalized according to a measure of the scale of X, ((X - X.mean(0))**2).sum() Z: the matrix of transformed Y-values tform: a dict specifying the rotation, translation and scaling that maps X --> Y :rtype: d .. py:function:: construct_nn_graph(adata: anndata.AnnData, spatial_key: str = 'spatial', dist_metric: str = 'euclidean', n_neighbors: int = 8, exclude_self: bool = True, make_symmetrical: bool = False, save_id: Union[None, str] = None) -> None Constructing bucket-to-bucket nearest neighbors graph. :param adata: An anndata object. :param spatial_key: Key in .obsm in which x- and y-coordinates are stored. :param dist_metric: Distance metric to use. Options: ‘braycurtis’, ‘canberra’, ‘chebyshev’, ‘cityblock’, ‘correlation’, ‘cosine’, ‘dice’, ‘euclidean’, ‘hamming’, ‘jaccard’, ‘jensenshannon’, ‘kulczynski1’, ‘mahalanobis’, ‘matching’, ‘minkowski’, ‘rogerstanimoto’, ‘russellrao’, ‘seuclidean’, ‘sokalmichener’, ‘sokalsneath’, ‘sqeuclidean’, ‘yule’. :param n_neighbors: Number of nearest neighbors to compute for each bucket. :param exclude_self: Set True to set elements along the diagonal to zero. :param make_symmetrical: Set True to make sure adjacency matrix is symmetrical (i.e. ensure that if A is a neighbor of B, B is also included among the neighbors of A) :param save_id: Optional string; if not None, will save distance matrix and neighbors matrix to path: :param './neighbors/{save_id}_distance.csv' and path: './neighbors/{save_id}_neighbors.csv', respectively. .. py:function:: neighbors(adata: anndata.AnnData, nbr_object: sklearn.neighbors.NearestNeighbors = None, basis: str = 'pca', spatial_key: str = 'spatial', n_neighbors_method: str = 'ball_tree', n_pca_components: int = 30, n_neighbors: int = 10) -> Tuple[sklearn.neighbors.NearestNeighbors, anndata.AnnData] Given an AnnData object, compute pairwise connectivity matrix in transcriptomic or physical space :param adata: an anndata object. :param nbr_object: An optional sklearn.neighbors.NearestNeighbors object. Can optionally create a nearest neighbor object with custom functionality. :param basis: str, default 'pca' The space that will be used for nearest neighbor search. Valid names includes, for example, `pca`, `umap`, or `X` for gene expression neighbors, 'spatial' for neighbors in the physical space. :param spatial_key: Optional, can be used to specify .obsm entry in adata that contains spatial coordinates. Only used if basis is 'spatial'. :param n_neighbors_method: str, default 'ball_tree' Specifies algorithm to use in computing neighbors using sklearn's implementation. Options: "ball_tree" and "kd_tree". :param n_pca_components: Only used if 'basis' is 'pca'. Sets number of principal components to compute (if PCA has not already been computed for this dataset). :param n_neighbors: Number of neighbors for kneighbors queries. :returns: Object of class `sklearn.neighbors.NearestNeighbors` adata : Modified AnnData object :rtype: nbrs .. py:function:: glm_degs(adata: anndata.AnnData, X_data: Optional[numpy.ndarray] = None, genes: Optional[list] = None, layer: Optional[str] = None, key_added: str = 'glm_degs', fullModelFormulaStr: str = '~cr(time, df=3)', reducedModelFormulaStr: str = '~1', qval_threshold: Optional[float] = 0.05, llf_threshold: Optional[float] = -2000, ci_alpha: float = 0.05, use_zinb: bool = False, zero_infl_formula: Optional[str] = None, inplace: bool = True) -> Optional[anndata.AnnData] Differential genes expression tests using generalized linear regressions. Here only size factor normalized gene expression matrix can be used, and SCT/pearson residuals transformed gene expression can not be used. Tests each gene for differential expression as a function of integral time (the time estimated via the reconstructed vector field function) or pseudo-time using generalized additive models with natural spline basis. This function can also use other co-variates as specified in the full (i.e `~clusters`) and reduced model formula to identify differentially expression genes across different categories, group, etc. glm_degs relies on statsmodels package and is adapted from the `differentialGeneTest` function in Monocle. Note that glm_degs supports performing deg analysis for any layer or normalized data in your adata object. That is you can either use the total, new, unspliced or velocity, etc. for the differential expression analysis. :param adata: An Anndata object. The anndata object must contain a size factor normalized gene expression matrix. :param X_data: The user supplied data that will be used for differential expression analysis directly. :param genes: The list of genes that will be used to subset the data for differential expression analysis. If ``genes = None``, all genes will be used. :param layer: The layer that will be used to retrieve data for dimension reduction and clustering. If ``layer = None``, ``.X`` is used. :param key_added: The key that will be used for the glm_degs key in ``.uns``. :param fullModelFormulaStr: A formula string specifying the full model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature. :param reducedModelFormulaStr: A formula string specifying the reduced model in differential expression tests (i.e. likelihood ratio tests) for each gene/feature. :param qval_threshold: Only keep the glm test results whose qval is less than the ``qval_threshold``. :param llf_threshold: Only keep the glm test results whose log-likelihood is less than the ``llf_threshold``. :param ci_alpha: The significance level for the confidence interval. The default ``ci_alpha = .05`` returns a 95% confidence interval. :param use_zinb: Whether to use zero-inflated negative binomial model. :param zero_infl_formula: A formula string specifying the zero-inflated part of the model. :param inplace: Whether to copy adata or modify it inplace. :returns: An ``AnnData`` object is updated/copied with the ``key_added`` dictionary in the ``.uns`` attribute, storing the differential expression test results after the GLM test. .. py:class:: Label(labels_dense: Union[numpy.ndarray, list], str_map: Union[None, dict] = None, verbose: bool = False) Bases: :py:obj:`object` Given categorizations for a set of points, wrap into a Label class. labels_dense: Numerical labels. str_map: Optional mapping of numerical labels (keys) to strings (values). verbose: whether to print running info of row_normalize. .. py:attribute:: dense .. py:attribute:: num_samples .. py:attribute:: bins .. py:attribute:: ids .. py:attribute:: counts .. py:attribute:: max_id .. py:attribute:: num_labels .. py:attribute:: verbose :value: False .. py:attribute:: onehot :value: None .. py:attribute:: normalized_onehot :value: None .. py:method:: __repr__() -> str .. py:method:: __str__() -> str .. py:method:: get_onehot() -> scipy.sparse.csr_matrix return one-hot sparse array of labels. If not already computed, generate the sparse array from dense label array .. py:method:: get_normalized_onehot() -> scipy.sparse.csr_matrix Return normalized one-hot sparse array of labels. .. py:method:: generate_normalized_onehot() -> scipy.sparse.csr_matrix Generate a normalized onehot matrix where each row is normalized by the count of that label e.g. a row [0 1 1 0 0] will be converted to [0 0.5 0.5 0 0] .. py:method:: generate_onehot() -> scipy.sparse.csr_matrix Convert an array of labels to a num_labels x num_samples sparse one-hot matrix Labels MUST be integers starting from 0, but can have gaps in between e.g. [0,1,5,9] .. py:function:: create_label_class(adata: anndata.AnnData, cat_key: Union[str, List[str]]) -> Union[Label, List[Label]] Wraps categorical labels into custom Label class for downstream processing. :param adata: An anndata object. :param cat_key: Keys in .obs containing categorical labels. This function and the Label class provide the most utility when this is used in conjunction with the results of multiple different runs of the Louvain algorithm. :returns: Either an object of Label class or a list where each element is an object of Label class. Will return a list if given multiple arguments to 'cat_key'. :rtype: label .. py:function:: GM_lag_model(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, drop_dummy: Tuple[None, str] = None, n_neighbors: int = 5, layer: Tuple[None, str] = None, copy: bool = False, n_jobs=30) Spatial lag model with spatial two stage least squares (S2SLS) with results and diagnostics; Anselin (1988). :math: `\log{P_i} = lpha + ho \log{P_{lag-i}} + \sum_k eta_k X_{ki} + \epsilon_i` Reference: https://geographicdata.science/book/notebooks/11_regression.html http://darribas.org/gds_scipy16/ipynb_md/08_spatial_regression.html Args: adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). group: The key to the cell group in the adata object. spatial_key: The spatial key of the spatial coordinate of each bucket. genes: The gene that will be used for S2SLS analyses, must be included in the data. drop_dummy: The name of the dummy group. n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. layer: The key to the layer. If it is None, adata.X will be used by default. copy: Whether to copy the adata object. Returns: Depend on the `copy` argument, return a deep copied adata object (when `copy = True`) or inplace updated adata object. The result adata will include the following new columns in `adata.var`: {*}_GM_lag_coeff: coefficient of GM test for each cell group (denoted by {*}) {*}_GM_lag_zstat: z-score of GM test for each cell group (denoted by {*}) {*}_GM_lag_pval: p-value of GM test for each cell group (denoted by {*}) Examples: >>> import spateo as st >>> st.tl.GM_lag_model(adata, group='simpleanno') >>> coef_cols = adata.var.columns[adata.var.columns.str.endswith('_GM_lag_coeff')] >>> adata.var.loc[["Hbb-bt", "Hbb-bh1", "Hbb-y", "Hbb-bs"], :].T >>> for i in coef_cols[1:-1]: >>> print(i) >>> top_markers = adata.var.sort_values(i, ascending=False).index[:5] >>> st.pl.space(adata, basis='spatial', color=top_markers, ncols=5, pointsize=0.1, alpha=1) >>> st.pl.space(adata.copy(), basis='spatial', color=['simpleanno'], >>> highlights=[i.split('_GM_lag_coeff')[0]], pointsize=0.1, alpha=1, show_legend='on data') .. py:function:: lisa_geo_df(adata: anndata.AnnData, gene: str, spatial_key: str = 'spatial', n_neighbors: int = 8, layer: Tuple[None, str] = None) -> geopandas.GeoDataFrame Perform Local Indicators of Spatial Association (LISA) analyses on specific genes and prepare a geopandas dataframe for downstream lisa plots to reveal the quantile plots and the hotspot, coldspot, doughnut and diamond regions. :param adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). :param gene: The gene that will be used for lisa analyses, must be included in the data. :param spatial_key: The spatial key of the spatial coordinate of each bucket. :param n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. :param layer: the key to the layer. If it is None, adata.X will be used by default. :returns: a geopandas dataframe that includes the coordinate (`x`, `y` columns), expression (`exp` column) and lagged expression (`w_exp` column), z-score (`exp_zscore`, `w_exp_zscore`) and the LISA (`Is` column). score. :rtype: df .. py:function:: local_moran_i(adata: anndata.AnnData, group: str, spatial_key: str = 'spatial', genes: Tuple[None, list] = None, layer: Tuple[None, str] = None, n_neighbors: int = 5, copy: bool = False, n_jobs: int = 30) Identify cell type specific genes with local Moran's I test. :param adata: An adata object that has spatial information (via `spatial_key` key in adata.obsm). :param group: The key to the cell group in the adata.obs. :param spatial_key: The spatial key of the spatial coordinate of each bucket. :param genes: The gene that will be used for lisa analyses, must be included in the data. :param layer: the key to the layer. If it is None, adata.X will be used by default. :param n_neighbors: The number of nearest neighbors of each bucket that will be used in calculating the spatial lag. :param copy: Whether to copy the adata object. :returns: Depend on the `copy` argument, return a deep copied adata object (when `copy = True`) or inplace updated adata object. The resultant adata will include the following new columns in `adata.var`: {*}_num_val: The maximum number of categories (`{"hotspot", "coldspot", "doughnut", "diamond"}) across all cell groups {*}_frac_val: The maximum fraction of categories across all cell groups {*}_spec_val: The maximum specificity of categories across all cell groups {*}_num_group: The corresponding cell group with the largest number of each category (this can be affect by the cell group size). {*}_frac_group: The corresponding cell group with the highest fraction of each category. {*}_spec_group: The corresponding cell group with the highest specificity of each category. {*} can be one of `{"hotspot", "coldspot", "doughnut", "diamond"}`. Examples: >>> import spateo as st >>> markers_df = pd.DataFrame(adata.var).query("hotspot_frac_val > 0.05 & mean > 0.05"). >>> groupby(['hotspot_spec_group'])['hotspot_spec_val'].nlargest(5) >>> markers = markers_df.index.get_level_values(1) >>> >>> for i in adata.obs[group].unique(): >>> if i in markers_df.index.get_level_values(0): >>> print(markers_df[i]) >>> dyn.pl.space(adata, color=group, highlights=[i], pointsize=0.1, alpha=1, figsize=(12, 8)) >>> st.pl.space(adata, color=markers_df[i].index, pointsize=0.1, alpha=1, figsize=(12, 8)) .. py:class:: LiveWireSegmentation(image: Optional = None, smooth_image: bool = False, threshold_gradient_image: bool = False) Bases: :py:obj:`object` .. py:attribute:: _image :value: None .. py:attribute:: edges :value: None .. py:attribute:: G :value: None .. py:attribute:: smooth_image :value: False .. py:attribute:: threshold_gradient_image :value: False .. py:property:: image .. py:method:: _smooth_image() .. py:method:: _compute_gradient_image() .. py:method:: _threshold_gradient_image() .. py:method:: _compute_graph() .. py:method:: compute_shortest_path(startPt, endPt) .. py:function:: compute_shortest_path(image: numpy.ndarray, startPt: Tuple[float, float], endPt: Tuple[float, float]) -> List Inline function for easier computation of shortest_path in an image. This function will create a new instance of LiveWireSegmentation class every time it is called, calling for a recomputation of the gradient image and the shortest path graph. If you need to compute the shortest path in one image more than once, use the class-form initialization instead. :param image: image on which the shortest path should be computed :param startPt: starting point for path computation :param endPt: target point for path computation :returns: shortest path as a list of tuples (x, y), including startPt and endPt :rtype: path .. py:function:: live_wire(image: numpy.ndarray, smooth_image: bool = False, threshold_gradient_image: bool = False, interactive: bool = True) -> List[numpy.ndarray] Use LiveWire segmentation algorithm for image segmentation aka intelligent scissors. The general idea of the algorithm is to use image information for segmentation and avoid crossing object boundaries. A gradient image highlights the boundaries, and Dijkstra’s shortest path algorithm computes a path using gradient differences as segment costs. Thus the line avoids strong gradients in the gradient image, which corresponds to following object boundaries in the original image. Now let's display the image using matplotlib front end. A click on the image starts livewire segmentation. The suggestion for the best segmentation will appear as you will be moving mouse across the image. To submit a suggestion, click on the image for the second time. To finish the segmentation, press Escape key. :param image: image on which the shortest path should be computed. :param smooth_image: Whether to smooth the original image using bilateral smoothing filter. :param threshold_gradient_image: Wheter to use otsu method generate a thresholded gradient image for shortest path computation. :param interactive: Wether to generate the path interactively. :returns: A list of paths that are generated when running this algorithm. Paths can be used to segment a particular spatial domain of interests. .. py:function:: spatial_bv_local_moran(adata: anndata.AnnData, feature1_key: str, feature2_key: str, connectivity_key: str = 'spatial_connectivities', n_neighbors: int = 10, mode: str = 'moran', transformation: str = 'r', permutations: Optional[int] = 999, copy: bool = False) -> Optional[pandas.DataFrame] Calculate global bivariate Moran's I between a spatial variable and gene expression :param adata: AnnData object containing spatial data :param feature1_key: Key in `adata.obs` for the first variable or gene_name :param feature2_key: Key in `adata.obs` for the seconda variable or gene_name :param connectivity_key: Key in `adata.obsp` for spatial connectivity matrix (default: 'spatial_connectivities') :param mode: Spatial correlation mode (only 'moran' supported) :param transformation: Weight transformation method ('r' for row-standardization) :param permutations: Number of permutations for significance testing :param copy: Return a DataFrame instead of storing in AnnData :returns: * If ``copy = True``, returns a :class:`pandas.DataFrame` with the following keys -- I : float value of bivariate Moran's I q : array (if permutations>0) values indicate quandrant location 1 HH, 2 LH, 3 LL, 4 HL sim : array (if permutations>0) vector of I values for permuted samples p_sim : float (if permutations>0) p-value based on permutations (one-sided) null: spatial randomness alternative: the observed I is extreme it is either extremely high or extremely low z_sim : array (if permutations>0) standardized I based on permutations p_z_sim : float (if permutations>0) p-value based on standard normal approximation from permutations * Otherwise, modifies the ``adata`` with the following key -- - :attr:`anndata.AnnData.uns` ``['{feature1_key}_{feature2_key}_bv_local_moranI']`` - the above mentioned dataframe``. .. py:function:: spatial_bv_moran_obs_genes(adata: anndata.AnnData, obs_key: str, connectivity_key: str = 'spatial_connectivities', genes: Union[str, int, Sequence[str], Sequence[int], None] = None, n_neighbors: int = 10, mode: str = 'moran', transformation: str = 'r', permutations: Optional[int] = 999, copy: bool = False) -> Optional[pandas.DataFrame] Calculate global bivariate Moran's I between a spatial variable and gene expression :param adata: AnnData object containing spatial data :param obs_key: Key in `adata.obs` for the variable :param connectivity_key: Key in `adata.obsp` for spatial connectivity matrix (default: 'spatial_connectivities') :param genes: Genes to calculate (names or indices). If None, use all genes. :param mode: Spatial correlation mode (only 'moran' supported) :param transformation: Weight transformation method ('r' for row-standardization) :param permutations: Number of permutations for significance testing :param copy: Return a DataFrame instead of storing in AnnData :returns: * If ``copy = True``, returns a :class:`pandas.DataFrame` with the following keys -- I : float value of bivariate Moran's I sim : array (if permutations>0) vector of I values for permuted samples p_sim : float (if permutations>0) p-value based on permutations (one-sided) null: spatial randomness alternative: the observed I is extreme it is either extremely high or extremely low z_sim : array (if permutations>0) standardized I based on permutations p_z_sim : float (if permutations>0) p-value based on standard normal approximation from permutations * Otherwise, modifies the ``adata`` with the following key -- - :attr:`anndata.AnnData.uns` ``['{obs_key}_gene_bv_moranI']`` - the above mentioned dataframe``. .. py:function:: cellbin_morani(adata_cellbin: anndata.AnnData, binsize: int, cluster_key: str = 'Celltype') -> pandas.DataFrame Calculate Moran's I score for each celltype (in segmented cell adata). Since the presentation of cells are boolean values, this function first summarizes the number of each celltype using a given binsize, creating a spatial 2D matrix with cell counts. Then calculates Moran's I score on the matrix for spatial score for each celltype. :param adata_cellbin: An Annodata object for segmented cells. :type adata_cellbin: :class:`~anndata.AnnData` :param binsize: The binsize used to summarize cell counts for each celltype. :type binsize: int :param cluster_key: The key in adata.obs including celltype labels. :type cluster_key: `str` (default="Celltype") :rtype: A pandas DataFrame containing the Moran' I score for celltypes. .. py:function:: moran_i(adata: anndata.AnnData, genes: Optional[List[str]] = None, layer: Optional[str] = None, spatial_key: str = 'spatial', model: Literal['2d', '3d'] = '2d', x: Optional[List[int]] = None, y: Optional[List[int]] = None, z: Optional[List[int]] = None, k: int = 5, weighted: Optional[List[str]] = None, permutations: int = 199, n_jobs: int = 1) -> pandas.DataFrame Identify genes with strong spatial autocorrelation with Moran's I test. This can be used to identify genes that are potentially related to cluster. :param adata: an Annodata object :type adata: :class:`~anndata.AnnData` :param genes: The list of genes that will be used to subset the data for dimension reduction and clustering. If `None`, all genes will be used. :type genes: `list` or None (default: `None`) :param layer: The layer that will be used to retrieve data for dimension reduction and clustering. If `None`, .X is used. :type layer: `str` or None (default: `None`) :param spatial_key: :type spatial_key: The key in ``.obsm`` that corresponds to the spatial coordinate of each cell. :param x: x-coordinates of all buckets. :type x: 'list' or None(default: `None`) :param y: y-coordinates of all buckets. :type y: 'list' or None(default: `None`) :param z: z-coordinates of all buckets. :type z: 'list' or None(default: `None`) :param k: Number of neighbors to use by default for kneighbors queries. :type k: 'int' (defult=20) :param weighted: Spatial weights, defult is None, 'kernel' is based on kernel functions. :type weighted: 'str'(defult='kernel') :param permutations: Number of random permutations for calculation of pseudo-p_values. :type permutations: `int` (default=999) :param n_cores: The maximum number of concurrently running jobs, If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all. :type n_cores: `int` (default=30) :rtype: A pandas DataFrame of the Moran' I test results.