spateo.tools.ST_regression.regression_utils#

Auxiliary functions to aid in the interpretation functions for the spatial and spatially-lagged regression models.

Module Contents#

Functions#

softplus(z)

Numerically stable version of log(1 + exp(z)).

L1_penalty(→ float)

Implementation of the L1 penalty that penalizes based on absolute value of coefficient magnitude.

L2_penalty(→ float)

Implementation of the L2 penalty that penalizes based on the square of coefficient magnitudes.

L1_L2_penalty(→ float)

Combination of the L1 and L2 penalties.

get_fisher_inverse(→ numpy.ndarray)

Computes the Fisher matrix that measures the amount of information each feature in x provides about y- that is,

wald_test(→ numpy.ndarray)

Perform single-coefficient Wald test, informing whether a given coefficient deviates significantly from the

multitesting_correction(→ numpy.ndarray)

In the case of testing multiple hypotheses from the same experiment, perform multiple test correction to adjust

_get_p_value(→ numpy.ndarray)

Computes p-values for differential expression for each feature

compute_wald_test(→ Tuple[numpy.ndarray, ...)

param params

Array of shape [n_features, n_params]

mae(→ float)

Mean absolute error- in this context, actually log1p mean absolute error

mse(→ float)

Mean squared error- in this context, actually log1p mean squared error

plot_prior_vs_data(reconst, adata, kind, target_name, ...)

Plots distribution of observed vs. predicted counts in the form of a comparative density barplot.

spateo.tools.ST_regression.regression_utils.softplus(z)[source]#

Numerically stable version of log(1 + exp(z)).

spateo.tools.ST_regression.regression_utils.L1_penalty(beta: numpy.ndarray) float[source]#

Implementation of the L1 penalty that penalizes based on absolute value of coefficient magnitude.

Parameters
beta

Array of shape [n_features,]; learned model coefficients

Returns

float, value for the regularization parameter (typically stylized by lambda)

Return type

L1penalty

spateo.tools.ST_regression.regression_utils.L2_penalty(beta: numpy.ndarray, Tau: Union[None, numpy.ndarray] = None) float[source]#

Implementation of the L2 penalty that penalizes based on the square of coefficient magnitudes.

Parameters
beta

Array of shape [n_features,]; learned model coefficients

Tau

optional array of shape [n_features, n_features]; the Tikhonov matrix for ridge regression. If not

provided

matrix. : Tau will default to the identity

spateo.tools.ST_regression.regression_utils.L1_L2_penalty(alpha: float, beta: numpy.ndarray, Tau: Union[None, numpy.ndarray] = None) float[source]#

Combination of the L1 and L2 penalties.

Parameters
alpha

The weighting between L1 penalty (alpha=1.) and L2 penalty (alpha=0.) term of the loss function.

beta

Array of shape [n_features,]; learned model coefficients

Tau

optional array of shape [n_features, n_features]; the Tikhonov matrix for ridge regression. If not

provided

matrix. : Tau will default to the identity

Returns

Value for the regularization parameter

Return type

P

spateo.tools.ST_regression.regression_utils.get_fisher_inverse(x: numpy.ndarray, y: numpy.ndarray) numpy.ndarray[source]#

Computes the Fisher matrix that measures the amount of information each feature in x provides about y- that is, whether the log-likelihood is sensitive to change in the parameter x.

Parameters
x

Independent variable array

y

Dependent variable array

Returns

np.ndarray

Return type

inverse_fisher

spateo.tools.ST_regression.regression_utils.wald_test(theta_mle: numpy.ndarray, theta_sd: numpy.ndarray, theta0: Union[float, numpy.ndarray] = 0) numpy.ndarray[source]#

Perform single-coefficient Wald test, informing whether a given coefficient deviates significantly from the supplied reference value (theta0), based on the standard deviation of the posterior of the parameter estimate.

Parameters
theta_mle

Maximum likelihood estimation of given parameter by feature

theta_sd

Standard deviation of the maximum likelihood estimation

theta0

Value(s) to test theta_mle against. Must be either a single number or an array w/ equal number of entries to theta_mle.

Returns

np.ndarray

Return type

pvals

spateo.tools.ST_regression.regression_utils.multitesting_correction(pvals: numpy.ndarray, method: str = 'fdr_bh', alpha: float = 0.05) numpy.ndarray[source]#

In the case of testing multiple hypotheses from the same experiment, perform multiple test correction to adjust q-values.

Args: pvals: Uncorrected p-values; must be given as a one-dimensional array method: Method to use for correction. Available methods can be found in the documentation for

statsmodels.stats.multitest.multipletests(), and are also listed below (in correct case) for convenience:
  • Named methods:
    • bonferroni

    • sidak

    • holm-sidak

    • holm

    • simes-hochberg

    • hommel

  • Abbreviated methods:
    • fdr_bh: Benjamini-Hochberg correction

    • fdr_by: Benjamini-Yekutieli correction

    • fdr_tsbh: Two-stage Benjamini-Hochberg

    • fdr_tsbky: Two-stage Benjamini-Krieger-Yekutieli method

alpha: Family-wise error rate (FWER)

Returns

qval: p-values post-correction

spateo.tools.ST_regression.regression_utils._get_p_value(variables: numpy.array, fisher_inv: numpy.array, coef_loc_totest: int) numpy.ndarray[source]#

Computes p-values for differential expression for each feature

Parameters
variables

Array where each column corresponds to a feature

fisher_inv

Inverse Fisher information matrix

coef_loc_totest

Numerical column of the array corresponding to the coefficient to test

Returns

Array of identical shape to variables, where each element is a p-value for that instance of that

feature

Return type

pvalues

spateo.tools.ST_regression.regression_utils.compute_wald_test(params: numpy.ndarray, fisher_inv: numpy.ndarray, significance_threshold: float = 0.01) Tuple[numpy.ndarray, numpy.ndarray, numpy.ndarray][source]#
Parameters
params

Array of shape [n_features, n_params]

fisher_inv

Inverse Fisher information matrix

significance_threshold

Upper threshold to be considered significant

Returns

Array of identical shape to variables, where each element is True or False if it meets the

threshold for significance

pvalues: Array of identical shape to variables, where each element is a p-value for that instance of that

feature

qvalues: Array of identical shape to variables, where each element is a q-value for that instance of that

feature

Return type

significance

spateo.tools.ST_regression.regression_utils.mae(y_true, y_pred) float[source]#

Mean absolute error- in this context, actually log1p mean absolute error

Parameters
y_true

Regression model output

y_pred

Observed values for the dependent variable

Returns

Mean absolute error value across all samples

Return type

mae

spateo.tools.ST_regression.regression_utils.mse(y_true, y_pred) float[source]#

Mean squared error- in this context, actually log1p mean squared error

Parameters
y_true

Regression model output

y_pred

Observed values for the dependent variable

Returns

Mean squared error value across all samples

Return type

mse

spateo.tools.ST_regression.regression_utils.plot_prior_vs_data(reconst: pandas.DataFrame, adata: anndata.AnnData, kind: str = 'barplot', target_name: Union[None, str] = None, title: Union[None, str] = None, figsize: Union[None, Tuple[float, float]] = None, save_show_or_return: Literal[save, show, return, both, all] = 'save', save_kwargs: dict = {})[source]#

Plots distribution of observed vs. predicted counts in the form of a comparative density barplot.

Parameters
reconst

DataFrame containing values for reconstruction/prediction of targets of a regression model

adata

AnnData object containing observed counts

kind

Kind of plot to generate. Options: “barplot”, “scatterplot”. Case sensitive, defaults to “barplot”.

target_name

Optional, can be:
  • Column name in DataFrame/AnnData object: name of gene to subset to

  • ”sum”: computes sum over all features present in ‘reconst’ to compare to the corresponding subset of

’adata’. - “mean”: computes mean over all features present in ‘reconst’ to compare to the corresponding subset of ‘adata’.

If not given, will subset AnnData to features in ‘reconst’ and flatten both arrays to compare all values.

If not given, will compute the sum over all features present in ‘reconst’ and compare to the corresponding subset of ‘adata’.

save_show_or_return

Whether to save, show or return the figure. If “both”, it will save and plot the figure at the same time. If “all”, the figure will be saved, displayed and the associated axis and other object will be return.

save_kwargs

A dictionary that will passed to the save_fig function. By default it is an empty dictionary and the save_fig function will use the {“path”: None, “prefix”: ‘scatter’, “dpi”: None, “ext”: ‘pdf’, “transparent”: True, “close”: True, “verbose”: True} as its parameters. Otherwise you can provide a dictionary that properly modifies those keys according to your needs.