Master mars function — mars • mars

The primary function used for input and estimation. The function takes the data inputs and routes the estimation and structure type based on data structure. The function can handle univariate, multivariate, longitudinal, and multilevel meta-analytic models.

Usage

mars(
  data,
  studyID,
  effectID,
  sample_size,
  effectsize_type = NULL,
  formula = NULL,
  scale_formula = NULL,
  variable_names = NULL,
  effectsize_name = NULL,
  estimation_method = "REML",
  variance = NULL,
  varcov_type,
  weights = NULL,
  structure = "UN",
  intercept = FALSE,
  missing = "remove",
  optim_method = "L-BFGS-B",
  robustID = NULL,
  multivariate_covs = NULL,
  lasso = FALSE,
  lasso_args = list(lambda_grid = 10^seq(1, -3, length.out = 5), K = 5, all_lasso_metrics
    = FALSE, lambda_tolerance = 0),
  tau2 = NULL,
  tol = 1e+07,
  ...
)

Arguments

data: Data used for analysis
studyID: Character string representing the study ID
effectID: Character string representing the effect size ID
sample_size: Character string representing the sample size of the studies.
effectsize_type: Type of effect size being analyzed
formula: The formula used for specifying the fixed and random structure. Used for univariate and multilevel structures.
scale_formula: Optional one-sided formula for modeling the log-heterogeneity in location-scale models. Currently supported for structure = "univariate" and structure = "multilevel". For multilevel models, this can be either a single one-sided formula applied to every random-effect component or a list of one-sided formulas aligned to the random-effect components. Multilevel scale predictors must be invariant within the top-level studyID cluster.
variable_names: Vector of character strings representing the attributes with correlations. The attributes that are correlated should be separated by an underscore.
effectsize_name: Character string representing the name of the effect size column in the data.
estimation_method: Type of estimation used, either "REML" or "MLE", REML is the default
variance: Character string representing the name of the variance of the effect size in the data.
varcov_type: Type of variance covariance matrix computed. Default is 'cor_weighted' for correlations or 'smd_outcome' for standardized mean differences.
weights: User specified matrix of weights for analysis.
structure: Between studies covariance structure, default is "UN" or unstructured. See details for more specifics.
intercept: Whether a model intercept should be specified, default is FALSE meaning no intercept. See details for more information.
missing: Missing-data handling mode. Use "remove" to drop incomplete rows, "keep" to keep rows as-is, or "em" to impute missing moderator values via EM before dropping remaining incomplete rows.
optim_method: Optimization method that is passed to the optim function. Default is 'L-BFGS-B'.
robustID: A character vector specifying the cluster group to use for computing the robust standard errors.
multivariate_covs: A one-sided formula to specify the covariates used in a multivariate analysis.
lasso: TRUE/FALSE indicator that specifies if LASSO results are returned. TRUE means LASSO results will be run, if number of predictors is less than number of effect sizes, both LASSO and non-LASSO results will be returned, if number of predictors is equal to or greater than the number of effect sizes, the LASSO results will only be returned. Numerical predictors are automatically standardized for optimization efficiency when doing LASSO.
lasso_args: A list of LASSO specific arguments. lambda_tolerance controls tie-breaking across lambda values by choosing the smaller lambda when CV metrics are within this tolerance.
tau2: Optional user-supplied between-study variance or covariance. If NULL, heterogeneity is estimated. For univariate models, supply one non-negative value. For multilevel models, supply one non-negative value per random-effect component. For multivariate DIAG2 models, supply one common value; for DIAG1, one value per outcome; and for UN, a positive semidefinite covariance matrix.
tol: Tolerance of the optimization, default is 1e7.
...: Not currently used.

Value

Returns a list of class mars; The returned object contains elements from the estimation.

Details

For LASSO, it is recommended to use MLE instead of REML for estimation.

The core estimator treats the observed effect-size vector in study $i$ as approximately multivariate normal, $$ y_i \sim N(X_i \beta, V_i), $$ where $X_i$ is the fixed-effect design matrix and $V_i$ is the model-implied marginal covariance matrix. The matrix $V_i$ is the sum of the within-study sampling covariance matrix $S_i$ and a between-study heterogeneity component. For univariate models this component is a scalar $\tau^2$; for multivariate models it is the selected covariance structure across outcomes; and for multilevel models it is assembled from random-effect design matrices as $$ V_i = S_i + \sum_j \tau_j^2 Z_{ij} Z_{ij}'. $$ Location-scale models replace constant heterogeneity components with log-linear variance models, so study-specific components are computed as $\tau_i^2 = \exp(W_i \gamma)$.

For maximum likelihood (estimation_method = "MLE"), mars() minimizes the deviance-scale objective $$ \sum_i \left\{\log |V_i| + (y_i - X_i \beta)' V_i^{-1} (y_i - X_i \beta)\right\}, $$ omitting constants that do not affect the optimizer. For restricted maximum likelihood (estimation_method = "REML"), the objective adds the usual fixed-effect adjustment $$ \log |X' V^{-1} X|, $$ where $X' V^{-1} X = \sum_i X_i' V_i^{-1} X_i$. This penalizes the likelihood for estimating the fixed effects and is the default estimator.

Optimization is performed with optim. The default optim_method = "L-BFGS-B" is used because variance components and covariance-structure parameters require box constraints. Variance parameters are constrained to be positive with a lower bound of 1e-6; correlation parameters in structured covariance models are bounded to their allowable ranges when the structure supplies such bounds. The objective is evaluated using Cholesky decompositions, so $\log |V_i|$ is computed from the Cholesky diagonal and quadratic forms are computed by triangular solves rather than by explicitly inverting $V_i$. The main likelihood paths pass analytic gradients to optim(), and the fitted Hessian is retained for downstream uncertainty calculations when available. The tol argument is forwarded to the L-BFGS-B factr control; smaller values request tighter convergence.

Examples

if (FALSE) { # \dontrun{
fit <- mars(
  data = teacher_expectancy,
  studyID = "study",
  effectID = NULL,
  sample_size = NULL,
  formula = yi ~ 1,
  variance = "vi",
  varcov_type = "univariate",
  structure = "univariate"
)
summary(fit)

mv_fit <- mars(
  data = becker09,
  studyID = "ID",
  effectID = "numID",
  sample_size = "N",
  effectsize_type = "cor",
  varcov_type = "weighted",
  variable_names = c(
    "Cognitive_Performance",
    "Somatic_Performance",
    "Selfconfidence_Performance",
    "Somatic_Cognitive",
    "Selfconfidence_Cognitive",
    "Selfconfidence_Somatic"
  )
)
summary(mv_fit)

ml_fit <- mars(
  data = school,
  formula = effect ~ 1 + (1 | district/study),
  studyID = "district",
  effectID = NULL,
  sample_size = NULL,
  variance = "var",
  varcov_type = "multilevel",
  structure = "multilevel",
  estimation_method = "MLE"
)
summary(ml_fit)
} # }