
Master mars function
mars.RdThe primary function used for input and estimation. The function takes the data inputs and routes the estimation and structure type based on data structure. The function can handle univariate, multivariate, longitudinal, and multilevel meta-analytic models.
Usage
mars(
data,
studyID,
effectID,
sample_size,
effectsize_type = NULL,
formula = NULL,
scale_formula = NULL,
variable_names = NULL,
effectsize_name = NULL,
estimation_method = "REML",
variance = NULL,
varcov_type,
weights = NULL,
structure = "UN",
intercept = FALSE,
missing = "remove",
optim_method = "L-BFGS-B",
robustID = NULL,
multivariate_covs = NULL,
lasso = FALSE,
lasso_args = list(lambda_grid = 10^seq(1, -3, length.out = 5), K = 5, all_lasso_metrics
= FALSE, lambda_tolerance = 0),
tau2 = NULL,
tol = 1e+07,
...
)Arguments
- data
Data used for analysis
- studyID
Character string representing the study ID
- effectID
Character string representing the effect size ID
- sample_size
Character string representing the sample size of the studies.
- effectsize_type
Type of effect size being analyzed
- formula
The formula used for specifying the fixed and random structure. Used for univariate and multilevel structures.
- scale_formula
Optional one-sided formula for modeling the log-heterogeneity in location-scale models. Currently supported for
structure = "univariate"andstructure = "multilevel". For multilevel models, this can be either a single one-sided formula applied to every random-effect component or a list of one-sided formulas aligned to the random-effect components. Multilevel scale predictors must be invariant within the top-levelstudyIDcluster.- variable_names
Vector of character strings representing the attributes with correlations. The attributes that are correlated should be separated by an underscore.
- effectsize_name
Character string representing the name of the effect size column in the data.
- estimation_method
Type of estimation used, either "REML" or "MLE", REML is the default
- variance
Character string representing the name of the variance of the effect size in the data.
- varcov_type
Type of variance covariance matrix computed. Default is 'cor_weighted' for correlations or 'smd_outcome' for standardized mean differences.
- weights
User specified matrix of weights for analysis.
- structure
Between studies covariance structure, default is "UN" or unstructured. See details for more specifics.
- intercept
Whether a model intercept should be specified, default is FALSE meaning no intercept. See details for more information.
- missing
Missing-data handling mode. Use
"remove"to drop incomplete rows,"keep"to keep rows as-is, or"em"to impute missing moderator values via EM before dropping remaining incomplete rows.- optim_method
Optimization method that is passed to the optim function. Default is 'L-BFGS-B'.
- robustID
A character vector specifying the cluster group to use for computing the robust standard errors.
- multivariate_covs
A one-sided formula to specify the covariates used in a multivariate analysis.
- lasso
TRUE/FALSE indicator that specifies if LASSO results are returned. TRUE means LASSO results will be run, if number of predictors is less than number of effect sizes, both LASSO and non-LASSO results will be returned, if number of predictors is equal to or greater than the number of effect sizes, the LASSO results will only be returned. Numerical predictors are automatically standardized for optimization efficiency when doing LASSO.
- lasso_args
A list of LASSO specific arguments.
lambda_tolerancecontrols tie-breaking across lambda values by choosing the smaller lambda when CV metrics are within this tolerance.- tau2
Optional user-supplied between-study variance or covariance. If
NULL, heterogeneity is estimated. For univariate models, supply one non-negative value. For multilevel models, supply one non-negative value per random-effect component. For multivariateDIAG2models, supply one common value; forDIAG1, one value per outcome; and forUN, a positive semidefinite covariance matrix.- tol
Tolerance of the optimization, default is 1e7.
- ...
Not currently used.
Details
For LASSO, it is recommended to use MLE instead of REML for estimation.
The core estimator treats the observed effect-size vector in study \(i\) as approximately multivariate normal, $$ y_i \sim N(X_i \beta, V_i), $$ where \(X_i\) is the fixed-effect design matrix and \(V_i\) is the model-implied marginal covariance matrix. The matrix \(V_i\) is the sum of the within-study sampling covariance matrix \(S_i\) and a between-study heterogeneity component. For univariate models this component is a scalar \(\tau^2\); for multivariate models it is the selected covariance structure across outcomes; and for multilevel models it is assembled from random-effect design matrices as $$ V_i = S_i + \sum_j \tau_j^2 Z_{ij} Z_{ij}'. $$ Location-scale models replace constant heterogeneity components with log-linear variance models, so study-specific components are computed as \(\tau_i^2 = \exp(W_i \gamma)\).
For maximum likelihood (estimation_method = "MLE"), mars()
minimizes the deviance-scale objective
$$
\sum_i \left\{\log |V_i| +
(y_i - X_i \beta)' V_i^{-1} (y_i - X_i \beta)\right\},
$$
omitting constants that do not affect the optimizer. For restricted maximum
likelihood (estimation_method = "REML"), the objective adds the usual
fixed-effect adjustment
$$
\log |X' V^{-1} X|,
$$
where \(X' V^{-1} X = \sum_i X_i' V_i^{-1} X_i\). This penalizes the
likelihood for estimating the fixed effects and is the default estimator.
Optimization is performed with optim. The default
optim_method = "L-BFGS-B" is used because variance components and
covariance-structure parameters require box constraints. Variance parameters
are constrained to be positive with a lower bound of 1e-6; correlation
parameters in structured covariance models are bounded to their allowable
ranges when the structure supplies such bounds. The objective is evaluated
using Cholesky decompositions, so \(\log |V_i|\) is computed from the
Cholesky diagonal and quadratic forms are computed by triangular solves
rather than by explicitly inverting \(V_i\). The main likelihood paths pass
analytic gradients to optim(), and the fitted Hessian is retained for
downstream uncertainty calculations when available. The tol argument
is forwarded to the L-BFGS-B factr control; smaller values request
tighter convergence.
Examples
if (FALSE) { # \dontrun{
fit <- mars(
data = teacher_expectancy,
studyID = "study",
effectID = NULL,
sample_size = NULL,
formula = yi ~ 1,
variance = "vi",
varcov_type = "univariate",
structure = "univariate"
)
summary(fit)
mv_fit <- mars(
data = becker09,
studyID = "ID",
effectID = "numID",
sample_size = "N",
effectsize_type = "cor",
varcov_type = "weighted",
variable_names = c(
"Cognitive_Performance",
"Somatic_Performance",
"Selfconfidence_Performance",
"Somatic_Cognitive",
"Selfconfidence_Cognitive",
"Selfconfidence_Somatic"
)
)
summary(mv_fit)
ml_fit <- mars(
data = school,
formula = effect ~ 1 + (1 | district/study),
studyID = "district",
effectID = NULL,
sample_size = NULL,
variance = "var",
varcov_type = "multilevel",
structure = "multilevel",
estimation_method = "MLE"
)
summary(ml_fit)
} # }