
Fit MARS Meta-Analytic Models
mars.RdPrimary interface for fitting univariate, multivariate/correlation, standardized-mean-difference, and multilevel meta-analytic models. The function inspects the supplied modeling arguments and routes the data to the appropriate estimator.
Usage
mars(
data,
studyID,
effectID,
sample_size,
effectsize_type = NULL,
formula = NULL,
scale_formula = NULL,
variable_names = NULL,
effectsize_name = NULL,
estimation_method = "REML",
variance = NULL,
varcov_type,
weights = NULL,
structure = "UN",
intercept = FALSE,
missing = "remove",
optim_method = "L-BFGS-B",
robustID = NULL,
multivariate_covs = NULL,
lasso = FALSE,
lasso_args = list(lambda_grid = 10^seq(1, -3, length.out = 5), K = 5, all_lasso_metrics
= FALSE, lambda_tolerance = 0),
tau2 = NULL,
tol = 1e+07,
...
)Arguments
- data
Data frame containing the effect sizes, sampling variances or inputs needed to construct sampling covariance matrices, study identifiers, and any moderators.
- studyID
Character string naming the study-level cluster column.
- effectID
Character string naming an effect-size or outcome identifier column for dependent-effect workflows. Use
NULLfor one effect size per study or when the structure does not require a separate effect ID.- sample_size
Character string naming the study sample-size column. This is required for correlation and some standardized-mean-difference covariance estimators and can be
NULLfor formula-based univariate or multilevel models with an observedvariancecolumn.- effectsize_type
Optional effect-size family for non-formula workflows. Use
"cor"for correlation synthesis. Other values, such as"smd","md","rom","rate", or"reliability", useeffectsize_namewith the selectedvarcov_type. LeaveNULLfor formula-based univariate or multilevel models.- formula
Model formula used for univariate and multilevel models. The left-hand side is the effect-size column. Fixed-effect moderators appear on the right-hand side. Multilevel models can add random effects with lme4-style syntax such as
effect ~ 1 + x + (1 | district/study).- scale_formula
Optional one-sided formula for modeling the log-heterogeneity in location-scale models. Currently supported for
structure = "univariate"andstructure = "multilevel". For multilevel models, this can be either a single one-sided formula applied to every random-effect component or a list of one-sided formulas aligned to the random-effect components. Multilevel scale predictors must be invariant within the top-levelstudyIDcluster.- variable_names
Vector of character strings naming effect-size columns for multivariate/correlation workflows. For correlations, names should identify the variable pair separated by an underscore, for example
"Cognitive_Performance".- effectsize_name
Character string representing the name of the effect size column in the data.
- estimation_method
Estimation method. Use
"REML"for restricted maximum likelihood or"MLE"for maximum likelihood. REML is the default for ordinary model fitting; MLE is recommended for LASSO model selection and likelihood-ratio style comparisons.- variance
Character string naming the sampling-variance column for formula-based models.
- varcov_type
Type of within-study variance-covariance matrix to use or construct. Common values include
"univariate"for one supplied variance per row,"multilevel"for formula-based dependent effects,"weighted"or"cor_weighted"for correlation synthesis, and"smd_outcome"for standardized mean differences. Additional metric-aware options include shared-control continuous outcomes ("mean_diff","log_response_ratio"), repeated-measures standardized change ("standardized_mean_change"), binary and rate outcomes ("log_rr","log_or","log_rate_ratio","rate_difference"), survival summaries ("log_hr"), transformed proportions ("logit_proportion","arcsine_proportion","freeman_tukey_proportion"), partial correlations ("partial_cor","partial_fisher_z"), reliability coefficients ("reliability","log_reliability"), and Fisher-z correlation-matrix options ("fisher_z_simple","fisher_z_weighted","fisher_z_average"). See theWithin-Study-VarCov-Metricsvignette for metric-specific options.- weights
Optional user-specified weight or inverse-covariance matrix. When supplied, these weights override the internally constructed within-study covariance for the corresponding analysis path.
- structure
Between-study heterogeneity structure. Use
"univariate"for single-effect models,"multilevel"for random-effect formulas,"UN"for an unstructured multivariate covariance,"DIAG1"for outcome-specific diagonal heterogeneity, or"DIAG2"for a common diagonal heterogeneity component.- intercept
Logical; controls whether the multivariate design matrix includes an intercept. Formula-based univariate and multilevel models use the intercept implied by
formula.- missing
Missing-data handling mode. Use
"remove"to drop incomplete rows,"keep"to keep rows as-is, or"em"to impute missing moderator values via EM before dropping remaining incomplete rows.- optim_method
Optimization method that is passed to the optim function. Default is 'L-BFGS-B'.
- robustID
Optional character string naming a cluster variable for cluster-robust standard errors when supported by the fitted workflow.
- multivariate_covs
Optional one-sided formula specifying moderators for multivariate/correlation workflows, for example
~ publication_year + setting.- lasso
TRUE/FALSE indicator that specifies if LASSO results are returned. TRUE means LASSO results will be run, if number of predictors is less than number of effect sizes, both LASSO and non-LASSO results will be returned, if number of predictors is equal to or greater than the number of effect sizes, the LASSO results will only be returned. Numerical predictors are automatically standardized for optimization efficiency when doing LASSO.
- lasso_args
A list of LASSO specific arguments.
lambda_gridgives candidate penalty values,Kcontrols cross-validation folds,all_lasso_metricsrequests full fold-level output, andlambda_tolerancecontrols tie-breaking across lambda values by choosing the smaller lambda when CV metrics are within this tolerance.- tau2
Optional user-supplied between-study variance or covariance. If
NULL, heterogeneity is estimated. For univariate models, supply one non-negative value. For multilevel models, supply one non-negative value per random-effect component. For multivariateDIAG2models, supply one common value; forDIAG1, one value per outcome; and forUN, a positive semidefinite covariance matrix.- tol
Optimization tolerance forwarded to the L-BFGS-B
factrcontrol. Smaller values request tighter convergence.- ...
Reserved for future extensions.
Value
A list with class "mars". Common components include:
data: analysis data after the selected missing-data handling.formula,study_id,outcome_name, andstructure: stored model specification.beta_randvarcov_beta: fixed-effect estimates and their estimated covariance matrix.est_values: estimated heterogeneity, covariance, or scale-location parameters.fit_stats: model-fit statistics such as deviance, AIC, BIC, and related criteria when available.residuals,fitted_values,relative_weights, and heterogeneity summaries used by diagnostics and plotting helpers.lasso,selection, or cross-validation components whenlasso = TRUE.
The exact component set varies by analysis path because univariate, multivariate, multilevel, and LASSO fits retain different intermediate matrices for downstream diagnostics and summaries.
Details
For LASSO, it is recommended to use MLE instead of REML for estimation.
The main analysis paths are:
Formula-based univariate models: provide
formula,variance,varcov_type = "univariate", andstructure = "univariate".Formula-based multilevel models: provide a random-effect
formula,variance,varcov_type = "multilevel", andstructure = "multilevel".Correlation synthesis: provide
effectsize_type = "cor",sample_size,variable_names, and an appropriate correlationvarcov_type.Non-correlation metric workflows: provide
effectsize_type,effectsize_name, and the columns required by the selected metric-specificvarcov_type.
When multiple effect sizes are available within a study, effectID,
variable_names, random-effect terms, and the selected
varcov_type determine how within-study dependence and between-study
heterogeneity are represented. If a user-supplied weights matrix or
tau2 value is provided, mars() treats those as fixed inputs for
the corresponding part of the model rather than estimating or constructing
them internally.
The core estimator treats the observed effect-size vector in study \(i\) as approximately multivariate normal, $$ y_i \sim N(X_i \beta, V_i), $$ where \(X_i\) is the fixed-effect design matrix and \(V_i\) is the model-implied marginal covariance matrix. The matrix \(V_i\) is the sum of the within-study sampling covariance matrix \(S_i\) and a between-study heterogeneity component. For univariate models this component is a scalar \(\tau^2\); for multivariate models it is the selected covariance structure across outcomes; and for multilevel models it is assembled from random-effect design matrices as $$ V_i = S_i + \sum_j \tau_j^2 Z_{ij} Z_{ij}'. $$ Location-scale models replace constant heterogeneity components with log-linear variance models, so study-specific components are computed as \(\tau_i^2 = \exp(W_i \gamma)\).
For maximum likelihood (estimation_method = "MLE"), mars()
minimizes the deviance-scale objective
$$
\sum_i \left\{\log |V_i| +
(y_i - X_i \beta)' V_i^{-1} (y_i - X_i \beta)\right\},
$$
omitting constants that do not affect the optimizer. For restricted maximum
likelihood (estimation_method = "REML"), the objective adds the usual
fixed-effect adjustment
$$
\log |X' V^{-1} X|,
$$
where \(X' V^{-1} X = \sum_i X_i' V_i^{-1} X_i\). This penalizes the
likelihood for estimating the fixed effects and is the default estimator.
Optimization is performed with optim. The default
optim_method = "L-BFGS-B" is used because variance components and
covariance-structure parameters require box constraints. Variance parameters
are constrained to be positive with a lower bound of 1e-6; correlation
parameters in structured covariance models are bounded to their allowable
ranges when the structure supplies such bounds. The objective is evaluated
using Cholesky decompositions, so \(\log |V_i|\) is computed from the
Cholesky diagonal and quadratic forms are computed by triangular solves
rather than by explicitly inverting \(V_i\). The main likelihood paths pass
analytic gradients to optim(), and the fitted Hessian is retained for
downstream uncertainty calculations when available. The tol argument
is forwarded to the L-BFGS-B factr control; smaller values request
tighter convergence.
Examples
# \donttest{
fit <- mars(
data = teacher_expectancy,
studyID = "study",
effectID = NULL,
sample_size = NULL,
formula = yi ~ 1,
variance = "vi",
varcov_type = "univariate",
structure = "univariate"
)
summary(fit)
#> Results generated with MARS:v 0.5.2
#> Tuesday, June 9, 2026
#>
#> Model Type:
#> univariate
#>
#> Estimation Method:
#> Restricted Maximum Likelihood
#>
#> Model Formula:
#> yi ~ 1
#> <environment: 0x55cecfdc1940>
#>
#> Data Summary:
#> Number of Effect Sizes: 19
#> Number of Fixed Effects: 1
#> Number of Random Effects: 1
#>
#> Random Components:
#> term var SD
#> intercept 0.01882 0.1372
#>
#> Fixed Effects Estimates:
#> attribute estimate SE z_test p_value lower upper
#> (Intercept) 0.0837 0.05164 1.621 0.1051 -0.01751 0.1849
#>
#> Model Fit Statistics:
#> logLik Dev AIC BIC AICc
#> -3.736 7.472 11.47 13.25 16.09
#>
#> Q Error: 35.83 (18), p = 0.0074
#>
#> I2 (General): 43.24383
#>
#> Residual Diagnostics:
#> n n_finite_raw mean_raw sd_raw rmse mae q_pearson mean_abs_studentized
#> 19 19 0.07998 0.3589 0.3583 0.2491 25.6 0.9379
#> max_abs_studentized prop_abs_studentized_gt2 prop_abs_studentized_gt3
#> 2.782 0.1053 0
#>
#> Normality (whitened residuals): test n_tested statistic p_value
#> shapiro_wilk_whitened 19 0.9381 0.2432
#>
#> Heteroscedasticity trend (|raw residual| ~ fitted): n corr_abs_raw_fitted slope p_value
#> 19 NA NA NA
mv_fit <- mars(
data = becker09,
studyID = "ID",
effectID = "numID",
sample_size = "N",
effectsize_type = "cor",
varcov_type = "weighted",
variable_names = c(
"Cognitive_Performance",
"Somatic_Performance",
"Selfconfidence_Performance",
"Somatic_Cognitive",
"Selfconfidence_Cognitive",
"Selfconfidence_Somatic"
)
)
summary(mv_fit)
#> Results generated with MARS:v 0.5.2
#> Tuesday, June 9, 2026
#>
#> Model Type:
#> multivariate
#>
#> Estimation Method:
#> Restricted Maximum Likelihood
#>
#> Model Formula:
#> NULL
#>
#> Data Summary:
#> Number of Effect Sizes: 54
#> Number of Fixed Effects: 6
#> Number of Random Effects: 6
#>
#> Random Components:
#> ri_1 ri_2 ri_3 ri_4 ri_5 ri_6
#> ri_1 0.1221 0.07342 -0.02923 0.002719 -0.015945 0.003653
#> ri_2 0.8992 0.05460 -0.02882 0.003697 -0.012822 0.001786
#> ri_3 -0.3645 -0.53747 0.05266 -0.009075 0.012947 -0.013151
#> ri_4 0.1934 0.39320 -0.98293 0.001619 -0.002136 0.002353
#> ri_5 -0.5668 -0.68157 0.70076 -0.659537 0.006482 -0.002520
#> ri_6 0.1487 0.10871 -0.81499 0.831698 -0.445197 0.004945
#>
#> Fixed Effects Estimates:
#> attribute estimate SE z_test p_value lower upper
#> 1 -0.02286 0.12024 -0.1901 8.492e-01 -0.25853 0.2128
#> 2 -0.06819 0.08607 -0.7923 4.282e-01 -0.23688 0.1005
#> 3 0.23486 0.08614 2.7264 6.403e-03 0.06602 0.4037
#> 4 0.54220 0.03456 15.6895 1.784e-55 0.47447 0.6099
#> 5 -0.44472 0.04717 -9.4278 4.187e-21 -0.53717 -0.3523
#> 6 -0.38870 0.04407 -8.8199 1.145e-18 -0.47507 -0.3023
#>
#> Model Fit Statistics:
#> logLik Dev AIC BIC AICc
#> 15.39 -30.79 23.21 73.73 33.98
#>
#> Q Error: 217.704 (48), p < 0.0001
#>
#> I2 (General):
#> names values
#> ri_1 92.91
#> ri_2 85.43
#> ri_3 84.98
#> ri_4 14.81
#> ri_5 41.05
#> ri_6 34.69
#>
#> I2 (Jackson):
#> names values
#> ri_1 89.19
#> ri_2 79.35
#> ri_3 82.78
#> ri_4 17.29
#> ri_5 39.51
#> ri_6 33.65
#>
#> I2 (Between): 81.26981
#>
#> Residual Diagnostics:
#> n n_finite_raw mean_raw sd_raw rmse mae q_pearson mean_abs_studentized
#> 54 54 -0.01122 0.2431 0.2411 0.184 109.9 1.217
#> max_abs_studentized prop_abs_studentized_gt2 prop_abs_studentized_gt3
#> 4.1 0.1852 0.05556
#>
#> Normality (whitened residuals): test n_tested statistic p_value
#> shapiro_wilk_whitened 54 0.9797 0.4891
#>
#> Heteroscedasticity trend (|raw residual| ~ fitted): n corr_abs_raw_fitted slope p_value
#> 54 0.0002964 0.0001388 0.9983
ml_fit <- mars(
data = school,
formula = effect ~ 1 + (1 | district/study),
studyID = "district",
effectID = NULL,
sample_size = NULL,
variance = "var",
varcov_type = "multilevel",
structure = "multilevel",
estimation_method = "MLE"
)
summary(ml_fit)
#> Results generated with MARS:v 0.5.2
#> Tuesday, June 9, 2026
#>
#> Model Type:
#> multilevel
#>
#> Estimation Method:
#> Maximum Likelihood
#>
#> Model Formula:
#> effect ~ 1 + (1 | district/study)
#> <environment: 0x55cecfdc1940>
#>
#> Data Summary:
#> Number of Effect Sizes: 56
#> Number of Fixed Effects: 1
#> Number of Random Effects: 2
#>
#> Random Data Summary:
#> Number unique district: 11
#> Number unique study: 56
#>
#> Random Components:
#> term var SD
#> district 0.05774 0.2403
#> study 0.03286 0.1813
#>
#> Fixed Effects Estimates:
#> attribute estimate SE z_test p_value lower upper
#> (Intercept) 0.1845 0.08048 2.292 0.02191 0.02671 0.3422
#>
#> Model Fit Statistics:
#> logLik Dev AIC BIC AICc
#> -8.395 24.79 22.79 28.87 24.5
#>
#> Q Error: 578.864 (55), p < 0.0001
#>
#> I2 (General):
#> names values
#> district 92.38
#> study 87.34
#>
#> I2 (Jackson): 98.69621
#> I2 (Between): 86.3727
#>
#> Residual Diagnostics:
#> n n_finite_raw mean_raw sd_raw rmse mae q_pearson mean_abs_studentized
#> 56 56 -0.06446 0.3258 0.3293 0.2632 96.8 1.065
#> max_abs_studentized prop_abs_studentized_gt2 prop_abs_studentized_gt3
#> 4.196 0.1071 0.03571
#>
#> Normality (whitened residuals): test n_tested statistic p_value
#> shapiro_wilk_whitened 56 0.9555 0.03766
#>
#> Heteroscedasticity trend (|raw residual| ~ fitted): n corr_abs_raw_fitted slope p_value
#> 56 NA NA NA
# }