Random Forest Meta-Analysis Models

Fits a study-bootstrap random forest for meta-analytic data without modifying the core estimation() pipeline. The function borrows the same data preparation patterns used in mars for univariate, multivariate, and multilevel inputs, then trains a forest-style ensemble of weighted regression trees with random predictor subsampling.

Usage

mars_rf(
  data,
  studyID,
  effectID = NULL,
  sample_size = NULL,
  effectsize_type = NULL,
  formula = NULL,
  variable_names = NULL,
  effectsize_name = NULL,
  variance = NULL,
  varcov_type,
  structure = "univariate",
  intercept = FALSE,
  missing = "remove",
  multivariate_covs = NULL,
  num_trees = 500L,
  mtry = NULL,
  minsplit = 10L,
  minbucket = 5L,
  cp = 0.001,
  maxdepth = 30L,
  sample_fraction = 1,
  seed = NULL,
  importance = TRUE,
  ...
)

Arguments

data: Data used for analysis.
studyID: Character string representing the study ID.
effectID: Character string representing the effect size ID for multivariate inputs.
sample_size: Character string representing study sample size.
effectsize_type: Type of effect size being analyzed, such as "cor" or "smd".
formula: Formula used for univariate or multilevel models.
variable_names: Vector of variables used for correlation synthesis in multivariate models.
effectsize_name: Character string naming the effect size column.
variance: Character string naming the variance column for formula-based fits.
varcov_type: Type of within-study variance-covariance structure.
structure: Structure label. Use "univariate" or "multilevel" for formula-based models. Multivariate models are inferred from effectsize_type.
intercept: Whether the multivariate design matrix should include an intercept.
missing: Missing-data handling mode. Use "remove", "keep", or "em".
multivariate_covs: One-sided formula specifying moderator covariates for multivariate models.
num_trees: Number of trees in the forest.
mtry: Number of candidate predictors randomly selected for each tree. Defaults to floor(sqrt(p)).
minsplit, minbucket: Tree-growing control parameters.
cp: Minimum split-improvement threshold for growing a new node.
maxdepth: Maximum tree depth.
sample_fraction: Fraction of studies sampled with replacement for each tree.
seed: Optional random seed.
importance: Logical; if TRUE, aggregate split-based variable importance across trees.
...: Not currently used.

Value

A list of class "mars_rf" containing the fitted forest, fitted and OOB predictions, variable importance, and the preprocessed meta-analytic data used by the model.

Details

This implementation is an honest first release of structure-aware random forests in mars. It is fully internal to the package and does not depend on rpart, ranger, or randomForest. The univariate path uses weighted tree fitting, the multivariate path uses covariance-aware node fitting and leaf-level meta-analytic GLS refinement, and the multilevel path uses random-effect-aware node fitting with node-level mixed-model refinement.

Split search is still approximate for speed: candidate splits are screened with the package's internal structure-aware objective, while realized nodes are then refined with likelihood-based meta-analytic fits when possible. This means mars_rf() is already useful for exploratory structure-aware forest modeling, but it should not yet be described as a fully exact likelihood-optimized meta-analytic forest at every candidate split.

mars_rf() currently supports three analysis paths:

Univariate: weighted tree ensembles using study-level resampling and inverse-variance style case weights.
Multivariate: covariance-aware forests that use within-study covariance in node scoring and fit node-level GLS refinements for prediction.
Multilevel: forests that respect nested grouping through grouped resampling, random-effect-aware node scoring, and node-level mixed-model refinements for prediction.

The function is intended for exploratory moderator detection, nonlinear pattern discovery, and structure-aware prediction inside meta-analytic data. Users who need the exact likelihood-based estimators reported by the core package should continue to use mars() for confirmatory model fitting.

Examples

rf_uni <- mars_rf(
  data = teacher_expectancy,
  formula = yi ~ year + weeks + factor(setting) + factor(tester),
  studyID = "study",
  variance = "vi",
  varcov_type = "univariate",
  structure = "univariate",
  num_trees = 25,
  seed = 123
)
summary(rf_uni)
#> Random forest meta-analysis
#> Structure: univariate 
#> Trees: 25 
#> Predictors: year, weeks, factor.setting., factor.tester. 
#> OOB coverage: 100 %
#> OOB RMSE: 0.2145 
#> OOB R-squared: 0.0341 
#> 
#> Top variable importance:
#>        predictor importance
#>            weeks  54.363200
#>   factor.tester.  12.337263
#>             year   8.323145
#>  factor.setting.   0.000000

rf_multi <- mars_rf(
  data = becker09,
  studyID = "ID",
  sample_size = "N",
  effectID = "numID",
  effectsize_type = "cor",
  varcov_type = "weighted",
  variable_names = c(
    "Cognitive_Performance",
    "Somatic_Performance",
    "Selfconfidence_Performance",
    "Somatic_Cognitive",
    "Selfconfidence_Cognitive",
    "Selfconfidence_Somatic"
  ),
  multivariate_covs = ~ Team,
  num_trees = 25,
  seed = 123
)
head(rf_importance(rf_multi))
#>   predictor importance
#> 1 effect_id 0.90689609
#> 2      Team 0.09310391