Skip to contents

Fits a study-bootstrap random forest for meta-analytic data without modifying the core estimation() pipeline. The function borrows the same data preparation patterns used in mars for univariate, multivariate, and multilevel inputs, then trains a forest-style ensemble of weighted regression trees with random predictor subsampling.

Usage

mars_rf(
  data,
  studyID,
  effectID = NULL,
  sample_size = NULL,
  effectsize_type = NULL,
  formula = NULL,
  variable_names = NULL,
  effectsize_name = NULL,
  variance = NULL,
  varcov_type,
  structure = "univariate",
  intercept = FALSE,
  missing = "remove",
  multivariate_covs = NULL,
  num_trees = 500L,
  mtry = NULL,
  minsplit = 10L,
  minbucket = 5L,
  cp = 0.001,
  maxdepth = 30L,
  sample_fraction = 1,
  seed = NULL,
  importance = TRUE,
  ...
)

Arguments

data

Data used for analysis.

studyID

Character string representing the study ID.

effectID

Character string representing the effect size ID for multivariate inputs.

sample_size

Character string representing study sample size.

effectsize_type

Type of effect size being analyzed, such as "cor" or "smd".

formula

Formula used for univariate or multilevel models.

variable_names

Vector of variables used for correlation synthesis in multivariate models.

effectsize_name

Character string naming the effect size column.

variance

Character string naming the variance column for formula-based fits.

varcov_type

Type of within-study variance-covariance structure.

structure

Structure label. Use "univariate" or "multilevel" for formula-based models. Multivariate models are inferred from effectsize_type.

intercept

Whether the multivariate design matrix should include an intercept.

missing

Missing-data handling mode. Use "remove", "keep", or "em".

multivariate_covs

One-sided formula specifying moderator covariates for multivariate models.

num_trees

Number of trees in the forest.

mtry

Number of candidate predictors randomly selected for each tree. Defaults to floor(sqrt(p)).

minsplit, minbucket

Tree-growing control parameters.

cp

Minimum split-improvement threshold for growing a new node.

maxdepth

Maximum tree depth.

sample_fraction

Fraction of studies sampled with replacement for each tree.

seed

Optional random seed.

importance

Logical; if TRUE, aggregate split-based variable importance across trees.

...

Not currently used.

Value

A list of class "mars_rf" containing the fitted forest, fitted and OOB predictions, variable importance, and the preprocessed meta-analytic data used by the model.

Details

This implementation is an honest first release of structure-aware random forests in mars. It is fully internal to the package and does not depend on rpart, ranger, or randomForest. The univariate path uses weighted tree fitting, the multivariate path uses covariance-aware node fitting and leaf-level meta-analytic GLS refinement, and the multilevel path uses random-effect-aware node fitting with node-level mixed-model refinement.

Split search is still approximate for speed: candidate splits are screened with the package's internal structure-aware objective, while realized nodes are then refined with likelihood-based meta-analytic fits when possible. This means mars_rf() is already useful for exploratory structure-aware forest modeling, but it should not yet be described as a fully exact likelihood-optimized meta-analytic forest at every candidate split.

mars_rf() currently supports three analysis paths:

  • Univariate: weighted tree ensembles using study-level resampling and inverse-variance style case weights.

  • Multivariate: covariance-aware forests that use within-study covariance in node scoring and fit node-level GLS refinements for prediction.

  • Multilevel: forests that respect nested grouping through grouped resampling, random-effect-aware node scoring, and node-level mixed-model refinements for prediction.

The function is intended for exploratory moderator detection, nonlinear pattern discovery, and structure-aware prediction inside meta-analytic data. Users who need the exact likelihood-based estimators reported by the core package should continue to use mars() for confirmatory model fitting.

Examples

rf_uni <- mars_rf(
  data = teacher_expectancy,
  formula = yi ~ year + weeks + factor(setting) + factor(tester),
  studyID = "study",
  variance = "vi",
  varcov_type = "univariate",
  structure = "univariate",
  num_trees = 25,
  seed = 123
)
summary(rf_uni)
#> Random forest meta-analysis
#> Structure: univariate 
#> Trees: 25 
#> Predictors: year, weeks, factor.setting., factor.tester. 
#> OOB coverage: 100 %
#> OOB RMSE: 0.2145 
#> OOB R-squared: 0.0341 
#> 
#> Top variable importance:
#>        predictor importance
#>            weeks  54.363200
#>   factor.tester.  12.337263
#>             year   8.323145
#>  factor.setting.   0.000000

rf_multi <- mars_rf(
  data = becker09,
  studyID = "ID",
  sample_size = "N",
  effectID = "numID",
  effectsize_type = "cor",
  varcov_type = "weighted",
  variable_names = c(
    "Cognitive_Performance",
    "Somatic_Performance",
    "Selfconfidence_Performance",
    "Somatic_Cognitive",
    "Selfconfidence_Cognitive",
    "Selfconfidence_Somatic"
  ),
  multivariate_covs = ~ Team,
  num_trees = 25,
  seed = 123
)
head(rf_importance(rf_multi))
#>   predictor importance
#> 1 effect_id 0.90689609
#> 2      Team 0.09310391