
Random Forest Meta-Analysis Models
mars_rf.RdFits a study-bootstrap random forest for meta-analytic data without modifying
the core estimation() pipeline. The function borrows the same data
preparation patterns used in mars for univariate, multivariate, and
multilevel inputs, then trains a forest-style ensemble of weighted
regression trees with random predictor subsampling.
Usage
mars_rf(
data,
studyID,
effectID = NULL,
sample_size = NULL,
effectsize_type = NULL,
formula = NULL,
variable_names = NULL,
effectsize_name = NULL,
variance = NULL,
varcov_type,
structure = "univariate",
intercept = FALSE,
missing = "remove",
multivariate_covs = NULL,
num_trees = 500L,
mtry = NULL,
minsplit = 10L,
minbucket = 5L,
cp = 0.001,
maxdepth = 30L,
sample_fraction = 1,
seed = NULL,
importance = TRUE,
...
)Arguments
- data
Data used for analysis.
- studyID
Character string representing the study ID.
- effectID
Character string representing the effect size ID for multivariate inputs.
- sample_size
Character string representing study sample size.
- effectsize_type
Type of effect size being analyzed, such as
"cor"or"smd".- formula
Formula used for univariate or multilevel models.
- variable_names
Vector of variables used for correlation synthesis in multivariate models.
- effectsize_name
Character string naming the effect size column.
- variance
Character string naming the variance column for formula-based fits.
- varcov_type
Type of within-study variance-covariance structure.
- structure
Structure label. Use
"univariate"or"multilevel"for formula-based models. Multivariate models are inferred fromeffectsize_type.- intercept
Whether the multivariate design matrix should include an intercept.
- missing
Missing-data handling mode. Use
"remove","keep", or"em".- multivariate_covs
One-sided formula specifying moderator covariates for multivariate models.
- num_trees
Number of trees in the forest.
- mtry
Number of candidate predictors randomly selected for each tree. Defaults to
floor(sqrt(p)).- minsplit, minbucket
Tree-growing control parameters.
- cp
Minimum split-improvement threshold for growing a new node.
- maxdepth
Maximum tree depth.
- sample_fraction
Fraction of studies sampled with replacement for each tree.
- seed
Optional random seed.
- importance
Logical; if
TRUE, aggregate split-based variable importance across trees.- ...
Not currently used.
Value
A list of class "mars_rf" containing the fitted forest, fitted and
OOB predictions, variable importance, and the preprocessed meta-analytic
data used by the model.
Details
This implementation is an honest first release of structure-aware random
forests in mars. It is fully internal to the package and does not depend on
rpart, ranger, or randomForest. The univariate path uses weighted tree
fitting, the multivariate path uses covariance-aware node fitting and
leaf-level meta-analytic GLS refinement, and the multilevel path uses
random-effect-aware node fitting with node-level mixed-model refinement.
Split search is still approximate for speed: candidate splits are screened
with the package's internal structure-aware objective, while realized nodes
are then refined with likelihood-based meta-analytic fits when possible. This
means mars_rf() is already useful for exploratory structure-aware forest
modeling, but it should not yet be described as a fully exact
likelihood-optimized meta-analytic forest at every candidate split.
mars_rf() currently supports three analysis paths:
Univariate: weighted tree ensembles using study-level resampling and inverse-variance style case weights.
Multivariate: covariance-aware forests that use within-study covariance in node scoring and fit node-level GLS refinements for prediction.
Multilevel: forests that respect nested grouping through grouped resampling, random-effect-aware node scoring, and node-level mixed-model refinements for prediction.
The function is intended for exploratory moderator detection, nonlinear
pattern discovery, and structure-aware prediction inside meta-analytic data.
Users who need the exact likelihood-based estimators reported by the core
package should continue to use mars() for confirmatory model fitting.
Examples
rf_uni <- mars_rf(
data = teacher_expectancy,
formula = yi ~ year + weeks + factor(setting) + factor(tester),
studyID = "study",
variance = "vi",
varcov_type = "univariate",
structure = "univariate",
num_trees = 25,
seed = 123
)
summary(rf_uni)
#> Random forest meta-analysis
#> Structure: univariate
#> Trees: 25
#> Predictors: year, weeks, factor.setting., factor.tester.
#> OOB coverage: 100 %
#> OOB RMSE: 0.2145
#> OOB R-squared: 0.0341
#>
#> Top variable importance:
#> predictor importance
#> weeks 54.363200
#> factor.tester. 12.337263
#> year 8.323145
#> factor.setting. 0.000000
rf_multi <- mars_rf(
data = becker09,
studyID = "ID",
sample_size = "N",
effectID = "numID",
effectsize_type = "cor",
varcov_type = "weighted",
variable_names = c(
"Cognitive_Performance",
"Somatic_Performance",
"Selfconfidence_Performance",
"Somatic_Cognitive",
"Selfconfidence_Cognitive",
"Selfconfidence_Somatic"
),
multivariate_covs = ~ Team,
num_trees = 25,
seed = 123
)
head(rf_importance(rf_multi))
#> predictor importance
#> 1 effect_id 0.90689609
#> 2 Team 0.09310391