
Random Forest Meta-Analysis
mars authors
2026-05-15
Random-Forest-Meta-Analysis.Rmd
library(mars)mars_rf() is the package’s internal random-forest
workflow for exploratory, structure-aware meta-analysis. It is designed
as an honest first release:
- it supports univariate, multivariate, and multilevel data structures
- it uses package-internal tree fitting rather than
rpartorranger - it uses structure-aware node fitting and prediction
- it still uses approximate split screening for speed, rather than full likelihood optimization at every candidate split
This vignette shows one example for each supported path.
Univariate Example
rf_uni <- mars_rf_univariate(
data = teacher_expectancy,
formula = yi ~ year + weeks + factor(setting) + factor(tester),
studyID = "study",
variance = "vi",
varcov_type = "univariate",
num_trees = 10,
seed = 123
)
summary(rf_uni)
#> Random forest meta-analysis
#> Structure: univariate
#> Trees: 10
#> Predictors: year, weeks, factor.setting., factor.tester.
#> OOB coverage: 100 %
#> OOB RMSE: 0.2547
#> OOB R-squared: -0.3615
#>
#> Top variable importance:
#> predictor importance
#> weeks 24.179760
#> year 7.242460
#> factor.tester. 3.907405
#> factor.setting. 0.000000
head(rf_importance(rf_uni))
#> predictor importance
#> 1 weeks 0.6844047
#> 2 year 0.2049968
#> 3 factor.tester. 0.1105985
#> 4 factor.setting. 0.0000000
predict(rf_uni, newdata = teacher_expectancy[1:5, , drop = FALSE])
#> [1] 0.06393069 0.04316834 0.04041068 0.11715802 0.13678954Multivariate Example
rf_multi <- mars_rf_multivariate(
data = becker09,
studyID = "ID",
effectID = "numID",
sample_size = "N",
effectsize_type = "cor",
varcov_type = "weighted",
variable_names = c(
"Cognitive_Performance",
"Somatic_Performance",
"Selfconfidence_Performance",
"Somatic_Cognitive",
"Selfconfidence_Cognitive",
"Selfconfidence_Somatic"
),
multivariate_covs = ~ Team,
num_trees = 10,
seed = 123
)
summary(rf_multi)
#> Random forest meta-analysis
#> Structure: multivariate
#> Trees: 10
#> Predictors: effect_id, Team
#> OOB coverage: 100 %
#> OOB RMSE: 0.2709
#> OOB R-squared: 0.6072
#>
#> Top variable importance:
#> predictor importance
#> effect_id 3301.379
#> Team 224.976
head(rf_importance(rf_multi))
#> predictor importance
#> 1 effect_id 0.93620156
#> 2 Team 0.06379844
predict(rf_multi, newdata = rf_multi$data[1:6, , drop = FALSE])
#> [1] -0.08298025 -0.11205409 0.18601101 0.33692012 -0.30669645 -0.28303643Multilevel Example
rf_multi_level <- mars_rf_multilevel(
data = school,
formula = effect ~ year + (1 | district/study),
studyID = "district",
variance = "var",
varcov_type = "multilevel",
num_trees = 10,
seed = 123
)
summary(rf_multi_level)
#> Random forest meta-analysis
#> Structure: multilevel
#> Trees: 10
#> Predictors: year
#> OOB coverage: 100 %
#> OOB RMSE: 0.2949
#> OOB R-squared: -0.7787
#>
#> Top variable importance:
#> predictor importance
#> year 0
rf_multi_level$random_tau
#> [1] 0.06506195 0.03273651
predict(rf_multi_level, newdata = school[1:5, , drop = FALSE])
#> [1] 0.07806810 0.07132870 0.17304830 0.04687979 0.17574037Interpretation Notes
-
rf_importance()reports split-based importance accumulated across trees. - OOB metrics are useful for quick exploration, but they are still forest-style diagnostics rather than formal confirmatory model-fit statistics.
- The multivariate and multilevel paths are structure-aware and use likelihood-based node refinement where possible, but the full split search is still approximate.
- For confirmatory estimation and formal inferential output, use [mars()] and related likelihood-based modeling functions.