# Install devtools if you don't have it
if (!require("devtools")) install.packages("devtools")
# Install metaMultiverse from GitHub
devtools::install_github("cyplessen/metaMultiverse",
force = TRUE,
upgrade = "never")Running Your First Multiverse Meta-Analysis
A step-by-step guide to exploring analytical robustness with the metaMultiverse package
Why Multiverse Meta-Analysis?
When conducting a meta-analysis, researchers face dozens of decisions: Which studies to include? How to handle outliers? Which statistical model to use? Each choice seems reasonable, but different choices can lead to different conclusions.
Rather than making one set of arbitrary decisions, multiverse meta-analysis systematically explores how different reasonable analytical choices affect your conclusions. This approach transforms researcher degrees of freedom from a source of concern into a tool for understanding robustness.
This tutorial walks you through running your first multiverse meta-analysis using real data from the Metapsy database.
What You’ll Need
Basic R knowledge is helpful, but I’ll explain each step. You’ll need:
R (version 4.0 or higher)
RStudio (recommended but not required)
About 30 minutes
Step 1: Install the Package
First, we need to install metaMultiverse for running the multiverse analysis. We’ll get the data directly from the Metapsy API (no package installation needed for that!).
Download metaMultiverse
Download metapsyTools
# You could also use remotes instead of devtools:
if (!require("remotes"))
install.packages("remotes")
remotes::install_github(
"metapsy-project/metapsyTools")Now load the packages we’ll need:
library(metaMultiverse)
library(metapsyTools)
library(dplyr) # for data manipulation
library(jsonlite) # for reading API data
library(knitr) # for formatted tablesStep 2: Get the Data from Metapsy
Metapsy maintains databases of psychotherapy trials across different mental health conditions. For this guide, you can either use their API to download data on psychotherapy for depression, I have already installed it so I do not abuse their service too much :)
You could also download their data as a .csv file and load it into R as below:
data <- read.csv2("data-guide.csv")
# You can get the depression psychotherapy database via API like this:
# The API requires 'shorthand' parameter and optional 'version' (defaults to 'latest')
# api_url <- "http://api.metapsy.org/v1/get_data?shorthand=depression-psyctr&version=latest"
# api_response <- fromJSON(api_url)
#
# # Extract the data
# data <- as.data.frame(api_response$data)
#
# # Take a look at what we have
skimr::skim(data)| Name | data |
| Number of rows | 900 |
| Number of columns | 69 |
| _______________________ | |
| Column type frequency: | |
| character | 22 |
| numeric | 47 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| study | 0 | 1.00 | 8 | 25 | 0 | 480 | 0 |
| condition_arm1 | 0 | 1.00 | 3 | 9 | 0 | 9 | 0 |
| condition_arm2 | 0 | 1.00 | 2 | 9 | 0 | 3 | 0 |
| outcome_type | 0 | 1.00 | 3 | 31 | 0 | 14 | 0 |
| instrument | 0 | 1.00 | 2 | 25 | 0 | 70 | 0 |
| rating | 0 | 1.00 | 9 | 11 | 0 | 2 | 0 |
| time | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
| comorbid_mental | 0 | 1.00 | 1 | 1 | 0 | 2 | 0 |
| format | 0 | 1.00 | 3 | 5 | 0 | 8 | 0 |
| format_details | 5 | 0.99 | 3 | 43 | 0 | 37 | 0 |
| country | 0 | 1.00 | 2 | 3 | 0 | 7 | 0 |
| age_group | 2 | 1.00 | 3 | 13 | 0 | 6 | 0 |
| recruitment | 0 | 1.00 | 3 | 4 | 0 | 3 | 0 |
| diagnosis | 0 | 1.00 | 3 | 4 | 0 | 6 | 0 |
| target_group | 0 | 1.00 | 3 | 11 | 0 | 13 | 0 |
| ba | 0 | 1.00 | 1 | 2 | 0 | 4 | 0 |
| full_ref | 0 | 1.00 | 115 | 532 | 0 | 481 | 0 |
| .id | 39 | 0.96 | 48 | 113 | 0 | 861 | 0 |
| multi_arm1 | 553 | 0.39 | 3 | 63 | 0 | 143 | 0 |
| multi_arm2 | 553 | 0.39 | 2 | 24 | 0 | 9 | 0 |
| dich_paper | 843 | 0.06 | 1 | 113 | 0 | 44 | 0 |
| other_stat | 894 | 0.01 | 38 | 121 | 0 | 6 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| X | 0 | 1.00 | 450.50 | 259.95 | 1.00 | 225.75 | 450.50 | 675.25 | 900.00 | ▇▇▇▇▇ |
| .g | 0 | 1.00 | 0.81 | 0.77 | -0.70 | 0.32 | 0.65 | 1.06 | 5.24 | ▆▇▁▁▁ |
| .g_se | 0 | 1.00 | 0.31 | 0.13 | 0.07 | 0.21 | 0.28 | 0.39 | 0.95 | ▇▇▃▁▁ |
| mean_arm1 | 83 | 0.91 | 13.32 | 10.11 | 0.54 | 8.11 | 11.30 | 15.50 | 89.10 | ▇▁▁▁▁ |
| sd_arm1 | 90 | 0.90 | 7.05 | 4.10 | 0.13 | 4.60 | 6.38 | 8.90 | 59.86 | ▇▁▁▁▁ |
| n_arm1 | 84 | 0.91 | 43.69 | 47.31 | 5.96 | 16.00 | 29.00 | 49.00 | 418.00 | ▇▁▁▁▁ |
| mean_arm2 | 83 | 0.91 | 18.85 | 11.71 | 0.90 | 12.00 | 17.11 | 22.67 | 85.80 | ▇▆▁▁▁ |
| sd_arm2 | 90 | 0.90 | 7.62 | 4.40 | 0.13 | 5.00 | 7.00 | 9.50 | 51.00 | ▇▂▁▁▁ |
| n_arm2 | 84 | 0.91 | 43.51 | 49.04 | 4.00 | 14.75 | 30.00 | 50.00 | 514.00 | ▇▁▁▁▁ |
| baseline_m_arm1 | 97 | 0.89 | 22.22 | 12.02 | 0.60 | 15.60 | 20.60 | 26.78 | 94.20 | ▇▇▁▁▁ |
| baseline_sd_arm1 | 104 | 0.88 | 6.20 | 3.67 | 0.12 | 4.15 | 5.70 | 7.90 | 56.60 | ▇▁▁▁▁ |
| baseline_n_arm1 | 93 | 0.90 | 47.90 | 55.70 | 6.00 | 18.00 | 32.00 | 54.00 | 614.00 | ▇▁▁▁▁ |
| baseline_m_arm2 | 97 | 0.89 | 22.16 | 12.05 | 1.00 | 15.53 | 20.60 | 26.90 | 88.31 | ▆▇▁▁▁ |
| baseline_sd_arm2 | 104 | 0.88 | 6.23 | 3.75 | 0.11 | 4.10 | 5.69 | 8.00 | 68.69 | ▇▁▁▁▁ |
| baseline_n_arm2 | 91 | 0.90 | 46.85 | 56.05 | 6.00 | 15.00 | 31.00 | 53.00 | 635.00 | ▇▁▁▁▁ |
| rand_arm1 | 49 | 0.95 | 51.84 | 57.21 | 7.00 | 20.00 | 36.00 | 60.50 | 578.00 | ▇▁▁▁▁ |
| rand_arm2 | 45 | 0.95 | 50.26 | 57.23 | 6.00 | 19.00 | 35.00 | 59.00 | 562.00 | ▇▁▁▁▁ |
| attr_arm1 | 93 | 0.90 | 10.28 | 17.27 | 0.00 | 2.00 | 6.00 | 12.00 | 241.00 | ▇▁▁▁▁ |
| attr_arm2 | 90 | 0.90 | 8.68 | 14.60 | 0.00 | 1.00 | 4.00 | 11.00 | 217.00 | ▇▁▁▁▁ |
| rand_ratio | 0 | 1.00 | 1.04 | 0.22 | 1.00 | 1.00 | 1.00 | 1.00 | 3.00 | ▇▁▁▁▁ |
| year | 0 | 1.00 | 2009.36 | 12.08 | 1977.00 | 2004.00 | 2013.00 | 2018.00 | 2024.00 | ▂▂▂▅▇ |
| time_weeks | 159 | 0.82 | 8.80 | 10.23 | 0.00 | 0.00 | 8.00 | 13.00 | 78.00 | ▇▂▁▁▁ |
| percent_women | 15 | 0.98 | 0.72 | 0.21 | 0.00 | 0.62 | 0.74 | 0.85 | 1.00 | ▁▁▃▇▆ |
| sg | 0 | 1.00 | 0.60 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▅▁▁▁▇ |
| ac | 0 | 1.00 | 0.47 | 0.50 | 0.00 | 0.00 | 0.00 | 1.00 | 1.00 | ▇▁▁▁▇ |
| itt | 0 | 1.00 | 0.60 | 0.49 | 0.00 | 0.00 | 1.00 | 1.00 | 1.00 | ▅▁▁▁▇ |
| rob | 0 | 1.00 | 2.63 | 1.22 | 0.00 | 1.00 | 3.00 | 4.00 | 4.00 | ▁▆▃▆▇ |
| n_sessions_arm1 | 3 | 1.00 | 9.09 | 5.33 | 1.00 | 6.00 | 8.00 | 11.00 | 60.00 | ▇▂▁▁▁ |
| mean_age | 37 | 0.96 | 43.79 | 14.13 | 18.00 | 35.00 | 41.73 | 50.84 | 81.94 | ▃▇▅▂▂ |
| event_arm1 | 846 | 0.06 | 34.94 | 79.03 | 7.00 | 13.00 | 16.50 | 25.75 | 576.00 | ▇▁▁▁▁ |
| event_arm2 | 846 | 0.06 | 23.37 | 56.23 | 1.00 | 4.00 | 9.50 | 18.00 | 402.00 | ▇▁▁▁▁ |
| totaln_arm1 | 845 | 0.06 | 60.27 | 95.43 | 15.00 | 22.00 | 34.00 | 50.50 | 578.00 | ▇▁▁▁▁ |
| totaln_arm2 | 845 | 0.06 | 58.65 | 94.46 | 10.00 | 18.00 | 30.00 | 61.00 | 562.00 | ▇▁▁▁▁ |
| .log_rr | 841 | 0.07 | 0.63 | 0.57 | -0.24 | 0.23 | 0.55 | 1.00 | 2.40 | ▆▇▅▁▁ |
| .log_rr_se | 841 | 0.07 | 0.34 | 0.21 | 0.03 | 0.21 | 0.29 | 0.45 | 1.02 | ▆▇▃▂▁ |
| .event_arm1 | 841 | 0.07 | 36.07 | 76.69 | 7.00 | 13.00 | 18.00 | 27.50 | 576.00 | ▇▁▁▁▁ |
| .event_arm2 | 841 | 0.07 | 24.36 | 54.44 | 1.00 | 4.00 | 12.00 | 21.50 | 402.00 | ▇▁▁▁▁ |
| .totaln_arm1 | 841 | 0.07 | 60.92 | 93.83 | 15.00 | 22.50 | 34.00 | 48.50 | 578.00 | ▇▁▁▁▁ |
| .totaln_arm2 | 841 | 0.07 | 59.25 | 93.05 | 10.00 | 19.00 | 34.00 | 57.00 | 562.00 | ▇▁▁▁▁ |
| mean_change_arm1 | 881 | 0.02 | -8.74 | 3.68 | -17.50 | -9.45 | -8.30 | -7.50 | -0.65 | ▁▁▇▁▁ |
| sd_change_arm1 | 881 | 0.02 | 6.45 | 3.18 | 0.80 | 4.46 | 6.30 | 8.97 | 11.81 | ▅▁▇▁▅ |
| n_change_arm1 | 881 | 0.02 | 33.11 | 21.60 | 12.00 | 15.50 | 29.00 | 40.00 | 101.00 | ▇▃▂▁▁ |
| mean_change_arm2 | 881 | 0.02 | -4.94 | 3.35 | -11.10 | -6.66 | -5.59 | -1.90 | -0.53 | ▃▁▇▁▇ |
| sd_change_arm2 | 881 | 0.02 | 5.73 | 3.05 | 0.81 | 3.98 | 6.10 | 6.50 | 12.65 | ▅▃▇▁▂ |
| n_change_arm2 | 881 | 0.02 | 32.32 | 22.38 | 10.00 | 15.00 | 32.00 | 40.00 | 102.00 | ▇▇▃▁▁ |
| precalc_g | 884 | 0.02 | 0.52 | 0.55 | -0.14 | 0.15 | 0.43 | 0.71 | 2.11 | ▇▇▃▁▁ |
| precalc_g_se | 884 | 0.02 | 0.26 | 0.14 | 0.08 | 0.15 | 0.21 | 0.29 | 0.60 | ▇▃▁▁▂ |
This dataset contains information from randomized controlled trials comparing psychotherapy to control conditions for depression. Each row is a comparison from a study.
Note: The API has a rate limit, so please don’t make rapid repeated requests. For this tutorial, you only need to run this once - the data will be stored in your R session.
Validate Data Set
Validate your data set with the metapsyTools::checkDataFormat() function, as this ensures that the metaMultiverse package runs smoothly. See the documentation for the metaPsy data standard and how your data needs to be structured here.
Validate Data Format
data <- data %>%
# Validate data structure for internal functions metaMultiverse
metaMultiverse::check_data_multiverse() %>%
metapsyTools::checkDataFormat(
must.contain = c(
"study",
"condition_arm1",
"condition_arm2",
"yi",
"vi"),
variable.class = list(
vi = "numeric",
vi = "numeric"))Generated es_id column using row numbers (1 to 900).
Converted .g and .g_se to yi and vi for metaMultiverse compatibility.
Warning in metaMultiverse::check_data_multiverse(.): Found 34 unreasonably
large d detected (|d| > 2.5). Check if SD and SE were confused when calculating
SMD.
250 studies contribute only one effect size, 230 studies contribute multiple effect sizes.
Data validation passed. Dataset is ready for multiverse analysis.
- [OK] Data set contains all variables in 'must.contain'.
- [OK] 'vi' has desired class numeric.
- [OK] 'vi' has desired class numeric.
You could also use the online tool to validate the data here.
Step 3: Understand the Data Structure
The dataset includes:
Effect sizes (
g,g_se): How effective was the treatment?Study characteristics: Sample size, year, country
Treatment details: Type of therapy, format, number of sessions
Population info: Age group, recruitment setting, comorbidity
Risk of bias: Quality ratings for each study
Let’s look at a few key variables:
# See what types of psychotherapy are included
condition_arm1_table <- as.data.frame(table(data$condition_arm1))
kable(condition_arm1_table,
col.names = c("Intervention Type", "Count"),
caption = "Types of psychotherapy interventions in the dataset")| Intervention Type | Count |
|---|---|
| 3rd | 67 |
| bat | 69 |
| cbt | 487 |
| dyn | 21 |
| ipt | 48 |
| lrt | 27 |
| other psy | 101 |
| pst | 53 |
| sup | 27 |
# See what types of control conditions are included
condition_arm2_table <- as.data.frame(table(data$condition_arm2))
kable(condition_arm2_table,
col.names = c("Control Condition", "Count"),
caption = "Types of control conditions in the dataset")| Control Condition | Count |
|---|---|
| cau | 354 |
| other ctr | 149 |
| wl | 397 |
Step 4: Define Your Multiverse
Now comes the interesting part: defining which analytical decisions to vary. We’ll use the E/U/N framework (Del Giudice & Gangestad, 2021):
The E/U/N Decision Framework
Not all analytical decisions are created equal. The framework distinguishes three types:
Type E (Equivalent): Options are theoretically interchangeable for your research question.
Example: Age groups (adults vs. mixed) when studying a universal phenomenon
In multiverse: Creates variations within a single multiverse
Interpretation: All options are included; adds “total” option combining all levels
Type U (Uncertain): Unclear which option is methodologically “correct.”
Example: Risk of bias thresholds—exclude only high-risk studies, or also “some concerns”?
In multiverse: Creates variations within a single multiverse
Interpretation: Exploring sensitivity to debated methodological choices
Type N (Non-equivalent): Options address fundamentally different research questions.
Example: Post-treatment vs. follow-up outcomes represent different constructs
In multiverse: Creates separate multiverses analyzed independently
Interpretation: These shouldn’t be combined; report separately
Defining Analytical Choices
Let’s define some decisions for our depression intervention data:
multiverse_specs_example_1 <- data %>%
define_factors(
Age = "age_group|U",
Risk_of_Bias = list(
"rob",
decision = "U",
groups = list(
low_only = "4",
low_moderate = c("4", "3"),
all_studies = c("4", "3", "2", "1", "0")
)
)
)
[OK] Factor setup complete
========================================================
[*] Age (simple)
Column: age_group | Decision: U (Uncertain - will create multiverse options)
Levels: adul, old, olderold, yadul, adol&yadul, Not specified (+ all combined)
[*] Risk_of_Bias (custom)
Column: rob | Decision: U (Uncertain - will create multiverse options)
Groups:
- low_only: 4
- low_moderate: 4, 3
- all_studies: 4, 3, 2, 1, 0
========================================================
Total: 2 factors (1 simple, 1 custom)
# Display the factor setup as a formatted table
kable(multiverse_specs_example_1$factors,
caption = "Defined factors for multiverse analysis")| label | column | decision | wf_internal | grouping_type |
|---|---|---|---|---|
| Age | age_group | U | wf_1 | simple |
| Risk_of_Bias | rob | U | wf_2 | custom |
The |E or |U tells the package whether this variable contains equivalent or uncertain studies.
Step 5: Create Analysis Specifications
Now we tell the package to create all possible combinations of our decisions, paired with different meta-analytic methods:
# Create all combinations
multiverse_full_example_1 <- multiverse_specs_example_1 %>%
create_multiverse_specifications(
# Try different statistical models
ma_methods = c("reml", "p-uniform", "waap", "rve"),
# How to handle multiple comparisons from same study
dependencies = c("select_max", "aggregate", "modeled")
)
[*] Multiverse Specifications Created
========================================================
126 specifications
1 multiverse(s)
2 factors included
4 methods x 3 dependencies
# Display the number of specifications created
specs_summary <- data.frame(
Metric = "Total specifications created",
Value = nrow(multiverse_full_example_1$specifications)
)
kable(specs_summary,
caption = "Multiverse specification summary")| Metric | Value |
|---|---|
| Total specifications created | 126 |
This creates dozens or hundreds of unique meta-analyses, each with different combinations of inclusion criteria and statistical methods.
Step 6: Run the Multiverse Analysis
Now we run all these analyses. This might take a few minutes:
# Run all analyses
results_example_1 <- run_multiverse_analysis(multiverse_full_example_1)# Create a formatted summary of the multiverse analysis results
n_failed <- results_example_1$n_attempted - results_example_1$n_successful
analysis_summary <- data.frame(
Metric = c("Total specifications", "Successful", "Failed", "Success rate"),
Value = c(
as.character(results_example_1$n_attempted),
as.character(results_example_1$n_successful),
as.character(n_failed),
paste0(round(100 * results_example_1$n_successful / results_example_1$n_attempted, 1), "%")
)
)
kable(analysis_summary,
caption = "Multiverse analysis execution summary")| Metric | Value |
|---|---|
| Total specifications | 126 |
| Successful | 90 |
| Failed | 36 |
| Success rate | 71.4% |
The package runs each meta-analysis and stores the results: effect size, confidence interval, p-value, and heterogeneity statistics.
Step 7: Visualize the Results
The real power comes from visualization. A specification curve shows how effect sizes vary across all your analytical choices:
# Plot specification curve
plot_spec_curve(results_example_1)This plot shows:
Top panel: Effect size for each analysis (sorted by magnitude)
Bottom panels: Which analytical choices were used for each analysis
You can immediately see:
How much do results vary?
Are some choices driving the results?
Is the effect robust across specifications?
Another useful visualization is the Vibration of Effects (VoE) plot:
# Explore relationship between effect size and significance
plot_voe(results_example_1)This shows the relationship between effect sizes and p-values across your multiverse.
Example 2: Type N Decisions - Separate Multiverses by Study Quality
In this example, we’ll use Type N decisions to create separate multiverses based on study quality. This is appropriate when different quality standards represent fundamentally different research questions.
Research question: How does the effectiveness of digital interventions vary when we apply different study quality criteria?
Since these represent different questions about evidence quality (not just sensitivity analyses), we use decision = "N":
multiverse_specs_example_2 <- data %>%
define_factors(
Age = "age_group|U",
# Type N: Each quality threshold is a separate research question
Risk_of_Bias = list(
"rob",
decision = "N", # Non-equivalent: creates separate multiverses
groups = list(
low_only = "4", # Only highest quality
low_moderate = c("4", "3"), # High + moderate quality
all_studies = c("4", "3", "2", "1", "0") # All studies
)
)
) %>%
create_multiverse_specifications(
ma_methods = c("reml", "p-uniform", "waap", "rve"),
dependencies = c("select_max", "aggregate", "modeled")
) %>%
run_multiverse_analysis()What’s different with Type N?
Creates 3 separate multiverses (one per quality threshold)
No “total_” option added (they shouldn’t be combined)
Each multiverse is analyzed independently
Results should be reported separately, not pooled
# Display the multiverse structure as a formatted table
multiverse_structure <- as.data.frame(table(multiverse_specs_example_2$results$multiverse_id))
kable(multiverse_structure,
col.names = c("Multiverse ID", "Number of Specifications"),
caption = "Separate multiverses by study quality threshold")| Multiverse ID | Number of Specifications |
|---|---|
| all_studies | 30 |
| low_moderate | 29 |
| low_only | 28 |
This shows we have three independent multiverses, each answering a distinct question about intervention effectiveness under different quality standards.
Visualize Quality-Stratified Results
# Plot specification curve showing all three multiverses
plot_spec_curve(multiverse_specs_example_2)Interpreting N-type multiverses:
The specification curve now shows results colored by multiverse_id. You can see:
Whether effect sizes are consistent across quality thresholds
If stricter quality criteria lead to larger/smaller effects
The range of uncertainty within each quality tier
# VoE plot across multiverses
plot_voe(multiverse_specs_example_2)Step 8: Interpret Your Results
Ask yourself:
How much do results vary? If all analyses point in the same direction with similar magnitudes, your conclusion is robust. If effect sizes range from positive to negative, your conclusion depends heavily on analytical choices.
Which choices matter most? Look at the bottom panels of the specification curve. Do certain inclusion criteria consistently produce larger or smaller effects?
Statistical significance: Do most analyses show significant effects, or does significance depend on which models you choose?
What This Tells Us
Multiverse meta-analysis doesn’t give you “the answer”—it gives you transparency about how confident you should be in your answer. If your conclusion holds across most reasonable analytical choices, you can be more confident. If it’s sensitive to specific decisions, that’s important to report.
This approach moves us from “the effect size is X” to “across plausible specifications, effect sizes range from Y to Z, with most falling around X.”
Going Further
This tutorial covered the practical workflow for multiverse meta-analysis. The metaMultiverse package can do much more:
Include bias-adjustment methods (PET-PEESE, selection models)
Explore moderator effects across specifications
Export results for custom visualizations
Use the interactive Shiny app for exploration
Advanced custom factor groupings
Bayesian meta-analytic methods
Package Documentation
For quick start: See vignette("getting-started", package = "metaMultiverse") for a streamlined introduction focusing on essential workflow steps.
For comprehensive theoretical background: See vignette("multiverse-theory-practice", package = "metaMultiverse") for in-depth coverage of:
The E/U/N decision framework (Del Giudice & Gangestad, 2021)
Advanced interpretation and reporting guidelines
Troubleshooting and edge cases
Complete methodological reference
Resources
Package:
Data Source:
Key References:
Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s guide to the multiverse. Advances in Methods and Practices in Psychological Science, 4(1), 1-15. DOI
Steegen, S., et al. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712. DOI
Voracek, M., Kossmeier, M., & Tran, U. S. (2019). Which data to meta-analyze, and how? Zeitschrift für Psychologie, 227(1), 64-82. DOI
Have questions or run into issues? Open an issue on GitHub or reach out.