Running Your First Multiverse Meta-Analysis

A step-by-step guide to exploring analytical robustness with the metaMultiverse package

tutorial

meta-analysis

multiverse

Published

October 21, 2025

Why Multiverse Meta-Analysis?

When conducting a meta-analysis, researchers face dozens of decisions: Which studies to include? How to handle outliers? Which statistical model to use? Each choice seems reasonable, but different choices can lead to different conclusions.

Rather than making one set of arbitrary decisions, multiverse meta-analysis systematically explores how different reasonable analytical choices affect your conclusions. This approach transforms researcher degrees of freedom from a source of concern into a tool for understanding robustness.

This tutorial walks you through running your first multiverse meta-analysis using real data from the Metapsy database.

What You’ll Need

Basic R knowledge is helpful, but I’ll explain each step. You’ll need:

R (version 4.0 or higher)
RStudio (recommended but not required)
About 30 minutes

Step 1: Install the Package

First, we need to install metaMultiverse for running the multiverse analysis. We’ll get the data directly from the Metapsy API (no package installation needed for that!).

Download `metaMultiverse`

# Install devtools if you don't have it
if (!require("devtools")) install.packages("devtools")

# Install metaMultiverse from GitHub
devtools::install_github("cyplessen/metaMultiverse", 
                         force = TRUE, 
                         upgrade = "never")

Download `metapsyTools`

# You could also use remotes instead of devtools:
if (!require("remotes"))
  install.packages("remotes")

remotes::install_github(
  "metapsy-project/metapsyTools")

Now load the packages we’ll need:

library(metaMultiverse)
library(metapsyTools)

library(dplyr)     # for data manipulation
library(jsonlite)  # for reading API data
library(knitr)     # for formatted tables

Step 2: Get the Data from Metapsy

Metapsy maintains databases of psychotherapy trials across different mental health conditions. For this guide, you can either use their API to download data on psychotherapy for depression, I have already installed it so I do not abuse their service too much :)

You could also download their data as a .csv file and load it into R as below:

data <- read.csv2("data-guide.csv") 

# You can get the depression psychotherapy database via API like this:
# The API requires 'shorthand' parameter and optional 'version' (defaults to 'latest')
# api_url <- "http://api.metapsy.org/v1/get_data?shorthand=depression-psyctr&version=latest"
# api_response <- fromJSON(api_url)
# 
# # Extract the data
# data <- as.data.frame(api_response$data)
# 
# # Take a look at what we have
skimr::skim(data)

Data summary
Name	data
Number of rows	900
Number of columns	69
_______________________
Column type frequency:
character	22
numeric	47
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
study	0	1.00	8	25	480
condition_arm1	0	1.00	3	9	9
condition_arm2	0	1.00	2	9	3
outcome_type	0	1.00	3	31	14
instrument	0	1.00	2	25	70
rating	0	1.00	9	11	2
time	0	1.00	4	4	1
comorbid_mental	0	1.00	1	1	2
format	0	1.00	3	5	8
format_details	5	0.99	3	43	37
country	0	1.00	2	3	7
age_group	2	1.00	3	13	6
recruitment	0	1.00	3	4	3
diagnosis	0	1.00	3	4	6
target_group	0	1.00	3	11	13
ba	0	1.00	1	2	4
full_ref	0	1.00	115	532	481
.id	39	0.96	48	113	861
multi_arm1	553	0.39	3	63	143
multi_arm2	553	0.39	2	24	9
dich_paper	843	0.06	1	113	44
other_stat	894	0.01	38	121	6

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
X	0	1.00	450.50	259.95	1.00	225.75	450.50	675.25	900.00	▇▇▇▇▇
.g	0	1.00	0.81	0.77	-0.70	0.32	0.65	1.06	5.24	▆▇▁▁▁
.g_se	0	1.00	0.31	0.13	0.07	0.21	0.28	0.39	0.95	▇▇▃▁▁
mean_arm1	83	0.91	13.32	10.11	0.54	8.11	11.30	15.50	89.10	▇▁▁▁▁
sd_arm1	90	0.90	7.05	4.10	0.13	4.60	6.38	8.90	59.86	▇▁▁▁▁
n_arm1	84	0.91	43.69	47.31	5.96	16.00	29.00	49.00	418.00	▇▁▁▁▁
mean_arm2	83	0.91	18.85	11.71	0.90	12.00	17.11	22.67	85.80	▇▆▁▁▁
sd_arm2	90	0.90	7.62	4.40	0.13	5.00	7.00	9.50	51.00	▇▂▁▁▁
n_arm2	84	0.91	43.51	49.04	4.00	14.75	30.00	50.00	514.00	▇▁▁▁▁
baseline_m_arm1	97	0.89	22.22	12.02	0.60	15.60	20.60	26.78	94.20	▇▇▁▁▁
baseline_sd_arm1	104	0.88	6.20	3.67	0.12	4.15	5.70	7.90	56.60	▇▁▁▁▁
baseline_n_arm1	93	0.90	47.90	55.70	6.00	18.00	32.00	54.00	614.00	▇▁▁▁▁
baseline_m_arm2	97	0.89	22.16	12.05	1.00	15.53	20.60	26.90	88.31	▆▇▁▁▁
baseline_sd_arm2	104	0.88	6.23	3.75	0.11	4.10	5.69	8.00	68.69	▇▁▁▁▁
baseline_n_arm2	91	0.90	46.85	56.05	6.00	15.00	31.00	53.00	635.00	▇▁▁▁▁
rand_arm1	49	0.95	51.84	57.21	7.00	20.00	36.00	60.50	578.00	▇▁▁▁▁
rand_arm2	45	0.95	50.26	57.23	6.00	19.00	35.00	59.00	562.00	▇▁▁▁▁
attr_arm1	93	0.90	10.28	17.27	0.00	2.00	6.00	12.00	241.00	▇▁▁▁▁
attr_arm2	90	0.90	8.68	14.60	0.00	1.00	4.00	11.00	217.00	▇▁▁▁▁
rand_ratio	0	1.00	1.04	0.22	1.00	1.00	1.00	1.00	3.00	▇▁▁▁▁
year	0	1.00	2009.36	12.08	1977.00	2004.00	2013.00	2018.00	2024.00	▂▂▂▅▇
time_weeks	159	0.82	8.80	10.23	0.00	0.00	8.00	13.00	78.00	▇▂▁▁▁
percent_women	15	0.98	0.72	0.21	0.00	0.62	0.74	0.85	1.00	▁▁▃▇▆
sg	0	1.00	0.60	0.49	0.00	0.00	1.00	1.00	1.00	▅▁▁▁▇
ac	0	1.00	0.47	0.50	0.00	0.00	0.00	1.00	1.00	▇▁▁▁▇
itt	0	1.00	0.60	0.49	0.00	0.00	1.00	1.00	1.00	▅▁▁▁▇
rob	0	1.00	2.63	1.22	0.00	1.00	3.00	4.00	4.00	▁▆▃▆▇
n_sessions_arm1	3	1.00	9.09	5.33	1.00	6.00	8.00	11.00	60.00	▇▂▁▁▁
mean_age	37	0.96	43.79	14.13	18.00	35.00	41.73	50.84	81.94	▃▇▅▂▂
event_arm1	846	0.06	34.94	79.03	7.00	13.00	16.50	25.75	576.00	▇▁▁▁▁
event_arm2	846	0.06	23.37	56.23	1.00	4.00	9.50	18.00	402.00	▇▁▁▁▁
totaln_arm1	845	0.06	60.27	95.43	15.00	22.00	34.00	50.50	578.00	▇▁▁▁▁
totaln_arm2	845	0.06	58.65	94.46	10.00	18.00	30.00	61.00	562.00	▇▁▁▁▁
.log_rr	841	0.07	0.63	0.57	-0.24	0.23	0.55	1.00	2.40	▆▇▅▁▁
.log_rr_se	841	0.07	0.34	0.21	0.03	0.21	0.29	0.45	1.02	▆▇▃▂▁
.event_arm1	841	0.07	36.07	76.69	7.00	13.00	18.00	27.50	576.00	▇▁▁▁▁
.event_arm2	841	0.07	24.36	54.44	1.00	4.00	12.00	21.50	402.00	▇▁▁▁▁
.totaln_arm1	841	0.07	60.92	93.83	15.00	22.50	34.00	48.50	578.00	▇▁▁▁▁
.totaln_arm2	841	0.07	59.25	93.05	10.00	19.00	34.00	57.00	562.00	▇▁▁▁▁
mean_change_arm1	881	0.02	-8.74	3.68	-17.50	-9.45	-8.30	-7.50	-0.65	▁▁▇▁▁
sd_change_arm1	881	0.02	6.45	3.18	0.80	4.46	6.30	8.97	11.81	▅▁▇▁▅
n_change_arm1	881	0.02	33.11	21.60	12.00	15.50	29.00	40.00	101.00	▇▃▂▁▁
mean_change_arm2	881	0.02	-4.94	3.35	-11.10	-6.66	-5.59	-1.90	-0.53	▃▁▇▁▇
sd_change_arm2	881	0.02	5.73	3.05	0.81	3.98	6.10	6.50	12.65	▅▃▇▁▂
n_change_arm2	881	0.02	32.32	22.38	10.00	15.00	32.00	40.00	102.00	▇▇▃▁▁
precalc_g	884	0.02	0.52	0.55	-0.14	0.15	0.43	0.71	2.11	▇▇▃▁▁
precalc_g_se	884	0.02	0.26	0.14	0.08	0.15	0.21	0.29	0.60	▇▃▁▁▂

This dataset contains information from randomized controlled trials comparing psychotherapy to control conditions for depression. Each row is a comparison from a study.

Note: The API has a rate limit, so please don’t make rapid repeated requests. For this tutorial, you only need to run this once - the data will be stored in your R session.

Validate Data Set

Validate your data set with the metapsyTools::checkDataFormat() function, as this ensures that the metaMultiverse package runs smoothly. See the documentation for the metaPsy data standard and how your data needs to be structured here.

Validate Data Format

data <- data %>% 
  # Validate data structure for internal functions metaMultiverse
  metaMultiverse::check_data_multiverse()  %>% 
  metapsyTools::checkDataFormat(
    must.contain = c(
      "study", 
      "condition_arm1",
      "condition_arm2",
      "yi",
      "vi"),
    variable.class = list(
      vi = "numeric",
      vi = "numeric"))

Generated es_id column using row numbers (1 to 900).

Converted .g and .g_se to yi and vi for metaMultiverse compatibility.

Warning in metaMultiverse::check_data_multiverse(.): Found 34 unreasonably
large d detected (|d| > 2.5). Check if SD and SE were confused when calculating
SMD.

250 studies contribute only one effect size, 230 studies contribute multiple effect sizes.

Data validation passed. Dataset is ready for multiverse analysis.

- [OK] Data set contains all variables in 'must.contain'.

- [OK] 'vi' has desired class numeric.
- [OK] 'vi' has desired class numeric.

You could also use the online tool to validate the data here.

Step 3: Understand the Data Structure

The dataset includes:

Effect sizes (g, g_se): How effective was the treatment?
Study characteristics: Sample size, year, country
Treatment details: Type of therapy, format, number of sessions
Population info: Age group, recruitment setting, comorbidity
Risk of bias: Quality ratings for each study

Let’s look at a few key variables:

# See what types of psychotherapy are included
condition_arm1_table <- as.data.frame(table(data$condition_arm1))
kable(condition_arm1_table,
      col.names = c("Intervention Type", "Count"),
      caption = "Types of psychotherapy interventions in the dataset")

Types of psychotherapy interventions in the dataset
Intervention Type	Count
3rd	67
bat	69
cbt	487
dyn	21
ipt	48
lrt	27
other psy	101
pst	53
sup	27

# See what types of control conditions are included
condition_arm2_table <- as.data.frame(table(data$condition_arm2))
kable(condition_arm2_table,
      col.names = c("Control Condition", "Count"),
      caption = "Types of control conditions in the dataset")

Types of control conditions in the dataset
Control Condition	Count
cau	354
other ctr	149
wl	397

Step 4: Define Your Multiverse

Now comes the interesting part: defining which analytical decisions to vary. We’ll use the E/U/N framework (Del Giudice & Gangestad, 2021):

The E/U/N Decision Framework

Not all analytical decisions are created equal. The framework distinguishes three types:

Type E (Equivalent): Options are theoretically interchangeable for your research question.

Example: Age groups (adults vs. mixed) when studying a universal phenomenon
In multiverse: Creates variations within a single multiverse
Interpretation: All options are included; adds “total” option combining all levels

Type U (Uncertain): Unclear which option is methodologically “correct.”

Example: Risk of bias thresholds—exclude only high-risk studies, or also “some concerns”?
In multiverse: Creates variations within a single multiverse
Interpretation: Exploring sensitivity to debated methodological choices

Type N (Non-equivalent): Options address fundamentally different research questions.

Example: Post-treatment vs. follow-up outcomes represent different constructs
In multiverse: Creates separate multiverses analyzed independently
Interpretation: These shouldn’t be combined; report separately

Defining Analytical Choices

Let’s define some decisions for our depression intervention data:

multiverse_specs_example_1 <- data %>%
  define_factors(
    Age = "age_group|U",
    Risk_of_Bias = list(
      "rob",
      decision = "U",
      groups = list(
        low_only = "4",
        low_moderate = c("4", "3"),
        all_studies = c("4", "3", "2", "1", "0")
      )
    )
  )


[OK] Factor setup complete
========================================================
[*] Age (simple)
   Column: age_group | Decision: U (Uncertain - will create multiverse options)
   Levels: adul, old, olderold, yadul, adol&yadul, Not specified (+ all combined)

[*] Risk_of_Bias (custom)
   Column: rob | Decision: U (Uncertain - will create multiverse options)
   Groups:
     - low_only: 4
     - low_moderate: 4, 3
     - all_studies: 4, 3, 2, 1, 0

========================================================
Total: 2 factors (1 simple, 1 custom)

# Display the factor setup as a formatted table
kable(multiverse_specs_example_1$factors,
      caption = "Defined factors for multiverse analysis")

Defined factors for multiverse analysis
label	column	decision	wf_internal	grouping_type
Age	age_group	U	wf_1	simple
Risk_of_Bias	rob	U	wf_2	custom

The |E or |U tells the package whether this variable contains equivalent or uncertain studies.

Step 5: Create Analysis Specifications

Now we tell the package to create all possible combinations of our decisions, paired with different meta-analytic methods:

# Create all combinations
multiverse_full_example_1 <- multiverse_specs_example_1 %>%
  create_multiverse_specifications(
    
    # Try different statistical models
    ma_methods = c("reml", "p-uniform", "waap", "rve"),
    
    # How to handle multiple comparisons from same study
    dependencies = c("select_max", "aggregate", "modeled")
  )


[*] Multiverse Specifications Created
========================================================
  126 specifications
  1 multiverse(s)
  2 factors included
  4 methods x 3 dependencies

# Display the number of specifications created
specs_summary <- data.frame(
  Metric = "Total specifications created",
  Value = nrow(multiverse_full_example_1$specifications)
)
kable(specs_summary,
      caption = "Multiverse specification summary")

Multiverse specification summary
Metric	Value
Total specifications created	126

This creates dozens or hundreds of unique meta-analyses, each with different combinations of inclusion criteria and statistical methods.

Step 6: Run the Multiverse Analysis

Now we run all these analyses. This might take a few minutes:

# Run all analyses
results_example_1 <- run_multiverse_analysis(multiverse_full_example_1)

# Create a formatted summary of the multiverse analysis results
n_failed <- results_example_1$n_attempted - results_example_1$n_successful

analysis_summary <- data.frame(
  Metric = c("Total specifications", "Successful", "Failed", "Success rate"),
  Value = c(
    as.character(results_example_1$n_attempted),
    as.character(results_example_1$n_successful),
    as.character(n_failed),
    paste0(round(100 * results_example_1$n_successful / results_example_1$n_attempted, 1), "%")
  )
)

kable(analysis_summary,
      caption = "Multiverse analysis execution summary")

Multiverse analysis execution summary
Metric	Value
Total specifications	126
Successful	90
Failed	36
Success rate	71.4%

The package runs each meta-analysis and stores the results: effect size, confidence interval, p-value, and heterogeneity statistics.

Step 7: Visualize the Results

The real power comes from visualization. A specification curve shows how effect sizes vary across all your analytical choices:

# Plot specification curve
plot_spec_curve(results_example_1)

This plot shows:

Top panel: Effect size for each analysis (sorted by magnitude)
Bottom panels: Which analytical choices were used for each analysis

You can immediately see:

How much do results vary?
Are some choices driving the results?
Is the effect robust across specifications?

Another useful visualization is the Vibration of Effects (VoE) plot:

# Explore relationship between effect size and significance
plot_voe(results_example_1)

This shows the relationship between effect sizes and p-values across your multiverse.

Example 2: Type N Decisions - Separate Multiverses by Study Quality

In this example, we’ll use Type N decisions to create separate multiverses based on study quality. This is appropriate when different quality standards represent fundamentally different research questions.

Research question: How does the effectiveness of digital interventions vary when we apply different study quality criteria?

Since these represent different questions about evidence quality (not just sensitivity analyses), we use decision = "N":

multiverse_specs_example_2 <- data %>%
  define_factors(
    Age = "age_group|U",

    # Type N: Each quality threshold is a separate research question
    Risk_of_Bias = list(
      "rob",
      decision = "N",  # Non-equivalent: creates separate multiverses
      groups = list(
        low_only = "4",                          # Only highest quality
        low_moderate = c("4", "3"),              # High + moderate quality
        all_studies = c("4", "3", "2", "1", "0") # All studies
      )
    )
  ) %>%
  create_multiverse_specifications(
    ma_methods = c("reml", "p-uniform", "waap", "rve"),
    dependencies = c("select_max", "aggregate", "modeled")
  ) %>%
  run_multiverse_analysis()

What’s different with Type N?

Creates 3 separate multiverses (one per quality threshold)
No “total_” option added (they shouldn’t be combined)
Each multiverse is analyzed independently
Results should be reported separately, not pooled

# Display the multiverse structure as a formatted table
multiverse_structure <- as.data.frame(table(multiverse_specs_example_2$results$multiverse_id))
kable(multiverse_structure,
      col.names = c("Multiverse ID", "Number of Specifications"),
      caption = "Separate multiverses by study quality threshold")

Separate multiverses by study quality threshold
Multiverse ID	Number of Specifications
all_studies	30
low_moderate	29
low_only	28

This shows we have three independent multiverses, each answering a distinct question about intervention effectiveness under different quality standards.

Visualize Quality-Stratified Results

# Plot specification curve showing all three multiverses
plot_spec_curve(multiverse_specs_example_2)

Interpreting N-type multiverses:

The specification curve now shows results colored by multiverse_id. You can see:

Whether effect sizes are consistent across quality thresholds
If stricter quality criteria lead to larger/smaller effects
The range of uncertainty within each quality tier

# VoE plot across multiverses
plot_voe(multiverse_specs_example_2)

Step 8: Interpret Your Results

Ask yourself:

How much do results vary? If all analyses point in the same direction with similar magnitudes, your conclusion is robust. If effect sizes range from positive to negative, your conclusion depends heavily on analytical choices.
Which choices matter most? Look at the bottom panels of the specification curve. Do certain inclusion criteria consistently produce larger or smaller effects?
Statistical significance: Do most analyses show significant effects, or does significance depend on which models you choose?

What This Tells Us

Multiverse meta-analysis doesn’t give you “the answer”—it gives you transparency about how confident you should be in your answer. If your conclusion holds across most reasonable analytical choices, you can be more confident. If it’s sensitive to specific decisions, that’s important to report.

This approach moves us from “the effect size is X” to “across plausible specifications, effect sizes range from Y to Z, with most falling around X.”

Going Further

This tutorial covered the practical workflow for multiverse meta-analysis. The metaMultiverse package can do much more:

Include bias-adjustment methods (PET-PEESE, selection models)
Explore moderator effects across specifications
Export results for custom visualizations
Use the interactive Shiny app for exploration
Advanced custom factor groupings
Bayesian meta-analytic methods

Package Documentation

For quick start: See vignette("getting-started", package = "metaMultiverse") for a streamlined introduction focusing on essential workflow steps.

For comprehensive theoretical background: See vignette("multiverse-theory-practice", package = "metaMultiverse") for in-depth coverage of:

The E/U/N decision framework (Del Giudice & Gangestad, 2021)
Advanced interpretation and reporting guidelines
Troubleshooting and edge cases
Complete methodological reference

Resources

Package:

Data Source:

Key References:

Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s guide to the multiverse. Advances in Methods and Practices in Psychological Science, 4(1), 1-15. DOI
Steegen, S., et al. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712. DOI
Voracek, M., Kossmeier, M., & Tran, U. S. (2019). Which data to meta-analyze, and how? Zeitschrift für Psychologie, 227(1), 64-82. DOI

Have questions or run into issues? Open an issue on GitHub or reach out.