Constantin Yves Plessen, PhD.
  • Home
  • Publications
  • Code
  • Blog

On this page

  • Why Multiverse Meta-Analysis?
    • What You’ll Need
    • Step 1: Install the Package
      • Download metaMultiverse
      • Download metapsyTools
    • Step 2: Get the Data from Metapsy
    • Validate Data Set
      • Validate Data Format
    • Step 3: Understand the Data Structure
    • Step 4: Define Your Multiverse
      • The E/U/N Decision Framework
      • Defining Analytical Choices
    • Step 5: Create Analysis Specifications
    • Step 6: Run the Multiverse Analysis
    • Step 7: Visualize the Results
      • Example 2: Type N Decisions - Separate Multiverses by Study Quality
    • Step 8: Interpret Your Results
    • What This Tells Us
    • Going Further
      • Package Documentation
    • Resources

Running Your First Multiverse Meta-Analysis

A step-by-step guide to exploring analytical robustness with the metaMultiverse package

tutorial
meta-analysis
R
multiverse
Published

October 21, 2025

Why Multiverse Meta-Analysis?

When conducting a meta-analysis, researchers face dozens of decisions: Which studies to include? How to handle outliers? Which statistical model to use? Each choice seems reasonable, but different choices can lead to different conclusions.

Rather than making one set of arbitrary decisions, multiverse meta-analysis systematically explores how different reasonable analytical choices affect your conclusions. This approach transforms researcher degrees of freedom from a source of concern into a tool for understanding robustness.

This tutorial walks you through running your first multiverse meta-analysis using real data from the Metapsy database.

What You’ll Need

Basic R knowledge is helpful, but I’ll explain each step. You’ll need:

  • R (version 4.0 or higher)

  • RStudio (recommended but not required)

  • About 30 minutes


Step 1: Install the Package

metaMultiverse hex logo

First, we need to install metaMultiverse for running the multiverse analysis. We’ll get the data directly from the Metapsy API (no package installation needed for that!).

Download metaMultiverse

# Install devtools if you don't have it
if (!require("devtools")) install.packages("devtools")

# Install metaMultiverse from GitHub
devtools::install_github("cyplessen/metaMultiverse", 
                         force = TRUE, 
                         upgrade = "never")

Download metapsyTools

# You could also use remotes instead of devtools:
if (!require("remotes"))
  install.packages("remotes")

remotes::install_github(
  "metapsy-project/metapsyTools")

Now load the packages we’ll need:

library(metaMultiverse)
library(metapsyTools)

library(dplyr)     # for data manipulation
library(jsonlite)  # for reading API data
library(knitr)     # for formatted tables

Step 2: Get the Data from Metapsy

Metapsy maintains databases of psychotherapy trials across different mental health conditions. For this guide, you can either use their API to download data on psychotherapy for depression, I have already installed it so I do not abuse their service too much :)

You could also download their data as a .csv file and load it into R as below:

data <- read.csv2("data-guide.csv") 

# You can get the depression psychotherapy database via API like this:
# The API requires 'shorthand' parameter and optional 'version' (defaults to 'latest')
# api_url <- "http://api.metapsy.org/v1/get_data?shorthand=depression-psyctr&version=latest"
# api_response <- fromJSON(api_url)
# 
# # Extract the data
# data <- as.data.frame(api_response$data)
# 
# # Take a look at what we have
skimr::skim(data)
Data summary
Name data
Number of rows 900
Number of columns 69
_______________________
Column type frequency:
character 22
numeric 47
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
study 0 1.00 8 25 0 480 0
condition_arm1 0 1.00 3 9 0 9 0
condition_arm2 0 1.00 2 9 0 3 0
outcome_type 0 1.00 3 31 0 14 0
instrument 0 1.00 2 25 0 70 0
rating 0 1.00 9 11 0 2 0
time 0 1.00 4 4 0 1 0
comorbid_mental 0 1.00 1 1 0 2 0
format 0 1.00 3 5 0 8 0
format_details 5 0.99 3 43 0 37 0
country 0 1.00 2 3 0 7 0
age_group 2 1.00 3 13 0 6 0
recruitment 0 1.00 3 4 0 3 0
diagnosis 0 1.00 3 4 0 6 0
target_group 0 1.00 3 11 0 13 0
ba 0 1.00 1 2 0 4 0
full_ref 0 1.00 115 532 0 481 0
.id 39 0.96 48 113 0 861 0
multi_arm1 553 0.39 3 63 0 143 0
multi_arm2 553 0.39 2 24 0 9 0
dich_paper 843 0.06 1 113 0 44 0
other_stat 894 0.01 38 121 0 6 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
X 0 1.00 450.50 259.95 1.00 225.75 450.50 675.25 900.00 ▇▇▇▇▇
.g 0 1.00 0.81 0.77 -0.70 0.32 0.65 1.06 5.24 ▆▇▁▁▁
.g_se 0 1.00 0.31 0.13 0.07 0.21 0.28 0.39 0.95 ▇▇▃▁▁
mean_arm1 83 0.91 13.32 10.11 0.54 8.11 11.30 15.50 89.10 ▇▁▁▁▁
sd_arm1 90 0.90 7.05 4.10 0.13 4.60 6.38 8.90 59.86 ▇▁▁▁▁
n_arm1 84 0.91 43.69 47.31 5.96 16.00 29.00 49.00 418.00 ▇▁▁▁▁
mean_arm2 83 0.91 18.85 11.71 0.90 12.00 17.11 22.67 85.80 ▇▆▁▁▁
sd_arm2 90 0.90 7.62 4.40 0.13 5.00 7.00 9.50 51.00 ▇▂▁▁▁
n_arm2 84 0.91 43.51 49.04 4.00 14.75 30.00 50.00 514.00 ▇▁▁▁▁
baseline_m_arm1 97 0.89 22.22 12.02 0.60 15.60 20.60 26.78 94.20 ▇▇▁▁▁
baseline_sd_arm1 104 0.88 6.20 3.67 0.12 4.15 5.70 7.90 56.60 ▇▁▁▁▁
baseline_n_arm1 93 0.90 47.90 55.70 6.00 18.00 32.00 54.00 614.00 ▇▁▁▁▁
baseline_m_arm2 97 0.89 22.16 12.05 1.00 15.53 20.60 26.90 88.31 ▆▇▁▁▁
baseline_sd_arm2 104 0.88 6.23 3.75 0.11 4.10 5.69 8.00 68.69 ▇▁▁▁▁
baseline_n_arm2 91 0.90 46.85 56.05 6.00 15.00 31.00 53.00 635.00 ▇▁▁▁▁
rand_arm1 49 0.95 51.84 57.21 7.00 20.00 36.00 60.50 578.00 ▇▁▁▁▁
rand_arm2 45 0.95 50.26 57.23 6.00 19.00 35.00 59.00 562.00 ▇▁▁▁▁
attr_arm1 93 0.90 10.28 17.27 0.00 2.00 6.00 12.00 241.00 ▇▁▁▁▁
attr_arm2 90 0.90 8.68 14.60 0.00 1.00 4.00 11.00 217.00 ▇▁▁▁▁
rand_ratio 0 1.00 1.04 0.22 1.00 1.00 1.00 1.00 3.00 ▇▁▁▁▁
year 0 1.00 2009.36 12.08 1977.00 2004.00 2013.00 2018.00 2024.00 ▂▂▂▅▇
time_weeks 159 0.82 8.80 10.23 0.00 0.00 8.00 13.00 78.00 ▇▂▁▁▁
percent_women 15 0.98 0.72 0.21 0.00 0.62 0.74 0.85 1.00 ▁▁▃▇▆
sg 0 1.00 0.60 0.49 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
ac 0 1.00 0.47 0.50 0.00 0.00 0.00 1.00 1.00 ▇▁▁▁▇
itt 0 1.00 0.60 0.49 0.00 0.00 1.00 1.00 1.00 ▅▁▁▁▇
rob 0 1.00 2.63 1.22 0.00 1.00 3.00 4.00 4.00 ▁▆▃▆▇
n_sessions_arm1 3 1.00 9.09 5.33 1.00 6.00 8.00 11.00 60.00 ▇▂▁▁▁
mean_age 37 0.96 43.79 14.13 18.00 35.00 41.73 50.84 81.94 ▃▇▅▂▂
event_arm1 846 0.06 34.94 79.03 7.00 13.00 16.50 25.75 576.00 ▇▁▁▁▁
event_arm2 846 0.06 23.37 56.23 1.00 4.00 9.50 18.00 402.00 ▇▁▁▁▁
totaln_arm1 845 0.06 60.27 95.43 15.00 22.00 34.00 50.50 578.00 ▇▁▁▁▁
totaln_arm2 845 0.06 58.65 94.46 10.00 18.00 30.00 61.00 562.00 ▇▁▁▁▁
.log_rr 841 0.07 0.63 0.57 -0.24 0.23 0.55 1.00 2.40 ▆▇▅▁▁
.log_rr_se 841 0.07 0.34 0.21 0.03 0.21 0.29 0.45 1.02 ▆▇▃▂▁
.event_arm1 841 0.07 36.07 76.69 7.00 13.00 18.00 27.50 576.00 ▇▁▁▁▁
.event_arm2 841 0.07 24.36 54.44 1.00 4.00 12.00 21.50 402.00 ▇▁▁▁▁
.totaln_arm1 841 0.07 60.92 93.83 15.00 22.50 34.00 48.50 578.00 ▇▁▁▁▁
.totaln_arm2 841 0.07 59.25 93.05 10.00 19.00 34.00 57.00 562.00 ▇▁▁▁▁
mean_change_arm1 881 0.02 -8.74 3.68 -17.50 -9.45 -8.30 -7.50 -0.65 ▁▁▇▁▁
sd_change_arm1 881 0.02 6.45 3.18 0.80 4.46 6.30 8.97 11.81 ▅▁▇▁▅
n_change_arm1 881 0.02 33.11 21.60 12.00 15.50 29.00 40.00 101.00 ▇▃▂▁▁
mean_change_arm2 881 0.02 -4.94 3.35 -11.10 -6.66 -5.59 -1.90 -0.53 ▃▁▇▁▇
sd_change_arm2 881 0.02 5.73 3.05 0.81 3.98 6.10 6.50 12.65 ▅▃▇▁▂
n_change_arm2 881 0.02 32.32 22.38 10.00 15.00 32.00 40.00 102.00 ▇▇▃▁▁
precalc_g 884 0.02 0.52 0.55 -0.14 0.15 0.43 0.71 2.11 ▇▇▃▁▁
precalc_g_se 884 0.02 0.26 0.14 0.08 0.15 0.21 0.29 0.60 ▇▃▁▁▂

This dataset contains information from randomized controlled trials comparing psychotherapy to control conditions for depression. Each row is a comparison from a study.

Note: The API has a rate limit, so please don’t make rapid repeated requests. For this tutorial, you only need to run this once - the data will be stored in your R session.

Validate Data Set

Validate your data set with the metapsyTools::checkDataFormat() function, as this ensures that the metaMultiverse package runs smoothly. See the documentation for the metaPsy data standard and how your data needs to be structured here.

Validate Data Format

data <- data %>% 
  # Validate data structure for internal functions metaMultiverse
  metaMultiverse::check_data_multiverse()  %>% 
  metapsyTools::checkDataFormat(
    must.contain = c(
      "study", 
      "condition_arm1",
      "condition_arm2",
      "yi",
      "vi"),
    variable.class = list(
      vi = "numeric",
      vi = "numeric"))
Generated es_id column using row numbers (1 to 900).
Converted .g and .g_se to yi and vi for metaMultiverse compatibility.
Warning in metaMultiverse::check_data_multiverse(.): Found 34 unreasonably
large d detected (|d| > 2.5). Check if SD and SE were confused when calculating
SMD.
250 studies contribute only one effect size, 230 studies contribute multiple effect sizes.
Data validation passed. Dataset is ready for multiverse analysis.
- [OK] Data set contains all variables in 'must.contain'.
- [OK] 'vi' has desired class numeric.
- [OK] 'vi' has desired class numeric.

You could also use the online tool to validate the data here.


Step 3: Understand the Data Structure

The dataset includes:

  • Effect sizes (g, g_se): How effective was the treatment?

  • Study characteristics: Sample size, year, country

  • Treatment details: Type of therapy, format, number of sessions

  • Population info: Age group, recruitment setting, comorbidity

  • Risk of bias: Quality ratings for each study

Let’s look at a few key variables:

# See what types of psychotherapy are included
condition_arm1_table <- as.data.frame(table(data$condition_arm1))
kable(condition_arm1_table,
      col.names = c("Intervention Type", "Count"),
      caption = "Types of psychotherapy interventions in the dataset")
Types of psychotherapy interventions in the dataset
Intervention Type Count
3rd 67
bat 69
cbt 487
dyn 21
ipt 48
lrt 27
other psy 101
pst 53
sup 27
# See what types of control conditions are included
condition_arm2_table <- as.data.frame(table(data$condition_arm2))
kable(condition_arm2_table,
      col.names = c("Control Condition", "Count"),
      caption = "Types of control conditions in the dataset")
Types of control conditions in the dataset
Control Condition Count
cau 354
other ctr 149
wl 397

Step 4: Define Your Multiverse

Now comes the interesting part: defining which analytical decisions to vary. We’ll use the E/U/N framework (Del Giudice & Gangestad, 2021):

The E/U/N Decision Framework

Not all analytical decisions are created equal. The framework distinguishes three types:

Type E (Equivalent): Options are theoretically interchangeable for your research question.

  • Example: Age groups (adults vs. mixed) when studying a universal phenomenon

  • In multiverse: Creates variations within a single multiverse

  • Interpretation: All options are included; adds “total” option combining all levels

Type U (Uncertain): Unclear which option is methodologically “correct.”

  • Example: Risk of bias thresholds—exclude only high-risk studies, or also “some concerns”?

  • In multiverse: Creates variations within a single multiverse

  • Interpretation: Exploring sensitivity to debated methodological choices

Type N (Non-equivalent): Options address fundamentally different research questions.

  • Example: Post-treatment vs. follow-up outcomes represent different constructs

  • In multiverse: Creates separate multiverses analyzed independently

  • Interpretation: These shouldn’t be combined; report separately

Defining Analytical Choices

Let’s define some decisions for our depression intervention data:

multiverse_specs_example_1 <- data %>%
  define_factors(
    Age = "age_group|U",
    Risk_of_Bias = list(
      "rob",
      decision = "U",
      groups = list(
        low_only = "4",
        low_moderate = c("4", "3"),
        all_studies = c("4", "3", "2", "1", "0")
      )
    )
  )

[OK] Factor setup complete
========================================================
[*] Age (simple)
   Column: age_group | Decision: U (Uncertain - will create multiverse options)
   Levels: adul, old, olderold, yadul, adol&yadul, Not specified (+ all combined)

[*] Risk_of_Bias (custom)
   Column: rob | Decision: U (Uncertain - will create multiverse options)
   Groups:
     - low_only: 4
     - low_moderate: 4, 3
     - all_studies: 4, 3, 2, 1, 0

========================================================
Total: 2 factors (1 simple, 1 custom)
# Display the factor setup as a formatted table
kable(multiverse_specs_example_1$factors,
      caption = "Defined factors for multiverse analysis")
Defined factors for multiverse analysis
label column decision wf_internal grouping_type
Age age_group U wf_1 simple
Risk_of_Bias rob U wf_2 custom

The |E or |U tells the package whether this variable contains equivalent or uncertain studies.


Step 5: Create Analysis Specifications

Now we tell the package to create all possible combinations of our decisions, paired with different meta-analytic methods:

# Create all combinations
multiverse_full_example_1 <- multiverse_specs_example_1 %>%
  create_multiverse_specifications(
    
    # Try different statistical models
    ma_methods = c("reml", "p-uniform", "waap", "rve"),
    
    # How to handle multiple comparisons from same study
    dependencies = c("select_max", "aggregate", "modeled")
  )

[*] Multiverse Specifications Created
========================================================
  126 specifications
  1 multiverse(s)
  2 factors included
  4 methods x 3 dependencies
# Display the number of specifications created
specs_summary <- data.frame(
  Metric = "Total specifications created",
  Value = nrow(multiverse_full_example_1$specifications)
)
kable(specs_summary,
      caption = "Multiverse specification summary")
Multiverse specification summary
Metric Value
Total specifications created 126

This creates dozens or hundreds of unique meta-analyses, each with different combinations of inclusion criteria and statistical methods.


Step 6: Run the Multiverse Analysis

Now we run all these analyses. This might take a few minutes:

# Run all analyses
results_example_1 <- run_multiverse_analysis(multiverse_full_example_1)
# Create a formatted summary of the multiverse analysis results
n_failed <- results_example_1$n_attempted - results_example_1$n_successful

analysis_summary <- data.frame(
  Metric = c("Total specifications", "Successful", "Failed", "Success rate"),
  Value = c(
    as.character(results_example_1$n_attempted),
    as.character(results_example_1$n_successful),
    as.character(n_failed),
    paste0(round(100 * results_example_1$n_successful / results_example_1$n_attempted, 1), "%")
  )
)

kable(analysis_summary,
      caption = "Multiverse analysis execution summary")
Multiverse analysis execution summary
Metric Value
Total specifications 126
Successful 90
Failed 36
Success rate 71.4%

The package runs each meta-analysis and stores the results: effect size, confidence interval, p-value, and heterogeneity statistics.


Step 7: Visualize the Results

The real power comes from visualization. A specification curve shows how effect sizes vary across all your analytical choices:

# Plot specification curve
plot_spec_curve(results_example_1)

This plot shows:

  • Top panel: Effect size for each analysis (sorted by magnitude)

  • Bottom panels: Which analytical choices were used for each analysis

You can immediately see:

  • How much do results vary?

  • Are some choices driving the results?

  • Is the effect robust across specifications?

Another useful visualization is the Vibration of Effects (VoE) plot:

# Explore relationship between effect size and significance
plot_voe(results_example_1)

This shows the relationship between effect sizes and p-values across your multiverse.


Example 2: Type N Decisions - Separate Multiverses by Study Quality

In this example, we’ll use Type N decisions to create separate multiverses based on study quality. This is appropriate when different quality standards represent fundamentally different research questions.

Research question: How does the effectiveness of digital interventions vary when we apply different study quality criteria?

Since these represent different questions about evidence quality (not just sensitivity analyses), we use decision = "N":

multiverse_specs_example_2 <- data %>%
  define_factors(
    Age = "age_group|U",

    # Type N: Each quality threshold is a separate research question
    Risk_of_Bias = list(
      "rob",
      decision = "N",  # Non-equivalent: creates separate multiverses
      groups = list(
        low_only = "4",                          # Only highest quality
        low_moderate = c("4", "3"),              # High + moderate quality
        all_studies = c("4", "3", "2", "1", "0") # All studies
      )
    )
  ) %>%
  create_multiverse_specifications(
    ma_methods = c("reml", "p-uniform", "waap", "rve"),
    dependencies = c("select_max", "aggregate", "modeled")
  ) %>%
  run_multiverse_analysis()

What’s different with Type N?

  • Creates 3 separate multiverses (one per quality threshold)

  • No “total_” option added (they shouldn’t be combined)

  • Each multiverse is analyzed independently

  • Results should be reported separately, not pooled

# Display the multiverse structure as a formatted table
multiverse_structure <- as.data.frame(table(multiverse_specs_example_2$results$multiverse_id))
kable(multiverse_structure,
      col.names = c("Multiverse ID", "Number of Specifications"),
      caption = "Separate multiverses by study quality threshold")
Separate multiverses by study quality threshold
Multiverse ID Number of Specifications
all_studies 30
low_moderate 29
low_only 28

This shows we have three independent multiverses, each answering a distinct question about intervention effectiveness under different quality standards.

Visualize Quality-Stratified Results

# Plot specification curve showing all three multiverses
plot_spec_curve(multiverse_specs_example_2)

Interpreting N-type multiverses:

The specification curve now shows results colored by multiverse_id. You can see:

  • Whether effect sizes are consistent across quality thresholds

  • If stricter quality criteria lead to larger/smaller effects

  • The range of uncertainty within each quality tier

# VoE plot across multiverses
plot_voe(multiverse_specs_example_2)

Step 8: Interpret Your Results

Ask yourself:

  1. How much do results vary? If all analyses point in the same direction with similar magnitudes, your conclusion is robust. If effect sizes range from positive to negative, your conclusion depends heavily on analytical choices.

  2. Which choices matter most? Look at the bottom panels of the specification curve. Do certain inclusion criteria consistently produce larger or smaller effects?

  3. Statistical significance: Do most analyses show significant effects, or does significance depend on which models you choose?


What This Tells Us

Multiverse meta-analysis doesn’t give you “the answer”—it gives you transparency about how confident you should be in your answer. If your conclusion holds across most reasonable analytical choices, you can be more confident. If it’s sensitive to specific decisions, that’s important to report.

This approach moves us from “the effect size is X” to “across plausible specifications, effect sizes range from Y to Z, with most falling around X.”


Going Further

This tutorial covered the practical workflow for multiverse meta-analysis. The metaMultiverse package can do much more:

  • Include bias-adjustment methods (PET-PEESE, selection models)

  • Explore moderator effects across specifications

  • Export results for custom visualizations

  • Use the interactive Shiny app for exploration

  • Advanced custom factor groupings

  • Bayesian meta-analytic methods

Package Documentation

For quick start: See vignette("getting-started", package = "metaMultiverse") for a streamlined introduction focusing on essential workflow steps.

For comprehensive theoretical background: See vignette("multiverse-theory-practice", package = "metaMultiverse") for in-depth coverage of:

  • The E/U/N decision framework (Del Giudice & Gangestad, 2021)

  • Advanced interpretation and reporting guidelines

  • Troubleshooting and edge cases

  • Complete methodological reference


Resources

Package:

  • metaMultiverse GitHub

  • Package Documentation

Data Source:

  • Metapsy Database

  • Metapsy Documentation

Key References:

  • Del Giudice, M., & Gangestad, S. W. (2021). A traveler’s guide to the multiverse. Advances in Methods and Practices in Psychological Science, 4(1), 1-15. DOI

  • Steegen, S., et al. (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702-712. DOI

  • Voracek, M., Kossmeier, M., & Tran, U. S. (2019). Which data to meta-analyze, and how? Zeitschrift für Psychologie, 227(1), 64-82. DOI


Have questions or run into issues? Open an issue on GitHub or reach out.

 
  • © 2025 Constantin Yves Plessen
Cookie Preferences