Using Large Language Models to Measure U.S. Retirement Plan Design at Scale: Methods and Evidence

Published in Forthcoming, 2026

We show how large language models can transform labor-intensive data collection in economics by constructing a comprehensive dataset of U.S. retirement plan characteristics from regulatory filings. Using a hand-collected dataset of 6,200 plans as training data, we fine-tune LLMs to automatically extract matching formulas, vesting schedules, and auto-enrollment provisions from unstructured text in regulatory filings. Our approach combines snippet extraction, retrieval-augmented generation, and double-coding validation to ensure data quality. Using this LLM-based pipeline we produce a dataset covering nearly 150,000 distinct plans over 21 years (2003-2024), yielding over one million plan-year observations: more than twentyfold the coverage of the original hand-collected data. This expansion enables the first comprehensive population-level analysis of retirement plan design. We document substantial increases in plan generosity, the increasingly powerful role of federal safe-harbor regulations in shaping firm plan offerings, and the dramatic rise of automatic enrollment from near zero to 40% of plans, illustrating how LLMs can make previously infeasible empirical research practical at scale.

Recommended citation: Taha Choukhmane, John M. Dedyo, Cormac O'Dea, Lawrence D.W. Schmidt. (2026). 'Using Large Language Models to Measure U.S. Retirement Plan Design at Scale: Methods and Evidence' Forthcoming.
Download Paper | Download Slides