Using Large Language Models to Measure U.S. Retirement Plan Design at Scale: Methods and Evidence
Published in Forthcoming, 2026
Using an LLM-based pipeline we produce a dataset covering nearly 150,000 distinct plans over 21 years (2003-2024), yielding over one million plan-year observations: more than twentyfold the coverage of existing hand-collected data. This expansion enables the first comprehensive population-level analysis of retirement plan design.
Recommended citation: Taha Choukhmane, John M. Dedyo, Cormac O'Dea, Lawrence D.W. Schmidt. (2026). 'Using Large Language Models to Measure U.S. Retirement Plan Design at Scale: Methods and Evidence' Forthcoming.
Download Paper | Download Slides
