About the Tutorial
Welcome to An Introduction to Simulating Human Survey Responses with Large Language Models: Potentials and Pitfalls. This tutorial takes place at the 12th International Conference on Computational Social Science (IC²S²) in Burlington, Vermont. This tutorialâs goal is to provide a hands-on introduction to simulating human survey responses with LLMs, with a focus on survey-centric use cases including survey pretesting, hybrid designs that combine human and simulated respondents, and missing-data imputation.
Overview
This tutorial provides a hands-on introduction to simulating human survey responses with Large Language Models (LLMs), focusing on the methodological rigor required to use âsilicon samplesâ to complement or extend human data. While this approach offers promise for rapid pretesting, counterfactual analysis, and enhancing statistical power through mixed-subjects designs, it introduces new methodological choices, assumptions, and risks that require careful scrutiny.
To that end, this tutorial addresses analytic flexibility in silicon samples. Participants will learn to systematically explore how design decisionsâsuch as persona construction and prompting strategiesâmeaningfully shift results, rather than treating LLM outputs as fixed. The tutorial introduces the QSTN framework, a tool designed to structure simulations and support transparent evaluation across design alternatives. Through guided Python exercises, participants will generate simulated responses for use cases like missing-data imputation and compare modelling choices using multiple evaluation metrics. The tutorial concludes with a critical discussion of methodological limitations, validation challenges, and ethical considerations surrounding autonomy and appropriate use cases for silicon samples. By the end of the tutorial, researchers will be equipped with a principled, transparent approach to integrating simulations into survey-centric social science workflows.
Learning Objectives
By the end of the session, participants will:
- Understand how to implement LLM-based survey simulations
- Learn the QSTN framework, a tool designed to structure LLM-based survey simulations and support transparent evaluation across design alternatives
- Generate simulated survey responses and compare modeling choices
- Evaluate outputs using multiple metrics
- Engage in critical discussion of methodological limitations, validation challenges, and ethical considerations
Data and Tools
We introduce QSTN, a Python framework developed to structure LLM-based survey simulations. Through guided hands-on exercises, participants will generate simulated survey responses, compare modeling choices, and evaluate outputs using multiple metrics.
The relevant material for the turorial can be retrieved by cloning this repository: Tutorial: Simulating Human Response Generation
Target Audience
The tutorial is intended for researchers and graduate students in computational social science, political science, sociology, communication, and related fields. No prior experience with LLMs is required, but basic familiarity with surveys and introductory Python (or willingness to follow along conceptually) is recommended.
Organizers
- Georg Ahnert, University of Mannheim
- Kristina GligoriÄ, Johns Hopkins University
- Indira Sen, University of Mannheim
- Maximilian Kreutner, University of Mannheim
- Jens Rupprecht, University of Mannheim
- Markus Strohmaier, University of Mannheim, Complexity Science Hub (Vienna), GESIS (Cologne)