An Introduction to Simulating Human Survey Responses with Large Language Models: Potentials and Pitfalls

A tutorial co-located with the 12th International Conference on Computational Social Science (IC²S²) in Burlington, Vermont.

About the Tutorial

Welcome to An Introduction to Simulating Human Survey Responses with Large Language Models: Potentials and Pitfalls. This tutorial takes place at the 12th International Conference on Computational Social Science (IC²S²) in Burlington, Vermont. This tutorial’s goal is to provide a hands-on introduction to simulating human survey responses with LLMs, with a focus on survey-centric use cases including survey pretesting, hybrid designs that combine human and simulated respondents, and missing-data imputation.

Overview

This tutorial provides a hands-on introduction to simulating human survey responses with Large Language Models (LLMs), focusing on the methodological rigor required to use “silicon samples” to complement or extend human data. While this approach offers promise for rapid pretesting, counterfactual analysis, and enhancing statistical power through mixed-subjects designs, it introduces new methodological choices, assumptions, and risks that require careful scrutiny.

To that end, this tutorial addresses analytic flexibility in silicon samples. Participants will learn to systematically explore how design decisions—such as persona construction and prompting strategies—meaningfully shift results, rather than treating LLM outputs as fixed. The tutorial introduces the QSTN framework, a tool designed to structure simulations and support transparent evaluation across design alternatives. Through guided Python exercises, participants will generate simulated responses for use cases like missing-data imputation and compare modelling choices using multiple evaluation metrics. The tutorial concludes with a critical discussion of methodological limitations, validation challenges, and ethical considerations surrounding autonomy and appropriate use cases for silicon samples. By the end of the tutorial, researchers will be equipped with a principled, transparent approach to integrating simulations into survey-centric social science workflows.

Learning Objectives

By the end of the session, participants will:

Data and Tools

We introduce QSTN, a Python framework developed to structure LLM-based survey simulations. Through guided hands-on exercises, participants will generate simulated survey responses, compare modeling choices, and evaluate outputs using multiple metrics.

The relevant material for the turorial can be retrieved by cloning this repository: Tutorial: Simulating Human Response Generation

Target Audience

The tutorial is intended for researchers and graduate students in computational social science, political science, sociology, communication, and related fields. No prior experience with LLMs is required, but basic familiarity with surveys and introductory Python (or willingness to follow along conceptually) is recommended.

Organizers