Open Benchmark of
AI Impact on Humans

How does using AI ?

The first open benchmark measuring AI's impact on human well-being across physical, psychological, and societal dimensions.

375

Metrics

Tested across realistic scenarios.

AI Systems Evaluated

Compared on the same standard.

Expert-Submitted Benchmarks

Spanning clinical, legal, and educational constructs.

Domains of Human Impact

Physical, psychological, societal.

Measuring AI's impact on people

Today's AI benchmarks measure what models can do: accuracy, reasoning, task completion. They say almost nothing about what AI does to the people who rely on it. Two models with identical capability scores can shape a user's autonomy, mental health, and relationships in completely different ways, and the field has had no shared way to tell them apart. ImpactBench is built to answer a different question: across realistic, multi-turn conversations, does an AI system support or undermine human flourishing?

Introducing ImpactBench

ImpactBench evaluates 14 leading AI systems against 18 expert-submitted benchmarks spanning physical, psychological, and societal impact. Each construct is contributed by clinicians, educators, legal scholars, and community advocates through an open submission process, then tested through multi-turn adversarial simulation with demographically stratified personas: the way harms actually unfold in real conversations, not in isolated prompts. Every score is paired with reliability checks so users can see not just what we found, but how much to trust it.

Explore ImpactBench flexibly

The Explore page lets you move from aggregate scores down to the underlying evidence: compare models across the three impact domains, drill into specific constructs like emotional dependence or cognitive autonomy, and read the actual multi-turn transcripts behind any verdict.

Explore page showing model comparisons across impact domains

Nutritional labels for AI

ImpactBench generates AI nutrition labels, which are at-a-glance summaries of a model's impact across nine categories, from avoiding harms like hallucination and toxicity to promoting benefits like learning, creativity, and wellbeing. Whether you're a parent evaluating a companion app or an educator choosing a tutoring tool, you can understand and share a model's performance in seconds.

Personalized ImpactBench nutrition label

Request Access Be an Expert Support Benchmarking Efforts Feedback

Request Access

The full benchmark dataset and evaluation API are available to vetted researchers and institutions.

Led by researchers at