twobody.ai — AI broke UX research. Can AI fix it?

The Problem Space

Mapping possible approaches

Click any node to explore how the pieces connect

CORE PROBLEM

Both the AI and user are moving targets

APPROACH 1

Scale the N

Compensate for instability with volume

APPROACH 2

Control AI variance

Hold the system constant during testing

APPROACH 3

Rethink what to measure

New metrics for both user and AI

APPROACH 4

Go upstream

Abstract each side, then compare

Synthetic users find the same issues as real users

Passive AI moderators improve some data

We can detect when research is invalidated

Trust calibration follows measurable patterns

AI analysis works for structured questions

Surfacing AI confidence improves user calibration

Hypotheses in Detail

Experiments & collected thoughts

Testing

AI-simulated users can identify the same major usability issues as real users for most routine evaluations.

Scale the N

If true, this would let us run cheap, fast sanity checks before investing in real user research. The interesting question isn't "does it work" but "where does it break down." My guess is it fails on anything involving trust, emotional response, or domain expertise.

Experiments

Study Parallel usability test: 20 real users vs. 20 synthetic users, same tasks →

Tool Synthetic user generator with configurable expertise levels →

Exploring

AI moderators playing passive observer roles—and human moderator removal in specific contexts—can improve data quality.

Scale the N Rethink what to measure

Two related ideas here. First: some participants perform more naturally without a human researcher watching. Second: AI moderators can adopt passive observer roles that humans simply cannot—infinitely patient, never reacting, intervening only when things go off the rails. The question is whether this captures what matters while reducing observer effects. I suspect the answer depends heavily on task type and participant comfort with AI.

Experiments

Framework Passive AI moderator protocol: intervention-only moderation spec →

Study Comparing think-aloud quality: human mod vs. passive AI mod vs. no mod →

Queued

We can detect when an AI product has changed enough that previous research findings no longer apply.

Control AI variance

Right now, research invalidation is vibes-based. Someone notices the product feels different, maybe. What if we could instrument this—track behavioral signatures over time and flag when drift crosses a threshold? Not sure if it's possible, but worth exploring.

Experiments

Tool Chrome extension: logging AI response patterns over time →

Analysis Defining "meaningful drift" in AI behavior →

Queued

User trust calibration follows predictable patterns that can be measured longitudinally.

Rethink what to measure

People start with some mental model of what AI can do. They use it, get surprised (positively or negatively), and adjust. Over time, their expectations stabilize—or don't. If there's a pattern here, we could design for trust calibration instead of just measuring satisfaction at a point in time. The tricky part is that people are often bad at introspecting on their own mental models, so we'd need both self-reported measures and behavioral proxies—what people say they expect vs. how they actually behave.

Experiments

Study Diary study: tracking AI confidence over 30 days →

Framework Behavioral proxies for trust calibration →

Exploring

AI analysis of qualitative data approaches human-level quality for well-structured research questions.

Scale the N Go upstream

"AI can analyze interviews" is too broad. The real question is: for which types of analysis, with what constraints, and how do you know when it's working? My hunch is it's good at finding patterns in explicit statements and bad at interpreting what people didn't say.

Experiments

Tool Quickform: conversational survey builder for rapid research setup →

Study Human vs. AI coding comparison on 50 interview transcripts →

Queued

Surfacing AI confidence signals to users improves their trust calibration.

Rethink what to measure

This connects the "measure the AI" angle with user outcomes. If the AI can tell you when it's uncertain—and you actually expose that to users—does their mental model calibrate better? This is a design intervention that could be tested empirically. The interesting questions: what form should confidence signals take? Do users actually attend to them? Does it help, or just create anxiety?

Experiments

Study A/B test: AI responses with vs. without confidence indicators →

Framework Taxonomy of confidence signal designs →

Body 1: The AI

Body 2: The User

AI broke UX research.
Can AI fix it?

The thing we're trained to do doesn't quite work anymore

Mapping possible approaches

Experiments & collected thoughts

Body 1: The AI

Body 2: The User

AI broke UX research. Can AI fix it?

The thing we're trained to do doesn't quite work anymore

Mapping possible approaches

Experiments & collected thoughts

Follow along

AI broke UX research.
Can AI fix it?