Definition
KTO (Kahneman-Tversky Optimization) is a method for aligning large language models that learns from a single binary label per response — "good" or "bad" — rather than from paired comparisons. Introduced by Ethayarajh and colleagues in 2024, it borrows the prospect theory of psychologists Daniel Kahneman and Amos Tversky, which models how humans actually weigh gains and losses (notably, we are loss-averse). KTO maximizes a Kahneman-Tversky utility of model generations rather than the log-likelihood of a preferred answer.
Why binary labels matter
Most direct alignment methods need pairwise data: for one prompt, a chosen response and a rejected response. Collecting that is expensive and slow. KTO only needs to know whether a given output was desirable, which is far cheaper to gather and matches how feedback often arrives in the wild — a thumbs-up or thumbs-down. The original work reports KTO matching or beating preference-based methods at scales from 1B to 30B parameters.
Why a sovereign builder cares
For anyone fine-tuning a model on self-hosted hardware, KTO lowers the data barrier to alignment. You can curate your own "keep / discard" signal from real usage without running a costly annotation pipeline, keeping the whole loop on machines you control. That fits the broader pattern of moving model training off rented cloud platforms and onto owned infrastructure.
KTO sits alongside other reference-light approaches in the post-training toolkit. For the paired-comparison baseline it improves on, see ORPO (Odds Ratio Preference Optimization), and for how labeled examples are organized see preference dataset.
In Simple Terms
KTO (Kahneman-Tversky Optimization) is a method for aligning large language models that learns from a single binary label per response — “good” or “bad” —…
