Definition
Reflexion is a framework for improving language agents without retraining their weights. Instead of gradient updates, the agent receives a feedback signal after a failed attempt, writes a short natural-language reflection on what went wrong, and stores that reflection in an episodic memory buffer. On the next trial the reflection is appended to the prompt, steering the agent toward better decisions. It was introduced by Shinn et al. at NeurIPS 2023.
How the loop works
A Reflexion agent runs three roles in a cycle. An actor produces actions; an evaluator scores the trajectory (a scalar reward, a unit-test pass/fail, or free-form critique); and a self-reflection model converts that signal into a concise lesson such as "I assumed the file existed without checking." Because the lesson is text rather than a weight change, it transfers cheaply and stays human-readable. The buffer is bounded, so only the most recent and relevant reflections carry forward.
Why it matters for sovereign builders
Reflexion lets a self-hosted agent learn from trial and error across coding, web-navigation, and decision tasks using only an inference endpoint, no fine-tuning pipeline. That keeps the whole loop runnable on hardware you control. The trade-off is that gains depend entirely on the quality of the evaluator: a noisy or gameable reward produces confident but wrong reflections, so a reliable check, often a separate verifier, is essential.
Reflexion pairs naturally with iterative self-correction in Self-Refine and with the deliberation budget described under test-time compute. For grading the trajectories that drive the reflections, see the verifier model.
In Simple Terms
Reflexion is a framework for improving language agents without retraining their weights. Instead of gradient updates, the agent receives a feedback signal after a failed…
