Definition
Self-Refine is a prompting strategy in which one language model generates an answer, then provides written feedback on that answer, then rewrites it using the feedback, repeating until a stopping condition is met. The same model fills all three roles: generator, critic, and reviser. It requires no supervised data, no reinforcement learning, and no separate reward model, which is why it is one of the simplest ways to raise output quality on a self-hosted stack. It was introduced by Madaan et al. and presented at NeurIPS 2023.
The generate-feedback-refine loop
The first pass produces a draft. The model is then prompted to act as a critic and list specific, actionable problems with that draft, for example "the function ignores empty input" rather than a vague "could be better." A third prompt feeds both the draft and the critique back in and asks for a corrected version. Iterations continue for a fixed number of rounds or until the critic reports no further issues. Across tasks such as code generation, math, and dialogue, the authors reported roughly a 20 percent average improvement over single-shot generation.
Strengths and limits
Self-Refine shines where the model can reliably spot its own mistakes, typically when errors are concrete and checkable. It is weaker on factual gaps the model cannot detect from the text alone, since a model that does not know a fact also cannot critique its absence. For those cases an external check or tool is needed.
Self-Refine differs from Reflexion in that refinement happens within a single task rather than across separate trials. It spends additional test-time compute to buy accuracy.
In Simple Terms
Self-Refine is a prompting strategy in which one language model generates an answer, then provides written feedback on that answer, then rewrites it using the…
