RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen
RealGen RealGen RealGen RealGen RealGen

RealGen

Photorealistic Text-to-Image Generation via Detector-Guided Rewards

1Shanghai AI Lab
2Sun Yat-Sen University
3Nanjing University
4CUHK MMLab
5Tsinghua University
6Peking University
* Equal Contribution, † Corresponding Author

Abstract

With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics.

Comparison

Comparison with other methods

Gallery

Gallery image 1
Gallery image 2
Gallery image 3
Gallery image 4
Gallery image 5
Gallery image 6
Gallery image 7
Gallery image 8
Gallery image 9
Gallery image 10
Gallery image 11
Gallery image 12
Gallery image 13
Gallery image 14
Gallery image 15
Gallery image 16
Gallery image 17
Gallery image 18
Gallery image 19
Gallery image 20
Gallery image 21
Gallery image 22
Gallery image 23
Gallery image 24
Gallery image 25
Gallery image 26
Gallery image 27
Gallery image 28
Gallery image 29
Gallery image 30
Gallery image 31
Gallery image 32
Gallery image 33
Gallery image 34
Gallery image 35
Gallery image 36
Gallery image 37
Gallery image 38
Gallery image 39
Gallery image 40

Citation

@article{ye2025realgen,
  title={RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards},
  author={Ye, Junyan and Zhu, Leiqi and Guo, Yuncheng and Jiang, Dongzhi and Huang, Zilong and Zhang, Yifan and Yan, Zhiyuan and Fu, Haohuan and He, Conghui and Li, Weijia},
  journal={arXiv preprint arXiv:2512.00473},
  year={2025}
}