Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
-
MaskGIT: Masked Generative Image Transformer [CVPR 2022]
-
Muse: Text-To-Image Generation via Masked Generative Transformers [ICML 2023]
-
[🌟]Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis [ICLR 2025]
-
Bag of Design Choices for Inference of High-Resolution Masked Generative Transformer
-
Di[𝙼]O: Distilling Masked Diffusion Models into One-step Generator [ICCV 2025]
-
[🌟]Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model [ICLR 2026]
-
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer [ICCV 2025]
-
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
-
Lavida-O: Elastic Large Masked Diffusion Models for Unified Multimodal Understanding and Generation
-
[🌟]Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding
-
Token Painter: Training-Free Text-Guided Image Inpainting via Mask Autoregressive Models
-
TR2-D2: Tree Search Guided Trajectory-Aware Fine-Tuning for Discrete Diffusion
-
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
-
Diffuse Everything: Multimodal Diffusion Models on Arbitrary State Spaces [ICML 2025]
-
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy [NeurIPS 2025]
-
[🌟]From Masks to Worlds: A Hitchhiker's Guide to World Models
-
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
-
Accelerating Inference of Masked Image Generators via Reinforcement Learning
-
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models
-
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
-
MaskFocus: Focusing Policy Optimization on Critical Steps for Masked Image Generation
-
Co-GRPO: Co-Optimized Group Relative Policy Optimization for Masked Diffusion Model
-
More papers are coming soon! See MeissonFlow Research (Organization Card) for more about our vision.
Meissonic is a non-autoregressive mask image modeling text-to-image synthesis model that can generate high-resolution images. It is designed to run on consumer graphics cards.
Key Features:
- 🖼️ High-resolution image generation (up to 1024x1024)
- 💻 Designed to run on consumer GPUs
- 🎨 Versatile applications: text-to-image, image-to-image
git clone https://github.com/viiika/Meissonic/
cd Meissonicconda create --name meissonic python
conda activate meissonic
pip install -r requirements.txtgit clone https://github.com/huggingface/diffusers.git
cd diffusers
pip install -e .python app.pypython inference.py --prompt "Your creative prompt here"python inpaint.py --mode inpaint --input_image path/to/image.jpg
python inpaint.py --mode outpaint --input_image path/to/image.jpgOptimize performance with FP8 quantization:
Requirements:
- CUDA 12.4
- PyTorch 2.4.1
- TorchAO
Note: Windows users install TorchAO using
pip install --pre torchao --index-url https://download.pytorch.org/whl/nightly/cpuCommand-line inference
python inference_fp8.py --quantization fp8Gradio for FP8 (Select Quantization Method in Advanced settings)
python app_fp8.py| Precision (Steps=64, Resolution=1024x1024) | Batch Size=1 (Avg. Time) | Memory Usage |
|---|---|---|
| FP32 | 13.32s | 12GB |
| FP16 | 12.35s | 9.5GB |
| FP8 | 12.93s | 8.7GB |
To train Meissonic, follow these steps:
-
Install dependencies:
cd train pip install -r requirements.txt -
Download the Meissonic base model from Hugging Face.
-
Prepare your dataset:
- Use the sample dataset: MeissonFlow/splash
- Or prepare your own dataset and dataset class following the format in line 100 in dataset_utils.py and line 656-680 in train_meissonic.py
- Modify train.sh with your dataset path
-
Start training:
bash train/train.sh
Note: For custom datasets, you'll likely need to implement your own dataset class.
If you find this work helpful, please consider citing:
@article{bai2024meissonic,
title={Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis},
author={Bai, Jinbin and Ye, Tian and Chow, Wei and Song, Enxin and Chen, Qing-Guo and Li, Xiangtai and Dong, Zhen and Zhu, Lei and Yan, Shuicheng},
journal={arXiv preprint arXiv:2410.08261},
year={2024}
}We thank the community and contributors for their invaluable support in developing Meissonic. We thank apolinario@multimodal.art for making Meissonic Demo. We thank @NewGenAI and @飛鷹しずか@自称文系プログラマの勉強 for making YouTube tutorials. We thank @pprp for making fp8 and int4 quantization. We thank @camenduru for making jupyter tutorial. We thank @chenxwh for making Replicate demo and api. We thank Collov Labs for reproducing Monetico. We thank Shitong et al. for identifying effective design choices for enhancing visual quality.
Made with ❤️ by the MeissonFlow Research




