MGIE—Guiding Instruction-based Image Editing—demonstrates how a multimodal LLM can parse natural-language editing instructions and then drive image transformations accordingly. The project focuses on making edits explainable and controllable: the model interprets text guidance, reasons over image content, and outputs edits aligned with user intent. It’s positioned as an ICLR 2024 Spotlight work, with code and references that show how to connect language planning to concrete image operations. This bridges a gap between free-form prompts and precise edits by letting users describe “what” and “where” in everyday language. The repo includes instructions, examples, and links that situate MGIE within Apple’s broader line of multimodal research. For practitioners, MGIE provides a blueprint for text-to-edit systems that are more semantically grounded than naive prompt-only pipelines.

Features

  • Natural-language instruction parsing for image editing
  • Multimodal reasoning that ties text plans to visual changes
  • Examples and demos aligned with the research paper
  • Fine-grained, region-aware editing behavior
  • Open code for reproducibility and adaptation
  • Basis for controllable, explainable image-editing agents

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow MGIE

MGIE Web Site

You Might Also Like
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of MGIE!

Additional Project Details

Programming Language

Python

Related Categories

Python Large Language Models (LLM)

Registered

2025-10-08