TL;DR:
Instruct-GS2GS enables instruction-based editing of Gaussian Splatting scenes via a 2D diffusion model
Overview
We propose a method for editing 3D Gaussian Splatting (3DGS) scenes with text-instructions. Given a 3DGS scene of a scene and the
collection of images used to reconstruct it, our method uses an image-conditioned diffusion model (InstructPix2Pix) to
iteratively edit the input images while optimizing the underlying scene, resulting in an optimized 3D
scene that respects the edit instruction. We demonstrate that our proposed method is able to edit large-scale,
real-world scenes, and is able to accomplish realistic and targeted edits.
How it works
Our method gradually updates a reconstructed 3DGS scene by iteratively updating the dataset images while
training the 3DGS:
Images are rendered from the scene at training viewpoints.
They get edited by InstructPix2Pix given a global text instruction.
The training dataset images are replaced with the edited images.
The 3DGS continues training as usual for 2.5k iterations.
Results
Training Progression
Timelapse training with Instruct-GS2GS and Instruct-NeRF2NeRF. We note that Instruct-GS2GS performs edits much faster.
Citation
If you use this work or find it helpful, please consider citing: (bibtex)
@misc{igs2gs,
author = {Vachha, Cyrus and Haque, Ayaan},
title = {Instruct-GS2GS: Editing 3D Gaussian Splats with Instructions},
year = {2024},
url = {https://instruct-gs2gs.github.io/}
}
Acknowledgments
We thank our instructors Alexei A. Efros and Angjoo Kanazawa for their support. This work was completed as a course project for CS180/280A. We would also like to thank the Nerfstudio and gsplat team for providing the 3D Gaussian Splatting implementation. We thank the authors of Instruct-NeRF2NeRF for their paper and website format.