Netflix Open-Sources AI for Physics-Based Video Editing

April 20, 20263 min read

TL;DR

VOID is a 5-billion-parameter model that removes objects from video while simulating realistic physical consequences. First open-source tool of its kind.

Netflix has quietly entered the open-source AI arena with a remarkable debut. The streaming giant published VOID (Video Object and Interaction Deletion), its first-ever publicly available AI model, on Hugging Face — and the research community has taken notice. Unlike conventional video inpainting tools that simply erase objects and fill in backgrounds, VOID understands physical causality: remove a person holding a guitar, and the guitar falls to the ground. Remove someone carrying a mug, and the mug drops. It is a capability that no existing video object removal system has convincingly demonstrated.

The research paper, published on arXiv on April 2, 2026, was authored by a six-person team led by Saman Motamed, affiliated with both Netflix and INSAIT Sofia University. Co-authors include William Harvey, Benjamin Klein, Luc Van Gool, Zhuoning Yuan, and Ta-Ying Cheng. The work builds on CogVideoX-Fun-V1.5-5b-InP, a 3D transformer architecture with 5 billion parameters, and introduces a novel technique called "quadmask conditioning" — a four-value masking system that separately encodes the primary object to be removed, overlapping regions, affected areas where physics simulation is needed, and background regions to be preserved.

The technical underpinnings of VOID reflect serious engineering investment. Training was conducted on eight NVIDIA A100 GPUs with 80GB of VRAM each, using DeepSpeed ZeRO Stage 2 optimization. The model was trained on synthetic data from two sources: HUMOTO, a dataset of human-object interactions rendered in Blender with full physics simulation, and Kubric, which captures object-to-object interactions using Google Scanned Objects. The resulting system processes video at 384×672 resolution for up to 197 frames, using BF16 precision with FP8 quantization. Running inference requires a GPU with at least 40GB of VRAM.

What sets VOID apart from established competitors like ProPainter, DiffuEraser, or commercial offerings from Runway and MiniMax is its physics-awareness. Traditional inpainting models treat object removal as a purely visual problem — paint over the object and hallucinate plausible background pixels. VOID treats it as a physical simulation problem, reasoning about what would happen in the scene once a supporting object or agent is removed. This distinction is critical for professional VFX workflows, where physically implausible results require expensive manual correction.

The model has been released under the Apache 2.0 license, making it fully available for both commercial and academic use. Within days of its quiet release, VOID had accumulated 172 likes on Hugging Face and 167 stars on its GitHub repository. An interactive demo is available online, and the reaction on social media has been overwhelmingly enthusiastic, with developers and content creators highlighting the model's physics simulation as a breakthrough moment for open-source video editing tools.

The strategic implications of this release extend well beyond the model itself. Netflix has long been recognized as one of the most sophisticated users of AI in the entertainment industry, deploying machine learning extensively for content recommendation, production optimization, and visual effects. However, the company has historically kept its AI research proprietary. Publishing VOID as a fully open model under a permissive license represents a meaningful shift in posture — one that positions Netflix as a contributor to, rather than merely a consumer of, the broader AI research ecosystem.

For the visual effects and post-production industry, VOID could prove transformative. Object removal is one of the most common and labor-intensive tasks in video editing, and a tool that handles physical consequences automatically could save significant time and cost across productions of all scales. As The Register noted, the model was preferred 64.8 percent of the time in human evaluations, with Runway a distant second at 18.4 percent. As the model is refined and the community builds on its open-source foundation, the gap between AI-assisted editing and manual VFX work may narrow considerably faster than many in the industry anticipated.

---SOURCES---
- VOID: Video Object and Interaction Deletion — arXiv
- VOID Project Page — void-model.github.io
- netflix/void-model — Hugging Face
- Netflix/void-model — GitHub
- Now even Netflix has its own video AI — The Register
- Netflix Releases VOID Video Inpainting Model — Let's Data Science
- VOID Interactive Demo — Hugging Face Spaces