How Pinecone Fixed the High Cost of Deleting Data

April 20, 20263 min read

TL;DR

Pinecone's janitor system eliminates the hidden overhead of immutable storage, making deletions predictable and stress-free for vector database users.

Every team building systems with immutable storage eventually faces the same expensive reality: data you no longer need becomes a problem you can't stop paying for. At Pinecone, this manifested as a steady stream of files that never disappeared on their own, quietly becoming one of the largest line items in their infrastructure bill. The company calls this the deletion tax—the growing cost of data that can't yet be safely removed despite being technically obsolete.

Pinecone's data plane utilizes immutable blob storage, where every write produces a new file rather than modifying existing ones. This design keeps the write path clean but creates a fundamental coordination problem because writing nodes and reading nodes operate separately. A writer cannot simply purge an old file when it's superseded, since separate reading nodes may still be actively serving live queries from that exact object. This disconnect generates constant waste that accumulates silently in storage systems.

The solution emerged as Janitor, a system designed to identify unreachable objects, verify safety, and delete with full auditability. Janitor treats deletion as three distinct problems, each with its own cadence and failure profile. Normal mode handles everyday accumulation of files superseded by newer writes and no longer reachable from the manifest metadata structure. Orphan mode cleans up files the system doesn't know about at all, like blobs written by services that crashed before committing metadata. Customer deletion mode handles the most operationally sensitive scenario when customers delete indexes or namespaces.

Janitor's workflow follows three phases regardless of mode: identify, verify, and execute. The system first computes a candidate set without deleting anything, treating this as just a hypothesis. It then rechecks this hypothesis against a fresh view of the world to handle propagation lag between different system components. Only after verification does it execute deletions, with every run built to be picked up cleanly if interrupted halfway through. This conservative approach ensures safety while managing scale.

Perhaps the most sophisticated aspect of Janitor's implementation involves its testing strategy. The system's hardest bugs don't appear in single runs but emerge from interactions of repeated writes, rebuilds, partial failures, customer deletes, and scheduled cleanups layered over time. Some scenarios play out over 30-day windows, but waiting a month to test isn't practical. Janitor uses property-based tests with a mock clock that collapses days and months into sub-second tests, generating hundreds of randomized sequences to compare real implementation against simplified reference models.

Before Janitor, deletion was a source of operational anxiety at Pinecone. Now the system runs continuously in the background, cleaning up reliably while generating audit trails that provide clear paths to understanding when unexpected events occur. For customers, this infrastructure remains invisible—queries don't fail because files were deleted too early, accidental index deletes remain recoverable within a 30-day window, and storage costs stay predictable instead of silently climbing. The system ensures teams don't have to think about garbage collection at all.

The approach demonstrates that safe deletion at scale requires layered defenses: second reachability checks before removal, durable checkpoints that make every run inspectable and restartable, and testing strategies that compress weeks of behavior into seconds. While the paper doesn't specify quantitative , it clearly indicates that Janitor has transformed deletion from a problematic cost center into a predictable operational component. The system represents a practical solution to a widespread in distributed systems using immutable storage architectures.

Limitations acknowledged in the paper include the fundamental tradeoff of immutable storage designs—while they reduce race conditions and simplify recovery, they inherently generate waste that requires sophisticated cleanup mechanisms. The system must constantly balance storage costs against reliability concerns, particularly given propagation lag where different system components update at different speeds. Janitor's conservative approach prioritizes safety over aggressive cleanup, meaning some waste may persist longer than theoretically necessary to ensure no active queries are disrupted.