Why LLMs Struggle With Layouts and How Moda Fixes It

April 20, 20263 min read

TL;DR

LLMs can't handle coordinates well. Moda lets AI design visual layouts without relying on complex, math-heavy positioning systems.

AI excels at generating code because web development has clear abstractions like Flexbox and grid, where relationships and relative sizes define layouts. Visual design lacks an equivalent standard, forcing AI tools to grapple with formats like PowerPoint's 40-year-old XML spec, which is packed with verbose, absolute XY coordinates. Large language models are notoriously poor at reasoning about these numerical coordinates, leading to designs that look generic and uninspired. This fundamental mismatch between how LLMs think and how traditional design software represents layouts creates a significant barrier to producing AI-generated designs that are both technically correct and visually appealing.

Moda, an AI-native design platform for non-designers, developed a solution by creating a context representation layer that gives its AI agents a cleaner, more compact view of the canvas. Instead of feeding raw coordinate-heavy data, the system provides layout abstractions that LLMs can reason about effectively, similar to the principles that make them good at web development. This approach reduces token costs and improves output quality by aligning the agent's input with its natural strengths in understanding structure and relationships. As co-founder Ravi Parikh notes, LLMs are not good at math, so relying on XY coordinates is a poor way for them to describe where elements should go.

The platform's architecture centers on a multi-agent system built with Deep Agents, using LangSmith for observability to enable rapid iteration. Every user request passes through a lightweight triage node that classifies the output format and pre-loads relevant skills, which are Markdown documents containing design best practices and guidelines. These skills are injected dynamically, with prompt caching breakpoints to maintain efficiency while allowing context-specific adjustments. The main agent operates with 12 to 15 core tools always in context, plus an additional 30 available on demand for specialized tasks, balancing token usage against flexibility.

For complex projects like 20-slide decks, Moda dynamically manages context by providing the agent with a higher-level summary and letting it pull in details as needed through tooling. This keeps token usage bounded without sacrificing the agent's ability to make informed design decisions across multi-page layouts. LangSmith's cost tracking per node was instrumental in finding the right balance between context richness and efficiency, allowing the team to optimize performance based on real-time data. The system's design ensures that most requests avoid unnecessary tool activation, keeping operations lean and responsive.

A key product choice is the interaction model: rather than a generate-and-replace flow, Moda's AI works directly on a fully editable 2D vector canvas where every element is immediately selectable and modifiable by the user. This shifts the dynamic from a binary accept-or-reject scenario to a collaborative process where the AI provides a solid starting point and the user refines it. The always-present, context-aware sidebar supports iterative back-and-forth, reducing the intimidation of a blank canvas while maintaining user control over the final design. This approach has resonated particularly with B2B companies in enterprise sales who need polished, brand-accurate materials quickly.

Moda does not yet run formal evaluations, but LangSmith traces serve as the primary feedback loop for catching regressions and validating improvements during development. The team uses these traces extensively to iterate on prompts, tool sets, and context representations, accelerating their workflow. Early product-market fit has been strong, with users benefiting from a professional starting point that they can actively refine rather than a static output. Future plans include completing the migration of the Design Agent to Deep Agents, expanding the brand context system for enterprise customers, and implementing memory primitives already in place.

The platform's reliance on Deep Agents and LangSmith highlights the importance of observability in building production-grade AI systems. By providing full visibility into every execution, these tools enable rapid debugging and optimization, which is critical for handling complex, multi-turn tasks in a design context. This infrastructure allows Moda to maintain high-quality outputs while scaling to meet the needs of diverse users, from marketers to small business owners. The approach demonstrates how targeted engineering can overcome inherent limitations in LLMs when applied to creative domains.

Limitations include the proprietary nature of the context representation layer, though the general principle is shared, and the ongoing evaluation of migrating the Design Agent from a custom LangGraph loop to Deep Agents. The team acknowledges that formal evals are on the roadmap, indicating a focus on iterative improvement based on trace data. These constraints reflect s of balancing innovation with practical deployment in a fast-evolving AI landscape, where user needs and technical capabilities must align to deliver reliable, high-quality design assistance.