Why LLMs Ignore Your Non-Functional Requirements (And How to Fix It)
AI coding agents are excellent at building what you ask for. They're terrible at making it fast, secure, accessible, and observable — because non-functional requirements are exactly the kind of cross-cutting, implicit constraint that LLMs handle worst.
# Why LLMs Ignore Your Non-Functional Requirements (And How to Fix It)
*You asked the AI to build a search feature. It built a search feature. It's also slow, inaccessible, unobservable, and doesn't handle errors. You forgot to ask for those parts.*
---
> **Non-functional requirements (NFRs) in AI-generated code** are the quality attributes — performance, security, observability, accessibility, and reliability — that LLMs systematically ignore because they're implicit, cross-cutting, and absent from prompts. NFR coverage actually *decreases* as vibecoded projects grow: from 25% at the prototype stage to just 12% in production codebases, because the total NFR count grows faster than any human can track in prompts.
There are two kinds of requirements in software:
**Functional requirements** describe *what* the system does: "Users can search products by name." "The checkout flow processes payments." "Admins can export reports as CSV."
**Non-functional requirements (NFRs)** describe *how well* the system does it: "Search returns results in under 200ms." "Payment processing retries on transient failures." "Reports are accessible to screen readers." "All API calls are logged with correlation IDs."
LLMs are remarkably good at functional requirements. Describe what you want, and you'll get working code that does the thing.
LLMs are remarkably bad at non-functional requirements. And this isn't a temporary limitation waiting for the next model release. It's a structural mismatch between how NFRs work and how LLMs generate code.
## The Five NFR Categories LLMs Ignore
### 1. Performance
**What you need:** P95 response time under 200ms. Efficient database queries. Pagination on large collections. Connection pooling. Caching for repeated reads.
**What you get:** Functionally correct code that fetches all records, processes them in memory, makes a new database connection per request, and returns the full dataset without pagination.
**Why:** The LLM optimizes for *correctness*, not *efficiency*. `SELECT * FROM products` is correct. It also falls over with 100,000 rows. Performance requires understanding the production data volume, which isn't in the prompt.
### 2. Security
**What you need:** Input validation. Parameterized queries. Auth on all endpoints. Rate limiting. Secret management. CORS restrictions.
**What you get:** Code that trusts all input, uses string interpolation for queries, and skips auth on the "obvious" endpoints. (See our full breakdown of [AI-generated code security vulnerabilities](/blog/posts/ai-generated-code-security).)
**Why:** Security is the absence of vulnerabilities — a negative requirement. LLMs generate what should be *present*, not what should *not* be possible. Secure defaults require explicit constraint injection.
### 3. Observability
**What you need:** Structured logging with request context. Error tracking with stack traces. Metrics for latency, error rates, and throughput. Distributed tracing for multi-service architectures.
**What you get:** `console.log('done')`. Maybe a try-catch that swallows the error silently. No correlation IDs. No metrics. No way to debug issues in production.
**Why:** Observability has zero impact on whether the feature "works" in development. The LLM generates code that passes the functional test. That the code is impossible to debug in production is not a concern the model has any reason to surface.
### 4. Accessibility
**What you need:** WCAG 2.1 AA compliance. Semantic HTML. Keyboard navigation. Screen reader compatibility. Color contrast ratios. Focus management.
**What you get:** A `
SlopBurn reframes agentic software quality as a depth-first roguelike dungeon crawl. Bugs become monsters, tests become weakpoints, and software quality becomes the main loop instead of an afterthought.
We're evolving from a technical product manager to a research company focused on safe vibecoding. Our mission remains the same: help developers build secure, scalable, and reliable software with AI coding agents — from the first line of code.
A new category of freelance work is exploding: fixing apps that AI built and humans shipped. Full disclosure: I'm a former Upwork employee (2022–2024). All observations below are based on publicly available data. Here's what the numbers say about the vibecoding cleanup economy — and why the hardest 20% is where all the money is.
Whether you just shipped an MVP or are still prompting your first feature, your vibecoded app has security gaps. They're not bugs — they're structural omissions baked into how LLMs generate code. Here's how to find them, fix them, and prevent them at every stage of the software engineering lifecycle.
In 2015, Google warned that ML systems were the 'high-interest credit card of technical debt.' A decade later, vibecoding tech debt makes that metaphor quaint. AI-generated code doesn't carry credit card rates — it carries payday lender rates, with terms designed to look cheap until the first payment is due.
Traditional TDD asks developers to write tests before code. Cutline's Red-Green Refactoring mode flips the script — the constraint graph writes the tests for you, turning every feature into a gauntlet of security, performance, and stability checks that the AI must pass.