An AI model will scaffold a working login form in 90 seconds. It will pick a reputable library. It will hash passwords with the right algorithm. The form will render, accept credentials, set a cookie, and redirect somewhere sensible. To a non-specialist reading the diff, it will look done.
What it will not do, even with that reputable library, is survive a serious security pass. The list of things consistently missing is short, specific, and almost identical across every AI-built codebase we've audited in the last year. None of these failures are exotic. Each is one OWASP search away. Cumulatively they are days of work, and nobody schedules the work until something goes wrong.
1. Session invalidation on password change
When a user changes their password, every existing session for that user should die. Not the current one — every one. This is the single most consistently missed control we see. The default in most frameworks rotates the current session's CSRF token and calls it a day. Sessions on other devices, other browsers, the stolen laptop the user is changing their password because of — those keep working.
The fix is small and entirely mechanical. Store a session_token_version integer on the user record. Include it in the signed session cookie. Bump it on password change, on email change, on explicit "log out everywhere". Reject any session whose embedded version doesn't match the current one. That's it. It's a one-afternoon change that closes a class of attacks the prototype was wide open to.
2. Rate limiting on login and reset, with the right defaults
A login endpoint with no rate limit is a credential-stuffing target the moment it appears in a list. A reset endpoint with no rate limit is a free email-flood weapon pointed at your transactional sender. AI scaffolding rarely includes either, and when it does the limits are arbitrary numbers picked without reference to what's behind the endpoint.
The interesting question is fail-open versus fail-closed. If the limiter itself is unavailable — Redis is down, the in-memory counter restarted — does the endpoint accept the request or refuse it? The right default depends on what's behind it. For a login endpoint backed by a hardened identity provider with its own brute-force protection, fail-open is reasonable. For a custom reset flow that sends emails through a paid sender, fail-closed is the only safe choice. Pick deliberately. Document the decision next to the limiter.
3. Email enumeration prevention
The reset and signup flows must return the same response shape whether the email exists in the database or not. Same status code, same body, same timing within a reasonable jitter. Everyone says they do this. Almost nobody actually does.
The usual tells: signup says "email already registered", reset says "we sent you a link" only when the account exists and "email not found" when it doesn't, password-change confirmation flows surface different errors based on which step failed. Each of those is a free user-enumeration oracle. The fix isn't subtle — return the same neutral response in every case, send the email asynchronously so timing isn't a leak, and put the "did we send anything" branch entirely on the server side where the attacker can't observe it. The bug is that the engineer writing the form thinks a clearer error message is a UX win. It is, for the legitimate user. It also publishes your customer list.
4. CSRF tokens scoped to the right operations
The framework default is "require a CSRF token on every POST". That's too broad and too narrow at the same time. Too broad because plenty of POSTs are public, idempotent, or already protected by other means and a token check just adds friction. Too narrow because the operations that actually need protection — password change, email change, fund movement, role assignment, anything mutating state in a way an attacker would care about — also live on PATCH, PUT, and DELETE. The middleware defaults don't always cover those uniformly.
What tightens it: enumerate the state-changing operations explicitly, decorate each handler, and treat anything not in the list as a deliberate decision rather than an oversight. Combine token checks with SameSite=Lax on session cookies for the simple cases and SameSite=Strict for the sensitive ones. Audit any endpoint that takes a redirect_to parameter and make sure the CSRF check happens before the redirect, not after. This is unglamorous and tedious. It's also where most of the real CSRF vulnerabilities live.
5. A permission model, once, consistently
The default pattern in an AI-scaffolded codebase is per-route ad-hoc checks. if user.is_admin or user.id == resource.owner_id scattered across thirty view functions. It works on day one. By month six it has rotted: one view checks is_admin, another checks is_staff, a third checks groups.filter(name='admin').exists(), and the difference between them is undocumented because nobody remembers writing it that way.
The right shape is a single permission layer that every protected operation routes through. Whether you call it RBAC, ABAC, a policy module, or just a can(user, action, resource) function — pick one, write it down, and make per-route checks a violation reviewers reject in PRs. The cost is a week of consolidation up front. The benefit is that the next security audit finds the answer to "who can do X" by reading one file instead of grepping for thirty regex variations.
6. Multi-tenant isolation at the database level, not in application code
This is the one that ages worst. The prototype enforces tenant isolation in the ORM: every query filters on tenant_id. Code review catches it. Tests cover it. Then someone refactors, introduces a new query path through a Celery task, forgets the filter, and one tenant's invoices appear in another tenant's dashboard.
There are two defences worth having. Row-level security in Postgres pushes the enforcement into the database: the connection sets a session variable, policies on each table require that variable to match the row's tenant_id, and a query missing the filter returns zero rows instead of the wrong tenant's data. The trade-off is operational: every connection needs the variable set correctly, connection pooling becomes more delicate, and debugging "why does this query return nothing" becomes a new failure mode. The alternative — scoped queries with an explicit tenant_id on every call — keeps the complexity in application code where it's easier to reason about, but relies on discipline that erodes the moment someone new joins the team. Pick one. We default to RLS on anything with more than two tenants, because the operational cost is bounded and the data-leak cost is not.
The work nobody schedules
Each of these is one OWASP search away. None require novel research. A diligent engineer with a checklist could work through them in a week or two. The reason they don't get done is that nobody schedules the work. The login form already exists. It already accepts logins. The product manager wants the next feature. The security review is theoretical until it isn't.
We do this work because we've been the team on call when the review stops being theoretical. There's a particular tone an engineering email takes on at 9pm on a Wednesday when an external researcher has found three of the six items above in the same afternoon, and it's not a tone anyone wants to learn the second time.
This is not glamorous engineering. It is the kind of work that takes a sprint to get right and never gets celebrated when it ships, because nothing visibly changes. It is also the floor below which "production-grade" doesn't really mean anything. If the auth surface hasn't survived a real OWASP pass, the rest of the system's reliability is decoration.


