Docdeploymill://docs/troubleshooting

Troubleshooting & getting help

When something goes wrong, work it in this order: read the structured error, read the logs, read the relevant guide, then escalate. This page covers the common failures and how to recover from each.

Start with the error code, not the message

Every tool returns structured output. Failures carry a machine-readable code you can branch on, so don't parse the human message. That code arrives on one of two channels — check both:

Error result (most tools). The call fails as an MCP error whose body is { "error": <message>, "code": <machine-code> }. Branch on code (e.g. not_authorized, not_found, upgrade_required, cap_reached, internal_error).
In-band result (tools that model expected failure as data). The call succeeds at the transport level and returns a discriminated object: { ok: true, ... } on success or { ok: false, errorCode, message, ... } on an expected, recoverable failure. Branch on ok first, then on errorCode. Object storage, backups, database tools, previews, and set_app_protection use this shape. (One legacy tool, attach_domain, still spells the in-band fields code/reason instead of errorCode/message — read those on that tool specifically.)

Multi-step tools (start_project, import_repo) are in-band and return { ok: false, failedAt, partial } so you can resume from the exact step that failed rather than starting over.

A deploy failed (`status: "error"`)

The build or deploy didn't complete. Read the build log:

get_logs({ applicationId, source: "build" })

This is the output of building and shipping the image: compile errors, failed installs, a bad Dockerfile. Fix in the repo, then push_files with deploy: { applicationId } (or call deploy) to ship the fix. See deploymill://guides/logs for the full failed-deploy loop and log filtering (tail / grep / level / since).

The deploy succeeded but the app misbehaves

The image is running but the app is wrong at runtime. Read the container's own stdout/stderr:

get_logs({ applicationId, source: "runtime" })

(Runtime logs are read straight from the compute backend. If the app isn't running you get an empty result with a note instead of an error.)

The deploy says `done` but my new code isn't live (`rolledOver: false`)

A rebuild can produce a byte-identical image: when every build layer is a Docker cache hit, the new image has the same digest as the last one, so the platform never recreates the container and your latest commit never goes live. deploy detects this and returns rolledOver: false with a rolloverNote. A healthy edge probe here is the old container answering, so don't read done as proof the new code shipped.

To fix it, force a clean rebuild that ignores the build cache:

deploy({ applicationId, noCache: true })

This rebuilds every layer from scratch (slower), so a real source change always produces a new image. Confirm rolledOver: true (or a changed imageDigest) on the response. DeployMill restores fast cached builds automatically afterward. If it still comes back rolledOver: false, the source the platform built didn't actually change, so check that your commit was pushed to the deployed branch.

The health gate is failing / auto-rollback fired

Deploy, rollback, and auto-rollback all key off one health endpoint (default /healthz) that must return 200 iff fully healthy. Anything else (non-200, connection error, timeout) counts as unhealthy. If deploys keep rolling back, your health handler is the first suspect: make sure it actually returns 200 once the app is ready (DB reachable, migrations run). Full contract in deploymill://guides/health.

Common error codes

Code	Meaning	What to do
`domain_verification_required`	A custom domain's ownership isn't proven yet	Publish the DNS TXT record the error carries (`txtRecord`), then retry
`dns_not_pointed`	A custom domain's DNS isn't pointed at the ingress yet	Create the CNAME/A record the error reports (`expected.value`), unproxied, then retry
`invalid_hostname` / `reserved_hostname`	The host is malformed or reserved	Choose a different host
`host_claimed`	Another org already verified that host	Use a host you control
`active_app_limit_reached`	Your org is at its app quota	Delete an unused app or check `get_account`
`preview_app_limit_reached`	Too many active previews at once (your preview allowance scales with your app limit)	Let previews expire or delete some
`storage_limit_reached`	A volume would exceed the storage quota	Lower the requested size or free space

Call get_account for a read-only view of your quota headroom before a workflow.

DNS and custom domains

Attaching a domain you own takes two DNS records, and the tools hand you the exact value for each, so you don't have to look them up:

Prove you own it. The first attach returns a domain_verification_required error carrying a TXT record (_deploymill-challenge.<host> = a token). Publish it, then retry. One-time per host. You can remove it after the attach succeeds.
Point it at the ingress. Once ownership is proven, the attach checks the host resolves to DeployMill. If not, a dns_not_pointed error reports the target. CNAME a subdomain at it, or A-record an apex at its IP. The record must be DNS-only / unproxied (no Cloudflare orange cloud / CDN in front), or the automatic Let's Encrypt cert can't issue.

DeployMill issues the TLS certificate for you, and you never supply one. Issuance isn't instant, so confirm https://<host> actually serves before assuming it's done. Declare the host under domains.custom in .deploymill/project.json and run reconcile_project (rather than a one-off attach_domain) so it survives future reconciles. Full playbook: deploymill://guides/domains.

When the docs run out

If you've read the structured error, the logs, and the relevant guide and you're still stuck, escalate. Use the support avenue surfaced in your account, and include the failing tool, the code, and the applicationId. That context is what lets a request be answered quickly. Never paste secrets or connection strings into a support request.