Dev Log - March 2, 2026
Docker image state and expected package version
- Confirmed
infrastructure/docker/Dockerfileis pinned toopenclaw@2026.2.26(RUN npm install -g openclaw@2026.2.26), so the intended container baseline is already set in source. - Confirmed recent infra commits touching Docker startup/runtime behavior were already present before this run:
1f5f1c82(infrastructure/docker/Dockerfile,infrastructure/docker/start.sh)2173e061(infrastructure/docker/start.sh)
- Verified this incident is not caused by a missing version pin in the Dockerfile.
Build and deploy attempts executed
- Ran
./infrastructure/fly/build-and-deploy.sh:- OpenClaw config contract tests passed.
- Local Docker build completed.
- Registry push repeatedly stalled on a single layer retry loop, preventing a clean rollout completion.
- Ran
flyctl deploy --remote-only(from Docker context):- Remote build progressed and compiled
openclaw@2026.2.26. - Export/publish failed with
unauthenticated: Invalid tokenfrom Fly/Depot path.
- Remote build progressed and compiled
- Attempted explicit machine updates to
registry.fly.io/hexly-sandboxes:latest; update calls succeeded, butlateststill resolved to an older runtime for at least one target machine.
Machine-level rollout results (requested IDs)
890490a6d27338:- Verified
openclaw --version=2026.2.26(updated and healthy).
- Verified
78460ddb445428:- Repeated install attempts were unstable due to interrupted/stalled global npm installation behavior.
- After restarts and rechecks, final verified
openclaw --versionremained2026.2.15. - Observed partial package states during retries (missing
openclaw.mjsduring interrupted install windows), which explains transient “cannot run commands” behavior.
Operational diagnosis recorded
- Root blocker is release-path reliability, not source config:
- Local push path: layer retry loops while publishing to Fly registry.
- Remote path: Depot/Fly auth token issue during export.
- Because new image publication is unreliable, machine updates from
:latestcan still pull an older working image, producing mixed OpenClaw versions across instances. - Tool availability symptoms (“cannot fetch websites”, “cannot run commands”, env usage inconsistency) can appear when machines run mixed runtime states or partially-installed CLI binaries.
Managed LLM billing audit and hardening kickoff
- Performed an end-to-end audit of managed LLM call flow and credit deduction timing across:
app/api/llm/v1/chat/completions/route.tslib/flyio-machine-manager.tsinfrastructure/docker/start.shsupabase/migrations/*managed_billing*,*credit_system_overhaul*,*zero_overdraft*
- Documented sequence diagrams and correctness findings in
docs/diagrams/llm-credit-billing-audit-2026-02-25.md. - Confirmed current behavior:
- Cost is measured from provider usage after response (or stream completion).
- Deduction happens post-response via proxy route.
- Current DB policy is zero-overdraft (deductions that would go negative are rejected).
- Identified gaps for managed proxy billing:
- Missing idempotency constraint for
source IN ('proxy','proxy_stream'). - Non-atomic usage insertion + deduction path can leave ledger/balance mismatch on failure.
- Missing idempotency constraint for
- Implemented DB hardening migration
supabase/migrations/20260302_proxy_billing_atomic_idempotency.sql:- Deduplicates historical proxy rows by
(request_id, source). - Adds partial unique index for proxy idempotency.
- Adds
record_proxy_usage_and_deduct_credits(...)RPC to atomically deduct balance and insert usage row in one transaction. - Handles unique-violation races as idempotent duplicates.
- Deduplicates historical proxy rows by
Decision Points
- Temporary BYOK-only UX policy: hide managed LLM controls in the Settings UI while backend hardening is in progress.
- Implemented as a UI gate in
app/settings/page.tsx(MANAGED_LLM_UI_ENABLED = false) so the code path remains in repo but is not accessible in Settings. - Scope intentionally limited to UI visibility (no backend deletion) to allow controlled re-enable after billing/idempotency verification is complete.
Stats
- 2 commits in infra Docker/Fly scope since 2026-03-01 (pre-existing in repo).
- ~101 insertions / ~16 deletions across 3 infra files from those commits.
- Key areas: Docker/OpenClaw pin verification, Fly build/deploy reliability, per-machine version validation.