Sasha Studio — TODO
Single authoritative task list. References other docs (e.g. QA session actions) for full context.
Desired End State
Chat pipeline — the load-bearing surface
The chat is the most-used surface and historically the most fragile. The end-state we're building toward, anchored on five product-quality goals:
| # | Goal | What "shipped" looks like |
|---|---|---|
| 1 | Stable session | Reconnect, server restart, and compaction all preserve session state. No drops, no duplicates. |
| 2 | Don't lose messages we ought to surface | Anything Anthropic's CLI persists to JSONL is replayable to the client. CI gates regressions. |
| 3 | Reduce noisy messages | [Message format not recognized: ...] never reaches a user. Status events stay out of the transcript. Typed envelope schema with signal: 'message' | 'status' | 'silent'. |
| 4 | Stay consistent with stdout/stderr | Live-stream and replay produce semantically equivalent client state. Server-side normaliser, not three normalisers in three places. |
| 5 | Debuggable | Every chat-subscribe outcome greppable in server logs ([chat-subscribe] prefix, shipped). A1 debug pane for live wire-frame inspection. CI catches regressions before merge. |
When all five are met:
REPLAY_BACKEND=jsonlis the default and only backend (B.1 ring deleted).- The client's
useChatReplaySubscribehook is the only history-load path (the five legacyapi.sessionMessagesrefetch sites are deleted). - A typed envelope schema exists in
server/services/normaliser/and the client never sees an unclassified envelope. - A vitest E2E suite spins up the server with
REPLAY_BACKEND=jsonland asserts the protocol; an A2 replay harness asserts the renderer matches expected DOM for known fixtures. - A Playwright suite covers login + send-message + reload golden flows.
Operational health
- All deployed instances (CJK, sasha1/HireBest, control panel) audited for unused scheduled agents; nothing burning tokens silently.
- File tree handles thousands of files without freezing.
- MCP secret redaction in failure logs across all services.
Documentation hygiene
docs-developer/doesn't contain stale chat-pipeline docs that would mislead future agents/engineers.CLAUDE.mdsession-management section reflects current architecture (no longer says "ring", no longer references B.1).
Chat Pipeline Program
Sequenced to end state. Phases are roughly ordered by risk and dependency.
Phase A — Y server-side replay (in flight)
Y replaces B.1's in-memory ring with a JSONL-tail backend. Server work is shipped behind REPLAY_BACKEND=jsonl.
A3. Soak Y as default for 3-7 days — Replaced by A2c synthetic stress test
Status: Reframed 2026-05-09. The original soak plan needed a non-prod environment with sustained traffic + compactions firing + concurrent clients — none of which exists in this stack. A multi-day passive soak with no traffic produces no signal. Replaced by the deterministic stress suite at claudecodeui/server/__tests__/stress/chat-subscribe-jsonl-stress.test.js which exercises the same failure modes in CI in ~600ms (concurrent subscribes, sustained append+reconnect, compaction race, large-file perf, partial-line safety). The "default jsonl" flip is now bundled with B.1 (the only thing that produces real chat-subscribe traffic anyway).
A4. Y Phase 3 — delete ring code — Priority: MEDIUM
Target: After B.1 ships (jsonl is the only path)
From: Y plan Task 9.
Delete event-ring*.js, related tests, messageStreamHandler.js ring tap (commit 6a5ea125), markCompleted hooks (commit 847d1352). Rename eventRingHandler.js → chatSubscribeHandler.js. Net: ~600 lines deleted.
Phase B — Client cutover (B.2)
The single biggest in-flight gap: server is ready, client never subscribes. Until B.2 ships, REPLAY_BACKEND=jsonl is dormant.
B1. Implement useChatReplaySubscribe hook — Priority: HIGH
Target: Next session — A is now stable (A2a fix shipped, stress test green at 72/72).
From: docs/superpowers/plans/2026-05-06-client-cutover-b2-redesign.md.
Per the redesigned B.2 plan: hook owns the cursor outside React state (chatReplayCursorStore), routes replay events through the existing message_streamed dispatcher (not a parallel renderer — the first attempt failed by going parallel), handles cursor-expired by triggering a single api.sessionMessages refetch.
B2. Delete the five legacy refetch paths — Priority: HIGH
Target: Same PR as B1
From: B.2 plan.
Remove useProjectWebSocketV2.js:663-737, ChatInterface.jsx:5273-5444 and :5533-5547, App.jsx:1304-1356, plus the historyLoadedRef/__pendingHistoryReload state. Replace LOAD_SESSION_MESSAGES reducer's content-merge with REPLACE + UUID dedup.
B3. Verify "no drops on reconnect" via Layer 3 + Layer 5 E2E — Priority: HIGH
Target: Same PR as B1/B2
Extend the WS E2E (A1) to cover the client-driven flow now that there's a client. Add a Playwright golden-flow test: send message → reload page → assert history intact.
Phase C — Noise classification (B.3)
The only piece that addresses goal 3. Not started today; this is what the user is seeing in [Message format not recognized: type, rate_limit_info, uuid…].
C1. Server-side typed-envelope normaliser — Priority: HIGH
Target: After Phase B
From: B spec signal-classification section.
New server/services/normaliser/ that produces typed envelopes with explicit signal: 'message' | 'status' | 'silent'. Single source of truth — replaces messageNormalizer.js, the ChatInterface.jsx shape branches, and useProjectWebSocketV2.js's envelope handling. The client trusts the signal and renders accordingly. Schema is checked in.
C2. Processing-indicator UI (Surface 1 + Surface 2) — Priority: MEDIUM
Target: After C1
From: B spec.
Two surfaces:
- Surface 1 (inline): in-transcript marker for
signal: 'status'events that are interesting (e.g., "thinking", "tool call started"). - Surface 2 (attached): persistent "Sasha is working…" indicator near the input, driven by aggregate status state.
Replaces ad-hoc tool-execution-tracking UI; gives users a stable mental model for "the system is doing something" without polluting the transcript.
C3. Drop the [Message format not recognized: ...] placeholder entirely — Priority: HIGH
Target: Same PR as C1
Once C1 ships, every envelope has a recognised signal. The fallback placeholder at messageNormalizer.js:170 becomes dead code; remove it. Any envelope that the typed normaliser can't classify is logged server-side and dropped client-side (silent).
Phase D — Replay test harness (A2) for renderer
Closes the rendering coverage gap. Most useful after B.2 ships because before then the client doesn't drive subscribe events.
D1. Build the replay-test harness — Priority: HIGH (after B)
From: docs/superpowers/specs/2026-05-02-chat-replay-testing-design.md.
Captures real ~/.claude/projects/**/*.jsonl session files, runs them through the server in replay mode, records WS frames, replays them through the React tree in jsdom, asserts: no drops (D1/D2/E completeness invariants), no out-of-order rendering (F invariant). Hand-written tests can't keep up with envelope-shape variance — this is the right tool for ChatInterface.jsx and the renderer.
D2. Set per-area coverage thresholds in CI — Priority: MEDIUM
Target: After D1 raises baseline.
Add thresholds to vitest.config.ts for the chat-pipeline modules. CI blocks PRs that drop coverage in server/services/jsonl*.js, server/services/normaliser/*, src/hooks/useChatReplaySubscribe.js, src/utils/messageNormalizer.js (or its successor).
Phase E — Playwright golden flows
E1. Scaffold Playwright + login test — Priority: MEDIUM
Target: Anytime; cheap
From: PRD §8 Layer 5.
claudecodeui/e2e/ directory, Playwright config, single test that logs in and asserts the project list renders. Establishes the framework. No Y-specific path required yet.
E2. Send-and-reload golden flow — Priority: MEDIUM
Target: After B ships
From: PRD §8 Layer 5.
Login → open existing session → send a message → assert it streams in → reload page → assert history intact. Directly tests goal 1 (stable session, no drops) end-to-end.
E3. Debug-pane assertion test — Priority: LOW
Target: After C ships
Use Playwright to read A1 debug-pane DOM, assert specific event types appear / don't appear. Catches noise-class regressions (goal 3) automatically.
Tracking
- Branch:
feature/jsonl-tail-replay-y(Y Phase 0 + today's validation; not yet pushed/merged) - Test strategy:
docs/prd/16-testing-strategy.md - Y spec:
docs/superpowers/specs/2026-05-06-replay-via-jsonl-tail-design.md - Y plan:
docs/superpowers/plans/2026-05-06-jsonl-tail-replay-y.md - B.2 plan:
docs/superpowers/plans/2026-05-06-client-cutover-b2-redesign.md - B spec (origin of B.3 signal classification):
docs/superpowers/specs/2026-05-05-chat-event-replay-ring-design.md
Operations
ECS rolling deploys hit SQLite lock contention — Priority: MEDIUM
From: CJK deploy attempt 2026-05-10 — new task failed ALB health check, ECS auto-rolled back. Retry succeeded but took ~5 min (vs typical ~2 min) because startup blocked on locks.
What happens: ECS service is configured strategy: ROLLING, maximumPercent: 200, minimumHealthyPercent: 100. New task starts BEFORE old task stops — so for ~30-60s, two containers concurrently write to the same /app/data/sasha.db on EFS. SQLite locking over NFS is unreliable; the new container's startup retention sweeps + cloud-drive init issued write transactions that ended up in busy-waits, blocking the event loop for ~40s, missing ALB health checks.
Important context: WAL mode is intentionally disabled in database/db.js because it requires POSIX file locking guarantees that NFS/EFS doesn't provide — would risk DB corruption. busy_timeout = 10000 is already set. So the standard "enable WAL + busy_timeout" recipe is OFF the table here. Fixes have to keep DELETE journal mode and either defer or eliminate startup writes.
Fixes shipped 2026-05-10:
services/{activityRetention,qualityRetention,fileRetention}.jsdefer their initial sweep by 60s on startup (configurable via*_INITIAL_DELAY_MSenv). By the time the sweep fires, the old container has drained.
Remaining gaps:
cloudDriveManager.initialize()(called insideserver.listencallback) also does DB writes during startup AND throws aCHECK constraint failed: health_status IN ('unmounted','mounting','mounted','degraded','remounting')error — separate bug, schema mismatch in remote-config sync code path.- Other startup writers we haven't audited:
markStaleExecutions('meeting'),setupProjectsWatcher, scheduler init. - Strategic fix: switch to
minimumHealthyPercent: 0, maximumPercent: 100deployment config so old stops before new starts. Trade-off: ~30s outage during deploys vs current intermittent failures. Worth doing once we audit all startup DB callers. - Possibly lower
busy_timeoutfrom 10000 to 3000 so any residual contention bounds the event-loop pin at 3s/statement (currently 10s). With the retention defer fix this is belt-and-suspenders — only relevant if another startup DB-writer slips through.
Data Integrity
Investigate systemic conversation transcript loss — Priority: HIGH
From: CJK Associates production audit, 2026-05-09 — discovered after deploying the C0 + B.2 Phase 1 image and clicking conversations that returned empty.
Target: Once [SESSION-STUB] log frequency tells us how widespread this is.
The phenomenon. On CJK, multiple conversations show in the file tree (project metadata + summary intact) but their session JSONL transcript is missing from disk. When the user clicks them, chat shows "No messages in this session yet". Server logs show messageCount: 0, hasMessages: false from getSessionMessages, and [SERVER-GET-MESSAGES] No JSONL files found, returning empty array.
What we know from the on-disk audit:
- Conversations affected on CJK at the time of the audit:
background,internal-understanding,setup,bromcom,bubble project,agentic-prompt-develop…,Help(~half the file tree). - The audit-hook log (
/home/sasha/all-project-files/audit/sessions/<sessionId>.jsonl) survives for these sessions and explicitly recordstranscript_path: /home/sasha/.claude/projects/-home-sasha-projects-<name>-conversation-001/<sessionId>.jsonl— i.e. Claude knew where it intended to write at the time. That path is missing today. /home/sasha/.claude/projects/directory entries are mostly mtime April 10, 2026 — a sharp cluster suggesting a mass-rebuild or migration on that date. The earliest surviving entry is Feb 19; conversations created Feb 25 onward (without a corresponding April 10 mtime) are gone.- Sessions that DO work (e.g.
test2/c43d8d7f) have intact JSONL at the canonical path. The split is binary — either the dir exists with content, or it's missing entirely.
Hypotheses (none confirmed):
- April 10 EFS sync/migration pruned older session dirs.
- Conversations created during a window when Claude Code's session-write was failing silently (e.g. auth issues at the time — local dev today still shows
Failed to authenticate. API Error: 401for old conversations). - A cleanup job we don't know about is purging on age/size criteria.
What's already in place to quantify it (this same session):
server/projects.js#detectStubState— emits structured[SESSION-STUB]warning logs whenever agetSessionMessagesreturns empty for a session whose conversation metadata indicates it was attempted.GET /api/projects/:projectName/sessions/:sessionId/messagesnow returns{ messages, stub }. Thestubpayload includeslastSessionId,sessionsRecorded,summary,auditLogPath,metadataLastUpdate— useful diagnostic shape for forensics.ChatInterface.jsxrenders a different empty-state ("Transcript not found") whenstubis non-null, so users see something honest instead of "Start a conversation" on top of a broken session.
Next steps (after baseline log frequency is in):
- Watch CloudWatch
/ecs/sashafor[SESSION-STUB]lines for a week. Count: by project, bymetadataLastUpdatewindow, by audit-log presence. - Check AWS Backup / EFS lifecycle policies (IAM
planb-opslacks the read perms; needs admin or web console access). If EFS snapshots cover April 9, restore selectively. - If patterns concentrate around a specific deploy or date range: bisect the deploy for a delete/migration step that pruned the canonical dirs.
- Add a "delete conversation" button to the stub UI once we understand the cause — letting users remove orphaned tree entries cleanly.
Test Infrastructure
Vitest parallel workers race on db.js SQLite probe file
From: A1 E2E suite work, 2026-05-09
Target: When the next person trips over a flaky CI suite import error
Priority: LOW
Multiple test suites that transitively import server/database/db.js (via conversationManager.js, sessionMetrics.js, etc.) write a probe file under data/.probe-<ts>.tmp at module-load time, then unlink it. Under vitest's default parallel pool, the unlink can ENOENT because a sibling worker's cleanup removed data/ first. Workaround: run the affected suites with --maxWorkers=1. Real fix: make db.js skip the writability probe when NODE_ENV === 'test' or when imported into a process that hasn't actually opened the database, OR use a per-worker tmp path for the probe.
Performance
File tree panel slow with large project trees
From: QA Session 1 §1.1 — qa/session-01-2026-04-17-actions.md
Target: Next sprint
Priority: HIGH
File tree reads from filesystem on every render. With scoring agent output producing thousands of files, UI becomes unusable. Profile rendering, add caching layer, consider lazy loading for deep directories.
Operations
Disable unused scheduled agents on deployed instances
From: QA Session 1 §2.1 — qa/session-01-2026-04-17-actions.md
Target: Immediate
Priority: HIGH
Demo/example scheduled agents still running on deployed instances, burning Anthropic tokens daily with no useful output. Audit all deployed instances (CJK Associates, sasha1/HireBest), disable anything not actively needed.
UI
Clean up dead NewProjectModal code after New Project buttons removal
From: UI request 2026-05-12 — New Folder button replaces New Project; New Project buttons removed in commit on feat/new-folder-single-hit
Target: Follow-up once we're confident no other entry point opens NewProjectModal
Priority: LOW
The three "New Project" buttons (desktop toolbar, mobile toolbar, collapsed rail) were removed in favor of "New Folder" doing everything. NewProjectModal.jsx, the showNewProject state, the two <NewProjectModal> JSX instances in Sidebar.jsx, and the handleProjectCreated callback (lines ~2365-2402) are now orphaned. Delete them once any indirect entry points (e.g. folder context menu "Create project in folder" still uses window.prompt, not the modal — confirmed in the inventory) are confirmed gone for good. Watch for cross-references: addProjectPlaceholder (used by handleProjectCreated) is also used by NewFolderDialog's onBundleCreated — keep it.
Replace window.prompt in folder-menu "Create project in folder"
From: Brainstorm 2026-05-12 — folder-creation rationalisation (docs/superpowers/specs/2026-05-12-new-folder-single-hit-design.md)
Target: Follow-up after the single-hit folder feature ships
Priority: LOW
The sidebar folder context menu's "Create project in folder" action (Sidebar.jsx:2058-2080) still uses window.prompt(...) for the project name. After the single-hit work ships, the same UI primitives (small dialog component, consistent error handling) can replace this prompt. Out of scope for the single-hit feature because the intent there is folder-first, not project-in-existing-folder.
Auto-rename auto-created project from first-message title
From: Brainstorm 2026-05-12 — "single-hit folder + chat" design (docs/superpowers/specs/2026-05-12-new-folder-single-hit-design.md)
Target: Follow-up after the single-hit folder feature ships
Priority: LOW
When New Folder auto-creates a project named chat 1, the conversation underneath gets a meaningful auto-title from generateSessionTitle() on the first user message (server/projects.js:923-945), but the project keeps the placeholder name. This produces redundant nesting in the sidebar: folder ST LS GTM → project chat 1 → conversation Help me draft the GTM plan. Auto-rename the project at the same point the session title is generated by calling renameProject() (server/projects.js:1818). Note: renameProject moves the project's working directory, so this must fire after the first session is established but before significant filesystem state accumulates — handle the race with the in-flight session carefully.
Documentation
Tidy stale chat-pipeline tech docs as part of B rollout
From: B spec (docs/superpowers/specs/2026-05-05-chat-event-replay-ring-design.md) — chat architecture is being rewritten in 3 phases; many existing docs describe the old behaviour and will mislead future agents and engineers
Target: Sweep alongside each B phase ship — don't let the doc debt accumulate to the end
Priority: MEDIUM
The B work (event replay ring + cursor protocol + signal classification) invalidates a lot of docs-developer/ content. Auto-generated mirrors under docs-developer/html/ and docs-developer/html-static/ regenerate from source, so focus only on source docs.
Sweep approach — for each impacted doc, decide one of: KEEP (mark as historical with a one-line note at top), REWRITE (update to match new design), DELETE (move to docs-developer/archive/). Default to DELETE for anything older than ~6 months that documents implementation details rather than rationale.
Likely needs REWRITE (reflect new architecture):
docs-developer/overview/chat-and-streaming-architecture.mddocs-developer/overview/architecture-current.mddocs-developer/overview/technical-architecture/message-handling-architecture.mddocs-developer/overview/technical-architecture/unified-architecture-design.mddocs-developer/features/session-technical/session-management-architecture.md(referenced fromCLAUDE.md)docs-developer/features/session-technical/message-pipeline.mddocs-developer/development/chat-websocket-events.mddocs-developer/development/test-harness-chat.mdCLAUDE.mdSession Management section + System Prompt Injection Registry (verify still accurate)
Likely DELETE / archive (historical post-mortems and superseded plans, no longer load-bearing):
docs-developer/decisions/chat-streaming-improvements.mddocs-developer/decisions/streaming-optimization-architecture.mddocs-developer/decisions/message-array-design.mddocs-developer/decisions/lessons-learned-session-architecture.mddocs-developer/decisions/session-management-lessons-learned.mddocs-developer/decisions/interactive-prompt-streaming-order-438.mddocs-developer/operations/debugging/chat-stream-duplicate-plan.mddocs-developer/features/session-technical/sessions-ux-streaming-plan.mddocs-developer/features/session-technical/message-pipeline-codex-review.mddocs-developer/plans/2026-02-27-subagent-visibility-design.mddocs-developer/plans/2026-02-27-subagent-visibility-plan.mddocs-developer/plans/2026-03-04-reconciliation-indicator.mddocs-developer/plans/2026-02-27-subagent-busy-bar-plan.mddocs-developer/plans/2026-02-27-subagent-busy-bar-design.mddocs-developer/features/ui-components/real-time-tool-execution-tracking.md(signal classification + processing indicator supersedes)docs-developer/operations/infrastructure/websocket-reconnect-handling.md(cursor protocol replaces)
Per-phase sweep cadence:
- After B.1 ships (event ring + cursor): rewrite the architecture overview docs; mark old session-management lessons as historical.
- After B.2 ships (delete client JSONL paths): delete superseded plans/decisions; update
CLAUDE.mdsession-management section. - After B.3 ships (signal classification + indicator): rewrite the message-pipeline docs and the chat-websocket-events doc; document the typed envelope schema.
A full audit before B.1 is overkill — sweep alongside each phase to keep doc-state in sync with code.
Redact MCP secrets in runCommand failure logs
From: Exa MCP Task 3 code review (commit 1a4c9e8b)
Target: Before next deployment
Priority: MEDIUM
mcpUtils.runCommand logs the full argv on non-zero exit (server/utils/mcpUtils.js:41-44). Any --header "x-api-key: ..." or --env "TOKEN=..." value is written to the server log when the command fails. Affects Exa (hosted-HTTP key), Postmark (server token), and any future MCP using --header/--env for secrets. Fix: scan argv for --header/-H/--env/-e and replace the next arg's secret portion with *** before logging.
Better error messaging when claude mcp add fails after remove succeeds
From: Exa MCP Task 3 code review
Priority: LOW
registerServer removes any prior registration before adding the new one. If the add fails, the previous registration is gone — recoverable by clicking Register again, but the user gets a generic CLI error. Wrap the failure path with: "Registration failed; previous registration (if any) has been removed. Click Register again to retry." Pre-existing in Postmark too.
MCP service pattern hardening (cross-cutting)
From: Exa MCP Task 4 code review
Priority: LOW
Three pattern-level issues inherited by every MCP service from the Postmark template:
getStatus()callsrunCommand('claude', ['mcp', 'list'])directly, bypassing the 30s cache ingetMcpRegistrationsCached. Each Settings page load spawns all registered MCPs to check health.- No module-level mutex in
mcpUtils.js, so a startup auto-register and a concurrent user-initiated Register click can interleave (briefly removes-then-adds a working registration). ensureRegistered's outer catch returnsfalsefor both transient registration failures and configuration corruption (e.g., decrypt failure on key rotation). User can't distinguish "retry" from "fix your config".
Fix in mcpUtils.js so all services benefit at once.
Future Ideas / Backlog
Cleanup job for accumulated agent output files
From: QA Session 1 §1.2 — qa/session-01-2026-04-17-actions.md
Priority: LOW
Scoring agent runs generate thousands of files that persist indefinitely. Create a scheduled cleanup skill that prunes old output files with configurable retention.
Done
- Y Phase 0 — JSONL-tail replay backend (2026-05-08, branch
feature/jsonl-tail-replay-y) — Tasks 1-7 of the Y plan. AddedjsonlTailCursor,jsonlTailService,jsonlReplayMapper,resolveTranscriptPath, feature-flag dispatcher ineventRingHandler.js, integration test, and[chat-subscribe]diagnostic logs. 58/58 tests green. Backend proven end-to-end via Playwright + browser console — fresh subscribe + cursor-rewind replays 27 events with monotonic seq. - Atomic JSON metadata writes (2026-05-08, commit
b7ece196) — switchedconversationManager.jsandprojects.jsto write-temp-then-rename to dodge an EFS O_TRUNC race that could leave valid JSON followed by trailing garbage when two writers raced. - Testing strategy PRD refresh (2026-05-09, commit
4e0975ea) — added 5-goal framing, Y to inventory, layered E2E plan with Layer 3 as the recommended next-build piece, removed stale "local login broken" caveat. - A2c — Synthetic stress test for chat-subscribe protocol (2026-05-09) — replaces the multi-day A3 soak that wasn't actually feasible (no non-prod env with traffic + compactions). New suite at
claudecodeui/server/__tests__/stress/chat-subscribe-jsonl-stress.test.jsruns 5 scenarios in600ms: 100 concurrent subscribes (71ms), 200 sustained append+reconnect cycles (200ms), 50-cursor compaction race after in-place rewrite (66ms), 5MB-file prefix-hash perf budget 1s actual 45ms (50× headroom), and 50× chunked partial-line safety (132ms). 72/72 tests across the full chat-pipeline suite. Catches what soak would (cursor-expired rate under load, perf regressions, partial-line edge cases) deterministically and without wall-time delay. - A2a — Strengthen cursor compaction detection (2026-05-09) — added
prefixHashfield to the cursor (sha256(bytes[0, byteOffset])at issue time) and verify it on reconnect. Catches in-place truncate-and-rewrite that the dev:ino-only check missed. BumpedJSONL_CURSOR_VERSIONto 2 (any in-flight v1 cursor gets one-timecursor-expiredand re-subscribes fresh). Files:jsonlTailCursor.js(schema),jsonlTailService.js(addedcomputePrefixHashhelper, returned byfreshCursorandtailFile),eventRingHandler.js(replacedsameFileIdentitywith prefix-hash check + truncation short-circuit). Verified end-to-end in Docker: same scenarios as A2 — rewind+replay , append+resume , in-place compaction now correctly returnscursor-expired reason=prefix-hash-mismatch(was previouslyoutcome=replay events=1silent corruption). Test coverage: 67/67 acrosschat-subscribe-jsonl.test.js(scenario 9 un-skipped + new scenario 10 for truncation),eventRingHandler.test.js,jsonlTailCursor.test.js,jsonlTailService.test.js,jsonl-tail-replay-e2e.test.js. A3 (soak) is now unblocked. - A2 — Y Phase 1 Docker validation (2026-05-09) — built
sasha-local:dev(linux/amd64 emulated on Apple Silicon, sourcemaps off to fit 8GB Docker memory), ran withREPLAY_BACKEND=jsonland the existing subscription token (mounted viadata/sasha.db). Drove three WS chat-subscribe scenarios in-container vianode ws-test.mjs: rewind-from-zero (3 events with monotonic seq ), append-and-resume (1 new event ), and in-place compaction. The in-place compaction case reproduced the Task 5 blind spot exactly — inode preserved (3936074→3936074),[chat-subscribe] outcome=replay events=1instead ofcursor-expired. The server returned post-compaction bytes as if they were continuation. A2a item above captures the fix design (prefixHashcursor field). Skipped regression test sits atchat-subscribe-jsonl.test.jsready to un-skip when A2a ships. - A1 — Layer 3 WS-protocol E2E vitest suite (2026-05-09) — added
claudecodeui/server/__tests__/e2e/chat-subscribe-jsonl.test.js, an in-process WS+Express harness that exerciseshandleChatSubscribeend-to-end withREPLAY_BACKEND=jsonl, a tmpCLAUDE_PROJECTS_PATH, and seeded JSONL fixtures. 8 scenarios (the 7 from this plan plus an unknown-sessionId-with-cursor negative path) all green in 463ms. The harness skips JWT auth deliberately — that layer is owned byverifyClienton the WSS in production and is independently covered. Implementation note: chose in-process over child-process spawn becauseserver/index.js(7693 lines) has heavy import-time side effects (db migrations, scheduler init) that are unrelated to the protocol contract under test; the lighter harness gives the same coverage with ~250 lines. - Suppress
rate_limit_infoin chat UI (2026-05-09, Phase C0) — extendedisSystemJson()inclaudecodeui/src/utils/messageNormalizer.jsto match envelopes by the presence of arate_limit_infofield rather than bytype/subtype. Investigation found the 2026-04-28 entry was inaccurate: no rate_limit_info code change was actually committed at that time, so the envelope continued falling through to the[Message format not recognized: ...]fallback. Added 10 unit tests insrc/utils/__tests__/messageNormalizer.test.jscovering the rate_limit envelope shape (with and withouttype: system), regression coverage for assistant + system-init, and concatenated assistant+rate_limit envelopes. C1 (server-side typed normaliser) remains the strategic fix.