Sasha Studio — TODO

Single authoritative task list. References other docs (e.g. QA session actions) for full context.

Desired End State

Chat pipeline — the load-bearing surface

The chat is the most-used surface and historically the most fragile. The end-state we're building toward, anchored on five product-quality goals:

#	Goal	What "shipped" looks like
1	Stable session	Reconnect, server restart, and compaction all preserve session state. No drops, no duplicates.
2	Don't lose messages we ought to surface	Anything Anthropic's CLI persists to JSONL is replayable to the client. CI gates regressions.
3	Reduce noisy messages	`[Message format not recognized: ...]` never reaches a user. Status events stay out of the transcript. Typed envelope schema with `signal: 'message' \| 'status' \| 'silent'`.
4	Stay consistent with stdout/stderr	Live-stream and replay produce semantically equivalent client state. Server-side normaliser, not three normalisers in three places.
5	Debuggable	Every chat-subscribe outcome greppable in server logs (`[chat-subscribe]` prefix, shipped). A1 debug pane for live wire-frame inspection. CI catches regressions before merge.

When all five are met:

REPLAY_BACKEND=jsonl is the default and only backend (B.1 ring deleted).
The client's useChatReplaySubscribe hook is the only history-load path (the five legacy api.sessionMessages refetch sites are deleted).
A typed envelope schema exists in server/services/normaliser/ and the client never sees an unclassified envelope.
A vitest E2E suite spins up the server with REPLAY_BACKEND=jsonl and asserts the protocol; an A2 replay harness asserts the renderer matches expected DOM for known fixtures.
A Playwright suite covers login + send-message + reload golden flows.

Operational health

All deployed instances (CJK, sasha1/HireBest, control panel) audited for unused scheduled agents; nothing burning tokens silently.
File tree handles thousands of files without freezing.
MCP secret redaction in failure logs across all services.

Documentation hygiene

docs-developer/ doesn't contain stale chat-pipeline docs that would mislead future agents/engineers.
CLAUDE.md session-management section reflects current architecture (no longer says "ring", no longer references B.1).

Chat Pipeline Program

Sequenced to end state. Phases are roughly ordered by risk and dependency.

Phase A — Y server-side replay (in flight)

Y replaces B.1's in-memory ring with a JSONL-tail backend. Server work is shipped behind REPLAY_BACKEND=jsonl.

A3. Soak Y as default for 3-7 days — Replaced by A2c synthetic stress test

Status: Reframed 2026-05-09. The original soak plan needed a non-prod environment with sustained traffic + compactions firing + concurrent clients — none of which exists in this stack. A multi-day passive soak with no traffic produces no signal. Replaced by the deterministic stress suite at claudecodeui/server/__tests__/stress/chat-subscribe-jsonl-stress.test.js which exercises the same failure modes in CI in ~600ms (concurrent subscribes, sustained append+reconnect, compaction race, large-file perf, partial-line safety). The "default jsonl" flip is now bundled with B.1 (the only thing that produces real chat-subscribe traffic anyway).

A4. Y Phase 3 — delete ring code — Priority: MEDIUM

Target: After B.1 ships (jsonl is the only path)
From: Y plan Task 9.

Delete event-ring*.js, related tests, messageStreamHandler.js ring tap (commit 6a5ea125), markCompleted hooks (commit 847d1352). Rename eventRingHandler.js → chatSubscribeHandler.js. Net: ~600 lines deleted.

Phase B — Client cutover (B.2)

The single biggest in-flight gap: server is ready, client never subscribes. Until B.2 ships, REPLAY_BACKEND=jsonl is dormant.

B1. Implement `useChatReplaySubscribe` hook — Priority: HIGH

Target: Next session — A is now stable (A2a fix shipped, stress test green at 72/72).
From: docs/superpowers/plans/2026-05-06-client-cutover-b2-redesign.md.

Per the redesigned B.2 plan: hook owns the cursor outside React state (chatReplayCursorStore), routes replay events through the existing message_streamed dispatcher (not a parallel renderer — the first attempt failed by going parallel), handles cursor-expired by triggering a single api.sessionMessages refetch.

B2. Delete the five legacy refetch paths — Priority: HIGH

Target: Same PR as B1
From: B.2 plan.

Remove useProjectWebSocketV2.js:663-737, ChatInterface.jsx:5273-5444 and :5533-5547, App.jsx:1304-1356, plus the historyLoadedRef/__pendingHistoryReload state. Replace LOAD_SESSION_MESSAGES reducer's content-merge with REPLACE + UUID dedup.

B3. Verify "no drops on reconnect" via Layer 3 + Layer 5 E2E — Priority: HIGH

Target: Same PR as B1/B2

Extend the WS E2E (A1) to cover the client-driven flow now that there's a client. Add a Playwright golden-flow test: send message → reload page → assert history intact.

Phase C — Noise classification (B.3)

The only piece that addresses goal 3. Not started today; this is what the user is seeing in [Message format not recognized: type, rate_limit_info, uuid…].

C1. Server-side typed-envelope normaliser — Priority: HIGH

Target: After Phase B
From: B spec signal-classification section.

New server/services/normaliser/ that produces typed envelopes with explicit signal: 'message' | 'status' | 'silent'. Single source of truth — replaces messageNormalizer.js, the ChatInterface.jsx shape branches, and useProjectWebSocketV2.js's envelope handling. The client trusts the signal and renders accordingly. Schema is checked in.

C2. Processing-indicator UI (Surface 1 + Surface 2) — Priority: MEDIUM

Target: After C1
From: B spec.

Two surfaces:

Surface 1 (inline): in-transcript marker for signal: 'status' events that are interesting (e.g., "thinking", "tool call started").
Surface 2 (attached): persistent "Sasha is working…" indicator near the input, driven by aggregate status state.

Replaces ad-hoc tool-execution-tracking UI; gives users a stable mental model for "the system is doing something" without polluting the transcript.

C3. Drop the `[Message format not recognized: ...]` placeholder entirely — Priority: HIGH

Target: Same PR as C1

Once C1 ships, every envelope has a recognised signal. The fallback placeholder at messageNormalizer.js:170 becomes dead code; remove it. Any envelope that the typed normaliser can't classify is logged server-side and dropped client-side (silent).

Phase D — Replay test harness (A2) for renderer

Closes the rendering coverage gap. Most useful after B.2 ships because before then the client doesn't drive subscribe events.

D1. Build the replay-test harness — Priority: HIGH (after B)

From: docs/superpowers/specs/2026-05-02-chat-replay-testing-design.md.

Captures real ~/.claude/projects/**/*.jsonl session files, runs them through the server in replay mode, records WS frames, replays them through the React tree in jsdom, asserts: no drops (D1/D2/E completeness invariants), no out-of-order rendering (F invariant). Hand-written tests can't keep up with envelope-shape variance — this is the right tool for ChatInterface.jsx and the renderer.

D2. Set per-area coverage thresholds in CI — Priority: MEDIUM

Target: After D1 raises baseline.

Add thresholds to vitest.config.ts for the chat-pipeline modules. CI blocks PRs that drop coverage in server/services/jsonl*.js, server/services/normaliser/*, src/hooks/useChatReplaySubscribe.js, src/utils/messageNormalizer.js (or its successor).

Phase E — Playwright golden flows

E1. Scaffold Playwright + login test — Priority: MEDIUM

Target: Anytime; cheap
From: PRD §8 Layer 5.

claudecodeui/e2e/ directory, Playwright config, single test that logs in and asserts the project list renders. Establishes the framework. No Y-specific path required yet.

E2. Send-and-reload golden flow — Priority: MEDIUM

Target: After B ships
From: PRD §8 Layer 5.

Login → open existing session → send a message → assert it streams in → reload page → assert history intact. Directly tests goal 1 (stable session, no drops) end-to-end.

E3. Debug-pane assertion test — Priority: LOW

Target: After C ships

Use Playwright to read A1 debug-pane DOM, assert specific event types appear / don't appear. Catches noise-class regressions (goal 3) automatically.

Tracking

Branch: feature/jsonl-tail-replay-y (Y Phase 0 + today's validation; not yet pushed/merged)
Test strategy: docs/prd/16-testing-strategy.md
Y spec: docs/superpowers/specs/2026-05-06-replay-via-jsonl-tail-design.md
Y plan: docs/superpowers/plans/2026-05-06-jsonl-tail-replay-y.md
B.2 plan: docs/superpowers/plans/2026-05-06-client-cutover-b2-redesign.md
B spec (origin of B.3 signal classification): docs/superpowers/specs/2026-05-05-chat-event-replay-ring-design.md

Operations

ECS rolling deploys hit SQLite lock contention — Priority: MEDIUM

From: CJK deploy attempt 2026-05-10 — new task failed ALB health check, ECS auto-rolled back. Retry succeeded but took ~5 min (vs typical ~2 min) because startup blocked on locks.

What happens: ECS service is configured strategy: ROLLING, maximumPercent: 200, minimumHealthyPercent: 100. New task starts BEFORE old task stops — so for ~30-60s, two containers concurrently write to the same /app/data/sasha.db on EFS. SQLite locking over NFS is unreliable; the new container's startup retention sweeps + cloud-drive init issued write transactions that ended up in busy-waits, blocking the event loop for ~40s, missing ALB health checks.

Important context: WAL mode is intentionally disabled in database/db.js because it requires POSIX file locking guarantees that NFS/EFS doesn't provide — would risk DB corruption. busy_timeout = 10000 is already set. So the standard "enable WAL + busy_timeout" recipe is OFF the table here. Fixes have to keep DELETE journal mode and either defer or eliminate startup writes.

Fixes shipped 2026-05-10:

services/{activityRetention,qualityRetention,fileRetention}.js defer their initial sweep by 60s on startup (configurable via *_INITIAL_DELAY_MS env). By the time the sweep fires, the old container has drained.

Remaining gaps:

cloudDriveManager.initialize() (called inside server.listen callback) also does DB writes during startup AND throws a CHECK constraint failed: health_status IN ('unmounted','mounting','mounted','degraded','remounting') error — separate bug, schema mismatch in remote-config sync code path.
Other startup writers we haven't audited: markStaleExecutions('meeting'), setupProjectsWatcher, scheduler init.
Strategic fix: switch to minimumHealthyPercent: 0, maximumPercent: 100 deployment config so old stops before new starts. Trade-off: ~30s outage during deploys vs current intermittent failures. Worth doing once we audit all startup DB callers.
Possibly lower busy_timeout from 10000 to 3000 so any residual contention bounds the event-loop pin at 3s/statement (currently 10s). With the retention defer fix this is belt-and-suspenders — only relevant if another startup DB-writer slips through.

Data Integrity

Investigate systemic conversation transcript loss — Priority: HIGH

From: CJK Associates production audit, 2026-05-09 — discovered after deploying the C0 + B.2 Phase 1 image and clicking conversations that returned empty.
Target: Once [SESSION-STUB] log frequency tells us how widespread this is.

The phenomenon. On CJK, multiple conversations show in the file tree (project metadata + summary intact) but their session JSONL transcript is missing from disk. When the user clicks them, chat shows "No messages in this session yet". Server logs show messageCount: 0, hasMessages: false from getSessionMessages, and [SERVER-GET-MESSAGES] No JSONL files found, returning empty array.

What we know from the on-disk audit:

Conversations affected on CJK at the time of the audit: background, internal-understanding, setup, bromcom, bubble project, agentic-prompt-develop…, Help (~half the file tree).
The audit-hook log (/home/sasha/all-project-files/audit/sessions/<sessionId>.jsonl) survives for these sessions and explicitly records transcript_path: /home/sasha/.claude/projects/-home-sasha-projects-<name>-conversation-001/<sessionId>.jsonl — i.e. Claude knew where it intended to write at the time. That path is missing today.
/home/sasha/.claude/projects/ directory entries are mostly mtime April 10, 2026 — a sharp cluster suggesting a mass-rebuild or migration on that date. The earliest surviving entry is Feb 19; conversations created Feb 25 onward (without a corresponding April 10 mtime) are gone.
Sessions that DO work (e.g. test2/c43d8d7f) have intact JSONL at the canonical path. The split is binary — either the dir exists with content, or it's missing entirely.

Hypotheses (none confirmed):

April 10 EFS sync/migration pruned older session dirs.
Conversations created during a window when Claude Code's session-write was failing silently (e.g. auth issues at the time — local dev today still shows Failed to authenticate. API Error: 401 for old conversations).
A cleanup job we don't know about is purging on age/size criteria.

What's already in place to quantify it (this same session):

server/projects.js#detectStubState — emits structured [SESSION-STUB] warning logs whenever a getSessionMessages returns empty for a session whose conversation metadata indicates it was attempted.
GET /api/projects/:projectName/sessions/:sessionId/messages now returns { messages, stub }. The stub payload includes lastSessionId, sessionsRecorded, summary, auditLogPath, metadataLastUpdate — useful diagnostic shape for forensics.
ChatInterface.jsx renders a different empty-state ("Transcript not found") when stub is non-null, so users see something honest instead of "Start a conversation" on top of a broken session.

Next steps (after baseline log frequency is in):

Watch CloudWatch /ecs/sasha for [SESSION-STUB] lines for a week. Count: by project, by metadataLastUpdate window, by audit-log presence.
Check AWS Backup / EFS lifecycle policies (IAM planb-ops lacks the read perms; needs admin or web console access). If EFS snapshots cover April 9, restore selectively.
If patterns concentrate around a specific deploy or date range: bisect the deploy for a delete/migration step that pruned the canonical dirs.
Add a "delete conversation" button to the stub UI once we understand the cause — letting users remove orphaned tree entries cleanly.

Test Infrastructure

Vitest parallel workers race on `db.js` SQLite probe file

From: A1 E2E suite work, 2026-05-09
Target: When the next person trips over a flaky CI suite import error
Priority: LOW

Multiple test suites that transitively import server/database/db.js (via conversationManager.js, sessionMetrics.js, etc.) write a probe file under data/.probe-<ts>.tmp at module-load time, then unlink it. Under vitest's default parallel pool, the unlink can ENOENT because a sibling worker's cleanup removed data/ first. Workaround: run the affected suites with --maxWorkers=1. Real fix: make db.js skip the writability probe when NODE_ENV === 'test' or when imported into a process that hasn't actually opened the database, OR use a per-worker tmp path for the probe.

Performance

File tree panel slow with large project trees

From: QA Session 1 §1.1 — qa/session-01-2026-04-17-actions.md
Target: Next sprint
Priority: HIGH

File tree reads from filesystem on every render. With scoring agent output producing thousands of files, UI becomes unusable. Profile rendering, add caching layer, consider lazy loading for deep directories.

Operations

Disable unused scheduled agents on deployed instances

From: QA Session 1 §2.1 — qa/session-01-2026-04-17-actions.md
Target: Immediate
Priority: HIGH

Demo/example scheduled agents still running on deployed instances, burning Anthropic tokens daily with no useful output. Audit all deployed instances (CJK Associates, sasha1/HireBest), disable anything not actively needed.

UI

Clean up dead NewProjectModal code after New Project buttons removal

From: UI request 2026-05-12 — New Folder button replaces New Project; New Project buttons removed in commit on feat/new-folder-single-hit
Target: Follow-up once we're confident no other entry point opens NewProjectModal
Priority: LOW

The three "New Project" buttons (desktop toolbar, mobile toolbar, collapsed rail) were removed in favor of "New Folder" doing everything. NewProjectModal.jsx, the showNewProject state, the two <NewProjectModal> JSX instances in Sidebar.jsx, and the handleProjectCreated callback (lines ~2365-2402) are now orphaned. Delete them once any indirect entry points (e.g. folder context menu "Create project in folder" still uses window.prompt, not the modal — confirmed in the inventory) are confirmed gone for good. Watch for cross-references: addProjectPlaceholder (used by handleProjectCreated) is also used by NewFolderDialog's onBundleCreated — keep it.

Replace `window.prompt` in folder-menu "Create project in folder"

From: Brainstorm 2026-05-12 — folder-creation rationalisation (docs/superpowers/specs/2026-05-12-new-folder-single-hit-design.md)
Target: Follow-up after the single-hit folder feature ships
Priority: LOW

The sidebar folder context menu's "Create project in folder" action (Sidebar.jsx:2058-2080) still uses window.prompt(...) for the project name. After the single-hit work ships, the same UI primitives (small dialog component, consistent error handling) can replace this prompt. Out of scope for the single-hit feature because the intent there is folder-first, not project-in-existing-folder.

Auto-rename auto-created project from first-message title

From: Brainstorm 2026-05-12 — "single-hit folder + chat" design (docs/superpowers/specs/2026-05-12-new-folder-single-hit-design.md)
Target: Follow-up after the single-hit folder feature ships
Priority: LOW

When New Folder auto-creates a project named chat 1, the conversation underneath gets a meaningful auto-title from generateSessionTitle() on the first user message (server/projects.js:923-945), but the project keeps the placeholder name. This produces redundant nesting in the sidebar: folder ST LS GTM → project chat 1 → conversation Help me draft the GTM plan. Auto-rename the project at the same point the session title is generated by calling renameProject() (server/projects.js:1818). Note: renameProject moves the project's working directory, so this must fire after the first session is established but before significant filesystem state accumulates — handle the race with the in-flight session carefully.

Documentation

Tidy stale chat-pipeline tech docs as part of B rollout

From: B spec (docs/superpowers/specs/2026-05-05-chat-event-replay-ring-design.md) — chat architecture is being rewritten in 3 phases; many existing docs describe the old behaviour and will mislead future agents and engineers
Target: Sweep alongside each B phase ship — don't let the doc debt accumulate to the end
Priority: MEDIUM

The B work (event replay ring + cursor protocol + signal classification) invalidates a lot of docs-developer/ content. Auto-generated mirrors under docs-developer/html/ and docs-developer/html-static/ regenerate from source, so focus only on source docs.

Sweep approach — for each impacted doc, decide one of: KEEP (mark as historical with a one-line note at top), REWRITE (update to match new design), DELETE (move to docs-developer/archive/). Default to DELETE for anything older than ~6 months that documents implementation details rather than rationale.

Likely needs REWRITE (reflect new architecture):

docs-developer/overview/chat-and-streaming-architecture.md
docs-developer/overview/architecture-current.md
docs-developer/overview/technical-architecture/message-handling-architecture.md
docs-developer/overview/technical-architecture/unified-architecture-design.md
docs-developer/features/session-technical/session-management-architecture.md (referenced from CLAUDE.md)
docs-developer/features/session-technical/message-pipeline.md
docs-developer/development/chat-websocket-events.md
docs-developer/development/test-harness-chat.md
CLAUDE.md Session Management section + System Prompt Injection Registry (verify still accurate)

Likely DELETE / archive (historical post-mortems and superseded plans, no longer load-bearing):

docs-developer/decisions/chat-streaming-improvements.md
docs-developer/decisions/streaming-optimization-architecture.md
docs-developer/decisions/message-array-design.md
docs-developer/decisions/lessons-learned-session-architecture.md
docs-developer/decisions/session-management-lessons-learned.md
docs-developer/decisions/interactive-prompt-streaming-order-438.md
docs-developer/operations/debugging/chat-stream-duplicate-plan.md
docs-developer/features/session-technical/sessions-ux-streaming-plan.md
docs-developer/features/session-technical/message-pipeline-codex-review.md
docs-developer/plans/2026-02-27-subagent-visibility-design.md
docs-developer/plans/2026-02-27-subagent-visibility-plan.md
docs-developer/plans/2026-03-04-reconciliation-indicator.md
docs-developer/plans/2026-02-27-subagent-busy-bar-plan.md
docs-developer/plans/2026-02-27-subagent-busy-bar-design.md
docs-developer/features/ui-components/real-time-tool-execution-tracking.md (signal classification + processing indicator supersedes)
docs-developer/operations/infrastructure/websocket-reconnect-handling.md (cursor protocol replaces)

Per-phase sweep cadence:

After B.1 ships (event ring + cursor): rewrite the architecture overview docs; mark old session-management lessons as historical.
After B.2 ships (delete client JSONL paths): delete superseded plans/decisions; update CLAUDE.md session-management section.
After B.3 ships (signal classification + indicator): rewrite the message-pipeline docs and the chat-websocket-events doc; document the typed envelope schema.

A full audit before B.1 is overkill — sweep alongside each phase to keep doc-state in sync with code.

Redact MCP secrets in `runCommand` failure logs

From: Exa MCP Task 3 code review (commit 1a4c9e8b)
Target: Before next deployment
Priority: MEDIUM

mcpUtils.runCommand logs the full argv on non-zero exit (server/utils/mcpUtils.js:41-44). Any --header "x-api-key: ..." or --env "TOKEN=..." value is written to the server log when the command fails. Affects Exa (hosted-HTTP key), Postmark (server token), and any future MCP using --header/--env for secrets. Fix: scan argv for --header/-H/--env/-e and replace the next arg's secret portion with *** before logging.

Better error messaging when `claude mcp add` fails after remove succeeds

From: Exa MCP Task 3 code review
Priority: LOW

registerServer removes any prior registration before adding the new one. If the add fails, the previous registration is gone — recoverable by clicking Register again, but the user gets a generic CLI error. Wrap the failure path with: "Registration failed; previous registration (if any) has been removed. Click Register again to retry." Pre-existing in Postmark too.

MCP service pattern hardening (cross-cutting)

From: Exa MCP Task 4 code review
Priority: LOW

Three pattern-level issues inherited by every MCP service from the Postmark template:

getStatus() calls runCommand('claude', ['mcp', 'list']) directly, bypassing the 30s cache in getMcpRegistrationsCached. Each Settings page load spawns all registered MCPs to check health.
No module-level mutex in mcpUtils.js, so a startup auto-register and a concurrent user-initiated Register click can interleave (briefly removes-then-adds a working registration).
ensureRegistered's outer catch returns false for both transient registration failures and configuration corruption (e.g., decrypt failure on key rotation). User can't distinguish "retry" from "fix your config".

Fix in mcpUtils.js so all services benefit at once.

Future Ideas / Backlog

Cleanup job for accumulated agent output files

From: QA Session 1 §1.2 — qa/session-01-2026-04-17-actions.md
Priority: LOW

Scoring agent runs generate thousands of files that persist indefinitely. Create a scheduled cleanup skill that prunes old output files with configurable retention.

Done

Y Phase 0 — JSONL-tail replay backend (2026-05-08, branch feature/jsonl-tail-replay-y) — Tasks 1-7 of the Y plan. Added jsonlTailCursor, jsonlTailService, jsonlReplayMapper, resolveTranscriptPath, feature-flag dispatcher in eventRingHandler.js, integration test, and [chat-subscribe] diagnostic logs. 58/58 tests green. Backend proven end-to-end via Playwright + browser console — fresh subscribe + cursor-rewind replays 27 events with monotonic seq.
Atomic JSON metadata writes (2026-05-08, commit b7ece196) — switched conversationManager.js and projects.js to write-temp-then-rename to dodge an EFS O_TRUNC race that could leave valid JSON followed by trailing garbage when two writers raced.
Testing strategy PRD refresh (2026-05-09, commit 4e0975ea) — added 5-goal framing, Y to inventory, layered E2E plan with Layer 3 as the recommended next-build piece, removed stale "local login broken" caveat.
A2c — Synthetic stress test for chat-subscribe protocol (2026-05-09) — replaces the multi-day A3 soak that wasn't actually feasible (no non-prod env with traffic + compactions). New suite at claudecodeui/server/__tests__/stress/chat-subscribe-jsonl-stress.test.js runs 5 scenarios in ~~600ms: 100 concurrent subscribes (~~71ms), 200 sustained append+reconnect cycles (~~200ms), 50-cursor compaction race after in-place rewrite (~~66ms), 5MB-file prefix-hash perf budget 1s actual 45ms (~~50× headroom), and 50× chunked partial-line safety (~~132ms). 72/72 tests across the full chat-pipeline suite. Catches what soak would (cursor-expired rate under load, perf regressions, partial-line edge cases) deterministically and without wall-time delay.
A2a — Strengthen cursor compaction detection (2026-05-09) — added prefixHash field to the cursor (sha256(bytes[0, byteOffset]) at issue time) and verify it on reconnect. Catches in-place truncate-and-rewrite that the dev:ino-only check missed. Bumped JSONL_CURSOR_VERSION to 2 (any in-flight v1 cursor gets one-time cursor-expired and re-subscribes fresh). Files: jsonlTailCursor.js (schema), jsonlTailService.js (added computePrefixHash helper, returned by freshCursor and tailFile), eventRingHandler.js (replaced sameFileIdentity with prefix-hash check + truncation short-circuit). Verified end-to-end in Docker: same scenarios as A2 — rewind+replay , append+resume , in-place compaction now correctly returns cursor-expired reason=prefix-hash-mismatch (was previously outcome=replay events=1 silent corruption). Test coverage: 67/67 across chat-subscribe-jsonl.test.js (scenario 9 un-skipped + new scenario 10 for truncation), eventRingHandler.test.js, jsonlTailCursor.test.js, jsonlTailService.test.js, jsonl-tail-replay-e2e.test.js. A3 (soak) is now unblocked.
A2 — Y Phase 1 Docker validation (2026-05-09) — built sasha-local:dev (linux/amd64 emulated on Apple Silicon, sourcemaps off to fit 8GB Docker memory), ran with REPLAY_BACKEND=jsonl and the existing subscription token (mounted via data/sasha.db). Drove three WS chat-subscribe scenarios in-container via node ws-test.mjs: rewind-from-zero (3 events with monotonic seq ), append-and-resume (1 new event ), and in-place compaction. The in-place compaction case reproduced the Task 5 blind spot exactly — inode preserved (3936074→3936074), [chat-subscribe] outcome=replay events=1 instead of cursor-expired. The server returned post-compaction bytes as if they were continuation. A2a item above captures the fix design (prefixHash cursor field). Skipped regression test sits at chat-subscribe-jsonl.test.js ready to un-skip when A2a ships.
A1 — Layer 3 WS-protocol E2E vitest suite (2026-05-09) — added claudecodeui/server/__tests__/e2e/chat-subscribe-jsonl.test.js, an in-process WS+Express harness that exercises handleChatSubscribe end-to-end with REPLAY_BACKEND=jsonl, a tmp CLAUDE_PROJECTS_PATH, and seeded JSONL fixtures. 8 scenarios (the 7 from this plan plus an unknown-sessionId-with-cursor negative path) all green in 463ms. The harness skips JWT auth deliberately — that layer is owned by verifyClient on the WSS in production and is independently covered. Implementation note: chose in-process over child-process spawn because server/index.js (7693 lines) has heavy import-time side effects (db migrations, scheduler init) that are unrelated to the protocol contract under test; the lighter harness gives the same coverage with ~250 lines.
Suppress rate_limit_info in chat UI (2026-05-09, Phase C0) — extended isSystemJson() in claudecodeui/src/utils/messageNormalizer.js to match envelopes by the presence of a rate_limit_info field rather than by type/subtype. Investigation found the 2026-04-28 entry was inaccurate: no rate_limit_info code change was actually committed at that time, so the envelope continued falling through to the [Message format not recognized: ...] fallback. Added 10 unit tests in src/utils/__tests__/messageNormalizer.test.js covering the rate_limit envelope shape (with and without type: system), regression coverage for assistant + system-init, and concatenated assistant+rate_limit envelopes. C1 (server-side typed normaliser) remains the strategic fix.

Sasha Studio — TODO

Desired End State

Chat pipeline — the load-bearing surface

Operational health

Documentation hygiene

Chat Pipeline Program

Phase A — Y server-side replay (in flight)

A3. Soak Y as default for 3-7 days — Replaced by A2c synthetic stress test

A4. Y Phase 3 — delete ring code — Priority: MEDIUM

Phase B — Client cutover (B.2)

B1. Implement useChatReplaySubscribe hook — Priority: HIGH

B2. Delete the five legacy refetch paths — Priority: HIGH

B3. Verify "no drops on reconnect" via Layer 3 + Layer 5 E2E — Priority: HIGH

Phase C — Noise classification (B.3)

C1. Server-side typed-envelope normaliser — Priority: HIGH

C2. Processing-indicator UI (Surface 1 + Surface 2) — Priority: MEDIUM

C3. Drop the [Message format not recognized: ...] placeholder entirely — Priority: HIGH

Phase D — Replay test harness (A2) for renderer

D1. Build the replay-test harness — Priority: HIGH (after B)

D2. Set per-area coverage thresholds in CI — Priority: MEDIUM

Phase E — Playwright golden flows

E1. Scaffold Playwright + login test — Priority: MEDIUM

E2. Send-and-reload golden flow — Priority: MEDIUM

E3. Debug-pane assertion test — Priority: LOW

Tracking

Operations

ECS rolling deploys hit SQLite lock contention — Priority: MEDIUM

Data Integrity

Investigate systemic conversation transcript loss — Priority: HIGH

Test Infrastructure

Vitest parallel workers race on db.js SQLite probe file

Performance

File tree panel slow with large project trees

Operations

Disable unused scheduled agents on deployed instances

UI

Clean up dead NewProjectModal code after New Project buttons removal

Replace window.prompt in folder-menu "Create project in folder"

Auto-rename auto-created project from first-message title

Documentation

Tidy stale chat-pipeline tech docs as part of B rollout

Redact MCP secrets in runCommand failure logs

Better error messaging when claude mcp add fails after remove succeeds

MCP service pattern hardening (cross-cutting)

Future Ideas / Backlog

Cleanup job for accumulated agent output files

Done

B1. Implement `useChatReplaySubscribe` hook — Priority: HIGH

C3. Drop the `[Message format not recognized: ...]` placeholder entirely — Priority: HIGH

Vitest parallel workers race on `db.js` SQLite probe file

Replace `window.prompt` in folder-menu "Create project in folder"

Redact MCP secrets in `runCommand` failure logs

Better error messaging when `claude mcp add` fails after remove succeeds