Sasha Studio Release Notes: v1.0.1025 to v1.0.1068
Release Period: January 2026
Version Range: 1.0.1025 → 1.0.1068
Total Commits: 111
Actual Code Changes: 177 files changed, 29,013 insertions, 4,733 deletions
TL;DR - Business Summary
What's New in Plain English:
This release focuses on memory management and system observability—making Sasha dramatically more reliable under heavy workloads:
Crash-Proof Stats Processing - Heavy analytics workloads that used to crash your server now run in an isolated process. If something goes wrong, your main Sasha instance keeps running.
Claude Process Visibility - See exactly how many AI processes are running, how much memory they're using, and which hosts are under pressure. The Control Panel now shows real-time memory monitoring across all your deployments.
Orphaned Process Cleanup - Scheduled jobs that time out no longer leave zombie processes consuming memory. They're automatically cleaned up.
Incremental Progress That Survives Crashes - Stats processing now saves checkpoints every 30 seconds. If interrupted, it continues from where it left off instead of starting over.
Memory Guards Everywhere - File uploads, stats refreshes, and background jobs all have configurable memory limits with graceful degradation when limits are approached.
Business Value:
- Uptime: Server crashes from memory exhaustion are now prevented by design
- Visibility: Real-time monitoring shows exactly where memory pressure exists
- Recovery: Interrupted jobs resume automatically instead of restarting from scratch
- Capacity Planning: Host activity dashboard enables proactive infrastructure decisions
Executive Summary
This release represents a major investment in production stability and operational visibility. The centerpiece is a completely redesigned stats refresh system that addresses a critical issue: the previous architecture could crash entire production servers when processing large event backlogs.
The Problem: Stats refresh operations ran in the main Node.js server process. When processing hundreds of thousands of events (like the 260,000 events in a production ESOP deployment), memory accumulation could trigger the Linux OOM killer, crashing the entire server. Since no checkpoint was saved until completion, the server would restart and immediately face the same backlog—creating an infinite crash loop.
The Solution: Stats processing now runs in an isolated child process with strict memory limits. The worker saves checkpoints every 30 seconds or every 5,000 events. If the worker crashes or times out, the main server continues running, and the next attempt continues from the last checkpoint instead of starting over. This architecture makes gradual progress even when facing massive backlogs.
Beyond stats processing, this release adds comprehensive Claude process visibility—the health endpoint now reports active Claude processes with per-process memory usage via /proc/[pid]/status. The Control Panel aggregates this data across all deployments, enabling capacity planning and identifying which infrastructure hosts have memory pressure from concurrent AI jobs.
Additional memory protections include: file upload size limits to prevent memory exhaustion during markdown conversion, scheduled job cleanup to eliminate orphaned Claude processes, and ESLint rules to catch code patterns that cause silent runtime failures.
Major Features & Improvements
Memory-Safe Stats Refresh Architecture
- Isolated Worker Process - Stats refresh now runs in a separate Node.js process with
--max-old-space-size=1024limit - Checkpoint Persistence - Progress saved every 30 seconds or 5,000 events to survive crashes
- Time-Windowed Processing - Events processed in 1-hour chunks with memory cleared between windows
- IPC Progress Reporting - Real-time progress updates via inter-process communication
- Graceful Degradation - Worker can crash safely; main server continues serving requests
- Memory Threshold Checks - Worker monitors heap usage every 500 lines, stops at 800MB
- Explicit GC Between Files - Forces garbage collection between processing files when available
Claude Process Observability
- Process Stats in Health Endpoint -
/api/healthnow includesclaudeProcessessection with:- Active process count and list
- Per-process memory usage from
/proc/[pid]/status - Peak concurrent processes since server start
- Total processes spawned count
- Host Activity Dashboard - New Control Panel page aggregating Claude process memory by infrastructure host
- Scheduled Job Markers - Jobs tagged with
isScheduledandprojectNamefor observability - Process Lifecycle Logging - Start/end timestamps with duration and concurrent count
Memory Exhaustion Prevention
- Upload Size Limits - Configurable max conversion size (default 50MB) for file uploads
- Stats Refresh Guards - Multiple protection layers:
- 14-day maximum window for non-incremental refreshes
- 512MB memory threshold with periodic checks
- 100,000 event hard limit per refresh
- Configurable
maxDaysparameter
- Orphaned Process Cleanup - Scheduled jobs that timeout automatically abort Claude sessions
- Safeguard Cleanup - Claude CLI close/error handlers ensure process map entries are deleted
Incremental Stats with Byte Offsets
- Byte-Level Resume - Processing continues from exact byte offset, not just timestamps
- File Truncation Detection - Automatically resets to full re-read if file shrinks
- 1KB Overlap Safety - Reads 1KB before last offset to handle partial line boundaries
Control Panel Enhancements
- Client Stats Page - Health metrics overview with uPlot sparkline visualizations
- Deployment Queue System - Bulk deploy management with pagination
- Secure Health Check Endpoint - Client monitoring via authenticated health API
- Last Rollup Timestamp Display - Shows when stats were last aggregated
Developer Experience
- ESLint with TDZ Detection - Catches temporal dead zone errors at build time
- Comprehensive Debug Logging - Enhanced diagnostics for streaming and session issues
- Debug-Sasha Skill - New skill for diagnosing deployed Sasha instances
Stability & Reliability
Stats Processing
- Checkpoint Saves During Processing - Survives timeouts and crashes mid-processing
- Worker Timeout Handling - 10-minute timeout kills hung workers with SIGKILL
- Error State Tracking -
memoryLimited,eventLimited,timeLimitedflags in response - Stats Refresh Log - Tracks running/completed/failed refresh attempts
Streaming & Sessions
- JSON.stringify for Object Content - Prevents
[object Object]in streamed messages - Haiku Thinking Error Handling - Immediate response to prevent stuck streaming
- Null SessionId Protection - Prevents message drops for new conversations
- Auto-Compact Socket Recovery - Prevents socket loss during Claude auto-compact
File Operations
- PDF Icon Maps - Comprehensive icon mapping for PDF and DOCX conversion
- Phosphor Web Font - Consistent icon rendering without emoji replacements
- SPA Size Checks - Prevents index.md from overwriting SPA shell
- Sidebar Resize - Proper initialization after SPA DOM injection
Security
- JWT Error Logging - Improved logging for secret rotation scenarios
- js-yaml Update - CVE-2025-64718 fix for prototype pollution
- Dependency Updates - Security fixes across diff, better-sqlite3, undici, esbuild, vitest
Developer Experience & Docs
Documentation
- Stats Refresh Memory Protection - Architecture document explaining the worker process design
- Health Monitoring Architecture - Three-tier response system and security model
- Dynamic Prompt Memory - Design document for variant testing system
New Skills
- debug-sasha - Comprehensive debugging skill for deployed instances
- check-container-db - Query SQLite on running containers via SSH
Testing & Debugging
- htop in Docker - Container memory debugging support
- Console Message Capture - Network request logging via Playwright MCP
- Stuck Button Enhancement - Diagnostic logging and stream cleanup
Upgrade Notes
Memory Configuration
- Worker Memory Limit: Stats worker runs with 1024MB limit by default
- Upload Size Limit: Configure max conversion size in Settings → General → File Management
- No User Action Required: Memory protections are automatic
Stats Processing Changes
- Automatic Resume: If stats refresh was interrupted, it continues from checkpoint
- Gradual Progress: Large backlogs are processed incrementally across multiple runs
- Worker Mode Default: Use
{"useWorker": false}only for debugging
Breaking Changes
- None: All changes are backward compatible
Changelog Summary (since v1.0.1025)
Features
- Isolated worker process for memory-safe stats refresh
- Claude process visibility in health endpoint and control panel
- Incremental stats refresh with byte offsets
- Deployment queue system for bulk deploys
- Client Stats page with sparkline visualizations
- ESLint with TDZ detection
- Comprehensive debug logging for issue #376
- Debug-sasha skill for instance troubleshooting
- Toast feedback for document download buttons
- History search with richer context display
- Postmark MCP: FilePath, CC, BCC, ReplyTo, metadata, headers support
Fixes
- Memory exhaustion prevention on large file uploads
- Orphaned Claude process cleanup on job timeout
- Checkpoint saves during stats processing to survive timeouts
- Streaming null sessionId message drops
- PDF/DOCX icon mapping and text density
- Haiku thinking error handling
- WebSocket loss during auto-compact
- Breadcrumb bar height and icon sizes
- Sidebar resize after SPA injection
- Dropdown z-index above BottomActionBar
- Queue message handling during streaming
Security
- JWT error logging improvements for secret rotation
- js-yaml CVE-2025-64718 fix
- Dependency updates with security fixes
Looking Ahead
- Memory Profiling Dashboard: Real-time memory visualization in Control Panel
- Stats Processing Queue: Background queue for large analytics jobs
- Alert Thresholds: Configurable alerts when memory/CPU exceed thresholds
- Process History: Historical view of Claude process spawning patterns
- Auto-Scaling Recommendations: Suggestions based on resource utilization trends
Jargon Buster - Technical Terms Explained
Isolated Worker Process
- A separate program that runs alongside the main server
- Like having an assistant handle heavy lifting in another room
- If the assistant has a problem, it doesn't affect your main office
OOM Killer (Out-of-Memory Killer)
- Linux kernel feature that terminates processes when memory runs out
- The operating system's emergency brake when things get out of control
- Our worker process takes the hit instead of the main server
Checkpoint Persistence
- Saving progress periodically so work isn't lost if interrupted
- Like auto-save in a word processor, but for data processing
- Enables resume from last checkpoint instead of starting over
IPC (Inter-Process Communication)
- How separate programs talk to each other
- Worker sends progress updates to main server via IPC
- Enables real-time progress reporting without sharing memory
Garbage Collection (GC)
- Automatic cleanup of memory that's no longer needed
- The worker explicitly triggers GC between files to reclaim memory
--expose-gcflag allows manual garbage collection calls
Byte Offset
- Exact position in a file measured in bytes
- Enables precise resume: "start reading from byte 10,485,760"
- More accurate than timestamp-based resume
Temporal Dead Zone (TDZ)
- JavaScript error when using variables before they're declared
- ESLint now catches these at build time instead of runtime
- Prevents mysterious crashes in production
Heap Memory
- The memory space where JavaScript stores objects
--max-old-space-size=1024limits heap to 1GB- Worker monitors heap usage to stop before hitting limits
Claude Process
- A running instance of Claude CLI handling a conversation
- Multiple processes can run concurrently for different users
- Health endpoint now tracks all active processes and their memory
Host Activity
- Aggregated view of resource usage across infrastructure hosts
- Shows which servers have memory pressure from concurrent AI jobs
- Enables capacity planning decisions
Thanks for upgrading. This release makes Sasha dramatically more reliable under heavy workloads—crashes from memory exhaustion are now prevented by design, and you have full visibility into where resources are being consumed.