Sasha Studio Release Notes: v1.0.1025 to v1.0.1068

Release Period: January 2026
Version Range: 1.0.1025 → 1.0.1068
Total Commits: 111
Actual Code Changes: 177 files changed, 29,013 insertions, 4,733 deletions

TL;DR - Business Summary

What's New in Plain English:

This release focuses on memory management and system observability—making Sasha dramatically more reliable under heavy workloads:

Crash-Proof Stats Processing - Heavy analytics workloads that used to crash your server now run in an isolated process. If something goes wrong, your main Sasha instance keeps running.
Claude Process Visibility - See exactly how many AI processes are running, how much memory they're using, and which hosts are under pressure. The Control Panel now shows real-time memory monitoring across all your deployments.
Orphaned Process Cleanup - Scheduled jobs that time out no longer leave zombie processes consuming memory. They're automatically cleaned up.
Incremental Progress That Survives Crashes - Stats processing now saves checkpoints every 30 seconds. If interrupted, it continues from where it left off instead of starting over.
Memory Guards Everywhere - File uploads, stats refreshes, and background jobs all have configurable memory limits with graceful degradation when limits are approached.

Business Value:

Uptime: Server crashes from memory exhaustion are now prevented by design
Visibility: Real-time monitoring shows exactly where memory pressure exists
Recovery: Interrupted jobs resume automatically instead of restarting from scratch
Capacity Planning: Host activity dashboard enables proactive infrastructure decisions

Executive Summary

This release represents a major investment in production stability and operational visibility. The centerpiece is a completely redesigned stats refresh system that addresses a critical issue: the previous architecture could crash entire production servers when processing large event backlogs.

The Problem: Stats refresh operations ran in the main Node.js server process. When processing hundreds of thousands of events (like the 260,000 events in a production ESOP deployment), memory accumulation could trigger the Linux OOM killer, crashing the entire server. Since no checkpoint was saved until completion, the server would restart and immediately face the same backlog—creating an infinite crash loop.

The Solution: Stats processing now runs in an isolated child process with strict memory limits. The worker saves checkpoints every 30 seconds or every 5,000 events. If the worker crashes or times out, the main server continues running, and the next attempt continues from the last checkpoint instead of starting over. This architecture makes gradual progress even when facing massive backlogs.

Beyond stats processing, this release adds comprehensive Claude process visibility—the health endpoint now reports active Claude processes with per-process memory usage via /proc/[pid]/status. The Control Panel aggregates this data across all deployments, enabling capacity planning and identifying which infrastructure hosts have memory pressure from concurrent AI jobs.

Additional memory protections include: file upload size limits to prevent memory exhaustion during markdown conversion, scheduled job cleanup to eliminate orphaned Claude processes, and ESLint rules to catch code patterns that cause silent runtime failures.

Major Features & Improvements

Memory-Safe Stats Refresh Architecture

Isolated Worker Process - Stats refresh now runs in a separate Node.js process with --max-old-space-size=1024 limit
Checkpoint Persistence - Progress saved every 30 seconds or 5,000 events to survive crashes
Time-Windowed Processing - Events processed in 1-hour chunks with memory cleared between windows
IPC Progress Reporting - Real-time progress updates via inter-process communication
Graceful Degradation - Worker can crash safely; main server continues serving requests
Memory Threshold Checks - Worker monitors heap usage every 500 lines, stops at 800MB
Explicit GC Between Files - Forces garbage collection between processing files when available

Claude Process Observability

Process Stats in Health Endpoint - /api/health now includes claudeProcesses section with:
- Active process count and list
- Per-process memory usage from /proc/[pid]/status
- Peak concurrent processes since server start
- Total processes spawned count
Host Activity Dashboard - New Control Panel page aggregating Claude process memory by infrastructure host
Scheduled Job Markers - Jobs tagged with isScheduled and projectName for observability
Process Lifecycle Logging - Start/end timestamps with duration and concurrent count

Memory Exhaustion Prevention

Upload Size Limits - Configurable max conversion size (default 50MB) for file uploads
Stats Refresh Guards - Multiple protection layers:
- 14-day maximum window for non-incremental refreshes
- 512MB memory threshold with periodic checks
- 100,000 event hard limit per refresh
- Configurable maxDays parameter
Orphaned Process Cleanup - Scheduled jobs that timeout automatically abort Claude sessions
Safeguard Cleanup - Claude CLI close/error handlers ensure process map entries are deleted

Incremental Stats with Byte Offsets

Byte-Level Resume - Processing continues from exact byte offset, not just timestamps
File Truncation Detection - Automatically resets to full re-read if file shrinks
1KB Overlap Safety - Reads 1KB before last offset to handle partial line boundaries

Control Panel Enhancements

Client Stats Page - Health metrics overview with uPlot sparkline visualizations
Deployment Queue System - Bulk deploy management with pagination
Secure Health Check Endpoint - Client monitoring via authenticated health API
Last Rollup Timestamp Display - Shows when stats were last aggregated

Developer Experience

ESLint with TDZ Detection - Catches temporal dead zone errors at build time
Comprehensive Debug Logging - Enhanced diagnostics for streaming and session issues
Debug-Sasha Skill - New skill for diagnosing deployed Sasha instances

Stability & Reliability

Stats Processing

Checkpoint Saves During Processing - Survives timeouts and crashes mid-processing
Worker Timeout Handling - 10-minute timeout kills hung workers with SIGKILL
Error State Tracking - memoryLimited, eventLimited, timeLimited flags in response
Stats Refresh Log - Tracks running/completed/failed refresh attempts

Streaming & Sessions

JSON.stringify for Object Content - Prevents [object Object] in streamed messages
Haiku Thinking Error Handling - Immediate response to prevent stuck streaming
Null SessionId Protection - Prevents message drops for new conversations
Auto-Compact Socket Recovery - Prevents socket loss during Claude auto-compact

File Operations

PDF Icon Maps - Comprehensive icon mapping for PDF and DOCX conversion
Phosphor Web Font - Consistent icon rendering without emoji replacements
SPA Size Checks - Prevents index.md from overwriting SPA shell
Sidebar Resize - Proper initialization after SPA DOM injection

Security

JWT Error Logging - Improved logging for secret rotation scenarios
js-yaml Update - CVE-2025-64718 fix for prototype pollution
Dependency Updates - Security fixes across diff, better-sqlite3, undici, esbuild, vitest

Developer Experience & Docs

Documentation

Stats Refresh Memory Protection - Architecture document explaining the worker process design
Health Monitoring Architecture - Three-tier response system and security model
Dynamic Prompt Memory - Design document for variant testing system

New Skills

debug-sasha - Comprehensive debugging skill for deployed instances
check-container-db - Query SQLite on running containers via SSH

Testing & Debugging

htop in Docker - Container memory debugging support
Console Message Capture - Network request logging via Playwright MCP
Stuck Button Enhancement - Diagnostic logging and stream cleanup

Upgrade Notes

Memory Configuration

Worker Memory Limit: Stats worker runs with 1024MB limit by default
Upload Size Limit: Configure max conversion size in Settings → General → File Management
No User Action Required: Memory protections are automatic

Stats Processing Changes

Automatic Resume: If stats refresh was interrupted, it continues from checkpoint
Gradual Progress: Large backlogs are processed incrementally across multiple runs
Worker Mode Default: Use {"useWorker": false} only for debugging

Breaking Changes

None: All changes are backward compatible

Changelog Summary (since v1.0.1025)

Features

Isolated worker process for memory-safe stats refresh
Claude process visibility in health endpoint and control panel
Incremental stats refresh with byte offsets
Deployment queue system for bulk deploys
Client Stats page with sparkline visualizations
ESLint with TDZ detection
Comprehensive debug logging for issue #376
Debug-sasha skill for instance troubleshooting
Toast feedback for document download buttons
History search with richer context display
Postmark MCP: FilePath, CC, BCC, ReplyTo, metadata, headers support

Fixes

Memory exhaustion prevention on large file uploads
Orphaned Claude process cleanup on job timeout
Checkpoint saves during stats processing to survive timeouts
Streaming null sessionId message drops
PDF/DOCX icon mapping and text density
Haiku thinking error handling
WebSocket loss during auto-compact
Breadcrumb bar height and icon sizes
Sidebar resize after SPA injection
Dropdown z-index above BottomActionBar
Queue message handling during streaming

Security

JWT error logging improvements for secret rotation
js-yaml CVE-2025-64718 fix
Dependency updates with security fixes

Looking Ahead

Memory Profiling Dashboard: Real-time memory visualization in Control Panel
Stats Processing Queue: Background queue for large analytics jobs
Alert Thresholds: Configurable alerts when memory/CPU exceed thresholds
Process History: Historical view of Claude process spawning patterns
Auto-Scaling Recommendations: Suggestions based on resource utilization trends

Jargon Buster - Technical Terms Explained

Isolated Worker Process

A separate program that runs alongside the main server
Like having an assistant handle heavy lifting in another room
If the assistant has a problem, it doesn't affect your main office

OOM Killer (Out-of-Memory Killer)

Linux kernel feature that terminates processes when memory runs out
The operating system's emergency brake when things get out of control
Our worker process takes the hit instead of the main server

Checkpoint Persistence

Saving progress periodically so work isn't lost if interrupted
Like auto-save in a word processor, but for data processing
Enables resume from last checkpoint instead of starting over

IPC (Inter-Process Communication)

How separate programs talk to each other
Worker sends progress updates to main server via IPC
Enables real-time progress reporting without sharing memory

Garbage Collection (GC)

Automatic cleanup of memory that's no longer needed
The worker explicitly triggers GC between files to reclaim memory
--expose-gc flag allows manual garbage collection calls

Byte Offset

Exact position in a file measured in bytes
Enables precise resume: "start reading from byte 10,485,760"
More accurate than timestamp-based resume

Temporal Dead Zone (TDZ)

JavaScript error when using variables before they're declared
ESLint now catches these at build time instead of runtime
Prevents mysterious crashes in production

Heap Memory

The memory space where JavaScript stores objects
--max-old-space-size=1024 limits heap to 1GB
Worker monitors heap usage to stop before hitting limits

Claude Process

A running instance of Claude CLI handling a conversation
Multiple processes can run concurrently for different users
Health endpoint now tracks all active processes and their memory

Host Activity

Aggregated view of resource usage across infrastructure hosts
Shows which servers have memory pressure from concurrent AI jobs
Enables capacity planning decisions

Thanks for upgrading. This release makes Sasha dramatically more reliable under heavy workloads—crashes from memory exhaustion are now prevented by design, and you have full visibility into where resources are being consumed.