Last updated: Feb 1, 2026, 11:23 AM UTC

Sasha Studio Release Notes: v1.0.1025 to v1.0.1068

Release Period: January 2026
Version Range: 1.0.1025 → 1.0.1068
Total Commits: 111
Actual Code Changes: 177 files changed, 29,013 insertions, 4,733 deletions


TL;DR - Business Summary

What's New in Plain English:

This release focuses on memory management and system observability—making Sasha dramatically more reliable under heavy workloads:

  1. Crash-Proof Stats Processing - Heavy analytics workloads that used to crash your server now run in an isolated process. If something goes wrong, your main Sasha instance keeps running.

  2. Claude Process Visibility - See exactly how many AI processes are running, how much memory they're using, and which hosts are under pressure. The Control Panel now shows real-time memory monitoring across all your deployments.

  3. Orphaned Process Cleanup - Scheduled jobs that time out no longer leave zombie processes consuming memory. They're automatically cleaned up.

  4. Incremental Progress That Survives Crashes - Stats processing now saves checkpoints every 30 seconds. If interrupted, it continues from where it left off instead of starting over.

  5. Memory Guards Everywhere - File uploads, stats refreshes, and background jobs all have configurable memory limits with graceful degradation when limits are approached.

Business Value:

  • Uptime: Server crashes from memory exhaustion are now prevented by design
  • Visibility: Real-time monitoring shows exactly where memory pressure exists
  • Recovery: Interrupted jobs resume automatically instead of restarting from scratch
  • Capacity Planning: Host activity dashboard enables proactive infrastructure decisions

Executive Summary

This release represents a major investment in production stability and operational visibility. The centerpiece is a completely redesigned stats refresh system that addresses a critical issue: the previous architecture could crash entire production servers when processing large event backlogs.

The Problem: Stats refresh operations ran in the main Node.js server process. When processing hundreds of thousands of events (like the 260,000 events in a production ESOP deployment), memory accumulation could trigger the Linux OOM killer, crashing the entire server. Since no checkpoint was saved until completion, the server would restart and immediately face the same backlog—creating an infinite crash loop.

The Solution: Stats processing now runs in an isolated child process with strict memory limits. The worker saves checkpoints every 30 seconds or every 5,000 events. If the worker crashes or times out, the main server continues running, and the next attempt continues from the last checkpoint instead of starting over. This architecture makes gradual progress even when facing massive backlogs.

Beyond stats processing, this release adds comprehensive Claude process visibility—the health endpoint now reports active Claude processes with per-process memory usage via /proc/[pid]/status. The Control Panel aggregates this data across all deployments, enabling capacity planning and identifying which infrastructure hosts have memory pressure from concurrent AI jobs.

Additional memory protections include: file upload size limits to prevent memory exhaustion during markdown conversion, scheduled job cleanup to eliminate orphaned Claude processes, and ESLint rules to catch code patterns that cause silent runtime failures.


Major Features & Improvements

Memory-Safe Stats Refresh Architecture

  • Isolated Worker Process - Stats refresh now runs in a separate Node.js process with --max-old-space-size=1024 limit
  • Checkpoint Persistence - Progress saved every 30 seconds or 5,000 events to survive crashes
  • Time-Windowed Processing - Events processed in 1-hour chunks with memory cleared between windows
  • IPC Progress Reporting - Real-time progress updates via inter-process communication
  • Graceful Degradation - Worker can crash safely; main server continues serving requests
  • Memory Threshold Checks - Worker monitors heap usage every 500 lines, stops at 800MB
  • Explicit GC Between Files - Forces garbage collection between processing files when available

Claude Process Observability

  • Process Stats in Health Endpoint - /api/health now includes claudeProcesses section with:
    • Active process count and list
    • Per-process memory usage from /proc/[pid]/status
    • Peak concurrent processes since server start
    • Total processes spawned count
  • Host Activity Dashboard - New Control Panel page aggregating Claude process memory by infrastructure host
  • Scheduled Job Markers - Jobs tagged with isScheduled and projectName for observability
  • Process Lifecycle Logging - Start/end timestamps with duration and concurrent count

Memory Exhaustion Prevention

  • Upload Size Limits - Configurable max conversion size (default 50MB) for file uploads
  • Stats Refresh Guards - Multiple protection layers:
    • 14-day maximum window for non-incremental refreshes
    • 512MB memory threshold with periodic checks
    • 100,000 event hard limit per refresh
    • Configurable maxDays parameter
  • Orphaned Process Cleanup - Scheduled jobs that timeout automatically abort Claude sessions
  • Safeguard Cleanup - Claude CLI close/error handlers ensure process map entries are deleted

Incremental Stats with Byte Offsets

  • Byte-Level Resume - Processing continues from exact byte offset, not just timestamps
  • File Truncation Detection - Automatically resets to full re-read if file shrinks
  • 1KB Overlap Safety - Reads 1KB before last offset to handle partial line boundaries

Control Panel Enhancements

  • Client Stats Page - Health metrics overview with uPlot sparkline visualizations
  • Deployment Queue System - Bulk deploy management with pagination
  • Secure Health Check Endpoint - Client monitoring via authenticated health API
  • Last Rollup Timestamp Display - Shows when stats were last aggregated

Developer Experience

  • ESLint with TDZ Detection - Catches temporal dead zone errors at build time
  • Comprehensive Debug Logging - Enhanced diagnostics for streaming and session issues
  • Debug-Sasha Skill - New skill for diagnosing deployed Sasha instances

Stability & Reliability

Stats Processing

  • Checkpoint Saves During Processing - Survives timeouts and crashes mid-processing
  • Worker Timeout Handling - 10-minute timeout kills hung workers with SIGKILL
  • Error State Tracking - memoryLimited, eventLimited, timeLimited flags in response
  • Stats Refresh Log - Tracks running/completed/failed refresh attempts

Streaming & Sessions

  • JSON.stringify for Object Content - Prevents [object Object] in streamed messages
  • Haiku Thinking Error Handling - Immediate response to prevent stuck streaming
  • Null SessionId Protection - Prevents message drops for new conversations
  • Auto-Compact Socket Recovery - Prevents socket loss during Claude auto-compact

File Operations

  • PDF Icon Maps - Comprehensive icon mapping for PDF and DOCX conversion
  • Phosphor Web Font - Consistent icon rendering without emoji replacements
  • SPA Size Checks - Prevents index.md from overwriting SPA shell
  • Sidebar Resize - Proper initialization after SPA DOM injection

Security

  • JWT Error Logging - Improved logging for secret rotation scenarios
  • js-yaml Update - CVE-2025-64718 fix for prototype pollution
  • Dependency Updates - Security fixes across diff, better-sqlite3, undici, esbuild, vitest

Developer Experience & Docs

Documentation

  • Stats Refresh Memory Protection - Architecture document explaining the worker process design
  • Health Monitoring Architecture - Three-tier response system and security model
  • Dynamic Prompt Memory - Design document for variant testing system

New Skills

  • debug-sasha - Comprehensive debugging skill for deployed instances
  • check-container-db - Query SQLite on running containers via SSH

Testing & Debugging

  • htop in Docker - Container memory debugging support
  • Console Message Capture - Network request logging via Playwright MCP
  • Stuck Button Enhancement - Diagnostic logging and stream cleanup

Upgrade Notes

Memory Configuration

  • Worker Memory Limit: Stats worker runs with 1024MB limit by default
  • Upload Size Limit: Configure max conversion size in Settings → General → File Management
  • No User Action Required: Memory protections are automatic

Stats Processing Changes

  • Automatic Resume: If stats refresh was interrupted, it continues from checkpoint
  • Gradual Progress: Large backlogs are processed incrementally across multiple runs
  • Worker Mode Default: Use {"useWorker": false} only for debugging

Breaking Changes

  • None: All changes are backward compatible

Changelog Summary (since v1.0.1025)

Features

  • Isolated worker process for memory-safe stats refresh
  • Claude process visibility in health endpoint and control panel
  • Incremental stats refresh with byte offsets
  • Deployment queue system for bulk deploys
  • Client Stats page with sparkline visualizations
  • ESLint with TDZ detection
  • Comprehensive debug logging for issue #376
  • Debug-sasha skill for instance troubleshooting
  • Toast feedback for document download buttons
  • History search with richer context display
  • Postmark MCP: FilePath, CC, BCC, ReplyTo, metadata, headers support

Fixes

  • Memory exhaustion prevention on large file uploads
  • Orphaned Claude process cleanup on job timeout
  • Checkpoint saves during stats processing to survive timeouts
  • Streaming null sessionId message drops
  • PDF/DOCX icon mapping and text density
  • Haiku thinking error handling
  • WebSocket loss during auto-compact
  • Breadcrumb bar height and icon sizes
  • Sidebar resize after SPA injection
  • Dropdown z-index above BottomActionBar
  • Queue message handling during streaming

Security

  • JWT error logging improvements for secret rotation
  • js-yaml CVE-2025-64718 fix
  • Dependency updates with security fixes

Looking Ahead

  • Memory Profiling Dashboard: Real-time memory visualization in Control Panel
  • Stats Processing Queue: Background queue for large analytics jobs
  • Alert Thresholds: Configurable alerts when memory/CPU exceed thresholds
  • Process History: Historical view of Claude process spawning patterns
  • Auto-Scaling Recommendations: Suggestions based on resource utilization trends

Jargon Buster - Technical Terms Explained

Isolated Worker Process

  • A separate program that runs alongside the main server
  • Like having an assistant handle heavy lifting in another room
  • If the assistant has a problem, it doesn't affect your main office

OOM Killer (Out-of-Memory Killer)

  • Linux kernel feature that terminates processes when memory runs out
  • The operating system's emergency brake when things get out of control
  • Our worker process takes the hit instead of the main server

Checkpoint Persistence

  • Saving progress periodically so work isn't lost if interrupted
  • Like auto-save in a word processor, but for data processing
  • Enables resume from last checkpoint instead of starting over

IPC (Inter-Process Communication)

  • How separate programs talk to each other
  • Worker sends progress updates to main server via IPC
  • Enables real-time progress reporting without sharing memory

Garbage Collection (GC)

  • Automatic cleanup of memory that's no longer needed
  • The worker explicitly triggers GC between files to reclaim memory
  • --expose-gc flag allows manual garbage collection calls

Byte Offset

  • Exact position in a file measured in bytes
  • Enables precise resume: "start reading from byte 10,485,760"
  • More accurate than timestamp-based resume

Temporal Dead Zone (TDZ)

  • JavaScript error when using variables before they're declared
  • ESLint now catches these at build time instead of runtime
  • Prevents mysterious crashes in production

Heap Memory

  • The memory space where JavaScript stores objects
  • --max-old-space-size=1024 limits heap to 1GB
  • Worker monitors heap usage to stop before hitting limits

Claude Process

  • A running instance of Claude CLI handling a conversation
  • Multiple processes can run concurrently for different users
  • Health endpoint now tracks all active processes and their memory

Host Activity

  • Aggregated view of resource usage across infrastructure hosts
  • Shows which servers have memory pressure from concurrent AI jobs
  • Enables capacity planning decisions

Thanks for upgrading. This release makes Sasha dramatically more reliable under heavy workloads—crashes from memory exhaustion are now prevented by design, and you have full visibility into where resources are being consumed.