# 4. Data Model

## Overview

Sasha Studio uses **SQLite** via `better-sqlite3` (synchronous API). Single database file at `DB_PATH` (default: `/app/data/sasha.db` in Docker, `../data/sasha.db` locally).

**Schema source:** `claudecodeui/server/database/init.sql` + migrations in `db.js`.

**Database settings:**
- `PRAGMA foreign_keys = ON`
- `PRAGMA busy_timeout = 10000` (for NFS/EFS compatibility)
- `journal_mode = DELETE` (not WAL -- safer on network filesystems)
- `RESET_DATABASE=true` env var triggers full schema rebuild

## Entity Relationship Diagram

```
users (1) ──────── (N) company_profiles
  │                       │
  │                       ├── (N) onboarding_documents
  │                       └── (N) research_documents
  │
  ├── (N) sessions ──── (N) raw_exchanges ──── (N) usage_events
  │         │                    │
  │         │                    ├── (N) response_quality_signals
  │         │                    └── (N) insights
  │         │
  │         └── (N) instruction_failures
  │
  ├── (N) api_keys ──── (N) api_key_usage
  │
  ├── (N) schedules ──── (N) execution_history
  │
  ├── (N) executions (unified log)
  │
  ├── (N) hook_events
  │
  └── (N) feedback_events

cloud_providers (1) ── (N) cloud_connections (1) ── (1) cloud_credentials
                                │
                                └── (N) cloud_mounts ── (N) cloud_mount_events

bedrock_config (single row)
bedrock_model_patterns (cache)
bedrock_context_audit (log)

system_config (key-value)
named_secrets (encrypted)
prompt_versions (versioned prompts)
stats_refresh_log (reporting)
hook_ingest_state (checkpoint)
```

## Tables

### Core: users

| Column | Type | Constraints | Description |
|--------|------|------------|-------------|
| id | INTEGER | PK AUTOINCREMENT | |
| username | TEXT | UNIQUE NOT NULL | Login identifier |
| password_hash | TEXT | NOT NULL | bcrypt hash |
| created_at | DATETIME | DEFAULT CURRENT_TIMESTAMP | |
| last_login | DATETIME | | Updated on each login |
| is_active | BOOLEAN | DEFAULT 1 | Account enabled |
| is_admin | BOOLEAN | DEFAULT 0 | Admin privileges |

**Indexes:** `idx_users_username`, `idx_users_active`

**Notes:** First registered user automatically gets `is_admin = 1`. No self-registration after first user.

### Core: company_profiles

| Column | Type | Constraints | Description |
|--------|------|------------|-------------|
| id | INTEGER | PK AUTOINCREMENT | |
| user_id | INTEGER | FK users(id) NOT NULL | Owner |
| project_id | VARCHAR(36) | UNIQUE | UUID for multi-project support |
| workspace_path | TEXT | | Filesystem path for project |
| search_term | TEXT | | Company name or URL used for research |
| search_type | TEXT | | How company was identified |
| company_name | TEXT | | |
| company_url | TEXT | | |
| industry | TEXT | | |
| company_size | TEXT | | |
| founded_year | INTEGER | | |
| headquarters | TEXT | | |
| description | TEXT | | |
| key_people | JSON | | |
| key_customers | JSON | | |
| key_partners | JSON | | |
| competitors | JSON | | |
| onboarding_completed | BOOLEAN | DEFAULT 0 | |
| onboarding_method | TEXT | DEFAULT 'documents' | |
| research_status | TEXT | DEFAULT 'pending' | pending → researching → completed |
| created_at | DATETIME | DEFAULT CURRENT_TIMESTAMP | |
| updated_at | DATETIME | DEFAULT CURRENT_TIMESTAMP | |

### Core: system_config

Key-value store for application settings.

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| config_key | TEXT | UNIQUE NOT NULL |
| config_value | TEXT | JSON or plain text |
| updated_at | DATETIME | |
| updated_by | INTEGER | FK users(id) |

**Known keys:** AI provider settings, upload limits, general settings.

### Auth: password_resets

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| user_id | INTEGER | FK users(id) |
| token_hash | TEXT | SHA-256 hash of reset token |
| expires_at | DATETIME | Token expiration |
| used_at | DATETIME | NULL until used (single-use) |
| created_at | DATETIME | |
| created_ip | TEXT | IP that requested reset |
| user_agent | TEXT | Browser user agent |

### Auth: api_keys

| Column | Type | Description |
|--------|------|-------------|
| id | TEXT | PK, format: `sk_...` |
| key_hash | TEXT | bcrypt hash of full key |
| signing_secret | TEXT | HMAC secret for callback verification |
| name | TEXT | Human-readable label |
| user_id | INTEGER | FK users(id), owner |
| callback_url | TEXT | Default webhook callback URL |
| created_at | TEXT | |
| last_used_at | TEXT | Updated on each use |
| is_active | INTEGER | DEFAULT 1 |

### Auth: api_key_usage

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| api_key_id | TEXT | FK api_keys(id) |
| meeting_id | TEXT | |
| event_type | TEXT | start, stop, callback_delivery, callback_failure |
| timestamp | TEXT | |
| metadata | TEXT | JSON |

### Activity: sessions

| Column | Type | Description |
|--------|------|-------------|
| id | TEXT | PK (UUID) |
| started_at | TEXT | ISO timestamp |
| ended_at | TEXT | |
| updated_at | TEXT | |
| user_id | INTEGER | FK users(id) |
| project_id | TEXT | |
| model | TEXT | Claude model used |
| system_prompt | TEXT | System prompt snapshot |
| claude_md_snapshot | TEXT | CLAUDE.md content at session start |
| meta_json | TEXT | Additional session metadata |
| output_style_name | TEXT | Applied output style |
| output_style_version_id | INTEGER | |

### Activity: raw_exchanges

| Column | Type | Description |
|--------|------|-------------|
| id | TEXT | PK (UUID) |
| ts | TEXT | ISO timestamp |
| session_id | TEXT | FK sessions(id) |
| user_id | INTEGER | FK users(id) |
| role | TEXT | 'user' or 'assistant' |
| content | TEXT | Message content |
| tool_names | TEXT | Comma-separated tool names used |
| is_correction | BOOLEAN | User corrected previous response |
| is_error | BOOLEAN | Response was an error |
| bash_exit_code | INTEGER | Exit code if bash command |
| files_touched | TEXT | Files modified in this exchange |
| tags | TEXT | |
| entities | TEXT | |

### Activity: usage_events

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| user_id | INTEGER | FK users(id) |
| session_id | TEXT | FK sessions(id) |
| exchange_id | TEXT | FK raw_exchanges(id) |
| provider | TEXT | 'anthropic' or 'bedrock' |
| model_id | TEXT | Specific model identifier |
| input_tokens | INTEGER | |
| output_tokens | INTEGER | |
| thinking_tokens | INTEGER | Extended thinking tokens |
| cache_read_tokens | INTEGER | Prompt cache reads |
| cache_write_tokens | INTEGER | Prompt cache writes |
| cost_usd | REAL | Calculated cost |
| latency_ms | INTEGER | Response latency |
| created_at | TEXT | |

### Quality: response_quality_signals

Detailed per-response quality analysis.

| Column | Type | Description |
|--------|------|-------------|
| exchange_id | TEXT | FK raw_exchanges(id), UNIQUE |
| session_id | TEXT | FK sessions(id) |
| prompt_chars / response_chars | INTEGER | Character counts |
| prompt_tokens / response_tokens | INTEGER | Token counts |
| hedging_score | INTEGER | 0-100, how much hedging language |
| certainty_score | INTEGER | 0-100, confidence level |
| asks_clarification | INTEGER | 0/1 |
| truncation_detected | INTEGER | 0/1 |
| code_blocks_count | INTEGER | |
| hallucination_risk_score | INTEGER | 0-100 |
| response_primary_type | TEXT | Classification |
| quality_score | INTEGER | Overall 0-100 |
| output_style_name | TEXT | |
| user_prompt_version_id | INTEGER | FK prompt_versions |

### Quality: feedback_events

| Column | Type | Description |
|--------|------|-------------|
| id | TEXT | PK |
| response_id | TEXT | |
| session_id | TEXT | |
| feedback | TEXT | CHECK: 'up', 'down', 'neutral' |
| reason | TEXT | Free-text explanation |
| model_id | TEXT | |
| user_id | INTEGER | |
| created_at | TEXT | |

**Constraint:** `UNIQUE(response_id, user_id)` -- one feedback per user per response.

### Scheduling: schedules

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| name | TEXT | UNIQUE, human-readable |
| prompt_name | TEXT | Skill/prompt to execute |
| cron_expression | TEXT | NULL = on-demand only |
| timezone | TEXT | DEFAULT 'UTC' |
| enabled | INTEGER | DEFAULT 1 |
| timeout_ms | INTEGER | DEFAULT 300000 (5 min) |
| last_run_at | TEXT | |
| last_status | TEXT | |
| created_by | INTEGER | FK users(id) |

### Scheduling: execution_history

| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | PK |
| execution_id | TEXT | UNIQUE, deterministic hash |
| schedule_id | INTEGER | FK schedules(id) |
| schedule_name | TEXT | Denormalized for display |
| status | TEXT | running, completed, failed, timeout, stale |
| trigger_type | TEXT | scheduled, manual, catchup |
| scheduled_for | TEXT | Time slot for this execution |
| started_at | TEXT | |
| completed_at | TEXT | |
| duration_ms | INTEGER | |
| output_file | TEXT | Path to output |
| result_json | TEXT | Structured result |
| input_tokens / output_tokens | INTEGER | |

**Idempotency:** `UNIQUE(schedule_id, scheduled_for)` prevents duplicate executions.

### Scheduling: executions (unified log)

Unified execution log for all execution types (schedule, meeting, API task).

| Column | Type | Description |
|--------|------|-------------|
| execution_type | TEXT | schedule, meeting, api_task |
| name | TEXT | Task/meeting name |
| status | TEXT | running, completed, failed, timeout, stale, cancelled |
| trigger_type | TEXT | scheduled, manual, catchup, api |
| api_key_id | TEXT | FK api_keys(id), for API-triggered |
| context_json | TEXT | Type-specific context |
| idempotency_key | TEXT | UNIQUE WHERE NOT NULL |

### Cloud: cloud_providers

| Column | Type | Description |
|--------|------|-------------|
| type | TEXT | CHECK: 'onedrive', 'sharepoint', 'gdrive' |
| display_name | TEXT | |
| client_id | TEXT | OAuth client ID |
| scopes | TEXT | OAuth scopes |

### Cloud: cloud_connections

| Column | Type | Description |
|--------|------|-------------|
| provider_id | INTEGER | FK cloud_providers(id) |
| status | TEXT | CHECK: 'pending', 'authorized', 'revoked' |
| scope_level | TEXT | CHECK: 'read', 'read_write' |
| integration_mode | TEXT | CHECK: 'mount', 'api' |
| created_by | INTEGER | FK users(id) |

### Cloud: cloud_credentials

| Column | Type | Description |
|--------|------|-------------|
| connection_id | INTEGER | PK, FK cloud_connections(id) |
| refresh_token_encrypted | TEXT | AES-256-GCM encrypted |
| access_token_encrypted | TEXT | AES-256-GCM encrypted |
| access_token_expires_at | DATETIME | |
| tenant_id | TEXT | Microsoft tenant ID |
| account_email | TEXT | |

### Cloud: cloud_mounts

| Column | Type | Description |
|--------|------|-------------|
| connection_id | INTEGER | FK cloud_connections(id) |
| mount_path | TEXT | Local filesystem mount point |
| remote_name | TEXT | rclone remote name |
| access_mode | TEXT | CHECK: 'read', 'read_write' |
| vfs_profile | TEXT | DEFAULT 'balanced' |
| health_status | TEXT | unmounted, mounting, mounted, degraded, remounting |

### AI Config: bedrock_config

Single-row table (enforced by `CHECK (id = 1)`).

| Column | Type | Description |
|--------|------|-------------|
| api_key_encrypted | TEXT | AES-256-GCM encrypted AWS key |
| region | TEXT | DEFAULT 'us-east-1' |
| active_model | TEXT | Current model ID |
| enable_1m_context | BOOLEAN | Extended context toggle |

### Versioning: prompt_versions

| Column | Type | Description |
|--------|------|-------------|
| scope | TEXT | CHECK: 'user', 'project', 'specialist', 'output_style' |
| scope_id | TEXT | Context-dependent identifier |
| file_path | TEXT | File being versioned |
| version | INTEGER | Incrementing version number |
| content_hash | TEXT | SHA-256 of content |
| content | TEXT | Full content snapshot |
| is_active | INTEGER | Current active version |

**Constraint:** `UNIQUE(scope, scope_id, file_path, version)`

### Hook Analytics

Six tables track Claude Code hook events for analytics:

- **hook_events** -- Raw event records (event_id PK, event type, session, tool)
- **hook_tool_execs** -- Tool execution details (duration, command, success)
- **hook_sessions** -- Session metadata (start/end, source, event count)
- **hook_subagents** -- Subagent tracking (agent type, transcript path)
- **hook_file_accesses** -- File access audit (op, path, size)
- **hook_usage_rollups** -- Pre-aggregated stats (bucket: day/hour/session/tool)
- **hook_command_rollups** -- Command usage patterns (chain analysis)
- **hook_ingest_state** -- Checkpoint for incremental JSONL processing

### Secrets: named_secrets

| Column | Type | Description |
|--------|------|-------------|
| name | TEXT | Display name |
| normalized_name | TEXT | UNIQUE, lowercased/trimmed |
| value_encrypted | TEXT | AES-256-GCM encrypted |
| metadata | JSON | Usage context, tags |
| last_accessed_at | DATETIME | |

### Monitoring: stats_refresh_log

Tracks analytics refresh operations.

| Column | Type | Description |
|--------|------|-------------|
| triggered_by | TEXT | ui_button, sasha_internal, api_external, scheduled |
| status | TEXT | running, completed, failed |
| hook_events_processed | INTEGER | |
| incremental | BOOLEAN | Incremental vs full refresh |

## Migration System

Migrations are applied in `db.js` at startup. They use `ALTER TABLE` and `CREATE TABLE IF NOT EXISTS` patterns. No down-migration support.

Key migrations:
1. Add `is_admin` column to users
2. Create `usage_events` table (replaces deprecated `bedrock_usage`)
3. Create `feedback_events` table
4. Expand `prompt_versions` scope to include 'output_style'
5. Add output style tracking columns and indexes
6. Make `schedules.cron_expression` nullable (on-demand tasks)
7. Add extended context columns for Bedrock

## Sensitive Fields

| Table | Field | Protection |
|-------|-------|-----------|
| users | password_hash | bcrypt |
| bedrock_config | api_key_encrypted | AES-256-GCM |
| cloud_credentials | refresh_token_encrypted | AES-256-GCM |
| cloud_credentials | access_token_encrypted | AES-256-GCM |
| named_secrets | value_encrypted | AES-256-GCM |
| api_keys | key_hash | bcrypt |
| api_keys | signing_secret | Plaintext (server-side only) |
| password_resets | token_hash | SHA-256 |
