Sasha Studio - Local LLM Administration

System Overview

System Health

Optimal

All services running normally

Active Models

llama3:8b, mistral:7b, codellama:13b, phi3:mini

Response Time

1.2s

Average response time (last hour)

Memory Usage

68%

43.5 GB / 64 GB total

GPU Utilization

84%

NVIDIA RTX 4090 - 20.2 GB / 24 GB

Cost Savings

87%

vs. cloud API costs (this month)

Recent Alerts

⚠️

High GPU Memory Usage

GPU memory usage at 84% - consider unloading unused models

5 min ago

ℹ️

Model Update Available

New version of llama3:8b available - improved performance

2 hours ago

Model Management

Llama 3 8B

General purpose model - fast and balanced

Running

Parameters

8.0B

Size

4.7 GB

Context

8,192

Speed

Fast

Mistral 7B

Long context specialist - great for analysis

Running

Parameters

7.3B

Size

4.1 GB

Context

32,768

Speed

Medium

CodeLlama 13B

Code generation and debugging specialist

Running

Parameters

13.0B

Size

7.3 GB

Context

16,384

Speed

Medium

Llama 3 70B

High-quality responses - requires GPU

Stopped

Parameters

70.6B

Size

39.9 GB

Context

8,192

Speed

Slow

Phi-3 Mini

Ultra-fast responses for simple tasks

Running

Parameters

3.8B

Size

2.2 GB

Context

4,096

Speed

Very Fast

Add New Model

Download from Ollama library or import custom model

Resource Monitor

System Resources Over Time

Real-time resource monitoring

Current Usage

CPU Usage 23%

Memory Usage 43.5 GB / 64 GB

GPU Memory 20.2 GB / 24 GB

GPU Utilization 84%

Disk Usage 156 GB / 500 GB

Network I/O 12.3 MB/s

Temperature 67°C

Active Requests 3

Configuration

Model Settings

Default Model

Max Concurrent Requests

Auto-load Models

Start essential models on boot

Model Cache Size (GB)

Resource Limits

Maximum Memory Usage (%)

80%

GPU Memory Allocation (%)

90%

CPU Threads

Priority Mode

Security & Access

API Authentication

Require API key for access

Rate Limiting

Request Logging

Log all API requests

Network Interface

Monitoring & Alerts

Memory Alert Threshold (%)

GPU Alert Threshold (%)

Email Notifications

Send email alerts

Metrics Retention (days)

Alerts & System Logs

Active Alerts

⚠️

High GPU Memory Usage

GPU memory at 84% (20.2 GB / 24 GB) - consider stopping unused models

5 minutes ago

ℹ️

Model Update Available

llama3:8b v2.1 available with 15% performance improvement

2 hours ago

ℹ️

Scheduled Maintenance

System maintenance window scheduled for tonight 2:00 AM - 4:00 AM

1 day ago

System Logs

2024-01-05 14:32:15 [INFO] Ollama service started successfully
2024-01-05 14:32:16 [INFO] Model llama3:8b loaded (4.7 GB)
2024-01-05 14:32:18 [INFO] Model mistral:7b loaded (4.1 GB)
2024-01-05 14:32:20 [INFO] Model codellama:13b loaded (7.3 GB)
2024-01-05 14:32:22 [INFO] Model phi3:mini loaded (2.2 GB)
2024-01-05 14:32:25 [INFO] Sasha Studio API server listening on port 80
2024-01-05 14:35:12 [INFO] Chat request processed (llama3:8b, 1.2s response time)
2024-01-05 14:36:45 [INFO] Chat request processed (mistral:7b, 0.8s response time)
2024-01-05 14:38:23 [WARN] GPU memory usage: 84% (20.2 GB / 24 GB)
2024-01-05 14:39:15 [INFO] Chat request processed (codellama:13b, 1.5s response time)
2024-01-05 14:41:08 [INFO] System metrics collected - all services healthy
2024-01-05 14:42:30 [INFO] Model update check completed - 1 update available
2024-01-05 14:43:15 [INFO] Chat request processed (phi3:mini, 0.3s response time)

System Overview

Recent Alerts

Model Management Add Model

Resource Monitor

System Resources Over Time

Current Usage

Configuration

Model Settings

Resource Limits

Security & Access

Monitoring & Alerts

Alerts & System Logs

Active Alerts

System Logs Export Logs

Model Management

System Logs