AI Costs: Complete Transparency
Understanding Your AI Investment with Sasha
Executive Summary
Zero Markup Pricing
Sasha operates on a direct cost pass-through model. You pay your AI provider (Anthropic or AWS Bedrock) directly at their published rates—no markup, no hidden fees. Your AI costs are completely transparent and under your control.
You provide your own API credentials → You pay your AI provider directly → You see every transaction
How Sasha's AI Pricing Works
API Token-Based Pricing
Unlike traditional software licensing, AI services charge per token—roughly 4 characters or 0.75 words. Every query and response consumes tokens, and you only pay for what you use.
Why Token-Based Pricing?
- Pay-per-use: No monthly minimums, no wasted capacity
- Scalable: Costs grow proportionally with actual usage
- Predictable: Clear pricing tables from AI providers
- Optimizable: Multiple cost-saving strategies available
Two Provider Options
Sasha supports two ways to access AI capabilities, both with direct billing:
Option 1: Direct Anthropic API
Best for: Organizations wanting simplest setup and latest models
- Setup: Provide your Anthropic API key in Sasha admin settings
- Billing: Monthly invoice from Anthropic based on token usage
- Monitoring: Full usage dashboard in Anthropic console
- Security: Direct API connection, encrypted token storage
- Access: Immediate access to newest Claude models
Option 2: AWS Bedrock
Best for: Organizations with existing AWS infrastructure and strict data residency requirements
- Setup: Configure AWS credentials with Bedrock permissions
- Billing: Included in your AWS monthly bill
- Monitoring: CloudWatch metrics and AWS Cost Explorer
- Security: Data processed within your AWS region, never leaves AWS
- Compliance: Regional deployment options for data sovereignty
Actual 2025 AI Pricing
Pricing Improves Over Time
AI models get better and cheaper every year. Unlike traditional software with fixed pricing, you benefit from:
- New models released regularly with better performance at lower cost
- Price reductions as AI providers scale infrastructure (prices have dropped 70% since 2023)
- Instant control to switch models in Sasha settings—no vendor lock-in
- Your choice to upgrade when ready or stick with proven models
You're in control: When a better model launches, simply update your settings and start using it immediately at the new (typically lower) price.
Direct Anthropic Pricing (January 2025)
| Model | Input Tokens | Output Tokens | Best For |
|---|---|---|---|
| Claude 3.5 Sonnet (Recommended) | $3.00 / 1M | $15.00 / 1M | Balanced performance and cost |
| Claude 3.5 Haiku (Economy) | $0.25 / 1M | $1.25 / 1M | High-volume queries, quick responses |
| Claude Opus 4.1 (Premium) | $15.00 / 1M | $75.00 / 1M | Complex reasoning, critical decisions |
Cost-Saving Features:
- Prompt Caching: Up to 90% savings on repeated context
- Batch Processing: 50% discount for non-urgent queries
- Long Context: 200K tokens standard, 1M available with premium pricing
AWS Bedrock Pricing (January 2025)
| Model | Input Tokens | Output Tokens | Batch Discount |
|---|---|---|---|
| Claude 3.5 Sonnet | $3.00 / 1M | $15.00 / 1M | $1.50 / $7.50 (50% off) |
| Claude 3.5 Haiku | $1.00 / 1M | $5.00 / 1M | $0.50 / $2.50 (50% off) |
| Claude Opus | $15.00 / 1M | $75.00 / 1M | $7.50 / $37.50 (50% off) |
Additional AWS Benefits:
- Consolidated Billing: AI costs on same bill as your other AWS services
- Volume Discounts: AWS Enterprise agreements may reduce costs further
- Regional Options: Deploy in specific AWS regions for compliance
- Provisioned Throughput: Dedicated capacity from $22-44/hour for guaranteed performance
Note: Prices vary slightly by AWS region. Check AWS Bedrock pricing page for your region.
Usage Scenarios: Real Cost Examples
Light Usage: 100-500 Queries/Month
Who: Small teams, occasional research, project-based knowledge access
Example: 300 queries per month, average 2,000 input tokens + 1,000 output tokens per query
Using Claude 3.5 Sonnet:
- Input: 300 × 2,000 = 600,000 tokens = 0.6M tokens × $3 = $1.80
- Output: 300 × 1,000 = 300,000 tokens = 0.3M tokens × $15 = $4.50
- Total Monthly Cost: ~$6-8
Medium Usage: 1,000-5,000 Queries/Month
Who: Regular team usage, daily knowledge queries, document analysis
Example: 2,500 queries per month, average 3,000 input + 1,500 output tokens
Using Claude 3.5 Sonnet:
- Input: 2,500 × 3,000 = 7.5M tokens × $3 = $22.50
- Output: 2,500 × 1,500 = 3.75M tokens × $15 = $56.25
- Total Monthly Cost: ~$75-100
With Prompt Caching (typical 50% cache hit rate):
- Cached input: 3.75M × $0.30 = $11.25 (90% savings)
- Fresh input: 3.75M × $3 = $11.25
- Output: $56.25
- Optimized Total: ~$75-80 (30% savings)
Heavy Usage: 10,000+ Queries/Month
Who: Enterprise-wide deployment, multiple teams, continuous AI assistance
Example: 15,000 queries per month, average 4,000 input + 2,000 output tokens
Using Claude 3.5 Sonnet:
- Input: 15,000 × 4,000 = 60M tokens × $3 = $180
- Output: 15,000 × 2,000 = 30M tokens × $15 = $450
- Total Monthly Cost: ~$600-650
With Prompt Caching + Batch Processing:
- Cached input (50%): 30M × $0.30 = $9
- Fresh input (50%): 30M × $3 = $90
- Batch output (40% of queries): 12M × $7.50 = $90
- Real-time output (60%): 18M × $15 = $270
- Optimized Total: ~$450-475 (30-35% savings)
What Influences AI Costs?
Query Complexity
Simple lookups use ~1,000 tokens. Complex document analysis can use 50,000+ tokens. More context = higher input costs.
Response Length
Short answers cost pennies. Detailed reports with summaries and analysis cost more. Output tokens are 5× more expensive than input.
Knowledge Base Size
Larger context windows (more documents) = more input tokens. Strategic document organization reduces costs.
Model Selection
Haiku for speed/cost, Sonnet for balance, Opus for critical tasks. Choosing the right model for each use case optimizes spending.
Practical Cost Optimization Strategies
Strategy 1: Smart Model Selection
Use Haiku ($0.25/$1.25 per 1M) for:
- Quick lookups and simple Q&A
- Document search and retrieval
- Status checks and brief summaries
Use Sonnet ($3/$15 per 1M) for:
- Complex analysis and reasoning
- Multi-document synthesis
- Strategic decision support
Potential Savings: 60-80% on routine queries
Strategy 2: Enable Prompt Caching
Sasha automatically caches:
- Your knowledge base context
- Frequently accessed documents
- Common organizational information
Potential Savings: 50-90% on input tokens
Strategy 3: Batch Non-Urgent Work
For reports, summaries, and scheduled analysis:
- Queue requests for batch processing
- Get 50% discount on all tokens
- Results delivered within hours instead of seconds
Potential Savings: 50% on background processing
Strategy 4: Optimize Knowledge Base
- Focus on most-accessed documents
- Remove redundant information
- Structure documents for efficient retrieval
- Use summaries instead of full documents where appropriate
Potential Savings: 20-40% on input tokens
Security & Compliance Impact on Costs
No Additional Security Costs
Unlike other AI solutions, Sasha's security features do not increase AI costs:
Encryption: Your API tokens are encrypted at rest (AES-256-GCM) - no AI cost impact
Private Deployment: AWS Bedrock keeps data in your region - same per-token pricing
Access Controls: Role-based permissions managed locally - zero AI cost
Audit Logging: All tracking happens in Sasha - no AI provider charges
The only AI costs you pay are for actual token consumption—security is free.
📞 Getting Started
Ready to See Your Costs?
30-Day Trial Estimates:
- Configure your API credentials
- Use Sasha normally for 30 days
- Review actual token usage in your provider console
- Make informed decisions based on real data
Most organizations discover:
- AI costs are 70-80% lower than expected
- Optimization features reduce costs by 40-60%
- Direct billing eliminates vendor markup concerns
- ROI is positive within first month of deployment