AI Costs: Complete Transparency

Understanding Your AI Investment with Sasha

Executive Summary

Zero Markup Pricing

Sasha operates on a direct cost pass-through model. You pay your AI provider (Anthropic or AWS Bedrock) directly at their published rates—no markup, no hidden fees. Your AI costs are completely transparent and under your control.

You provide your own API credentials → You pay your AI provider directly → You see every transaction

How Sasha's AI Pricing Works

API Token-Based Pricing

Unlike traditional software licensing, AI services charge per token—roughly 4 characters or 0.75 words. Every query and response consumes tokens, and you only pay for what you use.

Why Token-Based Pricing?

Pay-per-use: No monthly minimums, no wasted capacity
Scalable: Costs grow proportionally with actual usage
Predictable: Clear pricing tables from AI providers
Optimizable: Multiple cost-saving strategies available

Two Provider Options

Sasha supports two ways to access AI capabilities, both with direct billing:

Option 1: Direct Anthropic API

Best for: Organizations wanting simplest setup and latest models

Setup: Provide your Anthropic API key in Sasha admin settings
Billing: Monthly invoice from Anthropic based on token usage
Monitoring: Full usage dashboard in Anthropic console
Security: Direct API connection, encrypted token storage
Access: Immediate access to newest Claude models

Option 2: AWS Bedrock

Best for: Organizations with existing AWS infrastructure and strict data residency requirements

Setup: Configure AWS credentials with Bedrock permissions
Billing: Included in your AWS monthly bill
Monitoring: CloudWatch metrics and AWS Cost Explorer
Security: Data processed within your AWS region, never leaves AWS
Compliance: Regional deployment options for data sovereignty

Actual 2025 AI Pricing

Pricing Improves Over Time

AI models get better and cheaper every year. Unlike traditional software with fixed pricing, you benefit from:

New models released regularly with better performance at lower cost
Price reductions as AI providers scale infrastructure (prices have dropped 70% since 2023)
Instant control to switch models in Sasha settings—no vendor lock-in
Your choice to upgrade when ready or stick with proven models

You're in control: When a better model launches, simply update your settings and start using it immediately at the new (typically lower) price.

Direct Anthropic Pricing (January 2025)

Model	Input Tokens	Output Tokens	Best For
Claude 3.5 Sonnet (Recommended)	$3.00 / 1M	$15.00 / 1M	Balanced performance and cost
Claude 3.5 Haiku (Economy)	$0.25 / 1M	$1.25 / 1M	High-volume queries, quick responses
Claude Opus 4.1 (Premium)	$15.00 / 1M	$75.00 / 1M	Complex reasoning, critical decisions

Cost-Saving Features:

Prompt Caching: Up to 90% savings on repeated context
Batch Processing: 50% discount for non-urgent queries
Long Context: 200K tokens standard, 1M available with premium pricing

AWS Bedrock Pricing (January 2025)

Model	Input Tokens	Output Tokens	Batch Discount
Claude 3.5 Sonnet	$3.00 / 1M	$15.00 / 1M	$1.50 / $7.50 (50% off)
Claude 3.5 Haiku	$1.00 / 1M	$5.00 / 1M	$0.50 / $2.50 (50% off)
Claude Opus	$15.00 / 1M	$75.00 / 1M	$7.50 / $37.50 (50% off)

Additional AWS Benefits:

Consolidated Billing: AI costs on same bill as your other AWS services
Volume Discounts: AWS Enterprise agreements may reduce costs further
Regional Options: Deploy in specific AWS regions for compliance
Provisioned Throughput: Dedicated capacity from $22-44/hour for guaranteed performance

Note: Prices vary slightly by AWS region. Check AWS Bedrock pricing page for your region.

Usage Scenarios: Real Cost Examples

Light Usage: 100-500 Queries/Month

Who: Small teams, occasional research, project-based knowledge access

Example: 300 queries per month, average 2,000 input tokens + 1,000 output tokens per query

Using Claude 3.5 Sonnet:

Input: 300 × 2,000 = 600,000 tokens = 0.6M tokens × $3 = $1.80
Output: 300 × 1,000 = 300,000 tokens = 0.3M tokens × $15 = $4.50
Total Monthly Cost: ~$6-8

Medium Usage: 1,000-5,000 Queries/Month

Who: Regular team usage, daily knowledge queries, document analysis

Example: 2,500 queries per month, average 3,000 input + 1,500 output tokens

Using Claude 3.5 Sonnet:

Input: 2,500 × 3,000 = 7.5M tokens × $3 = $22.50
Output: 2,500 × 1,500 = 3.75M tokens × $15 = $56.25
Total Monthly Cost: ~$75-100

With Prompt Caching (typical 50% cache hit rate):

Cached input: 3.75M × $0.30 = $11.25 (90% savings)
Fresh input: 3.75M × $3 = $11.25
Output: $56.25
Optimized Total: ~$75-80 (30% savings)

Heavy Usage: 10,000+ Queries/Month

Who: Enterprise-wide deployment, multiple teams, continuous AI assistance

Example: 15,000 queries per month, average 4,000 input + 2,000 output tokens

Using Claude 3.5 Sonnet:

Input: 15,000 × 4,000 = 60M tokens × $3 = $180
Output: 15,000 × 2,000 = 30M tokens × $15 = $450
Total Monthly Cost: ~$600-650

With Prompt Caching + Batch Processing:

Cached input (50%): 30M × $0.30 = $9
Fresh input (50%): 30M × $3 = $90
Batch output (40% of queries): 12M × $7.50 = $90
Real-time output (60%): 18M × $15 = $270
Optimized Total: ~$450-475 (30-35% savings)

What Influences AI Costs?

Query Complexity

Simple lookups use ~1,000 tokens. Complex document analysis can use 50,000+ tokens. More context = higher input costs.

Response Length

Short answers cost pennies. Detailed reports with summaries and analysis cost more. Output tokens are 5× more expensive than input.

Knowledge Base Size

Larger context windows (more documents) = more input tokens. Strategic document organization reduces costs.

Model Selection

Haiku for speed/cost, Sonnet for balance, Opus for critical tasks. Choosing the right model for each use case optimizes spending.

Practical Cost Optimization Strategies

Strategy 1: Smart Model Selection

Use Haiku ($0.25/$1.25 per 1M) for:

Quick lookups and simple Q&A
Document search and retrieval
Status checks and brief summaries

Use Sonnet ($3/$15 per 1M) for:

Complex analysis and reasoning
Multi-document synthesis
Strategic decision support

Potential Savings: 60-80% on routine queries

Strategy 2: Enable Prompt Caching

Sasha automatically caches:

Your knowledge base context
Frequently accessed documents
Common organizational information

Potential Savings: 50-90% on input tokens

Strategy 3: Batch Non-Urgent Work

For reports, summaries, and scheduled analysis:

Queue requests for batch processing
Get 50% discount on all tokens
Results delivered within hours instead of seconds

Potential Savings: 50% on background processing

Strategy 4: Optimize Knowledge Base

Focus on most-accessed documents
Remove redundant information
Structure documents for efficient retrieval
Use summaries instead of full documents where appropriate

Potential Savings: 20-40% on input tokens

Security & Compliance Impact on Costs

No Additional Security Costs

Unlike other AI solutions, Sasha's security features do not increase AI costs:

Encryption: Your API tokens are encrypted at rest (AES-256-GCM) - no AI cost impact
Private Deployment: AWS Bedrock keeps data in your region - same per-token pricing
Access Controls: Role-based permissions managed locally - zero AI cost
Audit Logging: All tracking happens in Sasha - no AI provider charges

The only AI costs you pay are for actual token consumption—security is free.

📞 Getting Started

Ready to See Your Costs?

30-Day Trial Estimates:

Configure your API credentials
Use Sasha normally for 30 days
Review actual token usage in your provider console
Make informed decisions based on real data

Most organizations discover:

AI costs are 70-80% lower than expected
Optimization features reduce costs by 40-60%
Direct billing eliminates vendor markup concerns
ROI is positive within first month of deployment

 support@context-is-everything.com |  Schedule a cost analysis call

Sasha AI Knowledge Management - Complete transparency, zero markup, full control