Configuration
Reduces VRAM by offloading KV cache to CPU RAM. Slows down inference by 20-50%.
1 (Recommended: Match concurrent users)
Number of users processed simultaneously. Higher = better throughput but requires more VRAM.
Splits large prompts into chunks to avoid VRAM spikes. Essential for high batch sizes with large contexts.
Maximum tokens processed per chunk with chunked prefill. Lower = less VRAM spike, slightly slower prefill.
Estimated single-stream TPS for this GPU configuration. Suggested: 300-600 for RTX 6000 Blackwell.
60%
Time to recover hardware costs through rental. 36 months is standard.
Results
Hardware Cost
€0
One-time purchase
Monthly OpEx
€0
Electricity + Maintenance
Monthly Rental
€0
36-month payback + profit (15% margin)
VRAM Requirements
Model Weights
0 GB
KV Cache (Generation)
0 GB
Activations (Prefill)
0 GB
Prefill Spike (Naive)
0 GB
Total Required
0 GB
Per GPU VRAM
0 GB
Chunked Prefill Savings
0 GB
Prefill Analysis
Total Tokens to Prefill
0
Prefill Chunks Needed
0
Est. Prefill Time (Chunked)
0
Est. Prefill Time (Naive)
0
Prefill VRAM Spike Risk
Low
Hardware Requirements
GPUs Needed
0
Nodes Required
0
System RAM
0 GB
Storage
0 GB
Total Power
0 kW
Performance Metrics
Total VRAM Capacity
0 GB
VRAM Utilization
0%
Speed Impact
0%
KV Cache Offloaded
0 GB
Max Batch Size
0
Throughput Analysis
Base TPS (Single Stream)
0
Batch Size
0
Total System Throughput
0
Per-User Speed
0
PCIe Scaling Factor
0
Effective Throughput
0
PagedAttention Comparison
Without PagedAttention
0 GPUs
With PagedAttention
0 GPUs
GPUs Saved
0
Cost Savings
€0
Performance Trade-off
None
Cost Breakdown
GPU Cost
€0
Base System Cost
€0
Monthly Electricity
€0
Monthly Maintenance
€0
Rental Price Breakdown
Hardware Depreciation
€0
Electricity (with cooling)
€0
Bandwidth/Network
€0
Support & Maintenance
€0
Profit Margin (15%)
€0
Total Rental Price
€0