Prices LLM usage as you go
Model Performance and Pricing
Note
All prices are in USD per 1,000 tokens. MMLU scores indicate model performance on the Massive Multitask Language Understanding benchmark.
High Performance Models
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
llama3-swiss 🇨🇭 | $0.015 | $0.045 | 85.2% | Advanced, Efficient, Recommended |
gpt-4o | $0.045 | $0.09 | 92.3% | Leading Performance |
claude-sonnet | $0.005 | $0.02 | 88.7% | Multilingual, Writing, Coding |
Balanced Models
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
llama-swiss-medium 🇨🇭 | $0.005 | $0.01 | 79.2% | Strong for Size |
mixtral-swiss-big 🇨🇭 | $0.01 | $0.02 | N/A | Advanced Multilingual |
mistral-medium | $0.00375 | $0.01125 | 77.3% | Efficient, Multilingual |
mixtral-swiss-medium 🇨🇭 | $0.003 | $0.01 | 77.3% | Efficient, Multilingual |
gpt-4 | $0.045 | $0.09 | 86.5% | Consistent |
claude-opus | $0.022 | $0.10 | 86.8% | Strong Reasoning |
Efficient Models
Model | Input | Output | MMLU | Details |
---|---|---|---|---|
gpt-3.5-turbo-1106 | $0.0015 | $0.003 | ~70% | Legacy |
mistral-tiny | $0.00042 | $0.00126 | 60.1% | Compact, Fast, Cost-Effective |
mistral-small | $0.0012 | $0.0036 | 70.6% | Balanced Speed/Quality |
Performance Notes
- MMLU scores marked with (1) indicate single-shot performance
- Scores marked with (5-shot) use few-shot learning
- N/A indicates pending benchmark data
MMLU scores
While MMLU scores provide a useful metric for comparing language model capabilities, they represent only one dimension of performance. These scores primarily measure how well models can handle a standardized set of tasks, but they do not fully capture broader skills such as multilingual comprehension, information retention, domain-specific usage, programming proficiency, or complex reasoning abilities. In practice, different tasks place different demands on a model’s underlying architecture and training data, causing performance to vary considerably across these domains. As a result, MMLU should be seen as a helpful indicator rather than a definitive measure of a model’s overall quality or suitability for a given application. Source: Llm leaderboard
Rate Limiting
All endpoints have a combined limit of CHF 50 per month. If you would like to increase it, please contact support with your estimated usage and use case.