Leading Large Language Models Compared
Monthly performance analyses of leading language models – from OpenAI and Google to local open-source solutions.

The AI Strategy & Research Hub of TIMETOACT GROUP Austria is among the leading experts in applied research on generative AI for enterprises. Our research findings feed directly into product development – enabling us to set the highest standards in implementing AI-powered solutions for businesses.
LLM Benchmarks | March 2025
Highlights:
Gemini-2.5 Pro Preview
DeepSeek V3 0324
Llama 4 models
Google Gemma 3 models
Focus on RPA

The benchmark categories in detail
How well can the model work with large documents and knowledge bases?
How well does the model support work with product catalogs and marketplaces?
Can the model easily interact with external APIs, services and plugins?
How well can the model support marketing activities, e.g. brainstorming, idea generation and text generation?
How well can the model reason and draw conclusions in a given context?
Can the model generate code and help with programming?
The estimated cost of running the workload. For cloud-based models, we calculate the cost according to the pricing. For on-premises models, we estimate the cost based on GPU requirements for each model, GPU rental cost, model speed, and operational overhead.
The "Speed" column indicates the estimated speed of the model in requests per second (without batching). The higher the speed, the better.
Archive
Curious about how the scores have evolved? Here you can find all links to previously published leaderboards

Discover our AI workshops for businesses
Whether it's AI fundamentals, Prompt Engineering training, or potential analysis – we offer tailored solutions for every need.
Transform your digital projects with the best AI language models!
Discover the transformative power of the best Large Language Models and revolutionize your business with AI! Stay future-oriented, increase efficiency and secure a clear competitive advantage. We support you in taking your business value to the next level.