LLM Benchmarks

Leading Large Language Models Compared

Monthly performance analyses of leading language models – from OpenAI and Google to local open-source solutions.

The AI Strategy & Research Hub of TIMETOACT GROUP Austria is among the leading experts in applied research on generative AI for enterprises. Our research findings feed directly into product development – enabling us to set the highest standards in implementing AI-powered solutions for businesses.

LLM Benchmarks | March 2025

Highlights:

Gemini-2.5 Pro Preview
DeepSeek V3 0324
Llama 4 models
Google Gemma 3 models
Focus on RPA

Explore the March LLM Benchmarks

The benchmark categories in detail

Here's exactly what we're looking at with the different categories of LLM Leaderboards

Docs

How well can the model work with large documents and knowledge bases?

CRM

How well does the model support work with product catalogs and marketplaces?

Integrate

Can the model easily interact with external APIs, services and plugins?

Marketing

How well can the model support marketing activities, e.g. brainstorming, idea generation and text generation?

Reason

How well can the model reason and draw conclusions in a given context?

Code

Can the model generate code and help with programming?

Cost

The estimated cost of running the workload. For cloud-based models, we calculate the cost according to the pricing. For on-premises models, we estimate the cost based on GPU requirements for each model, GPU rental cost, model speed, and operational overhead.

Speed

The "Speed" column indicates the estimated speed of the model in requests per second (without batching). The higher the speed, the better.

Discover our AI workshops for businesses

Whether it's AI fundamentals, Prompt Engineering training, or potential analysis – we offer tailored solutions for every need.

Explore our AI Workshops

Transform your digital projects with the best AI language models!

Discover the transformative power of the best Large Language Models and revolutionize your business with AI! Stay future-oriented, increase efficiency and secure a clear competitive advantage. We support you in taking your business value to the next level.

First Name *

Last Name *

Company *

E-Mail *

Phone number

Message *

* required

We use the data you send us only for contacting you in connection with your request. You can find all further information in our privacy policy.

Martin Warnung

Sales Consultant TIMETOACT GROUP Österreich GmbH +43 664 881 788 80

Contact

Wissen 8/30/24

LLM-Benchmarks August 2024

Instead of our general LLM benchmarks, we present the first benchmark of different AI architectures in August.

Blog 11/12/24

ChatGPT & Co: LLM Benchmarks for October

Find out which large language models outperformed in the October 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 12/4/24

ChatGPT & Co: LLM Benchmarks for November

Find out which large language models outperformed in the November 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 1/7/25

ChatGPT & Co: LLM Benchmarks for December

Find out which large language models outperformed in the December 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 10/1/24

ChatGPT & Co: LLM Benchmarks for September

Find out which large language models outperformed in the September 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog 2/3/25

ChatGPT & Co: LLM Benchmarks for January

Find out which large language models outperformed in the January 2025 benchmarks. Stay informed on the latest AI developments and performance metrics.

Wissen 6/30/24

LLM-Benchmarks June 2024

This LLM Leaderboard from june 2024 helps to find the best Large Language Model for digital product development.

Wissen 5/30/24

LLM-Benchmarks May 2024

This LLM Leaderboard from may 2024 helps to find the best Large Language Model for digital product development.

Wissen 7/30/24

LLM-Benchmarks July 2024

This LLM Leaderboard from July 2024 helps to find the best Large Language Model for digital product development.

Wissen 4/30/24

LLM-Benchmarks April 2024

This LLM Leaderboard from april 2024 helps to find the best Large Language Model for digital product development.

Blog 6/22/23

Strategic Impact of Large Language Models

This blog discusses the rapid advancements in large language models, particularly highlighting the impact of OpenAI's GPT models.

Technologie Übersicht

Consulting for IBM products

IBM Software & Consulting with passion and experience – you can rely on us. Services relating to IBM Software Solutions have always been an integral part of our offer.

Blog 5/16/24

Common Mistakes in the Development of AI Assistants

We share how failures when implementing AI occurr and what can be learned from them for future projects: So that AI assistants can be implemented more successfully in the future!

Blog 5/17/24

8 tips for developing AI assistants

8 practical tips for implementing AI assistants

Blog 5/25/21

From the idea to the product: The genesis of Skwill

We strongly believe in the benefits of continuous learning at work; this has led us to developing products that we also enjoy using ourselves. Meet Skwill.

Wissen 5/2/24

Unlock the Potential of Data Culture in Your Organization

Are you ready to revolutionize your organization's potential by unleashing the power of data culture? Imagine a workplace where every decision is backed by insights, every strategy informed by data, and every employee equipped to navigate the digital landscape with confidence. This is the transformative impact of cultivating a robust data culture within your enterprise.

Blog 1/21/25

AI Contest - Enterprise RAG Challenge

TIMETOACT GROUP Austria demonstrates how RAG technologies can revolutionize processes with the Enterprise RAG Challenge.

Service

Fit for the digital ecosystem

Insurers are digitally networking with their ecosystem to gain critical capabilities in a division of labor. Personal data, object data are securely exchanged via common digital interfaces.

Branche

AI & Digitization for the Transportation and Logistics Indus

Digitalisierung und Transparenz der Prozesse sowie automatisierte Unterstützung bei der Optimierung können Logistikunternehmen helfen, den Spagat zwischen Kosten und Leistung besser zu bewältigen, um langfristig als wertvoller Partner der Wirtschaft zu agieren.

Blog 10/4/24

Open-sourcing 4 solutions from the Enterprise RAG Challenge

Our RAG competition is a friendly challenge different AI Assistants competed in answering questions based on the annual reports of public companies.

Leading Large Language Models Compared