Das Team-Leaderboard

ERGEBNISSE DER ENTERPRISE RAG CHALLENGE 2025

Crowd-Sourced AI research

Die Enterprise RAG Challenge ist ein weltweites Innovationsprojekt, bei dem die Grenzen von RAG neu definiert wurden: Ein crowdgesourctes KI-Forschungsprojekt mit echtem Praxisbezug und maximalem Impact.

Im Mittelpunkt stand ein realer Business Use-Case: Anhand von 100 Geschäftsberichten (teilweise tausende Seiten) wurde ermittelt, welches RAG-System am besten antwortet. Eingereichte Ansätze reichten von klassischen Retrieval-Pipelines über Multi-Agenten-Systeme bis zu lokalen Lösungen – um effektive RAG-Lösungen zu entwickeln und voneinander zu lernen.

Das Team-Leaderboard

Das Team-Leaderboard fasst alle eingereichten Beiträge zusammen – auch jene, die nach der Bekanntgabe der Ground Truth eingereicht wurden. Daher betrachten wir diese Rangliste als inoffizielle Übersicht.

Dieses vollständige Team-Leaderboard zeigt die besten Enterprise-RAG-Lösungen pro Team.
Berücksichtigt wurden nur Einreichungen, bei denen auch ein Architekturfragebogen ausgefüllt wurde. Wir haben alle Fragebögen sorgfältig ausgewertet – personenbezogene Daten wurden dabei entfernt, zentrale Erkenntnisse aus den Experimenten zusammengefasst.

Die Ergebnisse kann man direkt in der interaktiven Tabelle entdecken: Klicken Sie einfach auf eine Zeile, um die Detailansicht des jeweiligen Teams zu öffnen.

Was bedeuten die Spalten? Alle Werte im Überblick

"R&D" – kennzeichnet Teams, die an Forschungs- und Entwicklungsaktivitäten in unseren Communities teilnehmen. Treten Sie unserem Discord-Kanal bei, um über neue Initiativen auf dem Laufenden zu bleiben!
"Time" – wie viel Zeit seit dem Zeitpunkt vergangen ist, als wir die Fragen für die Challenge generiert haben.
"R Score" – die Qualität des Retrieval-Teils von RAG. Er wurde ermittelt, indem die angegebenen Referenzen mit den Ground-Truth-Seitennummern verglichen wurden.
"G Score" – die Qualität des Generation-Teils von RAG. Er wird berechnet, indem die generierten Antworten mit dem Ground-Truth-Datensatz verglichen werden.
"Score" – die endgültige Punktzahl: R/3 + G. Das theoretische Maximum lag bei 133,3.
"Local" – zeigt an, ob sich die Lösung vollständig offline ausführen lässt.

Zeige nur lokale Modelle 🏠
Schränke den Zeitraum der Einreichung ein auf Stunden |

#	Team	Experiment	Time	R&D	Local	R	G	Score
1	Ilia Ris	▶ Dense Retrieval; Router; LLM reranking; o3-mini	49 min	🤝		83.8	81.8	123.7
Ilia Ris Best experiment: Dense Retrieval; Router; LLM reranking; o3-mini Signature: `f1d79f` Summary: Dense retrieval combined with LLM reranking and SO CoT. Models used: o3-mini-2025-01-31 Architecture Ilia Ris solved the problem by making it easy to run numerous experiments before the competition has even started. He created an evaluation pipeline that let him quickly evaluate different architectural solutions. The best solution was also among the fastest ones. The winning experiment had this configuration: PDF Analysis: Documents are processed using a highly modified Docling Library from IBM. Modifications were needed to preserve page references. Router Pattern: First step in question answering flow picks the most suitable agent. Dense Retrieval: The system searches for relevant information based on semantic similarity (FAISS library and OpenAI vector embeddings). Parent Document Retrieval: Instead of retrieving only the chunk, full page is loaded to preserve relevant context. LLM Reranking: Retrieved information is re-evaluated and reordered by the LLM. Reasoning Patterns: Improve LLM accuracy within a single prompt by controlling its thinking process with Custom Chain-of-Thought and Structured Outputs. Final Answer generation: The optimized result is generated using o3-mini. Self-Consistency with Majority Vote: Multiple answer variations are generated, compared, and the most consistent one is selected. R&D Experiments Total experiments submitted: 11 Other approaches: Dense Retrieval; LLM Reranking; Router; SO CoT; o3-mini Dense Retrieval; Router; SO CoT; llama3.3-70b Dense Retrieval; Tables serialization; Router; LLM reranking; o3-mini Dense Retrieval; llama-3.3 70b Dense Retrieval; llama-3.1 8b Full Context; gemini-2.0 thinking Dense Retrieval; Router; LLM reranking; Self-Consistency; o3-mini Dense Retrieval; Router; LLM reranking; Self-Consistency; llama-3.3 70b What didn't work? Using llama-3.1 8b for reranking Incorporating Full Context with gemini-2.0 thinking Future experiments: Evaluating various local embedding models for fully offline solutions Experiment journal: 16 min → R: 83.9, G: 72.8, Score: 114.8 ▲ - Dense Retrieval; LLM Reranking; Router; SO CoT; o3-mini 23 min → R: 81.4, G: 74.7, Score: 115.4 ▲ - Dense Retrieval; llama-3.3 70b 49 min → R: 83.8, G: 81.8, Score: 123.7 ▲ - Dense Retrieval; Router; LLM reranking; o3-mini 50 min → R: 81.1, G: 68.7, Score: 109.3 - Dense Retrieval; llama-3.1 8b 51 min → R: 75.5, G: 75.0, Score: 112.8 - Full Context; gemini-2.0 thinking 66 min → R: 83.0, G: 78.8, Score: 120.3 - Dense Retrieval; Tables serialization; Router; LLM reranking; o3-mini 22 hours → R: 83.5, G: 81.8, Score: 123.6 - Dense Retrieval; Router; LLM reranking; o3-mini 22 hours → R: 80.8, G: 75.7, Score: 116.1 - Dense Retrieval; llama-3.3 70b 33 hours → R: 83.4, G: 79.8, Score: 121.6 - Dense Retrieval; Router; LLM reranking; Self-Consistency; o3-mini 33 hours → R: 81.3, G: 79.7, Score: 120.3 - Dense Retrieval; Router; LLM reranking; Self-Consistency; llama-3.3 70b
2	Emil Shagiev	▶ LLM_Search	55 min	🤝		86.3	78.5	121.6
Emil Shagiev Best experiment: LLM_Search Signature: `0a8782` Summary: A multi-step process involving query expansion, efficient search, question answering, and answer finalization. Models used: gpt-4o-mini-2024-07-18 gpt-4o-2024-08-06 o3-mini-2025-01-31 Architecture The best solution didn't use vector embeddings, it leveraged a structured approach: the input query is expanded to enhance search coverage and enable semantic search; relevant pages are retrieved using a cost-effective and rapid LLM; retrieved information is then passed to powerful LLM to generate answers; answers are refined and finalized for presentation. R&D Experiments Total experiments submitted: 3 Other approaches: LLL_Search_2: Similar architecture with added capability for mathematical operations. Experiment journal: 55 min → R: 86.3, G: 78.5, Score: 121.6 ▲ - LLM_Search 21 hours → R: 86.1, G: 77.5, Score: 120.5 - LLL_Search_2
3	Dmitry Buykin	▶ slow-run-and-bugs	8 hours	🤝		81.4	76.8	117.5
Dmitry Buykin Best experiment: Dynamic Structured Output with SEC EDGAR Ontologies Signature: `6b0d78` Summary: Dynamic structured output with query expansion and page-focused chunking. Models used: gpt-4o-2024-08-06 Architecture Used SO/CoT approach with ontologies to retrieve relevant information. Key highlights: embeddings and vector databases were not used; dynamic structured output approach combined with SEC EDGAR ontologies for query expansion (SO CoT); utilized CBOW similarity for majority selection across multiple runs, focusing on balancing pages versus tokens during chunking significant effort was dedicated to evaluating PDF quality heuristics to optimize OCR input synthetic tags were implemented to stabilize page detection and assess model quality.
4	Sergey Nikonov	▶ main v2	30 hours	🤝		85.1	73.9	116.4
Sergey Nikonov Best experiment: main v2 Signature: `00c0e1` Summary: For every question, all pages are processed using gpt-4o. Models used: gpt-4o o1-mini Architecture Solution involves feeding all pages of the provided documents into the gpt-4o model for each question. This simple but practical approach ensures comprehensive coverage of the content to extract accurate answers. R&D Experiments Total experiments submitted: 2 Other approaches: Finding the PDFs that correspond to questions, cutting the PDFs by page, running the question against each PDF page by loading the PDF directly into gpt-4o (through the assistant API), scanning all PDF pages for the answer, and combining the answers by simple logic. What didn't work? Using the o3-mini model instead of o1-mini in the architecture. Experiment journal: 5 hours → R: 85.3, G: 69.0, Score: 111.6 ▲ - Main 30 hours → R: 85.1, G: 73.9, Score: 116.4 ▲ - main v2
5	ScrapeNinja.net	▶ fixed multiple companies search	23 hours	🤝		82.6	71.2	112.5
ScrapeNinja.net Best experiment: fixed multiple companies search Signature: `417bbf` Summary: Node.js-based architecture utilizing pgvector for efficient data handling. Models used: Gemini Flash 2.0 Gemini Flash Lite 2.0 Flash Thinking Exp Architecture The solution used Node.js for backend operations and pgvector for vectorized data processing. It focused on efficient handling of complex queries and data retrieval tasks. The team utilized: Gemini Flash 2.0 Gemini Flash Lite 2.0 Flash Thinking Exp. R&D Experiments Total experiments submitted: 2 Other approaches: OCR and PG Experiment journal: 20 hours → R: 82.6, G: 64.2, Score: 105.5 ▲ - OCR and PG 23 hours → R: 82.6, G: 71.2, Score: 112.5 ▲ - fixed multiple companies search
6	xsl777	▶ multi-query, gpt-4o	16 hours	🤝		79.4	71.2	110.9
xsl777 Best experiment: multi-query, gpt-4o Signature: `66ab5c` Summary: Structured PDF parsing, metadata extraction, query expansion, hybrid search, reranking, and CoT. Models used: gpt-4o gpt-4o-mini Architecture The architecture integrates following patterns: structured PDF parsing and chunking; metadata extraction; query expansion; hybrid search mechanisms; reranking strategies. It synthesizes document metadata and chunks while utilizing Chain-of-Thought (CoT) reasoning to enhance response accuracy and relevance. `gpt-4o` and `gpt-4o-mini` help with high-quality language understanding and generation capabilities. R&D Experiments Total experiments submitted: 2 Experiment journal: 16 hours → R: 79.4, G: 71.2, Score: 110.9 ▲ - multi-query, gpt-4o 3 days → R: 80.1, G: 70.7, Score: 110.7 - Open source, Advanced RAG
7	nikolay_sheyko(grably.tech)	▶ nikolay_sheyko(grably.tech)_with_o3_mini	25 hours	🤝		81.1	69.8	110.4
nikolay_sheyko(grably.tech) Best experiment: nikolay_sheyko(grably.tech)_with_o3_mini Signature: `db8938` Summary: Relevant pages are identified and processed to generate answers. Models used: gpt-4o-mini o3-mini Architecture The solution employs a two-step process: first, it identifies relevant reports for a given question and evaluates the relevance of each page asynchronously using the gpt-4o-mini model; then , all relevant pages are compiled into a prompt, and the o3-mini model is utilized to generate the final answer. R&D Experiments Total experiments submitted: 7 Other approaches: Dynamic data extraction with pydantic classes Binary checks per page Parallel question splitting Subquestion generation for multi-entity queries Single-page reference experiments What didn't work? Binary checks per page Single-page reference experiments Experiment journal: 55 min → R: 77.2, G: 51.2, Score: 89.9 ▲ - grably.tech/with_extra_reasoning_from_different_pages_hacked96160725 25 hours → R: 81.1, G: 69.8, Score: 110.4 ▲ - nikolay_sheyko(grably.tech)_with_o3_mini 25 hours → R: 79.7, G: 60.2, Score: 100.1 - nikolay_sheyko(grably.tech)_dummy 8 days → R: 80.5, G: 64.3, Score: 104.6 - o3-mini-no-restrictions 8 days → R: 80.5, G: 66.3, Score: 106.6 - o3-mini-no-restrictions-fixed-names 12 days → R: 81.2, G: 67.1, Score: 107.7 - o3-mini-no-restrictions-single-reference 12 days → R: 80.5, G: 67.3, Score: 107.6 - o3-mini-no-restrictions-fixed-names-and-boolean
8	Felix-TAT	▶ Gemini-4o Multiagent RAG	7 days	🤝		80.2	69.3	109.4
Felix-TAT Best experiment: Gemini-4o Multiagent RAG Signature: `a2faff` Summary: Multiagent, mixed-model approach with delegation and execution agents. Models used: gemini-2.0-flash gpt-4o-2024-08-06 Architecture The solution uses a multiagent architecture where a delegation manager (OpenAI) splits the user query into company-specific subqueries. These subqueries are processed by expert agents using Google's Gemini flash model, which has access to the entire company PDF in context. The responses are then aggregated and synthesized by an execution agent (OpenAI) to produce the final answer. R&D Experiments Total experiments submitted: 4 Other approaches: Gemini Naive IBM-4o-based Multiagent RAG OpenAI Multiagent RAG What didn't work? Using a single model without multiagent delegation Relying solely on vector database retrieval without full PDF context Experiment journal: 6 days → R: 79.0, G: 60.3, Score: 99.8 ▲ - Gemini Naive 7 days → R: 81.7, G: 47.3, Score: 88.2 - IBM-4o-based Multiagent RAG 7 days → R: 82.2, G: 66.0, Score: 107.1 ▲ - OpenAI Multiagent RAG 7 days → R: 80.2, G: 69.3, Score: 109.4 ▲ - Gemini-4o Multiagent RAG
9	A.Rasskazov/V.Kalesnikau	▶ multi_agent_ibm_openai	30 hours			84.0	67.2	109.3
A.Rasskazov/V.Kalesnikau Best experiment: multi_agent_ibm_openai Signature: `efabd4` Summary: A multi-agent system leveraging LLMs for question answering using similarity-based retrieval. Models used: meta-llama/llama-3-405b-instruct ibm/granite-embedding-107m-multilingual text-embedding-3-small gpt-4o-mini Architecture The solution employs a multi-agent architecture to address the challenge. Initially, it generates a database for the Retrieval-Augmented Generation (RAG) model. Upon receiving a query, the system extracts key metrics such as company, industry, and currency. These metrics are then used to identify the most similar question in the database. The answer associated with this similar question is retrieved and refined using a Large Language Model (LLM). Finally, the system consolidates and presents the answer to the user. R&D Experiments Total experiments submitted: 2 Other approaches: pjatk_team_002: A system that preprocesses questions, retrieves relevant PDF pages using a vector database, and extracts answers with page references using LLMs. What didn't work? Alternative embedding models for retrieval. Different strategies for key metric extraction. Experiment journal: 30 hours → R: 84.0, G: 67.2, Score: 109.3 ▲ - multi_agent_ibm_openai 7 days → R: 82.5, G: 64.0, Score: 105.2 - pjatk_team_002
10	Dany the creator	▶ gpt-4o-mini + pgvector	3 hours	🤝		82.8	67.0	108.4
Dany the creator Best experiment: gpt-4o-mini + pgvector Signature: `ee29ae` Summary: Utilized a structured approach to parse and analyze text chunks, creating embeddings and generating questions. Models used: gpt-4o-mini Architecture The solution preprocesses text by chunking, generating embeddings with `pgvector` library, and formulating questions that could be answered by the respective chunks.
11	SergC	▶ submission_1	7 days	🤝		77.5	69.3	108.1
SergC Best experiment: submission_1 Signature: `c0d776` Summary: QE + SO + CoT Models used: gemini 2.0 Architecture The solution uses a combination of: Query Expansion (QE) Semantic Optimization (SO) Chain of Thought (CoT) reasoning to enhance the performance of the Gemini 2.0 model.
12	Swisscom Innovation Lab	▶ Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3	21 hours		🔒	83.3	66.2	107.8
Swisscom Innovation Lab Best experiment: Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3 Signature: `debcf6` Summary: A multi-agent system leveraging LangGraph, LlamaIndex, MarkerPDF, and Llama 3.3 for accurate and contextual multi-company query processing. Models used: llama-3.3-70b-instruct Architecture This offline solution uses a multi-agent architecture with: LangGraph for workflow orchestration LlamaIndex for data indexing MarkerPDF for document parsing Llama 3.3 for natural language processing. Solution supports multi-company queries by: extracting relevant entities validating inputs processing each entity individually retrieving and evaluating documents aggregating results for numeric-based comparisons. R&D Experiments Total experiments submitted: 3 Other approaches: Iterative refinement of query processing pipeline Enhanced document retrieval mechanisms What didn't work? Simplified single-agent architecture Direct query-to-response mapping without intermediate validation Experiment journal: 80 min → R: 83.3, G: 65.2, Score: 106.8 ▲ - Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3 21 hours → R: 83.3, G: 66.2, Score: 107.8 ▲ - Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3
13	fomih	▶ gemini-flash CoT + so small fixes in question type detection	10 days	🤝		83.0	65.9	107.4
fomih Best experiment: gemini-flash CoT with question type detection fixes Signature: `60bc28` Summary: Enhanced question type detection for improved accuracy. Models used: gemini-flash 2.0 Architecture The solution utilized the gemini-flash 2.0 model, incorporating a refined approach to question type detection. This enhancement aimed to improve the accuracy and relevance of the responses generated by the system. The architecture involved preprocessing input documents into structured formats, creating knowledge bases tailored to specific question types, and leveraging these resources during the question-answering phase. The system identified the question type and relevant entities, retrieved pertinent knowledge base entries, and generated answers by combining the question with the retrieved data. R&D Experiments Total experiments submitted: 4 Other approaches: gemini-flash CoT with structured output gemini-flash CoT with structured output and small fixes gemini CoT with structured output final What didn't work? Initial handling of 'n/a' cases Fallback processing without structured knowledge bases Experiment journal: 10 days → R: 83.2, G: 59.9, Score: 101.5 ▲ - _gemini-flash CoT + structured output _ 10 days → R: 82.9, G: 62.8, Score: 104.3 ▲ - gemini-flash CoT + structured output small n/a handling fixex 10 days → R: 83.0, G: 65.9, Score: 107.4 ▲ - gemini-flash CoT + so small fixes in question type detection 12 days → R: 83.3, G: 64.4, Score: 106.1 - gemini CoT + SO final
14	Al Bo	▶ albo	12 days			81.1	65.3	105.9
Al Bo Best experiment: albo Signature: `1e89b6` Summary: Docling, Vector, Agent with search tool into documents Models used: gpt-4o Architecture The solution utilized a sophisticated architecture combining document processing (Docling), vector-based representation, and an agent equipped with a search tool for document retrieval.
15	NumericalArt	▶ Vhck-R0-002	8 days			70.0	70.3	105.3
NumericalArt Best experiment: Vhck-R0-002 Signature: `32aae7` Summary: Preprocessing questions, raw retrieval, filtering, retrieval, detailed page analysis, and answer generation. Models used: 4o-mini 4o 3o-mini Architecture The best employs a structured approach to information retrieval and answer generation. The process begins with preprocessing the input questions to enhance clarity and relevance. This is followed by an initial raw retrieval phase to gather potential information sources. Subsequently, a filtering mechanism is applied to refine the retrieved data. The refined data undergoes a detailed page analysis to extract precise and contextually relevant information. Finally, the system generates answers based on the analyzed data, leveraging the capabilities of the LLM models 4o-mini, 4o, and 3o-mini. R&D Experiments Total experiments submitted: 2 Other approaches: Parsing text from PDFs only, separate VDB for each document, one chunk equals one page, extract four pages by entity value from question (excluding company name), detailed parsing of extracted pages, asking LLM question with detailed information in context. Experiment journal: 7 days → R: 75.9, G: 63.3, Score: 101.3 ▲ - Vhck-R0 8 days → R: 70.0, G: 70.3, Score: 105.3 ▲ - Vhck-R0-002
16	Pedro Ananias	▶ rag-3w-cot-gpt-4o-mini	4 hours	🤝		80.4	64.7	104.9
Pedro Ananias Best experiment: rag-3w-cot-gpt-4o-mini Signature: `d44b72` Summary: A 3-way FAISS MMR Search & Stepped Chain Of Thought RAG Models used: openai/gpt-4o-mini Architecture The solution uses a 3-way FAISS MMR Search mechanism combined with a Chain Of Thought (CoT) approach. FAISS MMR Search involves query expansion, file selection based on exact matches and cosine similarity, and database searching using maximum marginal relevance. CoT pipeline consists of three sequential model calls with specific prompts for reasoning, formatting, and parsing. This architecture leverages the openai/gpt-4o-mini model for processing. R&D Experiments Total experiments submitted: 5 Other approaches: rag-3w-cot-gpt-4o-mini-hi-res rag-3w-cot-deepseek-r1-distill-llama-8B-fast-fp16 rag-3w-cot-deepseek-r1-distill-llama-8B-hi-res-fp16 rag-3w-cot-microsoft-phi4-14B-hi-res-int8 What didn't work? Using lower resolution PDF extraction for certain tasks Employing fully local processing without cloud integration in some scenarios Experiment journal: 4 hours → R: 80.4, G: 64.7, Score: 104.9 ▲ - rag-3w-cot-gpt-4o-mini 9 hours → R: 70.6, G: 56.0, Score: 91.3 - rag-3w-cot-deepseek-r1-distill-llama-8B-fast-fp16 9 hours → R: 77.0, G: 64.6, Score: 103.1 - rag-3w-cot-gpt-4o-mini-hi-res 11 hours → R: 72.3, G: 58.0, Score: 94.2 - rag-3w-cot-deepseek-r1-distill-llama-8B-hi-res-fp16 31 hours → R: 78.1, G: 59.7, Score: 98.7 - rag-3w-cot-microsoft-phi4-14B-hi-res-int8
17	Daniyar	▶ Fixed reference page indices	3 days			62.4	72.9	104.1
Daniyar Best experiment: Fixed reference page indices Signature: `8bb723` Summary: The architecture utilizes fixed reference page indices for efficient information retrieval. Models used: gpt-4o Architecture Solution uses a strategy of fixed reference page indices to enhance the accuracy and efficiency of document parsing and question answering. This approach ensures that the model can quickly locate and utilize relevant information from the provided documents, leveraging the capabilities of the GPT-4o model. R&D Experiments Total experiments submitted: 2 Other approaches: Sliding window PDF page reading with checklists over questions addressed to files. What didn't work? Alternative indexing methods or dynamic page referencing strategies. Experiment journal: 3 days → R: 62.2, G: 72.9, Score: 104.0 ▲ - First draft 3 days → R: 62.4, G: 72.9, Score: 104.1 ▲ - Fixed reference page indices
18	RubberduckLabs	▶ RAG experiment	2 days		🔒	74.5	66.0	103.3
RubberduckLabs Best experiment: RubberduckLabs - RAG experiment attempt 001 Signature: `ee7519` Summary: A multi-step LLM processing pipeline for document question-answering. Models used: deepseek-r1-distill-llama-70b:bf16 llama-3.1-70b-instruct:bf16 Architecture The architecture preprocesses documents to generate detailed page-level summaries and extracting structured metadata, particularly focusing on financial data. The retrieval process employs a two-stage approach: document selection based on metadata matching; precise page identification using semantic relevance and explicit reasoning. Answer generation utilizes 'Context-Guided Response Generation' combining retrieved contexts with structured reasoning to ensure factual accuracy and traceability. The system maintains explicit reasoning trails and incorporates robust error handling for production stability. R&D Experiments Total experiments submitted: 2
19	Machine Learning Reply	▶ ML Reply - Submission 1	28 hours			74.5	66.0	103.2
Machine Learning Reply Best experiment: ML Reply - Submission 1 Signature: `fa34f3` Summary: Integration of Azure Document Intelligence and Azure AI Search. Models used: GPT-4o Architecture This solution utilized a combination of Azure Document Intelligence for document processing and Azure AI Search for efficient information retrieval. R&D Experiments Total experiments submitted: 2 Other approaches: ML Reply - Submission 2 Experiment journal: 28 hours → R: 74.5, G: 66.0, Score: 103.2 ▲ - ML Reply - Submission 1 29 hours → R: 74.0, G: 63.5, Score: 100.5 - ML Reply - Submission 2
20	Aleksandr Podgaiko	▶ smolagent_simple_v1	3 days	🤝		81.2	62.3	103.0
Aleksandr Podgaiko Best experiment: smolagent_simple_v1 Signature: `6afedb` Summary: Utilized smolagents library with basic PDF extraction and a coding agent. Models used: openrouter/google/gemini-2.0-flash-001 Architecture The solution employed the HuggingFace smolagents library for agent-based interactions, integrating basic PDF extraction using PyPDF2. The architecture featured a default coding agent equipped with two tools: `pdf_search` for keyword-based search with contextual display and `pdf_content` for full-page content retrieval upon request. Additionally, the `final_answer` tool was customized to adhere to the submission format.
21	Vlad Drobotukhin (@mrvladd)	▶ Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router	6 days	🤝	🔒	68.3	68.2	102.3
Vlad Drobotukhin (@mrvladd) Best experiment: Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router Signature: `fa77e2` Summary: System combining LLM-based reasoning with optimized retrieval techniques. Models used: Qwen-2.5-72b-INT4 Architecture This offline solution employs a multi-step process: start with question analysis to determine the type and domain; generate multiple search queries to maximize recall; relevant pages are retrieved using OpenSearch and processed with domain-specific LLM extractors to build structured knowledge; final answers are synthesized with reasoning and confidence scores. R&D Experiments Total experiments submitted: 10 Other approaches: Qwen2.5 72b + FTS (rephrase query) +SO + CheckList's Qwen2.5 72b + FTS +SO + CheckList's Qwen2.5 + FTS (rephrase query) + SO + CheckList's Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction Qwen 2.5-72b + Multi-Query BM25 (top 15 pages) + Domain-Specific Information Extraction + Router Qwen 2.5-72b + Multi-Query BM25+ Domain-Specific Information Extraction + Router Qwen 2.5-72b-4bit + BM25 + Domain-Specific Information Extraction + Router MagicQwen-4bit + BM25 + Domain-Specific Information Extraction + Router Qwen 72b-4bit + FTS + Domain-Specific Information Extraction 0803 What didn't work? Simplified query generation without diversification Lack of domain-specific term boosting Absence of structured output validation Experiment journal: 3 days → R: 74.7, G: 59.2, Score: 96.5 ▲ - Qwen2.5 72b + FTS (rephrase query) +SO + CheckList's 3 days → R: 71.8, G: 62.3, Score: 98.2 ▲ - Qwen2.5 72b + FTS +SO + CheckList's 4 days → R: 74.7, G: 59.2, Score: 96.5 - Qwen2.5 + FTS (rephrase query) + SO + CheckList's 5 days → R: 69.1, G: 65.7, Score: 100.2 ▲ - Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction 6 days → R: 68.3, G: 68.2, Score: 102.3 ▲ - Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router 7 days → R: 67.6, G: 67.4, Score: 101.2 - Qwen 2.5-72b + Multi-Query BM25 (top 15 pages) + Domain-Specific Information Extraction + Router 8 days → R: 64.6, G: 62.0, Score: 94.3 - Qwen 2.5-72b + Multi-Query BM25+ Domain-Specific Information Extraction + Router 9 days → R: 61.9, G: 63.0, Score: 93.9 - Qwen 2.5-72b-4bit + BM25 + Domain-Specific Information Extraction + Router 9 days → R: 69.2, G: 63.2, Score: 97.8 - MagicQwen-4bit + BM25 + Domain-Specific Information Extraction + Router 10 days → R: 78.4, G: 63.0, Score: 102.2 - Qwen 72b-4bit + FTS + Domain-Specific Information Extraction 0803
22	Ivan R.	▶ Round 2 submission	71 min	🤝		79.9	62.0	101.9
Ivan R. Best experiment: Round 2 submission Signature: `b29973` Summary: A multi-step approach leveraging LLMs for question decomposition, search, and validation. Models used: gpt-4o gpt-4o-mini Architecture The solution employs a structured pipeline: document loading using PyPDFDirectoryLoader from LangChain; question decomposition with GPT-4o; multiple OpenAI assistants, each dedicated to a specific company, perform targeted searches using GPT-4o-mini; results undergo answer validation with GPT-4o local FAISS vector store is used for similarity search to collect reference pages.
23	PENZA_AI_CREW	▶ gpt-4_claude3.5_unstructured	7 days	🤝		72.5	65.0	101.3
PENZA_AI_CREW Best experiment: gpt-4_claude3.5_unstructured Signature: `67ee86` Summary: A multi-step pipeline leveraging OCR, table/image analysis, and knowledge mapping for accurate question answering. Models used: gpt-4-mini claude 3.5 gpt-4o Architecture This RAG pipeline was composed of the following steps: PDF text is parsed using Unstructured library with OCR Tables and images are analyzed using Claude 3.5 Knowledge map is constructed using gpt-4-mini, utilizing Structured Outputs. Questions are analyzed in conjunction with the knowledge map using gpt-4-mini with Pydantic schema. Answers are generated by gpt-4o, employing chain-of-thought reasoning and Pydantic schema (SO CoT). R&D Experiments Total experiments submitted: 2 Other approaches: RAG_PNZ_PAYPLINE: OCR with Unstructured, table/image analysis with Claude 3.5, metadata extraction with gpt-4-mini, and final reasoning with gpt-4o. What didn't work? Alternative OCR methods not utilizing Unstructured. Direct question answering without intermediate knowledge mapping. Experiment journal: 7 days → R: 12.2, G: 11.0, Score: 17.1 ▲ - RAG_PNZ_PAYPLINE 7 days → R: 72.5, G: 65.0, Score: 101.3 ▲ - gpt-4_claude3.5_unstructured
24	Yolo leveling	▶ Marker + Gemini	25 hours			82.2	59.9	101.0
Yolo leveling Best experiment: Marker + Gemini Signature: `31b473` Summary: Convert PDFs to markdown, extract company names, and generate JSON representations. Models used: Surya (OCR) Flash 2.0 Architecture The solution starts converting each PDF document into markdown format using the Marker tool with OCR capabilities. Afterward, the system identifies the company name within the content. In cases where multiple companies are mentioned in the query, the system employs a hallucination control mechanism to determine the most relevant company. The markdown content is then incorporated into the context for the LLM, which extracts and generates a structured JSON representation of the required information. R&D Experiments Total experiments submitted: 2 Other approaches: Gemini 1M pdf "thinking" + 4o parser What didn't work? Queries involving multiple companies were marked as N/A in alternative approaches. Experiment journal: 25 hours → R: 76.0, G: 60.0, Score: 98.0 ▲ - Gemini 1M pdf "thinking" + 4o parser 25 hours → R: 82.2, G: 59.9, Score: 101.0 ▲ - Marker + Gemini
25	ArtemNurm	▶ brute_flash2.0&brute_flash2.0	7 days	🤝		77.8	61.0	99.9
ArtemNurm Best experiment: brute_flash2.0&brute_flash2.0 Signature: `46e0e0` Summary: PDF2MD with Flash, relevant data extraction with Flash, the data is sent to LLM with questions using SO (no CoT). All steps include generator-critic workflow. Models used: Gemini Flash 2.0 OpenAI o3-mini Architecture The winning experiment employs a robust architecture leveraging the Gemini Flash 2.0 and OpenAI o3-mini models. The process involves converting PDF documents to Markdown format using Flash, extracting relevant data, and querying the LLM with specific questions using a straightforward approach without chain-of-thought reasoning. A generator-critic workflow is integrated into all steps to ensure high-quality outputs. R&D Experiments Total experiments submitted: 8 Other approaches: brute_flash2.0&CoT_flash2.0 index_flash2.0&brute_flash2.0 index_flash2.0&CoT_4o-2024-11-20 index_flash2.0&CoT_flash2.0 index_flash2.0&CoT_o3-mini-high index_flash2.0&CoT_o3-mini flash2.0_sees_all_content What didn't work? Using chain-of-thought reasoning in 'brute_flash2.0&CoT_flash2.0' did not outperform the winning approach. Concatenating all Markdown files into a single string in 'flash2.0_sees_all_content' was less effective. Experiment journal: 7 days → R: 77.8, G: 61.0, Score: 99.9 ▲ - brute_flash2.0&brute_flash2.0 7 days → R: 77.7, G: 61.0, Score: 99.8 - brute_flash2.0&CoT_flash2.0 7 days → R: 68.5, G: 57.6, Score: 91.8 - index_flash2.0&brute_flash2.0 7 days → R: 66.4, G: 56.8, Score: 90.0 - index_flash2.0&CoT_4o-2024-11-20 7 days → R: 66.3, G: 57.6, Score: 90.7 - index_flash2.0&CoT_flash2.0 7 days → R: 65.6, G: 58.8, Score: 91.6 - index_flash2.0&CoT_o3-mini-high 7 days → R: 65.9, G: 59.3, Score: 92.2 - index_flash2.0&CoT_o3-mini 7 days → R: 71.8, G: 55.6, Score: 91.4 - flash2.0_sees_all_content
26	ndt by red_mad_robot	▶ qwen32b+bge_m3	9 days	🤝	🔒	72.9	63.2	99.7
ndt by red_mad_robot Best experiment: qwen32b+bge_m3 Signature: `30f0d1` Summary: PDFs were converted to markdown, vectorized using bge m3, and queried with Qwen 32B. Models used: Qwen 32B instruct BGE-M3 Architecture This offline solution involved processing PDF documents by converting them into markdown format using the Pymupdf library. These markdown representations were then vectorized using the popular BGE-M3 model. `Qwen 32B instruct` model was used to answer user queries by leveraging the vectorized data for relevant context retrieval. R&D Experiments Total experiments submitted: 5 Other approaches: full open-source + roter agent qwen7b-router-agent What didn't work? Directly querying without vectorization Using alternative LLMs for vectorization Experiment journal: 23 hours → R: 27.2, G: 54.0, Score: 67.6 ▲ - full open-source + roter agent 7 days → R: 73.2, G: 51.0, Score: 87.6 ▲ - qwen7b-router-agent 9 days → R: 73.2, G: 59.0, Score: 95.6 ▲ - ndt by red_mad_robot 9 days → R: 72.9, G: 63.2, Score: 99.7 ▲ - qwen32b+bge_m3
27	Neoflex DreamTeam	▶ Best run	30 hours	🤝	🔒	77.8	58.0	96.9
Neoflex DreamTeam Best experiment: Simple LLM Brute Force Signature: `34a266` Summary: Utilized a straightforward LLM brute force approach for each page with predefined questions and example answers. Models used: Qwen 2.5 Architecture Solution used `Qwen 2.5` model to process each page individually, applying a brute force methodology with a set of predefined questions and corresponding example answers to extract relevant information effectively. R&D Experiments Total experiments submitted: 2 Other approaches: Checklist based RAG What didn't work? Alternative configurations of the Checklist based RAG approach Experiment journal: 30 hours → R: 77.8, G: 58.0, Score: 96.9 ▲ - Best run 7 days → R: 67.3, G: 51.7, Score: 85.4 - neon_team
28	nightwalkers	▶ nightwalkers-baseline	6 hours		🔒	72.9	60.2	96.7
nightwalkers Best experiment: nightwalkers-baseline Signature: `356ef4` Summary: Utilized a vector database for efficient document retrieval and LLM for response generation. Models used: deepseek-r1-distill-llama-70b Architecture The team implemented vector database search using embeddings from all-MiniLM-L6-v2 and ibm/granite-embedding-107m-multilingual models. This facilitated the retrieval of the most relevant page and document based on the query. The retrieved information was then processed by the deepseek-r1-distill-llama-70b LLM to generate relevant answers.
29	Gleb Kozhaev	▶ Gleb Kozhaev	32 hours	🤝		79.1	56.0	95.5
Gleb Kozhaev Best experiment: pymupdf4llm + Structured Output Signature: `1442cb` Summary: Utilized pymupdf4llm with structured output and three distinct system prompts/roles. Models used: gpt-4o-mini Architecture RAG solution employed the pymupdf4llm framework, leveraging Structured Outputs to enhance data processing and comprehension. Three distinct system prompts/roles were utilized to optimize the model's performance and ensure accurate and efficient results.
30	AndreiKopysov	▶ AndreiKopysov	33 hours	🤝		76.2	57.2	95.3
AndreiKopysov Best experiment: Gemini2.0 and DeepSeek R1 Integration Signature: `574182` Summary: The architecture processes PDF pages using Gemini2.0 and refines responses with DeepSeek R1. Models used: Gemini2.0 DeepSeek R1 Architecture This RAG solution used a two-step pipeline: each page of the PDF document is processed using the Gemini2.0 model to extract relevant information; extracted responses are refined and analyzed using the DeepSeek R1 model to ensure accuracy and relevance. R&D Experiments Total experiments submitted: 2 Other approaches: Reused the same architecture in different configurations. Experiment journal: 33 hours → R: 76.2, G: 57.2, Score: 95.3 ▲ - AndreiKopysov 33 hours → R: 76.2, G: 57.2, Score: 95.3 - AndreyKopysov
31	Serj Tarasenko	▶ complicated second	3 days			82.0	54.0	95.0
Serj Tarasenko Best experiment: complicated second Signature: `a5cf25` Summary: RAG pipeline with query enhancement and re-ranking. Models used: gpt-4o-mini text-embedding-3-small Architecture The winning solution implemented a Retrieval-Augmented Generation (RAG) pipeline. The process involved extracting content from PDFs, segmenting it into manageable chunks, and indexing these chunks using FAISS for efficient vector-based retrieval. Queries were enhanced with financial terms to improve relevance, followed by a retrieval step that included re-ranking to prioritize the most pertinent information. Finally, an LLM was employed to generate comprehensive answers based on the retrieved data. The source code for this implementation is publicly available.
32	AAV	▶ llm2-sim-preselected	7 days			62.9	62.5	93.9
AAV Best experiment: Agent+Router Signature: `5e0479` Summary: The architecture employs an agent-based approach with a routing mechanism. Models used: gpt-4o-mini Architecture The solution uses the 'gpt-4o-mini' model in an architecture combining an agent with a router. This design enables efficient task delegation and processing, optimizing performance for the challenge requirements. R&D Experiments Total experiments submitted: 6 Other approaches: Agent Agent + sim search + tfidf What didn't work? Using 'private model' instead of 'gpt-4o-mini' Excluding the router component Experiment journal: 7 days → R: 60.7, G: 62.8, Score: 93.1 ▲ - llm1-sim-preselected 7 days → R: 62.9, G: 62.5, Score: 93.9 ▲ - llm2-sim-preselected 7 days → R: 62.7, G: 57.3, Score: 88.7 - llm2-sim-not-preselected 7 days → R: 61.0, G: 60.8, Score: 91.3 - llm1-sim-not-preselected 7 days → R: 25.1, G: 60.9, Score: 73.5 - llm1-sim-ifidf-not-preselected 7 days → R: 27.2, G: 62.8, Score: 76.4 - llm2-sim-tfidf-not-preselected
33	AI Slop	▶ AI Slop Cursor+Sonnet 3.7, No RAG, No OCR, gpt4o-mini all the way	3 hours	🤝		80.9	53.0	93.5
AI Slop Best experiment: AI Slop Cursor+Sonnet 3.7 Signature: `fc3dc9` Summary: Utilized a streamlined approach leveraging LLMs for direct question answering. Models used: gpt-4o-mini Architecture The team employed the gpt-4o-mini model to process and answer questions directly from the provided PDF documents. By utilizing metadata and targeted queries, they efficiently narrowed down relevant information, ensuring accurate and concise responses. The approach avoided complex retrieval-augmented generation (RAG) or OCR techniques, focusing on the inherent capabilities of the LLM.
34	RAG challenge Orphist	▶ Orphist	63 min		🔒	78.8	53.0	92.4
RAG challenge Orphist Best experiment: Iterative LLM Prompting with BM25 Signature: `e98c1b` Summary: The solution employs BM25 for document retrieval and iterative LLM prompting for query expansion and summarization. Models used: gemma-2-9b-it Architecture The solution utilized an architecture combining BM25plus for document retrieval and iterative prompting of the `gemma-2-9b-it` LLM. The process involved chunking PDF documents for ingestion, storing them in an in-memory local storage, and applying BM25plus for query matching with meta-filters. Due to a last-minute issue with embedding models, the team opted for a non-hybrid pipeline. The iterative prompting expanded the initial query and used a scratchpad for summary collection, culminating in a final prompt to extract the requested information.
35	Dennis S.	▶ Deepseek naive questionfilter	7 days	🤝		81.9	50.0	91.0
Dennis S. Best experiment: Deepseek naive questionfilter Signature: `53630f` Summary: A question-centered approach leveraging document parsing and heuristic-based analysis. Models used: Deepseek V3 Architecture The solution employs a question-centered methodology to efficiently extract relevant information from documents. Initially, PDFs are parsed using PyMuPDF and Tesseract for OCR when necessary. The system analyzes provided metadata and questions to identify relevant companies and metrics, classifying questions into `single_fact` or `aggregate` types. It processes documents in parallel, extracting answers based on the question type, and aggregates results accordingly. This approach prioritizes speed and cost-efficiency. R&D Experiments Total experiments submitted: 2 Other approaches: Deepseek v3 - bruteforce questionfilter What didn't work? Using regex-based logic for question classification Dividing questions into first occurrence and aggregated types without clear pipeline integration Experiment journal: 7 days → R: 79.8, G: 50.0, Score: 89.9 ▲ - Deepseek v3 - bruteforce questionfilter 7 days → R: 81.9, G: 50.0, Score: 91.0 ▲ - Deepseek naive questionfilter
36	Slava RAG	▶ Slava RAG	7 hours	🤝		65.6	57.8	90.7
Slava RAG Best experiment: Slava RAG Signature: `282787` Summary: Embedding: OpenAI text-embedding-3-small, LLM: GPT-4o, Vector Database: Pinecone, PDF Processing: PyMuPDF, Chunk Processing: Custom algorithm Models used: gpt-4o Architecture This architecture combined: OpenAI's text-embedding-3-small for embedding generation; GPT-4o as the primary LLM; Pinecone for vector database management; PyMuPDF for efficient PDF processing; a custom algorithm for chunk processing.
37	Alex_dao	▶ Alex_Dao_v1_final	95 min			68.4	56.5	90.7
Alex_dao Best experiment: Alex_Dao_v1_final Signature: `93c0ef` Summary: Utilized a kv-index architecture. Models used: gpt4o Architecture The winning solution implemented a key-value index (kv-index) architecture, leveraging the capabilities of the GPT-4 model (gpt4o) to efficiently retrieve and process information. This approach ensured high performance and accuracy in the challenge tasks.
38	Mykyta Skrypchenko	▶ Kyiv-bge1.5	31 hours	🤝		42.1	64.2	85.3
Mykyta Skrypchenko Best experiment: Kyiv-bge1.5 Signature: `d5fb15` Summary: Integration of advanced text retrieval and vector database with LLM for question answering. Models used: gpt-4o-2024-08-06 Architecture The solution is a multi-component architecture: Fitz for efficient text retrieval BAAI/bge-base-en-v1.5 Sentence Transformer for embedding generation ChromaDB as the vector database for storage and retrieval OpenAI API for question answering
39	F-anonymous	▶ F-anonymous. Fully local, own DeepThinking	5 days	🤝	🔒	73.6	47.0	83.8
F-anonymous Best experiment: Fully local, own DeepThinking Signature: `2a2a1b` Summary: Fully local graphRAG with hybrid search and custom-tuned LLM. Models used: Qwen2.5 14b Architecture The solution by F-anonymous a fully local graph-based Retrieval-Augmented Generation (RAG) architecture. They utilized their proprietary DeepThinking framework in conjunction with a custom-tuned Qwen2.5 14b model. The system integrated a hybrid search mechanism combining vector-based and BM25 methodologies to enhance retrieval accuracy and relevance.
40	DataNXT	▶ Prototype-RAG-Challenge	5 days		🔒	54.2	55.5	82.6
DataNXT Best experiment: Prototype-RAG-Challenge Signature: `0e942a` Summary: Pipeline with specialised prompted LLM Calls Models used: OpenAi-4o-mini Architecture The solution utilized a pipeline architecture with specialized prompted calls to the OpenAi-4o-mini model. This approach allowed for efficient and accurate information retrieval and generation.
41	AValiev	▶ IBM-deepseek-agentic-rag	4 hours		🔒	43.5	60.0	81.8
AValiev Best experiment: IBM-deepseek-agentic-rag Signature: `493744` Summary: Agentic RAG with type validation, Pydantic typing, Qdrant vector store querying. Models used: deepseek/deepseek-r1-distill-llama-70b Architecture This RAG solution was based on an Agentic Retrieval-Augmented Generation (RAG) architecture. It utilized type validation and Pydantic typing for robust data handling, and Qdrant vector store querying for efficient information retrieval. PDF documents were processed using PyPDF and Docling for accurate text extraction. R&D Experiments Total experiments submitted: 5 Other approaches: openai-agentic-rag IBM-mixtral-agentic-rag granite-3-8b-instruct_rag_agentic deepseek/deepseek-r1-distill-llama-70b_sophisticated_chunking_rag_agentic What didn't work? Alternative LLM models such as OpenAI-gpt-4o-mini and mistralai/mixtral-8x7b-instruct-v01 were explored but did not achieve the same performance as the winning model. Experiment journal: 54 min → R: 43.5, G: 60.0, Score: 81.8 ▲ - openai-agentic-rag 3 hours → R: 43.5, G: 33.0, Score: 54.8 - IBM-mixtral-agentic-rag 4 hours → R: 43.5, G: 60.0, Score: 81.8 - IBM-deepseek-agentic-rag 4 hours → R: 43.5, G: 48.5, Score: 70.2 - granite-3-8b-instruct_rag_agentic 34 hours → R: 35.8, G: 53.0, Score: 70.9 - deepseek/deepseek-r1-distill-llama-70b_sophisticated_chunking_rag_agentic
42	bimurat_mukhtar	▶ bm_v1	32 hours	🤝	🔒	36.2	31.3	49.4
bimurat_mukhtar Best experiment: bm_v1 Signature: `c25e30` Summary: Multi-agent architecture with specialized branches for diverse answer generation. Models used: deepseek-r1 gemini Architecture The solution is a multi-agent architecture inspired by Self RAG, where input PDFs are converted to text, preprocessed, and filtered to extract relevant information. Different branches are utilized to handle specific types of queries, leveraging the strengths of the LLMs deepseek-r1 and gemini.
43	ragtastic	▶ ragtastic	7 days			4.8	3.0	5.4
ragtastic Best experiment: ragtastic Signature: `43d4fd` Summary: The architecture leverages the Mistral-large model for its implementation. Models used: mistral-large Architecture The solution used Mistral-large model to achieve its objectives. The architecture is designed to optimize performance and accuracy, ensuring robust results.

Video: Die Gewinnerverkündung mit Rinat Abdullin

Erleben Sie die spannendsten Minuten der Challenge noch einmal! Im diesem Video verkündet Rinat Abdullin (Head of AI und Innovation) die Siegerteams und gibt einen Einblick in die überzeugendsten Lösungen.

Bitte passen Sie ihre Cookie Einstellungen an, um das Video abspielen zu können.

Fragen oder Interesse an einer Zusammenarbeit?

Die TIMETOACT GROUP Österreich zählt zu den führenden Experten im Bereich der angewandten Forschung zu generativer KI für Unternehmen. Unsere Forschungsergebnisse fließen unmittelbar in die Produktentwicklung ein – so setzen wir höchste Standards bei der Umsetzung KI-gestützter Anwendungen für Unternehmen.

Möchten auch Sie das volle Potenzial von KI für Ihre Geschäftsprozesse nutzen? Kontaktieren Sie uns gerne!

Niklas Thannäuser freut sich, von Ihnen zu hören!

Niklas Thannhäuser

Sales Consultant TIMETOACT GROUP Österreich GmbH +43 664 750 187 82

Kontakt

Vorname

Nachname *

Unternehmen *

E-Mail *

Telefonnummer

Ihre Nachricht *

* Pflichtfelder

Wir verwenden die von Ihnen an uns gesendeten Angaben nur, um auf Ihren Wunsch hin mit Ihnen Kontakt im Zusammenhang mit Ihrer Anfrage aufzunehmen. Alle weiteren Informationen können Sie unseren Datenschutzhinweisen entnehmen.

Insights

IBM watsonx Leaderboard der Enterprise RAG Challenge

Das Team-Leaderboard fasst alle eingereichten Beiträge zusammen – auch jene, die nach Bekanntgabe der Ground Truth eingereicht wurden. Daher betrachten wir diese Rangliste als inoffizielle Übersicht.

Insights

Das sind die Gewinner der Enterprise RAG Challenge

Entdecken Sie die Gewinner der Enterprise RAG Challenge! Sehen Sie sich das offizielle Announcement an und erfahren Sie, wie KI-Retrieval und LLMs die besten RAG-Lösungen geformt haben.

Blog 21.01.25

Die Zukunft der KI: Enterprise RAG Challenge

KI-Innovation, die überzeugt: Die Enterprise RAG Challenge zeigt, was möglich ist.

Insights 17.03.25

ChatGPT & Co: Februar-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Februar 2025. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 12.11.24

ChatGPT & Co: Oktober-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Oktober 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 20.02.25

ChatGPT & Co: November-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für November 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 07.01.25

ChatGPT & Co: Dezember-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Dezember 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 01.10.24

ChatGPT & Co: September-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks vom September 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 20.02.25

ChatGPT & Co: Jänner-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Jänner 2025. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Workshop

KI Workshops für Unternehmen

Ob Grundlagen der KI, Prompt-Engineering oder Potenzial-Scouting: Unser vielfältiges KI-Workshop Angebot bietet für jeden Wunsch die passenden Inhalte.

Wissen 24.10.24

RAG-Systeme erklärt: Wettbewerbsvorteile mit IBM WatsonX

IBM WatsonX hilft mit RAG-Systemen, schnell und effizient datenbasierte Entscheidungen.

Insights

LLM Benchmarks März 2025

Was gibt’s Neues in der Welt der LLMs? Finden Sie es heraus – und lesen Sie, warum Google DeepMind uns im vergangenen Monat gleich mehrfach überrascht hat.

Wissen 30.07.24

LLM-Benchmarks Juli 2024

Unser LLM Leaderboard aus Juli 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Wissen 30.06.24

LLM-Benchmarks Juni 2024

Unser LLM Leaderboard aus Juni 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Wissen 30.05.24

LLM-Benchmarks Mai 2024

Unser LLM Leaderboard aus Mai 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Blog 16.05.24

In 8 Schritten zu AI-Innovationen im Unternehmen

Künstliche Intelligenz ist längst mehr als ein Schlagwort – sie schafft echten Business Value. Mit unserem achtstufigen Ansatz unterstützen wir Unternehmen auf dem Weg zur erfolgreichen AI-Nutzung.

Wissen 30.04.24

LLM-Benchmarks April 2024

Unser LLM Leaderboard aus April 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Blog 16.05.24

Fehler in der Entwicklung von AI-Assistenten

Erfolgreiche AI-Assistenten starten mit den richtigen Learnings: Erfahren Sie, warum viele Unternehmen scheitern und wie Sie typische Fehler vermeiden – für eine optimale Umsetzung Ihrer AI-Projekte!

Wissen 30.08.24

LLM-Benchmarks August 2024

Anstelle unserer allgemeinen LLM Benchmarks, präsentieren wir Ihnen im August den ersten Benchmark verschiedener KI-Architekturen.

Branche

Künstliche Intelligenz im Treasury Management

Optimieren Sie Treasury-Prozesse durch KI: Automatisierte Berichte, Vorhersagen und Risikomanagement.

Das Team-Leaderboard

Das Team-Leaderboard

Ilia Ris

Architecture

R&D Experiments

Emil Shagiev

Architecture

R&D Experiments

Dmitry Buykin

Architecture

Sergey Nikonov

Architecture

R&D Experiments

ScrapeNinja.net

Architecture

R&D Experiments

xsl777

Architecture

R&D Experiments

nikolay_sheyko(grably.tech)

Architecture

R&D Experiments

Felix-TAT

Architecture

R&D Experiments

A.Rasskazov/V.Kalesnikau

Architecture

R&D Experiments

Dany the creator

Architecture

SergC

Architecture

Swisscom Innovation Lab

Architecture

R&D Experiments

fomih

Architecture

R&D Experiments

Al Bo

Architecture

NumericalArt

Architecture

R&D Experiments

Pedro Ananias

Architecture

R&D Experiments

Daniyar

Architecture

R&D Experiments

RubberduckLabs

Architecture

R&D Experiments

Machine Learning Reply

Architecture

R&D Experiments

Aleksandr Podgaiko

Architecture

Vlad Drobotukhin (@mrvladd)

Architecture

R&D Experiments

Ivan R.

Architecture

PENZA_AI_CREW

Architecture

R&D Experiments

Yolo leveling

Architecture

R&D Experiments

ArtemNurm

Architecture

R&D Experiments

ndt by red_mad_robot

Architecture

R&D Experiments

Neoflex DreamTeam

Architecture

R&D Experiments

nightwalkers

Architecture

Gleb Kozhaev