Crowd-Sourced AI research
 
 
Die Enterprise RAG Challenge ist ein weltweites Innovationsprojekt, bei dem die Grenzen von RAG neu definiert wurden: Ein crowdgesourctes KI-Forschungsprojekt mit echtem Praxisbezug und maximalem Impact.
 

Im Mittelpunkt stand ein realer Business Use-Case: Anhand von 100 Geschäftsberichten (teilweise tausende Seiten) wurde ermittelt, welches RAG-System am besten antwortet. Eingereichte Ansätze reichten von klassischen Retrieval-Pipelines über Multi-Agenten-Systeme bis zu lokalen Lösungen – um effektive RAG-Lösungen zu entwickeln und voneinander zu lernen.

Das Team-Leaderboard

Das Team-Leaderboard fasst alle eingereichten Beiträge zusammen – auch jene, die nach der Bekanntgabe der Ground Truth eingereicht wurden. Daher betrachten wir diese Rangliste als inoffizielle Übersicht.

Dieses vollständige Team-Leaderboard zeigt die besten Enterprise-RAG-Lösungen pro Team.
Berücksichtigt wurden nur Einreichungen, bei denen auch ein Architekturfragebogen ausgefüllt wurde. Wir haben alle Fragebögen sorgfältig ausgewertet – personenbezogene Daten wurden dabei entfernt, zentrale Erkenntnisse aus den Experimenten zusammengefasst.

Die Ergebnisse kann man direkt in der interaktiven Tabelle entdecken: Klicken Sie einfach auf eine Zeile, um die Detailansicht des jeweiligen Teams zu öffnen.

Was bedeuten die Spalten? Alle Werte im Überblick
  • "R&D" – kennzeichnet Teams, die an Forschungs- und Entwicklungsaktivitäten in unseren Communities teilnehmen. Treten Sie unserem Discord-Kanal bei, um über neue Initiativen auf dem Laufenden zu bleiben!
  • "Time" – wie viel Zeit seit dem Zeitpunkt vergangen ist, als wir die Fragen für die Challenge generiert haben.
  • "R Score" – die Qualität des Retrieval-Teils von RAG. Er wurde ermittelt, indem die angegebenen Referenzen mit den Ground-Truth-Seitennummern verglichen wurden.
  • "G Score" – die Qualität des Generation-Teils von RAG. Er wird berechnet, indem die generierten Antworten mit dem Ground-Truth-Datensatz verglichen werden.
  • "Score" – die endgültige Punktzahl: R/3 + G. Das theoretische Maximum lag bei 133,3.
  • "Local" – zeigt an, ob sich die Lösung vollständig offline ausführen lässt.

# Team Experiment Time R&D Local R G Score
1 Ilia Ris
Dense Retrieval; Router; LLM reranking; o3-mini
49 min 🤝 83.8 81.8 123.7

Ilia Ris

  • Best experiment: Dense Retrieval; Router; LLM reranking; o3-mini
  • Signature: f1d79f
  • Summary: Dense retrieval combined with LLM reranking and SO CoT.

Models used:

  • o3-mini-2025-01-31

Architecture

Ilia Ris solved the problem by making it easy to run numerous experiments before the competition has even started. He created an evaluation pipeline that let him quickly evaluate different architectural solutions. The best solution was also among the fastest ones.

The winning experiment had this configuration:

  • PDF Analysis: Documents are processed using a highly modified Docling Library from IBM. Modifications were needed to preserve page references.
  • Router Pattern: First step in question answering flow picks the most suitable agent.
  • Dense Retrieval: The system searches for relevant information based on semantic similarity (FAISS library and OpenAI vector embeddings).
  • Parent Document Retrieval: Instead of retrieving only the chunk, full page is loaded to preserve relevant context.
  • LLM Reranking: Retrieved information is re-evaluated and reordered by the LLM.
  • Reasoning Patterns: Improve LLM accuracy within a single prompt by controlling its thinking process with Custom Chain-of-Thought and Structured Outputs.
  • Final Answer generation: The optimized result is generated using o3-mini.
  • Self-Consistency with Majority Vote: Multiple answer variations are generated, compared, and the most consistent one is selected.

R&D Experiments

Total experiments submitted: 11

Other approaches:

  • Dense Retrieval; LLM Reranking; Router; SO CoT; o3-mini
  • Dense Retrieval; Router; SO CoT; llama3.3-70b
  • Dense Retrieval; Tables serialization; Router; LLM reranking; o3-mini
  • Dense Retrieval; llama-3.3 70b
  • Dense Retrieval; llama-3.1 8b
  • Full Context; gemini-2.0 thinking
  • Dense Retrieval; Router; LLM reranking; Self-Consistency; o3-mini
  • Dense Retrieval; Router; LLM reranking; Self-Consistency; llama-3.3 70b

What didn't work?

  • Using llama-3.1 8b for reranking
  • Incorporating Full Context with gemini-2.0 thinking

Future experiments:

  • Evaluating various local embedding models for fully offline solutions

Experiment journal:

  • 16 min → R: 83.9, G: 72.8, Score: 114.8 ▲ - Dense Retrieval; LLM Reranking; Router; SO CoT; o3-mini
  • 23 min → R: 81.4, G: 74.7, Score: 115.4 ▲ - Dense Retrieval; llama-3.3 70b
  • 49 min → R: 83.8, G: 81.8, Score: 123.7 ▲ - Dense Retrieval; Router; LLM reranking; o3-mini
  • 50 min → R: 81.1, G: 68.7, Score: 109.3 - Dense Retrieval; llama-3.1 8b
  • 51 min → R: 75.5, G: 75.0, Score: 112.8 - Full Context; gemini-2.0 thinking
  • 66 min → R: 83.0, G: 78.8, Score: 120.3 - Dense Retrieval; Tables serialization; Router; LLM reranking; o3-mini
  • 22 hours → R: 83.5, G: 81.8, Score: 123.6 - Dense Retrieval; Router; LLM reranking; o3-mini
  • 22 hours → R: 80.8, G: 75.7, Score: 116.1 - Dense Retrieval; llama-3.3 70b
  • 33 hours → R: 83.4, G: 79.8, Score: 121.6 - Dense Retrieval; Router; LLM reranking; Self-Consistency; o3-mini
  • 33 hours → R: 81.3, G: 79.7, Score: 120.3 - Dense Retrieval; Router; LLM reranking; Self-Consistency; llama-3.3 70b
2Emil Shagiev
LLM_Search
55 min🤝86.378.5121.6

Emil Shagiev

  • Best experiment: LLM_Search
  • Signature: 0a8782
  • Summary: A multi-step process involving query expansion, efficient search, question answering, and answer finalization.

Models used:

  • gpt-4o-mini-2024-07-18
  • gpt-4o-2024-08-06
  • o3-mini-2025-01-31

Architecture

The best solution didn't use vector embeddings, it leveraged a structured approach:

  • the input query is expanded to enhance search coverage and enable semantic search;
  • relevant pages are retrieved using a cost-effective and rapid LLM;
  • retrieved information is then passed to powerful LLM to generate answers;
  • answers are refined and finalized for presentation.

R&D Experiments

Total experiments submitted: 3

Other approaches:

  • LLL_Search_2: Similar architecture with added capability for mathematical operations.

Experiment journal:

  • 55 min → R: 86.3, G: 78.5, Score: 121.6 ▲ - LLM_Search
  • 21 hours → R: 86.1, G: 77.5, Score: 120.5 - LLL_Search_2
3Dmitry Buykin
slow-run-and-bugs
8 hours🤝81.476.8117.5

Dmitry Buykin

  • Best experiment: Dynamic Structured Output with SEC EDGAR Ontologies
  • Signature: 6b0d78
  • Summary: Dynamic structured output with query expansion and page-focused chunking.

Models used:

  • gpt-4o-2024-08-06

Architecture

Used SO/CoT approach with ontologies to retrieve relevant information.

Key highlights:

  • embeddings and vector databases were not used;
  • dynamic structured output approach combined with SEC EDGAR ontologies for query expansion (SO CoT);
  • utilized CBOW similarity for majority selection across multiple runs, focusing on balancing pages versus tokens during chunking
  • significant effort was dedicated to evaluating PDF quality heuristics to optimize OCR input
  • synthetic tags were implemented to stabilize page detection and assess model quality.
4Sergey Nikonov
main v2
30 hours🤝85.173.9116.4

Sergey Nikonov

  • Best experiment: main v2
  • Signature: 00c0e1
  • Summary: For every question, all pages are processed using gpt-4o.

Models used:

  • gpt-4o
  • o1-mini

Architecture

Solution involves feeding all pages of the provided documents into the gpt-4o model for each question. This simple but practical approach ensures comprehensive coverage of the content to extract accurate answers.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Finding the PDFs that correspond to questions, cutting the PDFs by page, running the question against each PDF page by loading the PDF directly into gpt-4o (through the assistant API), scanning all PDF pages for the answer, and combining the answers by simple logic.

What didn't work?

  • Using the o3-mini model instead of o1-mini in the architecture.

Experiment journal:

  • 5 hours → R: 85.3, G: 69.0, Score: 111.6 ▲ - Main
  • 30 hours → R: 85.1, G: 73.9, Score: 116.4 ▲ - main v2
5ScrapeNinja.net
fixed multiple companies search
23 hours🤝82.671.2112.5

ScrapeNinja.net

  • Best experiment: fixed multiple companies search
  • Signature: 417bbf
  • Summary: Node.js-based architecture utilizing pgvector for efficient data handling.

Models used:

  • Gemini Flash 2.0
  • Gemini Flash Lite 2.0
  • Flash Thinking Exp

Architecture

The solution used Node.js for backend operations and pgvector for vectorized data processing. It focused on efficient handling of complex queries and data retrieval tasks.

The team utilized:

  • Gemini Flash 2.0
  • Gemini Flash Lite 2.0
  • Flash Thinking Exp.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • OCR and PG

Experiment journal:

  • 20 hours → R: 82.6, G: 64.2, Score: 105.5 ▲ - OCR and PG
  • 23 hours → R: 82.6, G: 71.2, Score: 112.5 ▲ - fixed multiple companies search
6xsl777
multi-query, gpt-4o
16 hours🤝79.471.2110.9

xsl777

  • Best experiment: multi-query, gpt-4o
  • Signature: 66ab5c
  • Summary: Structured PDF parsing, metadata extraction, query expansion, hybrid search, reranking, and CoT.

Models used:

  • gpt-4o
  • gpt-4o-mini

Architecture

The architecture integrates following patterns:

  • structured PDF parsing and chunking;
  • metadata extraction;
  • query expansion;
  • hybrid search mechanisms;
  • reranking strategies.

It synthesizes document metadata and chunks while utilizing Chain-of-Thought (CoT) reasoning to enhance response accuracy and relevance. gpt-4o and gpt-4o-mini help with high-quality language understanding and generation capabilities.

R&D Experiments

Total experiments submitted: 2

Experiment journal:

  • 16 hours → R: 79.4, G: 71.2, Score: 110.9 ▲ - multi-query, gpt-4o
  • 3 days → R: 80.1, G: 70.7, Score: 110.7 - Open source, Advanced RAG
7nikolay_sheyko(grably.tech)
nikolay_sheyko(grably.tech)_with_o3_mini
25 hours🤝81.169.8110.4

nikolay_sheyko(grably.tech)

  • Best experiment: nikolay_sheyko(grably.tech)_with_o3_mini
  • Signature: db8938
  • Summary: Relevant pages are identified and processed to generate answers.

Models used:

  • gpt-4o-mini
  • o3-mini

Architecture

The solution employs a two-step process:

  • first, it identifies relevant reports for a given question and evaluates the relevance of each page asynchronously using the gpt-4o-mini model;
  • then , all relevant pages are compiled into a prompt, and the o3-mini model is utilized to generate the final answer.

R&D Experiments

Total experiments submitted: 7

Other approaches:

  • Dynamic data extraction with pydantic classes
  • Binary checks per page
  • Parallel question splitting
  • Subquestion generation for multi-entity queries
  • Single-page reference experiments

What didn't work?

  • Binary checks per page
  • Single-page reference experiments

Experiment journal:

  • 55 min → R: 77.2, G: 51.2, Score: 89.9 ▲ - grably.tech/with_extra_reasoning_from_different_pages_hacked96160725
  • 25 hours → R: 81.1, G: 69.8, Score: 110.4 ▲ - nikolay_sheyko(grably.tech)_with_o3_mini
  • 25 hours → R: 79.7, G: 60.2, Score: 100.1 - nikolay_sheyko(grably.tech)_dummy
  • 8 days → R: 80.5, G: 64.3, Score: 104.6 - o3-mini-no-restrictions
  • 8 days → R: 80.5, G: 66.3, Score: 106.6 - o3-mini-no-restrictions-fixed-names
  • 12 days → R: 81.2, G: 67.1, Score: 107.7 - o3-mini-no-restrictions-single-reference
  • 12 days → R: 80.5, G: 67.3, Score: 107.6 - o3-mini-no-restrictions-fixed-names-and-boolean
8Felix-TAT
Gemini-4o Multiagent RAG
7 days🤝80.269.3109.4

Felix-TAT

  • Best experiment: Gemini-4o Multiagent RAG
  • Signature: a2faff
  • Summary: Multiagent, mixed-model approach with delegation and execution agents.

Models used:

  • gemini-2.0-flash
  • gpt-4o-2024-08-06

Architecture

The solution uses a multiagent architecture where a delegation manager (OpenAI) splits the user query into company-specific subqueries. These subqueries are processed by expert agents using Google's Gemini flash model, which has access to the entire company PDF in context. The responses are then aggregated and synthesized by an execution agent (OpenAI) to produce the final answer.

R&D Experiments

Total experiments submitted: 4

Other approaches:

  • Gemini Naive
  • IBM-4o-based Multiagent RAG
  • OpenAI Multiagent RAG

What didn't work?

  • Using a single model without multiagent delegation
  • Relying solely on vector database retrieval without full PDF context

Experiment journal:

  • 6 days → R: 79.0, G: 60.3, Score: 99.8 ▲ - Gemini Naive
  • 7 days → R: 81.7, G: 47.3, Score: 88.2 - IBM-4o-based Multiagent RAG
  • 7 days → R: 82.2, G: 66.0, Score: 107.1 ▲ - OpenAI Multiagent RAG
  • 7 days → R: 80.2, G: 69.3, Score: 109.4 ▲ - Gemini-4o Multiagent RAG
9A.Rasskazov/V.Kalesnikau
multi_agent_ibm_openai
30 hours84.067.2109.3

A.Rasskazov/V.Kalesnikau

  • Best experiment: multi_agent_ibm_openai
  • Signature: efabd4
  • Summary: A multi-agent system leveraging LLMs for question answering using similarity-based retrieval.

Models used:

  • meta-llama/llama-3-405b-instruct
  • ibm/granite-embedding-107m-multilingual
  • text-embedding-3-small
  • gpt-4o-mini

Architecture

The solution employs a multi-agent architecture to address the challenge.

Initially, it generates a database for the Retrieval-Augmented Generation (RAG) model. Upon receiving a query, the system extracts key metrics such as company, industry, and currency. These metrics are then used to identify the most similar question in the database. The answer associated with this similar question is retrieved and refined using a Large Language Model (LLM). Finally, the system consolidates and presents the answer to the user.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • pjatk_team_002: A system that preprocesses questions, retrieves relevant PDF pages using a vector database, and extracts answers with page references using LLMs.

What didn't work?

  • Alternative embedding models for retrieval.
  • Different strategies for key metric extraction.

Experiment journal:

  • 30 hours → R: 84.0, G: 67.2, Score: 109.3 ▲ - multi_agent_ibm_openai
  • 7 days → R: 82.5, G: 64.0, Score: 105.2 - pjatk_team_002
10Dany the creator
gpt-4o-mini + pgvector
3 hours🤝82.867.0108.4

Dany the creator

  • Best experiment: gpt-4o-mini + pgvector
  • Signature: ee29ae
  • Summary: Utilized a structured approach to parse and analyze text chunks, creating embeddings and generating questions.

Models used:

  • gpt-4o-mini

Architecture

The solution preprocesses text by chunking, generating embeddings with pgvector library, and formulating questions that could be answered by the respective chunks.

11SergC
submission_1
7 days🤝77.569.3108.1

SergC

  • Best experiment: submission_1
  • Signature: c0d776
  • Summary: QE + SO + CoT

Models used:

  • gemini 2.0

Architecture

The solution uses a combination of:

  • Query Expansion (QE)
  • Semantic Optimization (SO)
  • Chain of Thought (CoT) reasoning to enhance the performance of the Gemini 2.0 model.
12Swisscom Innovation Lab
Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3
21 hours🔒83.366.2107.8

Swisscom Innovation Lab

  • Best experiment: Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3
  • Signature: debcf6
  • Summary: A multi-agent system leveraging LangGraph, LlamaIndex, MarkerPDF, and Llama 3.3 for accurate and contextual multi-company query processing.

Models used:

  • llama-3.3-70b-instruct

Architecture

This offline solution uses a multi-agent architecture with:

  • LangGraph for workflow orchestration
  • LlamaIndex for data indexing
  • MarkerPDF for document parsing
  • Llama 3.3 for natural language processing.

Solution supports multi-company queries by:

  • extracting relevant entities
  • validating inputs
  • processing each entity individually
  • retrieving and evaluating documents
  • aggregating results for numeric-based comparisons.

R&D Experiments

Total experiments submitted: 3

Other approaches:

  • Iterative refinement of query processing pipeline
  • Enhanced document retrieval mechanisms

What didn't work?

  • Simplified single-agent architecture
  • Direct query-to-response mapping without intermediate validation

Experiment journal:

  • 80 min → R: 83.3, G: 65.2, Score: 106.8 ▲ - Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3
  • 21 hours → R: 83.3, G: 66.2, Score: 107.8 ▲ - Multi-Agent Langgraph-Llamaindex-MarkerPDF-Llama3.3
13fomih
gemini-flash CoT + so small fixes in question type detection
10 days🤝83.065.9107.4

fomih

  • Best experiment: gemini-flash CoT with question type detection fixes
  • Signature: 60bc28
  • Summary: Enhanced question type detection for improved accuracy.

Models used:

  • gemini-flash 2.0

Architecture

The solution utilized the gemini-flash 2.0 model, incorporating a refined approach to question type detection. This enhancement aimed to improve the accuracy and relevance of the responses generated by the system. The architecture involved preprocessing input documents into structured formats, creating knowledge bases tailored to specific question types, and leveraging these resources during the question-answering phase. The system identified the question type and relevant entities, retrieved pertinent knowledge base entries, and generated answers by combining the question with the retrieved data.

R&D Experiments

Total experiments submitted: 4

Other approaches:

  • gemini-flash CoT with structured output
  • gemini-flash CoT with structured output and small fixes
  • gemini CoT with structured output final

What didn't work?

  • Initial handling of 'n/a' cases
  • Fallback processing without structured knowledge bases

Experiment journal:

  • 10 days → R: 83.2, G: 59.9, Score: 101.5 ▲ - _gemini-flash CoT + structured output _
  • 10 days → R: 82.9, G: 62.8, Score: 104.3 ▲ - gemini-flash CoT + structured output small n/a handling fixex
  • 10 days → R: 83.0, G: 65.9, Score: 107.4 ▲ - gemini-flash CoT + so small fixes in question type detection
  • 12 days → R: 83.3, G: 64.4, Score: 106.1 - gemini CoT + SO final
14Al Bo
albo
12 days81.165.3105.9

Al Bo

  • Best experiment: albo
  • Signature: 1e89b6
  • Summary: Docling, Vector, Agent with search tool into documents

Models used:

  • gpt-4o

Architecture

The solution utilized a sophisticated architecture combining document processing (Docling), vector-based representation, and an agent equipped with a search tool for document retrieval.

15NumericalArt
Vhck-R0-002
8 days70.070.3105.3

NumericalArt

  • Best experiment: Vhck-R0-002
  • Signature: 32aae7
  • Summary: Preprocessing questions, raw retrieval, filtering, retrieval, detailed page analysis, and answer generation.

Models used:

  • 4o-mini
  • 4o
  • 3o-mini

Architecture

The best employs a structured approach to information retrieval and answer generation. The process begins with preprocessing the input questions to enhance clarity and relevance. This is followed by an initial raw retrieval phase to gather potential information sources. Subsequently, a filtering mechanism is applied to refine the retrieved data. The refined data undergoes a detailed page analysis to extract precise and contextually relevant information. Finally, the system generates answers based on the analyzed data, leveraging the capabilities of the LLM models 4o-mini, 4o, and 3o-mini.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Parsing text from PDFs only, separate VDB for each document, one chunk equals one page, extract four pages by entity value from question (excluding company name), detailed parsing of extracted pages, asking LLM question with detailed information in context.

Experiment journal:

  • 7 days → R: 75.9, G: 63.3, Score: 101.3 ▲ - Vhck-R0
  • 8 days → R: 70.0, G: 70.3, Score: 105.3 ▲ - Vhck-R0-002
16Pedro Ananias
rag-3w-cot-gpt-4o-mini
4 hours🤝80.464.7104.9

Pedro Ananias

  • Best experiment: rag-3w-cot-gpt-4o-mini
  • Signature: d44b72
  • Summary: A 3-way FAISS MMR Search & Stepped Chain Of Thought RAG

Models used:

  • openai/gpt-4o-mini

Architecture

The solution uses a 3-way FAISS MMR Search mechanism combined with a Chain Of Thought (CoT) approach.

FAISS MMR Search involves query expansion, file selection based on exact matches and cosine similarity, and database searching using maximum marginal relevance.

CoT pipeline consists of three sequential model calls with specific prompts for reasoning, formatting, and parsing. This architecture leverages the openai/gpt-4o-mini model for processing.

R&D Experiments

Total experiments submitted: 5

Other approaches:

  • rag-3w-cot-gpt-4o-mini-hi-res
  • rag-3w-cot-deepseek-r1-distill-llama-8B-fast-fp16
  • rag-3w-cot-deepseek-r1-distill-llama-8B-hi-res-fp16
  • rag-3w-cot-microsoft-phi4-14B-hi-res-int8

What didn't work?

  • Using lower resolution PDF extraction for certain tasks
  • Employing fully local processing without cloud integration in some scenarios

Experiment journal:

  • 4 hours → R: 80.4, G: 64.7, Score: 104.9 ▲ - rag-3w-cot-gpt-4o-mini
  • 9 hours → R: 70.6, G: 56.0, Score: 91.3 - rag-3w-cot-deepseek-r1-distill-llama-8B-fast-fp16
  • 9 hours → R: 77.0, G: 64.6, Score: 103.1 - rag-3w-cot-gpt-4o-mini-hi-res
  • 11 hours → R: 72.3, G: 58.0, Score: 94.2 - rag-3w-cot-deepseek-r1-distill-llama-8B-hi-res-fp16
  • 31 hours → R: 78.1, G: 59.7, Score: 98.7 - rag-3w-cot-microsoft-phi4-14B-hi-res-int8
17Daniyar
Fixed reference page indices
3 days62.472.9104.1

Daniyar

  • Best experiment: Fixed reference page indices
  • Signature: 8bb723
  • Summary: The architecture utilizes fixed reference page indices for efficient information retrieval.

Models used:

  • gpt-4o

Architecture

Solution uses a strategy of fixed reference page indices to enhance the accuracy and efficiency of document parsing and question answering.

This approach ensures that the model can quickly locate and utilize relevant information from the provided documents, leveraging the capabilities of the GPT-4o model.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Sliding window PDF page reading with checklists over questions addressed to files.

What didn't work?

  • Alternative indexing methods or dynamic page referencing strategies.

Experiment journal:

  • 3 days → R: 62.2, G: 72.9, Score: 104.0 ▲ - First draft
  • 3 days → R: 62.4, G: 72.9, Score: 104.1 ▲ - Fixed reference page indices
18RubberduckLabs
RAG experiment
2 days🔒74.566.0103.3

RubberduckLabs

  • Best experiment: RubberduckLabs - RAG experiment attempt 001
  • Signature: ee7519
  • Summary: A multi-step LLM processing pipeline for document question-answering.

Models used:

  • deepseek-r1-distill-llama-70b:bf16
  • llama-3.1-70b-instruct:bf16

Architecture

The architecture preprocesses documents to generate detailed page-level summaries and extracting structured metadata, particularly focusing on financial data.

The retrieval process employs a two-stage approach:

  • document selection based on metadata matching;
  • precise page identification using semantic relevance and explicit reasoning.

Answer generation utilizes 'Context-Guided Response Generation' combining retrieved contexts with structured reasoning to ensure factual accuracy and traceability. The system maintains explicit reasoning trails and incorporates robust error handling for production stability.

R&D Experiments

Total experiments submitted: 2

19Machine Learning Reply
ML Reply - Submission 1
28 hours74.566.0103.2

Machine Learning Reply

  • Best experiment: ML Reply - Submission 1
  • Signature: fa34f3
  • Summary: Integration of Azure Document Intelligence and Azure AI Search.

Models used:

  • GPT-4o

Architecture

This solution utilized a combination of Azure Document Intelligence for document processing and Azure AI Search for efficient information retrieval.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • ML Reply - Submission 2

Experiment journal:

  • 28 hours → R: 74.5, G: 66.0, Score: 103.2 ▲ - ML Reply - Submission 1
  • 29 hours → R: 74.0, G: 63.5, Score: 100.5 - ML Reply - Submission 2
20Aleksandr Podgaiko
smolagent_simple_v1
3 days🤝81.262.3103.0

Aleksandr Podgaiko

  • Best experiment: smolagent_simple_v1
  • Signature: 6afedb
  • Summary: Utilized smolagents library with basic PDF extraction and a coding agent.

Models used:

  • openrouter/google/gemini-2.0-flash-001

Architecture

The solution employed the HuggingFace smolagents library for agent-based interactions, integrating basic PDF extraction using PyPDF2. The architecture featured a default coding agent equipped with two tools: pdf_search for keyword-based search with contextual display and pdf_content for full-page content retrieval upon request. Additionally, the final_answer tool was customized to adhere to the submission format.

21Vlad Drobotukhin (@mrvladd)
Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router
6 days🤝🔒68.368.2102.3

Vlad Drobotukhin (@mrvladd)

  • Best experiment: Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router
  • Signature: fa77e2
  • Summary: System combining LLM-based reasoning with optimized retrieval techniques.

Models used:

  • Qwen-2.5-72b-INT4

Architecture

This offline solution employs a multi-step process:

  • start with question analysis to determine the type and domain;
  • generate multiple search queries to maximize recall;
  • relevant pages are retrieved using OpenSearch and processed with domain-specific LLM extractors to build structured knowledge;
  • final answers are synthesized with reasoning and confidence scores.

R&D Experiments

Total experiments submitted: 10

Other approaches:

  • Qwen2.5 72b + FTS (rephrase query) +SO + CheckList's
  • Qwen2.5 72b + FTS +SO + CheckList's
  • Qwen2.5 + FTS (rephrase query) + SO + CheckList's
  • Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction
  • Qwen 2.5-72b + Multi-Query BM25 (top 15 pages) + Domain-Specific Information Extraction + Router
  • Qwen 2.5-72b + Multi-Query BM25+ Domain-Specific Information Extraction + Router
  • Qwen 2.5-72b-4bit + BM25 + Domain-Specific Information Extraction + Router
  • MagicQwen-4bit + BM25 + Domain-Specific Information Extraction + Router
  • Qwen 72b-4bit + FTS + Domain-Specific Information Extraction 0803

What didn't work?

  • Simplified query generation without diversification
  • Lack of domain-specific term boosting
  • Absence of structured output validation

Experiment journal:

  • 3 days → R: 74.7, G: 59.2, Score: 96.5 ▲ - Qwen2.5 72b + FTS (rephrase query) +SO + CheckList's
  • 3 days → R: 71.8, G: 62.3, Score: 98.2 ▲ - Qwen2.5 72b + FTS +SO + CheckList's
  • 4 days → R: 74.7, G: 59.2, Score: 96.5 - Qwen2.5 + FTS (rephrase query) + SO + CheckList's
  • 5 days → R: 69.1, G: 65.7, Score: 100.2 ▲ - Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction
  • 6 days → R: 68.3, G: 68.2, Score: 102.3 ▲ - Qwen 2.5-72b + Multi-Query BM25 + Domain-Specific Information Extraction + Router
  • 7 days → R: 67.6, G: 67.4, Score: 101.2 - Qwen 2.5-72b + Multi-Query BM25 (top 15 pages) + Domain-Specific Information Extraction + Router
  • 8 days → R: 64.6, G: 62.0, Score: 94.3 - Qwen 2.5-72b + Multi-Query BM25+ Domain-Specific Information Extraction + Router
  • 9 days → R: 61.9, G: 63.0, Score: 93.9 - Qwen 2.5-72b-4bit + BM25 + Domain-Specific Information Extraction + Router
  • 9 days → R: 69.2, G: 63.2, Score: 97.8 - MagicQwen-4bit + BM25 + Domain-Specific Information Extraction + Router
  • 10 days → R: 78.4, G: 63.0, Score: 102.2 - Qwen 72b-4bit + FTS + Domain-Specific Information Extraction 0803
22Ivan R.
Round 2 submission
71 min🤝79.962.0101.9

Ivan R.

  • Best experiment: Round 2 submission
  • Signature: b29973
  • Summary: A multi-step approach leveraging LLMs for question decomposition, search, and validation.

Models used:

  • gpt-4o
  • gpt-4o-mini

Architecture

The solution employs a structured pipeline:

  • document loading using PyPDFDirectoryLoader from LangChain;
  • question decomposition with GPT-4o;
  • multiple OpenAI assistants, each dedicated to a specific company, perform targeted searches using GPT-4o-mini;
  • results undergo answer validation with GPT-4o
  • local FAISS vector store is used for similarity search to collect reference pages.
23PENZA_AI_CREW
gpt-4_claude3.5_unstructured
7 days🤝72.565.0101.3

PENZA_AI_CREW

  • Best experiment: gpt-4_claude3.5_unstructured
  • Signature: 67ee86
  • Summary: A multi-step pipeline leveraging OCR, table/image analysis, and knowledge mapping for accurate question answering.

Models used:

  • gpt-4-mini
  • claude 3.5
  • gpt-4o

Architecture

This RAG pipeline was composed of the following steps:

  • PDF text is parsed using Unstructured library with OCR
  • Tables and images are analyzed using Claude 3.5
  • Knowledge map is constructed using gpt-4-mini, utilizing Structured Outputs.
  • Questions are analyzed in conjunction with the knowledge map using gpt-4-mini with Pydantic schema.
  • Answers are generated by gpt-4o, employing chain-of-thought reasoning and Pydantic schema (SO CoT).

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • RAG_PNZ_PAYPLINE: OCR with Unstructured, table/image analysis with Claude 3.5, metadata extraction with gpt-4-mini, and final reasoning with gpt-4o.

What didn't work?

  • Alternative OCR methods not utilizing Unstructured.
  • Direct question answering without intermediate knowledge mapping.

Experiment journal:

  • 7 days → R: 12.2, G: 11.0, Score: 17.1 ▲ - RAG_PNZ_PAYPLINE
  • 7 days → R: 72.5, G: 65.0, Score: 101.3 ▲ - gpt-4_claude3.5_unstructured
24Yolo leveling
Marker + Gemini
25 hours82.259.9101.0

Yolo leveling

  • Best experiment: Marker + Gemini
  • Signature: 31b473
  • Summary: Convert PDFs to markdown, extract company names, and generate JSON representations.

Models used:

  • Surya (OCR)
  • Flash 2.0

Architecture

The solution starts converting each PDF document into markdown format using the Marker tool with OCR capabilities. Afterward, the system identifies the company name within the content. In cases where multiple companies are mentioned in the query, the system employs a hallucination control mechanism to determine the most relevant company. The markdown content is then incorporated into the context for the LLM, which extracts and generates a structured JSON representation of the required information.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Gemini 1M pdf "thinking" + 4o parser

What didn't work?

  • Queries involving multiple companies were marked as N/A in alternative approaches.

Experiment journal:

  • 25 hours → R: 76.0, G: 60.0, Score: 98.0 ▲ - Gemini 1M pdf "thinking" + 4o parser
  • 25 hours → R: 82.2, G: 59.9, Score: 101.0 ▲ - Marker + Gemini
25ArtemNurm
brute_flash2.0&brute_flash2.0
7 days🤝77.861.099.9

ArtemNurm

  • Best experiment: brute_flash2.0&brute_flash2.0
  • Signature: 46e0e0
  • Summary: PDF2MD with Flash, relevant data extraction with Flash, the data is sent to LLM with questions using SO (no CoT). All steps include generator-critic workflow.

Models used:

  • Gemini Flash 2.0
  • OpenAI o3-mini

Architecture

The winning experiment employs a robust architecture leveraging the Gemini Flash 2.0 and OpenAI o3-mini models. The process involves converting PDF documents to Markdown format using Flash, extracting relevant data, and querying the LLM with specific questions using a straightforward approach without chain-of-thought reasoning.

A generator-critic workflow is integrated into all steps to ensure high-quality outputs.

R&D Experiments

Total experiments submitted: 8

Other approaches:

  • brute_flash2.0&CoT_flash2.0
  • index_flash2.0&brute_flash2.0
  • index_flash2.0&CoT_4o-2024-11-20
  • index_flash2.0&CoT_flash2.0
  • index_flash2.0&CoT_o3-mini-high
  • index_flash2.0&CoT_o3-mini
  • flash2.0_sees_all_content

What didn't work?

  • Using chain-of-thought reasoning in 'brute_flash2.0&CoT_flash2.0' did not outperform the winning approach.
  • Concatenating all Markdown files into a single string in 'flash2.0_sees_all_content' was less effective.

Experiment journal:

  • 7 days → R: 77.8, G: 61.0, Score: 99.9 ▲ - brute_flash2.0&brute_flash2.0
  • 7 days → R: 77.7, G: 61.0, Score: 99.8 - brute_flash2.0&CoT_flash2.0
  • 7 days → R: 68.5, G: 57.6, Score: 91.8 - index_flash2.0&brute_flash2.0
  • 7 days → R: 66.4, G: 56.8, Score: 90.0 - index_flash2.0&CoT_4o-2024-11-20
  • 7 days → R: 66.3, G: 57.6, Score: 90.7 - index_flash2.0&CoT_flash2.0
  • 7 days → R: 65.6, G: 58.8, Score: 91.6 - index_flash2.0&CoT_o3-mini-high
  • 7 days → R: 65.9, G: 59.3, Score: 92.2 - index_flash2.0&CoT_o3-mini
  • 7 days → R: 71.8, G: 55.6, Score: 91.4 - flash2.0_sees_all_content
26ndt by red_mad_robot
qwen32b+bge_m3
9 days🤝🔒72.963.299.7

ndt by red_mad_robot

  • Best experiment: qwen32b+bge_m3
  • Signature: 30f0d1
  • Summary: PDFs were converted to markdown, vectorized using bge m3, and queried with Qwen 32B.

Models used:

  • Qwen 32B instruct
  • BGE-M3

Architecture

This offline solution involved processing PDF documents by converting them into markdown format using the Pymupdf library. These markdown representations were then vectorized using the popular BGE-M3 model.

Qwen 32B instruct model was used to answer user queries by leveraging the vectorized data for relevant context retrieval.

R&D Experiments

Total experiments submitted: 5

Other approaches:

  • full open-source + roter agent
  • qwen7b-router-agent

What didn't work?

  • Directly querying without vectorization
  • Using alternative LLMs for vectorization

Experiment journal:

  • 23 hours → R: 27.2, G: 54.0, Score: 67.6 ▲ - full open-source + roter agent
  • 7 days → R: 73.2, G: 51.0, Score: 87.6 ▲ - qwen7b-router-agent
  • 9 days → R: 73.2, G: 59.0, Score: 95.6 ▲ - ndt by red_mad_robot
  • 9 days → R: 72.9, G: 63.2, Score: 99.7 ▲ - qwen32b+bge_m3
27Neoflex DreamTeam
Best run
30 hours🤝🔒77.858.096.9

Neoflex DreamTeam

  • Best experiment: Simple LLM Brute Force
  • Signature: 34a266
  • Summary: Utilized a straightforward LLM brute force approach for each page with predefined questions and example answers.

Models used:

  • Qwen 2.5

Architecture

Solution used Qwen 2.5 model to process each page individually, applying a brute force methodology with a set of predefined questions and corresponding example answers to extract relevant information effectively.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Checklist based RAG

What didn't work?

  • Alternative configurations of the Checklist based RAG approach

Experiment journal:

  • 30 hours → R: 77.8, G: 58.0, Score: 96.9 ▲ - Best run
  • 7 days → R: 67.3, G: 51.7, Score: 85.4 - neon_team
28nightwalkers
nightwalkers-baseline
6 hours🔒72.960.296.7

nightwalkers

  • Best experiment: nightwalkers-baseline
  • Signature: 356ef4
  • Summary: Utilized a vector database for efficient document retrieval and LLM for response generation.

Models used:

  • deepseek-r1-distill-llama-70b

Architecture

The team implemented vector database search using embeddings from all-MiniLM-L6-v2 and ibm/granite-embedding-107m-multilingual models. This facilitated the retrieval of the most relevant page and document based on the query. The retrieved information was then processed by the deepseek-r1-distill-llama-70b LLM to generate relevant answers.

29Gleb Kozhaev
Gleb Kozhaev
32 hours🤝79.156.095.5

Gleb Kozhaev

  • Best experiment: pymupdf4llm + Structured Output
  • Signature: 1442cb
  • Summary: Utilized pymupdf4llm with structured output and three distinct system prompts/roles.

Models used:

  • gpt-4o-mini

Architecture

RAG solution employed the pymupdf4llm framework, leveraging Structured Outputs to enhance data processing and comprehension.

Three distinct system prompts/roles were utilized to optimize the model's performance and ensure accurate and efficient results.

30AndreiKopysov
AndreiKopysov
33 hours🤝76.257.295.3

AndreiKopysov

  • Best experiment: Gemini2.0 and DeepSeek R1 Integration
  • Signature: 574182
  • Summary: The architecture processes PDF pages using Gemini2.0 and refines responses with DeepSeek R1.

Models used:

  • Gemini2.0
  • DeepSeek R1

Architecture

This RAG solution used a two-step pipeline:

  • each page of the PDF document is processed using the Gemini2.0 model to extract relevant information;
  • extracted responses are refined and analyzed using the DeepSeek R1 model to ensure accuracy and relevance.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Reused the same architecture in different configurations.

Experiment journal:

  • 33 hours → R: 76.2, G: 57.2, Score: 95.3 ▲ - AndreiKopysov
  • 33 hours → R: 76.2, G: 57.2, Score: 95.3 - AndreyKopysov
31Serj Tarasenko
complicated second
3 days82.054.095.0

Serj Tarasenko

  • Best experiment: complicated second
  • Signature: a5cf25
  • Summary: RAG pipeline with query enhancement and re-ranking.

Models used:

  • gpt-4o-mini
  • text-embedding-3-small

Architecture

The winning solution implemented a Retrieval-Augmented Generation (RAG) pipeline. The process involved extracting content from PDFs, segmenting it into manageable chunks, and indexing these chunks using FAISS for efficient vector-based retrieval. Queries were enhanced with financial terms to improve relevance, followed by a retrieval step that included re-ranking to prioritize the most pertinent information. Finally, an LLM was employed to generate comprehensive answers based on the retrieved data. The source code for this implementation is publicly available.

32AAV
llm2-sim-preselected
7 days62.962.593.9

AAV

  • Best experiment: Agent+Router
  • Signature: 5e0479
  • Summary: The architecture employs an agent-based approach with a routing mechanism.

Models used:

  • gpt-4o-mini

Architecture

The solution uses the 'gpt-4o-mini' model in an architecture combining an agent with a router. This design enables efficient task delegation and processing, optimizing performance for the challenge requirements.

R&D Experiments

Total experiments submitted: 6

Other approaches:

  • Agent
  • Agent + sim search + tfidf

What didn't work?

  • Using 'private model' instead of 'gpt-4o-mini'
  • Excluding the router component

Experiment journal:

  • 7 days → R: 60.7, G: 62.8, Score: 93.1 ▲ - llm1-sim-preselected
  • 7 days → R: 62.9, G: 62.5, Score: 93.9 ▲ - llm2-sim-preselected
  • 7 days → R: 62.7, G: 57.3, Score: 88.7 - llm2-sim-not-preselected
  • 7 days → R: 61.0, G: 60.8, Score: 91.3 - llm1-sim-not-preselected
  • 7 days → R: 25.1, G: 60.9, Score: 73.5 - llm1-sim-ifidf-not-preselected
  • 7 days → R: 27.2, G: 62.8, Score: 76.4 - llm2-sim-tfidf-not-preselected
33AI Slop
AI Slop Cursor+Sonnet 3.7, No RAG, No OCR, gpt4o-mini all the way
3 hours🤝80.953.093.5

AI Slop

  • Best experiment: AI Slop Cursor+Sonnet 3.7
  • Signature: fc3dc9
  • Summary: Utilized a streamlined approach leveraging LLMs for direct question answering.

Models used:

  • gpt-4o-mini

Architecture

The team employed the gpt-4o-mini model to process and answer questions directly from the provided PDF documents.

By utilizing metadata and targeted queries, they efficiently narrowed down relevant information, ensuring accurate and concise responses. The approach avoided complex retrieval-augmented generation (RAG) or OCR techniques, focusing on the inherent capabilities of the LLM.

34RAG challenge Orphist
Orphist
63 min🔒78.853.092.4

RAG challenge Orphist

  • Best experiment: Iterative LLM Prompting with BM25
  • Signature: e98c1b
  • Summary: The solution employs BM25 for document retrieval and iterative LLM prompting for query expansion and summarization.

Models used:

  • gemma-2-9b-it

Architecture

The solution utilized an architecture combining BM25plus for document retrieval and iterative prompting of the gemma-2-9b-it LLM.

The process involved chunking PDF documents for ingestion, storing them in an in-memory local storage, and applying BM25plus for query matching with meta-filters.

Due to a last-minute issue with embedding models, the team opted for a non-hybrid pipeline. The iterative prompting expanded the initial query and used a scratchpad for summary collection, culminating in a final prompt to extract the requested information.

35Dennis S.
Deepseek naive questionfilter
7 days🤝81.950.091.0

Dennis S.

  • Best experiment: Deepseek naive questionfilter
  • Signature: 53630f
  • Summary: A question-centered approach leveraging document parsing and heuristic-based analysis.

Models used:

  • Deepseek V3

Architecture

The solution employs a question-centered methodology to efficiently extract relevant information from documents.

  • Initially, PDFs are parsed using PyMuPDF and Tesseract for OCR when necessary.
  • The system analyzes provided metadata and questions to identify relevant companies and metrics, classifying questions into single_fact or aggregate types.
  • It processes documents in parallel, extracting answers based on the question type, and aggregates results accordingly.

This approach prioritizes speed and cost-efficiency.

R&D Experiments

Total experiments submitted: 2

Other approaches:

  • Deepseek v3 - bruteforce questionfilter

What didn't work?

  • Using regex-based logic for question classification
  • Dividing questions into first occurrence and aggregated types without clear pipeline integration

Experiment journal:

  • 7 days → R: 79.8, G: 50.0, Score: 89.9 ▲ - Deepseek v3 - bruteforce questionfilter
  • 7 days → R: 81.9, G: 50.0, Score: 91.0 ▲ - Deepseek naive questionfilter
36Slava RAG
Slava RAG
7 hours🤝65.657.890.7

Slava RAG

  • Best experiment: Slava RAG
  • Signature: 282787
  • Summary: Embedding: OpenAI text-embedding-3-small, LLM: GPT-4o, Vector Database: Pinecone, PDF Processing: PyMuPDF, Chunk Processing: Custom algorithm

Models used:

  • gpt-4o

Architecture

This architecture combined:

  • OpenAI's text-embedding-3-small for embedding generation;
  • GPT-4o as the primary LLM;
  • Pinecone for vector database management;
  • PyMuPDF for efficient PDF processing;
  • a custom algorithm for chunk processing.
37Alex_dao
Alex_Dao_v1_final
95 min68.456.590.7

Alex_dao

  • Best experiment: Alex_Dao_v1_final
  • Signature: 93c0ef
  • Summary: Utilized a kv-index architecture.

Models used:

  • gpt4o

Architecture

The winning solution implemented a key-value index (kv-index) architecture, leveraging the capabilities of the GPT-4 model (gpt4o) to efficiently retrieve and process information. This approach ensured high performance and accuracy in the challenge tasks.

38Mykyta Skrypchenko
Kyiv-bge1.5
31 hours🤝42.164.285.3

Mykyta Skrypchenko

  • Best experiment: Kyiv-bge1.5
  • Signature: d5fb15
  • Summary: Integration of advanced text retrieval and vector database with LLM for question answering.

Models used:

  • gpt-4o-2024-08-06

Architecture

The solution is a multi-component architecture:

  • Fitz for efficient text retrieval
  • BAAI/bge-base-en-v1.5 Sentence Transformer for embedding generation
  • ChromaDB as the vector database for storage and retrieval
  • OpenAI API for question answering
39F-anonymous
F-anonymous. Fully local, own DeepThinking
5 days🤝🔒73.647.083.8

F-anonymous

  • Best experiment: Fully local, own DeepThinking
  • Signature: 2a2a1b
  • Summary: Fully local graphRAG with hybrid search and custom-tuned LLM.

Models used:

  • Qwen2.5 14b

Architecture

The solution by F-anonymous a fully local graph-based Retrieval-Augmented Generation (RAG) architecture.

They utilized their proprietary DeepThinking framework in conjunction with a custom-tuned Qwen2.5 14b model. The system integrated a hybrid search mechanism combining vector-based and BM25 methodologies to enhance retrieval accuracy and relevance.

40DataNXT
Prototype-RAG-Challenge
5 days🔒54.255.582.6

DataNXT

  • Best experiment: Prototype-RAG-Challenge
  • Signature: 0e942a
  • Summary: Pipeline with specialised prompted LLM Calls

Models used:

  • OpenAi-4o-mini

Architecture

The solution utilized a pipeline architecture with specialized prompted calls to the OpenAi-4o-mini model. This approach allowed for efficient and accurate information retrieval and generation.

41AValiev
IBM-deepseek-agentic-rag
4 hours🔒43.560.081.8

AValiev

  • Best experiment: IBM-deepseek-agentic-rag
  • Signature: 493744
  • Summary: Agentic RAG with type validation, Pydantic typing, Qdrant vector store querying.

Models used:

  • deepseek/deepseek-r1-distill-llama-70b

Architecture

This RAG solution was based on an Agentic Retrieval-Augmented Generation (RAG) architecture.

It utilized type validation and Pydantic typing for robust data handling, and Qdrant vector store querying for efficient information retrieval. PDF documents were processed using PyPDF and Docling for accurate text extraction.

R&D Experiments

Total experiments submitted: 5

Other approaches:

  • openai-agentic-rag
  • IBM-mixtral-agentic-rag
  • granite-3-8b-instruct_rag_agentic
  • deepseek/deepseek-r1-distill-llama-70b_sophisticated_chunking_rag_agentic

What didn't work?

  • Alternative LLM models such as OpenAI-gpt-4o-mini and mistralai/mixtral-8x7b-instruct-v01 were explored but did not achieve the same performance as the winning model.

Experiment journal:

  • 54 min → R: 43.5, G: 60.0, Score: 81.8 ▲ - openai-agentic-rag
  • 3 hours → R: 43.5, G: 33.0, Score: 54.8 - IBM-mixtral-agentic-rag
  • 4 hours → R: 43.5, G: 60.0, Score: 81.8 - IBM-deepseek-agentic-rag
  • 4 hours → R: 43.5, G: 48.5, Score: 70.2 - granite-3-8b-instruct_rag_agentic
  • 34 hours → R: 35.8, G: 53.0, Score: 70.9 - deepseek/deepseek-r1-distill-llama-70b_sophisticated_chunking_rag_agentic
42bimurat_mukhtar
bm_v1
32 hours🤝🔒36.231.349.4

bimurat_mukhtar

  • Best experiment: bm_v1
  • Signature: c25e30
  • Summary: Multi-agent architecture with specialized branches for diverse answer generation.

Models used:

  • deepseek-r1
  • gemini

Architecture

The solution is a multi-agent architecture inspired by Self RAG, where input PDFs are converted to text, preprocessed, and filtered to extract relevant information.

Different branches are utilized to handle specific types of queries, leveraging the strengths of the LLMs deepseek-r1 and gemini.

43ragtastic
ragtastic
7 days4.83.05.4

ragtastic

  • Best experiment: ragtastic
  • Signature: 43d4fd
  • Summary: The architecture leverages the Mistral-large model for its implementation.

Models used:

  • mistral-large

Architecture

The solution used Mistral-large model to achieve its objectives. The architecture is designed to optimize performance and accuracy, ensuring robust results.

Video: Die Gewinnerverkündung mit Rinat Abdullin

Erleben Sie die spannendsten Minuten der Challenge noch einmal! Im diesem Video verkündet Rinat Abdullin (Head of AI und Innovation) die Siegerteams und gibt einen Einblick in die überzeugendsten Lösungen.

Fragen oder Interesse an einer Zusammenarbeit?

Die TIMETOACT GROUP Österreich zählt zu den führenden Experten im Bereich der angewandten Forschung zu generativer KI für Unternehmen. Unsere Forschungsergebnisse fließen unmittelbar in die Produktentwicklung ein – so setzen wir höchste Standards bei der Umsetzung KI-gestützter Anwendungen für Unternehmen.

Möchten auch Sie das volle Potenzial von KI für Ihre Geschäftsprozesse nutzen? Kontaktieren Sie uns gerne!

Niklas Thannäuser freut sich, von Ihnen zu hören!

Niklas Thannhäuser
TIMETOACT GROUP Österreich GmbH +43 664 750 187 82
* Pflichtfelder

Wir verwenden die von Ihnen an uns gesendeten Angaben nur, um auf Ihren Wunsch hin mit Ihnen Kontakt im Zusammenhang mit Ihrer Anfrage aufzunehmen. Alle weiteren Informationen können Sie unseren Datenschutzhinweisen entnehmen.

Bitte Captcha lösen!

captcha image
Insights

Das sind die Gewinner der Enterprise RAG Challenge

Entdecken Sie die Gewinner der Enterprise RAG Challenge! Sehen Sie sich das offizielle Announcement an und erfahren Sie, wie KI-Retrieval und LLMs die besten RAG-Lösungen geformt haben.

Blog 21.01.25

Die Zukunft der KI: Enterprise RAG Challenge

KI-Innovation, die überzeugt: Die Enterprise RAG Challenge zeigt, was möglich ist.

Insights 17.03.25

ChatGPT & Co: Februar-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Februar 2025. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 20.02.25

ChatGPT & Co: November-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für November 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Wissen 30.05.24

LLM-Benchmarks Mai 2024

Unser LLM Leaderboard aus Mai 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Wissen 30.07.24

LLM-Benchmarks Juli 2024

Unser LLM Leaderboard aus Juli 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Wissen 30.04.24

LLM-Benchmarks April 2024

Unser LLM Leaderboard aus April 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Wissen 24.10.24

RAG-Systeme erklärt: Wettbewerbsvorteile mit IBM WatsonX

IBM WatsonX hilft mit RAG-Systemen, schnell und effizient datenbasierte Entscheidungen.

Blog 12.11.24

ChatGPT & Co: Oktober-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Oktober 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Event 19.10.21

Enterprise Identity Roadshow

"The Future of Identity is Here" lautet der Leitsatz der ersten Enterprise Identity Roadshow am 18. November in München. Treffen Sie die IAM-Experten der TIMETOACT GROUP und tauschen Sie sich zu Innovationen und Fallstudien rund um Cybersicherheit aus.

Blog 01.10.24

ChatGPT & Co: September-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks vom September 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Blog 07.01.25

ChatGPT & Co: Dezember-Benchmarks für Sprachmodelle

Entdecken Sie die neuesten Erkenntnisse aus unseren unabhängigen LLM Benchmarks für Dezember 2024. Erfahren Sie, welche großen Sprachmodelle am besten abgeschnitten haben.

Wissen 30.06.24

LLM-Benchmarks Juni 2024

Unser LLM Leaderboard aus Juni 2024 hilft dabei, das beste Large Language Model für die digitale Produktentwicklung zu finden.

Blog 17.06.20

The Future-Proof Business: Breaking The Monolith

Die Aufzeichnung unseres gemeinsam mit CROZ und Red Hat gehosteten Webinars ist jetzt zum Nachsehen verfügbar!

News 01.09.20

Investitionsprämie für Unternehmen mit Sitz in Österreich

Um den wirtschaftlichen Nebenwirkungen der COVID-19 Pandemie entgegenzuwirken, unterstützt die österreichische Bundesregierung Unternehmen mit einer Investitionsprämie.

Produkt

Azure

Flexibler agieren mit Microsoft Azure - Zeit und Kosten sparen durch den Einstieg in die Cloud-Welt.

Technologie Übersicht

Microsoft

Wir entwickeln Lösungen für Unternehmen auf Basis der modernsten Microsoft Technologien und Tools wie Künstliche Intelligenz, Containerisierung und mehr.

Produkt

Microsoft 365

In Microsoft 365 stehen die bekannten Office-Software sowie weitere nützlichen Tools zur Zusammenarbeit und Kommunikation an 365 Tage des Jahres bereit.

News 04.01.20

Die TIMETOACT GROUP Österreich ist Red Hat Advanced Partner

Nach mehrjähriger erfolgreicher Zusammenarbeit mit IBM und nach der Akquise von Red Hat durch IBM im Jahr 2019 ist die TIMETOACT GROUP Österreich nun Partner des Open-Source-Herstellers.

Blog 18.06.20

The Future-Proof Business: API-Management

Die Aufzeichnung des zweiten Teiles unserer gemeinsam mit CROZ und Red Hat gehosteten Webinarreihe, "The Future-Proof Business" ist jetzt zum Nachsehen verfügbar!