Boosting speed of scikit-learn regression algorithms

Datum

27.06.2023

content.autor.writtenBy

When browsing the web, numerous posts can be found discussing techniques to speed up the training time of well-known machine learning algorithms. Surprisingly, there is limited information available regarding the prediction phase. However, from a practical standpoint, this aspect holds great relevance. Once a satisfactory regression algorithm has been trained, it is typically deployed in a real-world system. In such cases, the speed at which predictions are obtained becomes crucial. Faster prediction times enable real-time or near-real-time decision-making, enhance user experience in interactive applications, and facilitate efficient processing of large-scale datasets. Therefore, optimizing inference speed can have significant implications for various domains and applications.

The purpose of this blog post is to investigate the performance and prediction speed behavior of popular regression algorithms, i.e. models that predict numerical values based on a set of input variables. Considering that scikit-learn is the most extensively utilized machine learning framework [1], our focus will be on exploring methods to accelerate its algorithms' predictions.

Benchmarking regression algorithms

To assess the current state of popular regression algorithms, we selected four popular regression datasets from Kaggle [2], along with an internal dataset from our company. These datasets vary in sample size, number and type of features, capturing performance for different data structures.

To ensure fair comparisons, we need to optimize hyperparameters before testing to unlock the models' full potential. We will benchmark the following regression algorithms:

The different versions of regularized linear regression, such as lasso, ridge, and elastic net, are not analyzed separately as they were comparable to pure linear regression in terms of prediction speed and accuracy in a pre-evaluation step.

Prediction speed vs. accuracy

The plot below displays the benchmarking results on our company's internal dataset. We can observe a sweet spot in the bottom left, where both the error - measured via root mean square error (RMSE) - and prediction times are low. Simple neural networks (Multilayer Perceptron, MLP) and gradient boosted regression trees demonstrate good performance in both dimensions. Random forest also shows decent accuracy but has the highest prediction time. Other algorithms exhibit reasonable prediction speed but relatively high errors.

However, it is crucial to try different algorithms on the specific dataset at hand. Accuracy and prediction time heavily depend on the number of features used, their transformations, as well as the model's parameters. Linear models, for example, may perform well with properly transformed features, while larger MLPs might exhibit longer prediction times. Nevertheless, algorithms like random forest and k-NN are by construction expected to be slower in inference speed.

How to speed up inference

Generally, scikit-learn models are already optimized through compiled Cython extensions or optimized computing libraries [3]. However, there are additional ways to accelerate prediction latency, apart from using faster hardware. In this blog post, we’ll benchmark the following optimizations:

Data-based approaches:

Reduce the number of features by selecting relevant ones or applying dimensionality reduction (“Half features”)

Implementation-based approaches:

Apply bulk prediction instead of atomic prediction, enabling parallelization and speeding up the process (“Bulk 100/1000”)
Utilize the Intel extension for scikit-learn, which supports certain algorithms and can lead to significant speed improvements (“Intel extension”)
Disable scikit-learn's validation overhead, which checks the finiteness of the data (“No finite check”)

Furthermore, we want to mention the following optimization approaches, which we did not include in our benchmark, partly because they are problem specific:

Data-based approaches:

Efficiently represent input data, such as using sparse matrix data structures
Optimize feature extraction and transformation, including efficient database queries and preprocessing tasks

Model-based approaches:

Reduce the complexity of the model, such as reducing the size of a random forest or MLP architecture
Utilize model-specific accelerators

Implementation-based approaches:

Implement the prediction step with given weights independently, potentially in a faster programming language, to avoid unnecessary overhead
Use cloud services for prediction, such as Google ML, Amazon ML or MS Azure

As you can see, there are numerous ways to influence inference time, ranging from fundamental approaches to simple tricks. Changing the data structure and implementing algorithms from scratch optimized for efficiency may be more involved, while the latter approaches can be easily applied even to existing systems that use scikit-learn.

Note that all of the above approaches do not affect prediction quality, except reducing the number of features and model complexity. For these approaches, it is important to evaluate the trade-off between prediction speed and quality.

In this blog post, we mostly benchmark approaches that do not affect prediction quality, and therefore focus on evaluating the speedup in the next section.

Evaluating some speedup tricks

Check out the technical appendix to see how the time measurement is performed.

Reducing the number of features by half (in our case from 106 to 53 features) only leads to small decreases in inference speed for KNN, SVR while it had an major influence on the MLP. Disabling scikit-learn's finiteness checkup, which is just one line of code, improves prediction speed more significantly. As can be seen below, inference time can be reduced up to 40% depending on the algorithm. Utilizing the Intel extension for scikit-learn, also requiring only one line of code, results in substantial speed improvements for random forest, SVR and the KNN regressor. For the latter two algorithms, a time reduction of more than 50% could be achieved, while for random forest, prediction time decreases by impressive 98%. In the plots below there are no values shown for the other algorithms as the Intel extension currently does not support those.

As can be seen below, most potential lies in bulk inference. By predicting several samples simultaneously (here: 100 or 1000 samples at once), the average prediction time per sample decreases significantly for most of the algorithms. Overall, bulk prediction can lead up to 200-fold speed increases in this test setting. This approach is particularly effective for the MLP as well as linear and tree based methods, greatly accelerating their performance.

Summary

Fast predictions are crucial for various use cases, in particular when it comes to real-time predictions. Moreover, investing in efficiency always pays off by reducing energy consumption, thus saving money and at the same time lowering carbon emissions.

In this blog post we have explored multiple ways to achieve faster prediction times. Firstly, the dimensionality of the data and the algorithm chosen have major influence on inference speed and scalability behaviour. However, there are various tricks to even accelerate existing scikit-learn code. Disabling scikit-learn's finite data validation or utilizing the Intel extension for supported algorithms can already yield considerable improvements depending on the algorithm. However, the most substantial gains can be achieved by addressing fundamental aspects, such as reducing the number of features (in particular for high-dimensional data), implementing bulk prediction or custom prediction methods. These strategies can result in speed increases by factors of several hundred.

In our small test setting, we could additionally show that a small neural network, gradient boosted regressor and random forest appear to be the most promising choices in terms of both accuracy and speed, when using the above-mentioned speedup tricks.

Sources

[1] https://storage.googleapis.com/kaggle-media/surveys/Kaggle%20State%20of%20Machine%20Learning%20and%20Data%20Science%202020.pdf

[2] House sales: House Sales in King County, USA ,

red wine quality: Red Wine Quality ,

avocado prices: Avocado Prices ,

medical insurance costs: Medical Cost Personal Datasets

[3] 8. Computing with scikit-learn — scikit-learn 0.23.2 documentation

Technical Appendix

Speedtests were performed with all unnecessary background processes stopped.

Inference time measurement for one test sample (“atomic prediction”):

n = 500 # number of consecutive runs
r = 10 # number of repeats of above

pred_times = timeit.repeat(stmt=lambda: model.predict(X_test[0]), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_time = np.min(pred_times) # take minimum of all repetitions

Inference time measurement for several samples at once (“bulk prediction”):

n = 50 # number of consecutive runs
r = 5 # number of repeats of above

X_test_sample = X_test[0:1000] # 100 or 1000
pred_times = timeit.repeat(stmt=lambda: model.predict(X_test_sample), 
  repeat=r, number=n)
pred_times = np.array(pred_times) / n # divide by number of runs
pred_times = pred_times / len(X_test_sample) # divide by number of samples
pred_time = np.min(pred_times) # take minimum of all repetitions

With “model” being the scikit-learn models mentioned above which were trained with the first 10.000 observations of the “house sales” data and using default model paramters.

Versions used:

Python: 3.9.7
Scikit-learn: 1.0.2
Scikit-learn-intelex: 2021.20210714.120553

Contact

Christoph Hasenzagl

TIMETOACT GROUP Österreich GmbHContact

Felix KrauseBlog

Blog

Part 2: Detecting Truck Parking Lots on Satellite Images

In the previous blog post, we created an already pretty powerful image segmentation model in order to detect the shape of truck parking lots on satellite images. However, we will now try to run the code on new hardware and get even better as well as more robust results.

Felix KrauseBlog

Blog

Creating a Cross-Domain Capable ML Pipeline

As classifying images into categories is a ubiquitous task occurring in various domains, a need for a machine learning pipeline which can accommodate for new categories is easy to justify. In particular, common general requirements are to filter out low-quality (blurred, low contrast etc.) images, and to speed up the learning of new categories if image quality is sufficient. In this blog post we compare several image classification models from the transfer learning perspective.

Rinat AbdullinBlog

Blog

State of Fast Feedback in Data Science Projects

DSML projects can be quite different from the software projects: a lot of R&D in a rapidly evolving landscape, working with data, distributions and probabilities instead of code. However, there is one thing in common: iterative development process matters a lot.

Felix KrauseBlog

Blog

Part 1: Detecting Truck Parking Lots on Satellite Images

Real-time truck tracking is crucial in logistics: to enable accurate planning and provide reliable estimation of delivery times, operators build detailed profiles of loading stations, providing expected durations of truck loading and unloading, as well as resting times. Yet, how to derive an exact truck status based on mere GPS signals?

Rinat AbdullinBlog

Blog

Part 1: TIMETOACT Logistics Hackathon - Behind the Scenes

A look behind the scenes of our Hackathon on Sustainable Logistic Simulation in May 2022. This was a hybrid event, running on-site in Vienna and remotely. Participants from 12 countries developed smart agents to control cargo delivery truck fleets in a simulated Europe.

Referenz

Automated Planning of Transport Routes

Efficient transport route planning through automation and seamless integration.

Rinat AbdullinBlog

Blog

Machine Learning Pipelines

In this first part, we explain the basics of machine learning pipelines and showcase what they could look like in simple form. Learn about the differences between software development and machine learning as well as which common problems you can tackle with them.

Felix KrauseBlog

Blog

License Plate Detection for Precise Car Distance Estimation

When it comes to advanced driver-assistance systems or self-driving cars, one needs to find a way of estimating the distance to other vehicles on the road.

Rinat AbdullinBlog

Blog

Strategic Impact of Large Language Models

This blog discusses the rapid advancements in large language models, particularly highlighting the impact of OpenAI's GPT models.

TIMETOACT

Service

Service

Operationalization of Data Science (MLOps)

Data and Artificial Intelligence (AI) can support almost any business process based on facts. Many companies are in the phase of professional assessment of the algorithms and technical testing of the respective technologies.

TIMETOACT

Technologie

Decision Optimization

Mathematical algorithms enable fast and efficient improvement of partially contradictory specifications. As an integral part of the IBM Data Science platform "Cloud Pak for Data" or "IBM Watson Studio", decision optimisation has been decisively expanded and embedded in the Data Science process.

TIMETOACT

Service

Service

Demand Planning, Forecasting and Optimization

After the data has been prepared and visualized via dashboards and reports, the task is now to use the data obtained accordingly. Digital planning, forecasting and optimization describes all the capabilities of an IT-supported solution in the company to support users in digital analysis and planning.

TIMETOACT

Technologie

Technologie

IBM SPSS Modeler

IBM SPSS Modeler is a tool that can be used to model and execute tasks, for example in the field of Data Science and Data Mining, via a graphical user interface.

Rinat AbdullinBlog

Blog

Let's build an Enterprise AI Assistant

In the previous blog post we have talked about basic principles of building AI assistants. Let’s take them for a spin with a product case that we’ve worked on: using AI to support enterprise sales pipelines.

TIMETOACT

Service

Service

Data Science, Artificial Intelligence and Machine Learning

For some time, Data Science has been considered the supreme discipline in the recognition of valuable information in large amounts of data. It promises to extract hidden, valuable information from data of any structure.

Blog

ChatGPT & Co: LLM Benchmarks for January

Find out which large language models outperformed in the January 2025 benchmarks. Stay informed on the latest AI developments and performance metrics.

Rinat AbdullinBlog

Blog

LLM Performance Series: Batching

Beginning with the September Trustbit LLM Benchmarks, we are now giving particular focus to a range of enterprise workloads. These encompass the kinds of tasks associated with Large Language Models that are frequently encountered in the context of large-scale business digitalization.

TIMETOACT

Service

Service

Conception of individual Analytics and Big Data solutions

We determine the best approach to develop an individual solution from the professional, role-specific requirements – suitable for the respective situation!

Matus ZilinskyBlog

Blog

Creating a Social Media Posts Generator Website with ChatGPT

Using the GPT-3-turbo and DALL-E models in Node.js to create a social post generator for a fictional product can be really helpful. The author uses ChatGPT to create an API that utilizes the openai library for Node.js., a Vue component with an input for the title and message of the post. This article provides step-by-step instructions for setting up the project and includes links to the code repository.

Rinat AbdullinBlog

Blog

So You are Building an AI Assistant?

So you are building an AI assistant for the business? This is a popular topic in the companies these days. Everybody seems to be doing that. While running AI Research in the last months, I have discovered that many companies in the USA and Europe are building some sort of AI assistant these days, mostly around enterprise workflow automation and knowledge bases. There are common patterns in how such projects work most of the time. So let me tell you a story...

Aqeel AlazreeBlog

Blog

Part 1: Data Analysis with ChatGPT

In this new blog series we will give you an overview of how to analyze and visualize data, create code manually and how to make ChatGPT work effectively. Part 1 deals with the following: In the data-driven era, businesses and organizations are constantly seeking ways to extract meaningful insights from their data. One powerful tool that can facilitate this process is ChatGPT, a state-of-the-art natural language processing model developed by OpenAI. In Part 1 pf this blog, we'll explore the proper usage of data analysis with ChatGPT and how it can help you make the most of your data.

TIMETOACT

Technologie

IBM Watson Studio

IBM Watson Studio is an integrated solution for implementing a data science landscape. It helps companies to structure and simplify the process from exploratory analysis to the implementation and operationalisation of the analysis processes.

TIMETOACT

Technologie

Technologie

IBM Watson® Knowledge Catalog/Information Governance Catalog

Today, "IGC" is a proprietary enterprise cataloging and metadata management solution that is the foundation of all an organization's efforts to comply with rules and regulations or document analytical assets.

TIMETOACT

Technologie

Technologie

IBM InfoSphere Information Server

IBM Information Server is a central platform for enterprise-wide information integration. With IBM Information Server, business information can be extracted, consolidated and merged from a wide variety of sources.

TIMETOACT

Technologie

IBM Db2

The IBM Db2database has been established on the market for many years as the leading data warehouse database in addition to its classic use in operations.

TIMETOACT

Technologie

IBM Netezza Performance Server

IBM offers Database technology for specific purposes in the form of appliance solutions. In the Data Warehouse environment, the Netezza technology, later marketed under the name "IBM PureData for Analytics", is particularly well known.

TIMETOACT

Service

Business Intelligence

Business Intelligence (BI) is a technology-driven process for analyzing data and presenting usable information. On this basis, sound decisions can be made.

TIMETOACT

Service

Service

Dashboards & Reports

The discipline of Business Intelligence provides the necessary means for accessing data. In addition, various methods have developed that help to transport information to the end user through various technologies.

TIMETOACT

Technologie

IBM Cloud Pak for Data

The Cloud Pak for Data acts as a central, modular platform for analytical use cases. It integrates functions for the physical and virtual integration of data into a central data pool - a data lake or a data warehouse, a comprehensive data catalogue and numerous possibilities for (AI) analysis up to the operational use of the same.

TIMETOACT

Technologie

IBM Cloud Pak for Automation

The IBM Cloud Pak for Automation helps you automate manual steps on a uniform platform with standardised interfaces. With the Cloud Pak for Business Automation, the entire life cycle of a document or process can be mapped in the company.

TIMETOACT

Technologie

IBM Cloud Pak for Application

The IBM Cloud Pak for Application provides a solid foundation for developing, deploying and modernising cloud-native applications. Since agile working is essential for a faster release cycle, ready-made DevOps processes are used, among other things.

TIMETOACT

Referenz

Standardized data management creates basis for reporting

TIMETOACT implements a higher-level data model in a data warehouse for TRUMPF Photonic Components and provides the necessary data integration connection with Talend. With this standardized data management, TRUMPF will receive reports based on reliable data in the future and can also transfer the model to other departments.

TIMETOACT

Technologie

Talend Data Fabric

The ultimate solution for your data needs – Talend Data Fabric includes everything your (Data Integration) heart desires and serves all integration needs relating to applications, systems and data.

TIMETOACT

Technologie

Talend Data Integration

Talend Data Integration offers a highly scalable architecture for almost any application and any data source - with well over 900 connectors from cloud solutions like Salesforce to classic on-premises systems.

TIMETOACT

Technologie

Technologie

Talend Application Integration / ESB

With Talend Application Integration, you create a service-oriented architecture and connect, broker & manage your services and APIs in real time.

TIMETOACT

Technologie

Talend Real-Time Big Data Platform

Talend Big Data Platform simplifies complex integrations so you can successfully use Big Data with Apache Spark, Databricks, AWS, IBM Watson, Microsoft Azure, Snowflake, Google Cloud Platform and NoSQL.

Nina DemuthBlog

Blog

7 Positive effects of visualizing the interests of your team

Interests maps unleash hidden potentials and interests, but they also make it clear which topics are not of interest to your colleagues.

Blog

ChatGPT & Co: LLM Benchmarks for November

Find out which large language models outperformed in the November 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog

ChatGPT & Co: LLM Benchmarks for December

Find out which large language models outperformed in the December 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Blog

ChatGPT & Co: LLM Benchmarks for September

Find out which large language models outperformed in the September 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Christian FolieBlog

Blog

The Power of Event Sourcing

This is how we used Event Sourcing to maintain flexibility, handle changes, and ensure efficient error resolution in application development.

Blog

ChatGPT & Co: LLM Benchmarks for October

Find out which large language models outperformed in the October 2024 benchmarks. Stay informed on the latest AI developments and performance metrics.

Ian RussellBlog

Blog

Ways of Creating Single Case Discriminated Unions in F#

There are quite a few ways of creating single case discriminated unions in F# and this makes them popular for wrapping primitives. In this post, I will go through a number of the approaches that I have seen.

Blog

Second Place - AIM Hackathon 2024: Trustpilot for ESG

The NightWalkers designed a scalable tool that assigns trustworthiness scores based on various types of greenwashing indicators, including unsupported claims and inaccurate data.

Daniel WellerBlog

Blog

Revolutionizing the Logistics Industry

As the logistics industry becomes increasingly complex, businesses need innovative solutions to manage the challenges of supply chain management, trucking, and delivery. With competitors investing in cutting-edge research and development, it is vital for companies to stay ahead of the curve and embrace the latest technologies to remain competitive. That is why we introduce the TIMETOACT Logistics Simulator Framework, a revolutionary tool for creating a digital twin of your logistics operation.

Laura GaetanoBlog

Blog

5 lessons from running a (remote) design systems book club

Last year I gifted a design systems book I had been reading to a friend and she suggested starting a mini book club so that she’d have some accountability to finish reading the book. I took her up on the offer and so in late spring, our design systems book club was born. But how can you make the meetings fun and engaging even though you're physically separated? Here are a couple of things I learned from running my very first remote book club with my friend!

Nina DemuthBlog

Blog

From the idea to the product: The genesis of Skwill

We strongly believe in the benefits of continuous learning at work; this has led us to developing products that we also enjoy using ourselves. Meet Skwill.

Aqeel AlazreeBlog

Blog

Part 4: Save Time and Analyze the Database File

ChatGPT-4 enables you to analyze database contents with just two simple steps (copy and paste), facilitating well-informed decision-making.

Rinat AbdullinBlog

Blog

Innovation Incubator at TIMETOACT GROUP Austria

Discover how our Innovation Incubator empowers teams to innovate with collaborative, week-long experiments, driving company-wide creativity and progress.

Laura GaetanoBlog

Blog

Using a Skill/Will matrix for personal career development

Discover how a Skill/Will Matrix helps employees identify strengths and areas for growth, boosting personal and professional development.

Aqeel AlazreeBlog

Blog

Database Analysis Report

This report comprehensively analyzes the auto parts sales database. The primary focus is understanding sales trends, identifying high-performing products, Analyzing the most profitable products for the upcoming quarter, and evaluating inventory management efficiency.

Christian FolieBlog

Blog

Running Hybrid Workshops

When modernizing or building systems, one major challenge is finding out what to build. In Pre-Covid times on-site workshops were a main source to get an idea about ‘the right thing’. But during Covid everybody got used to working remotely, so now the question can be raised: Is it still worth having on-site, physical workshops?

Chrystal LantnikBlog

Blog

CSS :has() & Responsive Design

In my journey to tackle a responsive layout problem, I stumbled upon the remarkable benefits of the :has() pseudo-class. Initially, I attempted various other methods to resolve the issue, but ultimately, embracing the power of :has() proved to be the optimal solution. This blog explores my experience and highlights the advantages of utilizing the :has() pseudo-class in achieving flexible layouts.

Daniel PuchnerBlog

Blog

Make Your Value Stream Visible Through Structured Logging

Boost your value stream visibility with structured logging. Improve traceability and streamline processes in your software development lifecycle.

Sebastian BelczykBlog

Blog

Composite UI with Design System and Micro Frontends

Discover how to create scalable composite UIs using design systems and micro-frontends. Enhance consistency and agility in your development process.

Rinat AbdullinBlog

Blog

Consistency and Aggregates in Event Sourcing

Learn how we ensures data consistency in event sourcing with effective use of aggregates, enhancing system reliability and performance.

Daniel PuchnerBlog

Blog

How we discover and organise domains in an existing product

Software companies and consultants like to flex their Domain Driven Design (DDD) muscles by throwing around terms like Domain, Subdomain and Bounded Context. But what lies behind these buzzwords, and how these apply to customers' diverse environments and needs, are often not as clear. As it turns out it takes a collaborative effort between stakeholders and development team(s) over a longer period of time on a regular basis to get them right.

Christian FolieBlog

Blog

Designing and Running a Workshop series: The board

In this part, we discuss the basic design of the Miro board, which will aid in conducting the workshops.

Sebastian BelczykBlog

Blog

Building and Publishing Design Systems | Part 2

Learn how to build and publish design systems effectively. Discover best practices for creating reusable components and enhancing UI consistency.

Daniel PuchnerBlog

Blog

How to gather data from Miro

Learn how to gather data from Miro boards with this step-by-step guide. Streamline your data collection for deeper insights.

Ian RussellBlog

Blog

So, I wrote a book

Join me as I share the story of writing a book on F#. Discover the challenges, insights, and triumphs along the way.

Christian FolieBlog

Blog

Designing and Running a Workshop series: An outline

Learn how to design and execute impactful workshops. Discover tips, strategies, and a step-by-step outline for a successful workshop series.

Rinat AbdullinBlog

Blog

Using NLP libraries for post-processing

Learn how to analyse sticky notes in miro from event stormings and how this analysis can be carried out with the help of the spaCy library.

Aqeel AlazreeBlog

Blog

Part 3: How to Analyze a Database File with GPT-3.5

In this blog, we'll explore the proper usage of data analysis with ChatGPT and how you can analyze and visualize data from a SQLite database to help you make the most of your data.

Felix KrauseBlog

Blog

AIM Hackathon 2024: Sustainability Meets LLMs

Focusing on impactful AI applications, participants addressed key issues like greenwashing detection, ESG report relevance mapping, and compliance with the European Green Deal.

Blog

SAM Wins First Prize at AIM Hackathon

The winning team of the AIM Hackathon, nexus. Group AI, developed SAM, an AI-powered ESG reporting platform designed to help companies streamline their sustainability compliance.

Laura GaetanoBlog

Blog

My Weekly Shutdown Routine

Discover my weekly shutdown routine to enhance productivity and start each week fresh. Learn effective techniques for reflection and organization.

Rinat AbdullinBlog

Blog

Announcing Domain-Driven Design Exercises

Interested in Domain Driven Design? Then this DDD exercise is perfect for you!

Rinat AbdullinBlog

Blog

Learning + Sharing at TIMETOACT GROUP Austria

Discover how we fosters continuous learning and sharing among employees, encouraging growth and collaboration through dedicated time for skill development.

Jonathan ChannonBlog

Blog

Tracing IO in .NET Core

Learn how we leverage OpenTelemetry for efficient tracing of IO operations in .NET Core applications, enhancing performance and monitoring.

Rinat AbdullinBlog

Blog

Celebrating achievements

Our active memory can be like a cache of recently used data; fresh ideas & frustrations supersede older ones. That's why celebrating achievements is key for your success.

Ian RussellBlog

Blog

Introduction to Partial Function Application in F#

Partial Function Application is one of the core functional programming concepts that everyone should understand as it is widely used in most F# codebases.In this post I will introduce you to the grace and power of partial application. We will start with tupled arguments that most devs will recognise and then move onto curried arguments that allow us to use partial application.

Rinat AbdullinBlog

Blog

Inbox helps to clear the mind

I hate distractions. They can easily ruin my day when I'm in the middle of working on a cool project. They do that by overloading my mind, buzzing around inside me, and just making me tired. Even though we can think about several things at once, we can only do one thing at a time.

Rinat AbdullinBlog

Blog

Event Sourcing with Apache Kafka

For a long time, there was a consensus that Kafka and Event Sourcing are not compatible with each other. So it might look like there is no way of working with Event Sourcing. But there is if certain requirements are met.

Blog

Third Place - AIM Hackathon 2024: The Venturers

ESG reports are often filled with vague statements, obscuring key facts investors need. This team created an AI prototype that analyzes these reports sentence-by-sentence, categorizing content to produce a "relevance map".

Sebastian BelczykBlog

Blog

Building a micro frontend consuming a design system | Part 3

In this blopgpost, you will learn how to create a react application that consumes a design system.

Rinat AbdullinBlog

Blog

The Intersection of AI and Voice Manipulation

The advent of Artificial Intelligence (AI) in text-to-speech (TTS) technologies has revolutionized the way we interact with written content. Natural Readers, standing at the forefront of this innovation, offers a comprehensive suite of features designed to cater to a broad spectrum of needs, from personal leisure to educational support and commercial use. As we delve into the capabilities of Natural Readers, it's crucial to explore both the advantages it brings to the table and the ethical considerations surrounding voice manipulation in TTS technologies.

Martin WarnungBlog

Blog

Common Mistakes in the Development of AI Assistants

How fortunate that people make mistakes: because we can learn from them and improve. We have closely observed how companies around the world have implemented AI assistants in recent months and have, unfortunately, often seen them fail. We would like to share with you how these failures occurred and what can be learned from them for future projects: So that AI assistants can be implemented more successfully in the future!

Aqeel AlazreeBlog

Blog

Part 2: Data Analysis with powerful Python

Analyzing and visualizing data from a SQLite database in Python can be a powerful way to gain insights and present your findings. In Part 2 of this blog series, we will walk you through the steps to retrieve data from a SQLite database file named gold.db and display it in the form of a chart using Python. We'll use some essential tools and libraries for this task.

Ian RussellBlog

Blog

Introduction to Functional Programming in F#

Dive into functional programming with F# in our introductory series. Learn how to solve real business problems using F#'s functional programming features. This first part covers setting up your environment, basic F# syntax, and implementing a simple use case. Perfect for developers looking to enhance their skills in functional programming.

Rinat AbdullinBlog

Blog

5 Inconvenient Questions when hiring an AI company

This article discusses five questions you should ask when buying an AI. These questions are inconvenient for providers of AI products, but they are necessary to ensure that you are getting the best product for your needs. The article also discusses the importance of testing the AI system on your own data to see how it performs.

Bernhard SchauerBlog