Discover Kedro, McKinsey’s first open source software tool

0


QuantumBlack, the advanced analytics company we acquired in 2015, has now launched Kedro, an open source tool created specifically for data scientists and engineers. This is a library of code that can be used to build a data and machine learning pipeline.

Many data scientists have to perform the routine tasks of cleaning, processing, and compiling data that may not be their favorite pursuits, but a large part of their day-to-day tasks. Kedro makes it easy to build a data pipeline to automate “heavy lifting” and reduce the time spent on this type of task.

“Kedro can change the way data scientists and engineers work,” explains Quantum blackproduct manager of Yetunde Dada, “making it easier to manage large workflows and ensuring consistent code quality throughout a project.”

A small step

McKinsey has never created an open source tool accessible to the public.

“This represents a big step for the company”, notes Jeremy Palmer, CEO of QuantumBlack, “as we continue to balance the value of proprietary assets with the opportunities to engage as a member of the developer community, to accelerate and share our learning. “

Open source and proprietary software solutions: the key to an analytical project

How to marry open source and proprietary software solutions into one successful analytical project. Read here

Kedro

The name Kedro derives from the Greek word which means center or core. The name gives a clue to its purpose; open source software that provides crucial code for the “production” of advanced analysis projects. It is essentially a development workflow framework.

Kedro has two major advantages, according to today’s announcement.

  1. It makes it easier for teams to collaborate by structuring analytical code in a uniform way so that it runs seamlessly through all stages of a project. This can include consolidating data sources, cleaning data, building features, and feeding data into machine learning models for explanatory or predictive analysis.
  2. Kedro also helps deliver “production ready” code, which makes it easy to integrate into a business process.

“Data scientists are trained in math, statistics, and modeling, not necessarily the software engineering principles required to write production code,” says Dada. “A lot of times converting a pilot project to production code can add weeks to a schedule, a problem for customers. Now they can spend less time on code and more time focusing on applying analytics to solve customer problems.

At the feature level, the open software tool can help teams create data pipelines that are “modular, tested, repeatable in any environment, and versioned, allowing users to access previous data states,” he continues. ‘announcement.

“More importantly, the same code can make the transition from a single developer laptop to a business project using cloud computing,” explains Ivan Danov, technical manager of Kedro. “And he’s agnostic, working across all industries, models, and data sources.”

The use of open source in business: necessary for digital transformation

Open source is a collaborative and social way to create software in a transparent way. And businesses are starting to take notice. Read here

Two years of preparation – an age in the technological space

Two years of manufacture, Kedro is the result of two engineers from QuantumBlack – Nikolaos Tsaousis, Aris Valtazanos, and alumnus of QB Peteris Erins, who created it to handle their many workflows. It had started as a library of prototypes and was quickly adapted by different teams when they brought it to QuantumBlack Labs, the technical innovation group led by Michele Battelli.

“Client teams can rotate in our lab and have the resources to convert a single software or database [such as Kedro] into a viable product that can be used in all industries and that will be continuously improved, ”explains Michele. “It’s a powerful way to innovate; our technical teams can evolve faster, more efficiently and make a lasting contribution.

Tried and tested

McKinsey has used Kedro on over 50 projects to date. According to Tsaousis, customers particularly appreciate his pipeline visualization. He explains that Kedro makes conversations much easier, as customers immediately see the different transformation stages, the types of models involved, and can go back to the source of the raw data.

“Kedro started out as a proprietary program, but when a project was completed, customers could no longer access the tool. We had created technical debt, ”Tsaousis said. “By converting Kedro to an open source tool, customers can use it after leaving a project – it’s our way of giving back. “

“There is a lot of work ahead, but our hope and vision is that Kedro should help advance the standard on how data and modeling pipelines are built around the world, while also enabling continuous learning and accelerated. There are huge opportunities for organizations to improve their performance and data-driven decision making, but seizing these opportunities at scale and securely is extremely complex and requires intense collaboration, ”says Palmer. “We are keenly interested to see what the community is doing with this and how we can work and learn more quickly together. “

Characteristics

The features provided by Kedro understand:

• A standard, easy-to-use project template, allowing collaborators to spend less time understanding how you set up your analysis project.
• Data abstraction, managing how you load and save data so you don’t have to worry about your code reproducibility in different environments.
• Configuration management, helping you keep credentials out of your code base.
• Fosters test-driven development and industry-standard code quality, reducing operational risk for businesses.
• Modularity, allowing you to break up large chunks of code into smaller self-contained and understandable logical units.
• Pipeline visualization makes it easy to see how your data pipeline is being built.
• Seamless packaging, allowing you to ship your projects to production, for example using Docker or Airflow.
• Manage versions of your machine learning data sets and models every time your pipeline runs.


Share.

Comments are closed.