Dataiku: What is it? How to use it? Ultimate Guide 2023
Dataiku is an artificial intelligence platform created in France in 2013. It has since become one of the world’s benchmarks for data science and machine learning studios.
What is Dataiku?
Dataiku is a data science platform of French origin. It stands out historically for its very packaged and integrated character. This puts it within the reach of both experienced and novice data scientists. Thanks to its ergonomics, it allows you to create a model in a few clicks, while industrializing the entire processing chain as a backdrop: collection, preparation of data, etc.
Co-founded in 2013 in Paris by Florian Douetteau, its current CEO, and Clément Stenac (both ex-Exalead) alongside Thomas Cabrol and Marc Batty, Dataiku is experiencing rapid growth. In 2015, the company established itself in the United States. After raising $101 million in 2018, Dataiku is closing a round of $400 million in 2021 for a valuation of $4.6 billion. The company has more than 1,000 employees and more than 300 customers among the largest global groups. Among them are the French companies Accor, BNP Paribas, Engie and SNCF.
Dataiku DSS, what is it?
Dataiku DSS (for Dataiku Data Science Studio) is the name of Dataiku’s AI platform.
What are the features of Dataiku?
The Dataiku platform has around 90 functionalities that can be grouped into several major areas:
- Integration. The platform integrates with Hadoop, Spark, but also with cloud services AWS, Azure, Google Cloud. In total, the platform is equipped with more than 25 connectors.
- Plug-ins. A gallery of more than 100 plugins allows you to benefit from third-party applications in many areas: translation, NLG, weather, recommendation engine, import/export of data…
- Data preparation / data ops. A graphical console handles data preparation. Time series and geospatial data are supported. More than 90 prepackaged data transformers are available.
- Development. Dataiku supports Jupyter notebooks, Python, R, Scala, SQL, Hive, Pig, Impala languages. It supports PySpark, SparkR and SparkSQL.
- Machine Learning. The platform includes a machine learning automation engine (auto ML), a visualization console for training deep neural networks, support for Scikit-learn and XGBoost, etc.
- Collaboration. Dataiku integrates project management, chat, wiki, versioning (via Git)…
- Governance. The platform offers a model monitoring and auditing console, as well as a feature store.
- The MLOps. Dataiku handles model deployment. It supports Kubernetes architectures as well as Kubernetes as a Service offerings from AWS, Azure and Google Cloud.
- Data Visualization. A statistical visualization interface is complemented by 25 data visualization charts to identify relationships and insights within datasets.
What is the price of Dataiku?
Dataiku offers a free edition of its platform to install yourself. Called Dataiku Free, it is limited to three users, but gives access to the majority of features. It is available for Windows, Linux, MacOS, Amazon EC2, Google Cloud and Microsoft Azure.
To go further, Dataiku markets three editions whose prices are available on request: Dataiku Discover for small teams, Dataiku Business for medium-sized teams, and Dataiku Enterprise to deploy the platform at the scale of a large enterprise.
What is Dataiku Online?
Mainly designed for small structures, Dataiku Online makes it possible to manage data science projects on a moderate scale. It is a SaaS (Software as a Service) device. Features are similar to Dataiku, but setting up and launching the app is faster.
Dataiku Academy: Dataiku training and certification
The Dataiku Academy brings together a series of online training courses on the Dataiku platform. It offers a Quicks Start program that allows you to start using the solution in a few hours, but also Learning Paths sessions to acquire more advanced skills. Each program leads to a Dataiku certification: Core Designer Certificate, ML Practitioner Certificate, Advanced Designer Certificate, Developer Certificate and MLOps Practitioner Certificate.
Dataiku vs. DataRobot
Created in 2012, the American DataRobot can be considered as the historical pure player of automated machine learning (auto ML). A ground on which Dataiku positioned itself later. As they develop, the two platforms now tend to be more and more comparable.
Compared to DataRobot, however, Dataiku stands out on the collaboration front. The publisher multiplies the functionalities in this area: wiki, sharing of results dashboards, role management and action traceability system, etc.
Dataiku vs. Alteryx
While Dataiku is primarily a machine learning-oriented data science platform, Alteryx is positioned as a business intelligence solution potentially targeting any business decision maker, well beyond data science teams.
The main added value of Alteryx is to automate the creation of analytics dashboards. Dashboards that may include predictive indicators based on machine learning models. With this in mind, Alteryx integrates automated machine learning (auto ML) features to allow users to generate this type of indicator. This is its main point in common with Dataiku.
Dataiku vs. Databricks
Dataiku and Databricks are very different platforms. The first focuses on data science, the design and deployment of machine learning models. The second comes in the form of a universal data platform that meets both data warehouse and BI and data lake-oriented use cases, but also data streaming and distributed computing.
Still, Databricks is increasingly enriched with machine learning-oriented features. The San Francisco company acquired the low-code / no-code data science environment 8080 Labs in October 2021, then the MLOps platform Cortex Labs in April 2022. Two technologies that it is in the process of integrating.
Dataiku Community: tutorials and documentation
Dataiku Community is a space for exchange and documentation to perfect your knowledge of Dataiku and its fields of application. After registration, it is possible to join the discussion forum.
ABOUT LONDON DATA CONSULTING (LDC)
We, at London Data Consulting (LDC), provide all sorts of Data Solutions. This includes Data Science (AI/ML/NLP), Data Engineer, Data Architecture, Data Analysis, CRM & Leads Generation, Business Intelligence and Cloud solutions (AWS/GCP/Azure).
For more information about our range of services, please visit: https://london-data-consulting.com/services
Interested in working for London Data Consulting, please visit our careers page on https://london-data-consulting.com/careers
More info on: https://london-data-consulting.com