Data Visualization — How to Pick the Right Tool ?

Data Visualization — How to Pick the Right Tool ?

Photo by Firmbee.com on Unsplash

Maybe the best way to start comparing capabilities, advantages and disadvantages of Analytic Platforms for Data Science is seeing the Gartner recently published magic quadrant report on data science and machine learning (DSML) platforms.

Gartner Magic Quadrant

Gartner evaluates vendors based on their support for a cohesive, complete and end-to-end pipeline for data science platform. Commercial availability and customer adoption of platforms acted as major influencers in defining the vendor ranking. There is special emphasis on the availability of the platform in the public cloud, hybrid cloud, and on-premises data center.

How does a Gartner Magic Quadrant work?

A Magic Quadrant provides a graphical competitive positioning of four types of technology providers, in markets where growth is high and provider differentiation is distinct:

1st. Leaders execute well against their current vision and are well-positioned for tomorrow.

2nd. Visionaries understand where the market is going or have a vision for changing market rules, but do not yet execute well.

3rd. Niche Players focus successfully on a small segment, or are unfocused and do not out-innovate or outperform others.

4th. Challengers execute well today or may dominate a large segment, but do not demonstrate an understanding of market direction.

Image from Gartner

Gartner Magic Quadrant for Data Science and Machine Learning Platforms

The magic quadrant report is one of the most credible, genuine, and authoritative pieces of research from Gartner. Since it influences the buying decision of enterprises, vendors strive to get a place in the report.

Gartner published its magic quadrant report on Data Science and Machine Learning (DSML) platforms in March 2021.

Their evaluation criteria, based on the experience of expert data scientists and other professionals working in these roles, include vendor’s ability to execute and business completeness of vision.

Image from Gartner

The Leadership Quadrant

According to Gartner, leaders have a strong presence and significant MindShare in the Data Science and Machine Learning (DSML) market. They demonstrate strength in depth and breadth across the full data exploration, model development and operationalization process.

For 2020’s magic quadrant, 6 companies made it to the leadership quadrant. What’s common among these vendors is their proven track record of delivering end-to-end data science platforms. SAS, TIBCO, MathWorks have a heritage of building data and analytics-based platforms. Interestingly, some of the young companies such as Alteryx, Databricks and Dataiku have found a place in the top right quadrant.

All the vendors in the leadership quadrant offer commercially viable, platform-agnostic, mature data science platforms.

Dataiku DSS

Dataiku is the only all-in-one and centralized data platform that moves businesses along their data journey from analytics at scale to Enterprise AI, powering self-service analytics while also ensuring the operationalization of machine learning models in production.

Alteryx

This platform was conceived to set up governed domains with the centralized server environment of Alteryx Analytics Hub (AAH). It is possible to control the publishing, cataloging, sharing, and governance of assets and analytic workflows, as well as, to easily scale, organize, search, and track assets. AAH includes built-in scheduling and multi-tenancy.

Databricks

The Data Science Workspace is a collaborative environment for practitioners to run all analytic processes in one place, and manage ML models across the full lifecycle.

Product Components:

Collaborative Notebooks

Machine Learning Runtime

Managed MLflow

TIBCO

TIBCO Data Science software helps organizations innovate and solve complex problems faster to ensure predictive findings quickly turn into optimal outcomes.

SAS

SAS Visual Analytics provides a single application for reporting, data exploration and analytics.

MathWorks

MATLAB makes data science easy with tools to access and preprocess data, build machine learning and predictive models, and deploy models to enterprise IT systems.

The Challengers Quadrant

Gartner calls challengers as someone with an established presence, credibility, viability and robust product capabilities but falls short of becoming a leader by a few notches.

IBM is the lone ranger of the challengers’ quadrant. Among the product can be found on DataScience orientation is Watson Studio as the ML PaaS and IBM Cloud Paks for on-premises makes IBM a unique player in the market. But IBM’s frequent rebranding and renaming of product portfolio hurt its ability to establish Watson as a top brand for ML and AI.

IBM

IBM Watson Studio provides tools to more easily work and collaborate with data to create and train models at scale. It gives you the flexibility to build models where your data resides and to deploy it anywhere in a hybrid environment, so you can get data science up and running faster.

The Niche Players Quadrant

According to Gartner, niche Players demonstrate strength in a particular industry or approach or pair well with a specific technology stack. They should be considered by buyers in their particular niche.

Altair and Anaconda share the bottom-most left quadrant. Altair leapfrogged to this quadrant through the acquisition of Datawatch which in turn acquired Angoss. Both Datawatch and Angoss were a part of the niche players’ quadrant in 2019.

Anaconda is a focused data science company that offers both open source and commercial platforms. A massive community combined with the simplification of Python and R-based libraries and packages makes Anaconda a niche player in the market.

Anaconda

Anaconda Enterprise supports your organization no matter the size, easily scaling from a single user on one laptop to thousands of machines with failover controls and security built in. No headaches, no IT nightmares.

Image from Anaconda

Altair

Altair has solutions that are designed for many different skill sets: from experienced data scientists, IT/MLOps engineers, data engineers, and business analysts and executives. Connect to and transform almost any data source, from structured databases to real-time streams and cloud data sources. Our focus on an extensible data analytics platform means that Altair will not disrupt existing analytic investments.

The Visionaries Quadrant

Gartner calls companies with products that have the potential to influence the market as visionaries. The companies range from early-stage startups to well-established platform companies but their product offering could still be new and emerging.

In 2020 DSML MQ report, this is the most crowded quadrant. With 7 diverse players sharing the space, it does make it an interesting space to watch out.

DataRobot, Domino, Google, H20.ai, KNIME, Microsoft and RapidMiner are the DSML visionaries for 2020. Following the marketing campaigns from Google and Microsoft, an average user might assume that they would be in the leaders’ quadrant. But based on Gartner’s criterion, both Microsoft and Google lack a viable on-premises DSML platform that works against their score.

Using the Gartner Magic Quadrant as a first step to understanding the technology providers, will help you and your team to consider various investment opportunities.

Keep in mind that focusing on the leaders’ quadrant isn’t always the best course of action. There are good reasons to consider market challengers. And a niche player may support your needs better than a market leader. It all depends on how the provider aligns with your business goals.

DataRobot

DataRobot’s Automated Machine Learning product accelerates the productivity of your data science team while increasing your capacity for AI by empowering existing analysts to become citizen data scientists. This enables your organization to open the floodgates to innovation and start your intelligence revolution today.

Domino

Domino centralizes data science work and infrastructure across the enterprise for collaboratively building, training, deploying, and managing models — faster and more efficiently. With Domino, data scientists can innovate faster, teams reuse work and collaborate more, and IT teams can manage and govern infrastructure.

KNIME

KNIME Analytics Platform is the open-source software for creating data science. Intuitive, open, and continuously integrating new developments, KNIME makes understanding data and designing data science workflows and reusable components accessible to everyone.

Microsoft Azure

Microsoft provides a full spectrum of analytics resources for both cloud or on-premises platforms. They can be deployed to make the execution of your data science projects efficient and scalable. Guidance for teams implementing data science projects in a trackable, version-controlled, and collaborative way is provided by the Team Data Science Process (TDSP).

The analytics resources available to data science teams using the TDSP include:

Data Science Virtual Machines (both Windows and Linux CentOS)

HDInsight Spark Clusters

Azure Synapse Analytics

Azure Data Lake

HDInsight Hive Clusters

Azure File Storage

SQL Server 2019 R and Python Services

Azure Databricks

Google Cloud Platform

For those of you not familiar with Google Cloud, the Google Cloud Platform (GCP), is a suite of cloud-based computing services designed to support a range of common use cases; from hosting containerized applications, such as a social media app, to massive-scale data analytics platforms, and the application of advanced machine learning and AI.

Google Cloud is one of three major cloud providers currently in the marketplace, the other two being Microsoft Azure and Amazon Web Services (AWS).

H2O ai

H2O is a fully open-source, distributed in-memory machine learning platform with linear scalability. H2O supports the most widely used statistical & machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more. H2O also has an industry-leading AutoML functionality that automatically runs through all the algorithms and their hyperparameters to produce a leaderboard of the best models. The H2O platform is used by over 18,000 organizations globally and is extremely popular in both the R & Python communities.

RapidMiner

RapidMiner Studio is a comprehensive data science platform with visual workflow design and full automation. Has the following features:

Visual Workflow Designer

Connect to Any Data Source

Automated In-Database Processing

Data Visualization & Exploration

Data Prep & Blending

Visual & Automated Machine Learning

Model Validation

Explainable Models Not Black Boxes

Get More From R & Python Code

Flexible Scoring & Model Operations

Automation & Process Control

Open & Extensible

Gartner MQ for DSML platforms: Outliers.

Gartner surprisingly has put AWS in the honorable mentions along with SAP, Oracle, and Teradata, sincerely AWS deserved a place in the visionaries’ quadrant. Startups like Cloudera, FICO and Iquazio found a place in the honorable mentions.

Through this document are described the main elements in Gartner Magic Quadrant, as well as its evaluation criteria. Are described in detail all vendors from their corresponding quadrant.