Predictive Analytics – Automated vs Traditional

Maciek WasiakUncategorized

Exec Summary:

  • Traditional Predictive Analytics is too slow and inefficient for today’s business needs.
  • Automated Predictive Analytics is the future.
  • Marketing noise and false messaging are plaguing the Predictive Analytics market.

Disclosure: this is not an article designed with a hidden agenda to promote one approach over another. On the contrary, as the developers of Xpanse AI, we are very open about our firm belief in the automated way of delivering Predictive Analytics.

Traditional vs Automated – what’s the difference?

Despite the red-hot hype making it sound like something new – Predictive Analytics has been around for decades.

The number of tools you can use while delivering Predictive Models has exponentially grown in the last few years, yet the amount of effort burnt on delivering models has not changed much.

It’s important to understand why.

It’s because the software solutions have been focusing on the “sexy” Machine Learning (aka Modelling) stage of Predictive Analytics. Modelling has always been quick and takes only a small fraction of the project time, hence any improvements in that stage have a marginal impact on the entirety of project effort and timelines.

The bulk of the effort goes into data preparation where Data Scientists transform disparate data-sources into a cohesive and flat Modelling Dataset – and this is typically 80%-90% of project effort . More about that further on.

The truth is that “Traditional” Predictive Analytics delivery typically takes enough time and effort to threaten the ROI of the whole initiative – and data preparation is the main cause of that.

We offer a paradigm shift that replaces the manual data preparation stage with automated processing thus shortening the overall project’s time-frame from months to days.

One might say that we are not the first one to claim that – e.g. H2o or DataRobot (and others) use similar messaging when advertising their products.

We will debunk that below but first a crash-course on Predictive Analytics,

The Way of our Fathers – Traditional Predictive Analytics

While CRISP-DM breaks down 6 stages in the analytical process – in practical delivery mode there are actually 3 major development steps plus deployment.

Step 1: Target Definition

This stage involves defining the behaviour that you would your model to predict e.g. customer churn, product cross-sell, fraudulent activities.

This is very much the core starting point where the future benefits of the project may be won or lost.

Experienced veterans may spend days and sometimes weeks analysing historical usage patterns, brooding over human nature, triggers in our minds, false flags in the data and time-to-action.

It seems that the more experience we have – the more challenging the task can present itself.

Each Target usually converts into a separate Modelling project.

Step 2: Feature Engineering

We define Feature Engineering as a process of transforming the source data – typically residing in a raw non-aggregated form – to the format of a Modelling Dataset, ready to be pushed through Machine Learning.

Typically, we start with a database full of diverse tables, none of which is digestible by Machine Learning. 

What we have to do is to transform this data to a format ready for Machine Learning by creating so-called Features (aka Variables, Input Columns, Factors) which roll the raw data-points up to the Target level.

This process consumes ridiculous amounts of effort. This is exactly why there is a consensus in Data Science community that “Data preparation takes 80%-90% of the project time“.

It is manualslow, error-prone and it is also completely open-ended because there is no limit to the number of Features we can generate from the source data.

Each additional Feature can increase the future model’s performance – but with the project clock ticking, eventually we have to move to Machine Learning (aka Modelling) task, no matter how satisfied – or not we are with the final Modelling Dataset.

 Step 3: Machine Learning

The Machine Learning part, the “magic” in the Data Scientist’s arsenal of skills actually takes very little time.

Why? Because it is automated. Always has been.

No one is coding up a Neural Network or a Random Forest manually from scratch. We let the pre-built algorithms do the job.

The Future – Automated Predictive Analytics

Ask yourself how much work is needed before you can use a tool and then again how much after you have finished using it

What we mean by “Automated” is the ability for the software to replace as much manual effort as possible, eliminating bottlenecks in the process.

DataRobot hails “Automated Predictive Modelling” as their tagline – which is quite accurate, considering their efforts to streamline the Modelling part of the project.

However, their further messaging that “DataRobot does in hours what used to take months” muddies the waters instantly. Predictive Modelling never takes months. On my first encounters with Predictive Modelling software in the nineties, we were building models within minutes of processing.

The catch for the users of DataRobot is that they still need to prepare the Modelling Dataset outside their tool – and this, indeed, still takes weeks or months.

Another vendor – H2o claims “Automated Feature Engineering” which sounds exactly like what we need – but as long as they require a singular Modelling Dataset as an input (which they do) we would brand this claim as seriously misleading because (again) the users have to integrate the data and create the Features bundling them into a Modelling Dataset outside the H2o tool.

It seems that anytime you see the claim “Automated Predictive Analytics” it is key to look at your source data and consider how much work is needed before you can use the tool and also – how much work is needed after you generate your model.

This is why we developed Xpanse AI.

Xpanse AI (www.xpanse.ai) digests multiplenon-aggregated data sources and after a few clicks it runs automatically through the entire process, outputting insightful visuals and the final Model ready to be deployed on your Data Warehouse

With weeks of manual labour taken out of the equation – Data Scientists are free to focus on finding relevant Targets and delivering all the models that their business needs in a matter of days.

And if you still like some of the functionalities of other tools – you can always export the automatically built Modelling Dataset and utilise it with your existing tolls of choice – be they R, Python, DataRobot, H2o or any other modelling tool of your choosing.

Your data – your choice.

Maciek

CEO of Xpanse AI