Data

Reality Check on the Data Science Hype in the Post-COVID-19 World

We took a look at how reality may change for the cosy world of data science and the Data Scientists working in it.

“Despite thousands of bodies throwing themselves at the problem – the results are not noticeable. “

The impact of Covid-19 will be massive across the globe, affecting every industry and every profession.

A quite uncomfortable starting point would be the recent scrambling of Data Scientists offering their tools and time to fight the Coronavirus which exposed how limited the profession can actually be.

Despite thousands of bodies throwing themselves at the problem and numerous tools offered for free to fight the virus – the results are not noticeable.

The smartest people on the planet with the sexiest job and cutting-edge tools did not achieve anything close to remarkable.

Most achieved nothing at all.

It’s not entirely unusual or surprising.

It is unsurprising because this is how the typical Data Science delivery looks behind the corporate walls.

Lots of buzzwords, big promises, PowerPoint decks touting huge potential but in reality… in reality business stakeholders increasingly started voicing their concerns about Data Scientists not delivering ROI.

After many years of admiration for the data wizards congregating in their ivory towers, some senior execs finally gather the courage to admit: “We don’t know what they are doing over there”.

And while they might have cut a cheque in the good times – will they still do it in the era of massive layoffs and the global economy contracting?

To understand what is going on, we need to look back at how it started in the first place.

Creation of Data Scientists

“Weeks of Data Engineering, 1-day training Machine Learning models. Welcome to Data Science.”

The key skill needed to create a Predictive Model is the ability to train Machine Learning (ML) models.

ML learns based on a large pool of examples and without them it’s useless.

And here is the crux of the issue – the real-life data is very rarely laid out as examples.

It’s more of long-winded stories recorded in the database introducing the characters, developing them through time, sometimes jump over plot-holes, just to end in the least expected moment.

ML can’t take any of it.

Hence our ML specialists spend their lives on of single-handedly transforming the databases into an ML-friendly format.

They improvise, hacking their way through, desperately trying to reshape the data by any means available.

People on the outside are not even trying to understand what is going on.

Over time it may look like this person – called a Data Scientist now, is the only one able to start with the raw data and work out their way to a Model.

A unicorn.

It seems though that everybody lost the sight of the fact that there have been plenty of professionals around, working on reshaping the data.

They are called Data Engineers.

Weeks of Data Engineering plus 1 day of training Machine Learning models.

Welcome to Data Science.

Data Scientists

Data Scientists are Machine Learning specialists forced to do ad-hoc Data Engineering.

It’s that simple.

And it’s bad for several reasons.

Machine Learning specialists are not trained in best practice in Data Engineering.

As such, typically they are not very good at it.

They also don’t like it.

They resent that work and often feel it’s beneath them.

Reality Check

“The slowness of Data Scientists created the impression that the world needs a lot more Data Scientists. But it doesn’t.

This is a good time to mention that over the last few years almost anybody who was tasked with independently delivering something, anything of business value from a messy data started calling themselves a Data Scientist, making the popular understanding of the role extremely muddied.

How does that link to the “fight Covid-19” fiasco?

Firstly – most of the Data Scientists declaring the “fight with the virus” did not fully realize that there is no data to work on.

It’s the same hubris that promised to change the business with AI and later delivers a logarithmic chart.

Secondly – those who got access to the real data, meaning – patient-level data with a medical history, will spend weeks if not months on engineering this mess into a format ready for analysis.

Because data engineering executed by Data Scientists is very, very slow.

Data Scientists

This slowness created the impression that the world needs a lot more Data Scientists to do the data science.

But it doesn’t.

It’s been three decades – how do we fix this?

“Extracting insights from the data should be a part of the fabric in the organisation – a part of a process, not a project.”

This inefficient, expensive, over-promised, and misguided execution of analytics has been in place for over 3 decades.

In 2018/2019 though there has been a growing realisation that the ROI on slow analytics is not there.

Add the pandemic to the mix and the decision-makers wake up every day thinking – how yesterday’s news will impact us going forward? What changed in our business in the last few days? Are our months-old models any good right now?

So how do we fix this?

Our take via people-process-technology lenses:

People

Business Stakeholders

Business Stakeholders need to pull their heads out of the sand and take a hard look at the data they want to use in the decision making.

If their data infrastructure needs radical improvement (it does) then they need to put the money where their mouth is and sponsor this.

Data Scientists

Data Scientists need to accept the reality that in 99% of the cases right now the “data” is not ready for “science”.

It actually is many miles away from it.

Get down from your ivory towers and start delivering FAST using automated data engineering platforms instead of competing on Kaggle or your ground will be shaky soon.

Some of you got hit already.

Data Engineers

Data Engineers can be the second to get some cheese by quickly upskilling themselves in Machine Learning and jumping on the already slowing Data Science bandwagon.

It’s not that hard and you will kick ass with your data engineering experience.

Processes

Delivering Data Science has been traditionally project-based.

In the current day and age this design is obsolete.

The need for fresh insights and models just jumped up a notch – while the need for analysis older than 2 months evaporated.

The rapidly changing economy monitoring changes and extracting insights from the data should be a part of the fabric in the organisation – a part of a process, not a project.

Technology

In the rampant progress of automation – talking about the world needing thousands of new Data Scientists is madness.

If the data side of the house is fixed and the right technologies are put in place then an average organisation only needs one Data Scientist for their data science efforts.

It’s a C-level role when a senior exec is responsible for organising the infrastructure to allow Business Users to interact with the data without the layers of people in-between.

And this is what the new world will expect – being connected directly to the data.

In real-time.

With ease.

Author

  • Maciek Wasiak

    Maciek Wasiak is the CEO of Xpanse AI, has a Ph.D. in AI, and 15 years of experience in leading Data Science delivery across Telecom, Banking, Insurance, Airline, and Healthcare sectors.

    View all posts

Related Articles

Back to top button