Enough of the hype, books, conferences, fashion and even the so called sexiest job title, it appears like Data Science has been here for eternity. It continues to stir up the aspirations for so many, from young college graduates to being in the priority list of many IT leaders who wish to flaunt their Big Data and Data Science investments and speak gleefully on how their new age Data practice has been successfully integrated with their core businesses.
But, the reality is far from being pleasant and in all honesty, every IT leader would quietly accept behind doors that for all the investments made in the Data space, the results have been much disappointing. Why is it that way? Unlike any other technology paradigm that quietly hits mainstream success, why is it that bottom-line results on data science has been more farce and why do so many early adopters still continue to struggle with data?
With all my experience around setting up Data Science teams from ground-up and trying to make it work across large enterprises, I can say the blame cannot rest anywhere close to the maturity of the technology or the promise of Data. Increasingly ours is a data driven world today and almost everything we consume today leaves so much of data exhaust, which in turn completes the cycle of meaningful data driven products. So, that leaves absolutely no question on the Data Driven philosophy. Rather, it is all to do with our mindset, organization culture and the way we apply the tools.
The below is a list of what I would consider as primary factors for the Data Science struggle and is in no way complete. There could be zillions of other reasons why projects, if not data science projects alone, fail.
1. Cart before the Horse. It is not about the tools or the talent and it is more to do with a clear purpose on what want to do with Data. The ‘Science’ portion of the name could have been unique, but, industries and especially the scientific community have relied on Data for their core decision making and intelligence for ages. One of the popular story on earliest usage of Data that I personally like, would be on how the acclaimed astronomer and mathematician Johannes Kepler made use of all the data that his predecessor Tyco Brahe had collected, to derive his Kepler’s laws and most importantly the elliptical orbit of planets.
Organizations that have been most successful with data are those that have put the Need and Culture before the Means of getting there. Any meaningful result on Data Science investment comes from a clear purpose on WHY we do what we do and not by a stroke of luck or people.
2. Dump the Jargons. As Data Science claimed its ascendancy, so too has been the usage of dense vocabulary and ill-defined jargons. It is just not about using regression or bayesian or supervisory vs un-supervisory learning methods.Today, we are a society that is spoilt for choices and there is too much of everything. Just like any other technology innovation passing through the hype cycle before hitting mainstream, Data Science and related technologies today is as much crowded and confusing. What makes it worse is the way these jargons blur the meaningful purpose on what needs to get done and rather keep the ill-informed wander around randomly in a mega store.
One of the primary reason for failure is that the people with the fashionable job title attempt to flaunt or overwhelm the lesser mortals in the organization with jargons and complicated words to get simpler tasks done. End of day, technology is about simplifying life and it should never be the contrary. You cannot force feed a combine harvester on someone who wants to pick cherries in his backyard. He just would care less and would never mind doing it by hand.
3. Garbage in Garbage out. I would attribute this as the No. 1 reason for struggle. How long have we been taught this simple common sense – the output is only as good as the input data? There is absolutely no miracle and unfortunately there is a huge misconception that the new technologies have some spooky power to create analytic wonders.
In the absence of sound Data Management practices, most of the Big Data implementations turn out to be a trash can with the analysts struggling to stitch together disparate data elements and trying to make sense of them. Unfortunately, the legacy industries cannot wash away their bad data management sins with fancy technology spend. Because of this, significant amount of time – and I would imagine it as high as 70% of the total effort goes into cleansing, integrating and defining the data elements – which many of the new data scientists find it hard to accept. To be a successful data scientist, you really need to get your hands dirty with the legacy data quality grease. As the saying goes, Data may be the new Oil, but, just like its fossil equivalent, Data needs to be refined before you can consume any of its by-products. Unlike the oil industry, neither can you outsource the refinement task nor can you leave it to a miracle tool. For organizations, it is not that Data is the new oil, it is the refined data that is the new oil; else it is as much a gooey mess.
4. Access to Data. This would be in the Top 3 of the failure reasons. For all right reasons, many of the legacy organizations have strong Information Security controls which shows up as tough bureaucracies that hamper timely access to data. Especially with no common agenda on the Data Science mission across the organization, the data custodian on most cases just don’t care on the data access. For majority of the organizations, it is a double edged sword in walking the tight rope of tighter Information Security policies and meeting the ‘fancy’ needs of the Data Science teams. Yes – ‘fancy’, that’s how Data Science is still felt across many of the organizations and that is a clear indication on the spread of Data culture. Not having access to Data is one of the primary reasons for frustrations and attrition within the Data Science community.
5. Yes, it is Science! Most of the traditional organizations are used to build applications in one way and bad habits die hard. Data Science is all about answering questions with data or getting insights that we never knew existed. Some of the questions on the Descriptive and to some extent on the Predictive insights like ‘Who are my frequent customers and what do they buy?’ or ‘Who has a higher probability of buying this product?’ can be asked upfront.
The other important facet of Data Science is that, it is really a Science, which requires a different mindset of forming hypothesis, collecting data and conducting experiments to either prove or disprove your hypothesis. Some of these experiments take time, single-minded effort and patience which unfortunately is in short-supply in majority of the firms that solely base their performance on a quarter-to-quarter basis.
6. Data Science is a multi-disciplinary talent. I am often asked this question by many aspirants on what skills one should pick up to be good at Data Science. My answer has always been, Data Science is a multi-disciplinary skill and that is one of the reason why the ‘real’ Data Scientists are hard to find. To be good at Data Science, you need to wear multiple hats including Data Munging skills, Programming, Communication, a clear understanding and knowledge of the problem domain, and more importantly a sense of Curiosity, a deep Passion to experiment with Persistence. It is really hard to find such a talent in an industry that has predominantly gotten used to and benefitted from division of labor.
In the absence of such unique talent, Data Science at best is a team sport which at a minimum includes the Business Teams, Data Analyst, Data Scientist and the Data Engineer in the team. Any team sport requires everyone to play their role in perfect synergy starting with a clear vision and operating methods. No wonder it is a struggle in traditional organizations with distributed responsibilities with no collective data science vision.
7. Infatuation. What could be one of the primary reason for a data centric culture in many organizations? The leader happens to be an avid reader and he just finished reading a cool blog on Machine Learning. More often the genesis of any Data Science investment in many organization is often triggered by a smart database vendor, conferences or a leader with good reading habits. As much it is a fashionable job title to hold, without a meaningful purpose, the data science investment is just a shiny car to be left on the porch to the envy of your neighbors.
Bottomline, the goal is to achieve a Data driven culture and to look at the world in an objective manner with all the tools and possibilities that we are endowed with these days. Unfortunately many organizations lack a clear Data Strategy and in the absence of that no one likes the new 007s walking around with the latest gadget, sipping his martini and speaking a language that is foreign to the common man who still toils day and night with his greasy job to keep the business up and running.
In my subsequent posts, I will write more on my views on how to get a Data Science strategy right to avoid the above pitfalls.