Big Data and Data Science: Gold Rush or Bright Future?
If you haven’t noticed yet, big data has become big in the last few years, and data science has become the next big profession. That is, if you believe the Internet, where predictions are made that Chief Data Scientist roles could displace CIO positions.
But technology trends come and go. Some reach dazzling heights before they crash and disappear. Some (such as cloud computing) become ubiquitous to the point that their commonality makes them almost transparent. I have been following big data for quite a while now, trying to get some measure of its future path. To do that, I look at it from several aspects:
Technology has made the big data/data science domain feasible by providing tools that lower the cost of entry considerably. Tools such as Hadoop and Mongo DB make large scale data processing feasible for users who cannot afford the investment of mainframe computing power. Statistical processing and machine learning languages/frameworks such as R, NLTK, NUMPY, and Scikit-learn provide access to advanced algorithms.
Then there is the cloud as a purveyor of big data and data science services. Almost all of the tools mentioned above are available through service providers in various shapes and form. The growing focus is on making these rather complex tools more and more accessible. A good example here is Azure Machine Learning, which is a modelling studio combined with a deployment and production platform.
Accessibility of Big Data
There is increasing accessibility to big data. On an individual level, we already are used to the fact that before a job interview, a prospective employer will have Googled, LinkedIn’d and Facebooked our public profiles on the Internet. Now this is migrating to the next level, with all these social service providers selling our digital life histories to third parties. So when we apply for a loan or health insurance, we can expect that our past behavior, both in its individual form and as a collective of a social group, is purchased by the financial institution or insurer for the purpose of developing a commercial response to their (prospective) customers.
To put this into perspective, the number of APIs (the means by which these services are made possible) available online has been doubling every 18-24 months over the last five years according to ProgrammableWeb.
So we have the supply side of this domain covered, but for whom does big data/data science mean value? Here the picture is more mixed:
- There are businesses that traditionally trade on data, such as insurance companies and almost all financial services firms. For these companies, data has always been critical and will become more so. Their biggest issue is not big data, but the ability to understand the limits and value of the data at hand. What used to be the domain of a small community of quantitative analysts will grow into a larger community of specialized data scientists.
- Other companies will increasingly benefit from big data, and may even need to rely on it for survival. These are the companies that operate on slim profit margins in a competitive world, and for whom even small percentage increases in margins, customer retention, and efficiency are essential. A good example is mass-market retail, where we see an increasing focus on effective, targeted marketing using big data/data science.
- This will also indirectly benefit companies in fields such as marketing, market research, and risk management. Weather forecasting may have been the front-runner for a long time, but meteorologists are no longer alone on the leading edge of high-complexity data analysis.
- After that it becomes increasingly difficult to see how many other companies will directly benefit from big data. Of course, there will be a few genuinely new domains of application that we may not yet be aware of, but these are likely to be few. After all, how many Googles are out there today?
Invest now or wait until the dust has settled?
If you are a service provider in this domain then now would probably be the right time to get established, attract talent and develop a customer base. Data science is a highly technical specialty, and after the eventual shake-out those companies that have attracted the right talent and developed a reputation for delivering good value should expect a good long-term future.
If you are a company that needs to consume big data/data science services, the answer is definitely mixed. If you are in one of the domains mentioned above (financial services, insurance, or even retail) then you probably cannot avoid investing in these services. If you are big, you can develop these in-house; if you are mid-sized you would probably do well finding a partner who has the skill sets.
If you are in neither of the two groups above, then it’s important to understand what your real needs are. You will have to be able to define the need and its value well before you invest. As big data is often about delivering incremental (or even delayed) benefits, you cannot afford to invest and then run out of budget half-way through. For instance, if you invest in log analysis for your network traffic as a means of intrusion detection, you may never see a direct benefit unless you experience an extensive hack attack (of which we heard enough recently). Again, consider a partner or look for vendors that deliver tools that make use of big data analysis for your needs.
For individuals, this is indeed a gold rush situation. Data science meetups are getting increasingly crowded, as many companies are looking to hire and everyone wants to cash in on lucrative jobs. However, as already noted data science is a profession that requires a measure of cerebral ability. It demands analytical ability as well as a capability to think conceptually, understand the needs of business, translate these into models and experiments, and transform these into programs that deliver numbers that ultimately are underpinned by statistical means identifying their reliability and meaningfulness. Such a generalist profile is not that common to find, and unless you have these skills, you should expect to be part of the casualties that will eventuate once the peak of the hype curve has passed and the crash into the trough of disillusionment happens.