
Predictive analytics is changing the face of business – creating a new era of competition. Driving this revolution is R, a powerful open source programming language which is moving from the laboratory to the executive office.
We live in a world that is driven and defined by data. Every moment of every day, huge volumes of data are generated, captured and stored.
Wal-Mart, for example, conducts more than 1 million customer transactions every hour, sending a steady deluge of information to data warehouses that are already among the largest in the world.
At the same time, organizations all over the world are recognizing the competitive advantages that are created when data is properly organized, analyzed and managed. Companies across a broad spectrum of industries – including retail, service, manufacturing, pharmaceutical, finance and consumer product goods – are convinced that data represents a new form of capital.
Yet only a tiny fraction of this data is ever put to use. Why? Because most of the tools that were built to analyze large amounts of data are slow, expensive and old. Moreover, they were designed to be used almost exclusively by “quants,” who tend to be highly trained specialists with advanced degrees in statistical analysis.
Welcome to the World of R
The era of these legacy analytic tools is ending, and a new era is beginning -- marked by analytic solutions that are faster, more cost-effective, user-friendly and extensible.
These modern analytic technologies can handle very large volumes of data, at very high speeds. Analytic processes that used to take days to perform can now be accomplished in minutes. Imagine the value of sifting through mountains of information and gleaning the knowledge you really need to make better decisions.
The newer, faster and more powerful technologies that make it possible to find needles of insight in haystacks of data are based on a powerful open-source programming language called R.
With more than two million users, R has already become the de facto standard platform for statistical analysis in the academic, scientific and analytic communities.
The adoption of R as the lingua franca of analytic statistics is creating a deep pool of fresh talent. Among students, scientists, programmers, and data managers, R is the accepted standard. In a very real sense, R represents both the present and the future of statistical analytics.
A “Perfect Storm” is Transforming the Industry
A “perfect storm” of events is now pushing R beyond its original core audience and transforming the analytics industry – being driven by several forces:
The first driver is the aforementioned data deluge, and the consensus that the companies who will succeed in the competitive marketplace are those that can most effectively gain insight and predictions from the data they’ve collected through the use of predictive models.
The second driver is the fact that the application of predictive models to data is no longer a “secret art”; in universities and colleges worldwide, a new generation of data analysts has been trained not just in the necessity of data analysis in today’s business, but in the analytic methods that offer competitive advantage. And the training tool of choice for the vast majority of those students is the R language.
Finally, the economic opportunity is unmistakable: the market for data management and analytic technologies currently generates about $100 billion and is growing at a pace of 10 percent annually. The market leaders in data analysis software today are based on decades-old technology unable to meet current demands for analysis of huge data sets within an easy-to-use user interface.
Overcoming Obstacles to Adoption
The two primary obstacles facing many R users today involve capacity and performance.
For example, most R software cannot currently handle the kind of enormous data sets that are generated routinely by large retailers, consumer packaged good marketers, pharmaceutical companies, global finance organizations or national government agencies.
The capacity of R-based solutions is limited by the requirement that all the data has to fit in memory in order to be processed. The algorithms simply won’t scale to accommodate “Big Data,” the phrase that describes exploding data sets that are, in traditional terms, too large to analyze.
This capacity limitation then forces analysts to use smaller samples of data, which can lead to inaccurate or sub-optimal results.
The second issue involves the inability of many R applications to read data quickly from files or other sources. Speed is critical in all areas of modern life, and it seems unreasonable to wait weeks or months for a computer to crunch through larger sets of data.
Although some software packages claim to address these issues, what’s usually missing is an over-arching framework with a top-down approach for analyzing “Big Data” easily and efficiently. Typically, analysts find themselves struggling with a collection of software tools that can create more problems than they solve.
As most CIO and IT executives recognize, open source software development models offer many benefits – and pose many challenges. The benefits include faster development cycles and lower development costs; the challenges include lack of administration, management and support.
For many businesses, especially those operating in complex or highly regulated markets, open source software can be impractical or threatening.
The commercial potential of R, however, has led to a surge of interest in developing enhanced “Enterprise Grade” versions of R software. These newer applications address the key issues that have prevented R from realizing its full potential as a mainstream enterprise technology.
The New Normal
The R revolution is just beginning. As it spreads, it will transform business at every level. The idea of making critical decisions based on hunches or intuition will seem hopelessly antiquated. It will become common practice for business leaders to rely on knowledge generated through rigorous numerical analysis of large data sets. Fact-based decision making will become the norm instead of the exception.
The use of “Big Data” to guide business decisions – at every level of the enterprise – will become practical, affordable and commonplace.
At the same time, more organizations will depend more heavily on data analysis to generate competitive advantages. The intersection of these trends – user-friendly, cost effective analytics and growing reliance on larger data sets to fuel decision-making processes – will have a profound impact on the economy and upon the broader culture.