Data Science is now the buzz word across the world. It has gained prominence across multiple industries and not just the information technology arena. However, as we all know data and analysis have been in existence ever since the cradle of human evolution. Data, in simpler words, is a raw piece of input. When mixed with relevant inputs it becomes meaningful and transforms into defining INFORMATION.
The cycle of receiving data, structuring it, transforming it into readable information and analysing the same to look at patterns, possibilities and patterns are called Data Analysis (or) Analytics.
Many scientists, theorists, industry specialists have over these year designed methodologies, tools, data sets to make our life easier with respect to analysis. Most of these began during the renaissance era in the early 1600’s.
These methodologies, practices and founding principles began as a simple branch of mathematics and evolved into an independent science discipline referred to as STATISTICS.
Data Science, as opposed to statistics, is the knowledge of software architecture and multiple programming languages.
Data Science involves defining the problem, identifying the key sources of information, and designing the framework for collecting and screening the needed data.
A software is typically responsible for collecting, processing, and modelling the data, which uses the principles of Data Science, and related sub-fields and practices encompassed within Data Science, to gain deeper insight into the data assets under review. Hence, STATISTICS and the internal principles form the fundamentals of Data Analytics or Data Science.
However, over the last decade or so we have seen the large-scale and rapid growth in industrialisation, information technology, and communication. This has exploded the need for improvising and changing the data collection, analysis and summary.
The birth of Data Science can be traced to early 1960’s and late 1970’s where noted mathematician John Tukey and computer science pioneer Peter Naur established the relation of data analysis with statistics and computers. Noted development took place in 1994, when BusinessWeek ran an article, Database Marketing, stating how certain news companies have started gathering huge amounts of customer information to begin targeted marketing campaigns. With the advent of the world wide web (www) coupled with massive improvement in the telecom industry communication has increased manifold.
In 2002, with increased application of Data Science across various industries, the International Council for Science: Committee on Data Science and Technology began printing the Data Science Journal that emphasised issues such as the description of data systems, their publication on the internet, applications and legal issues.
Thus, began the era of simple databases, machines that could help collecting data through manual inputs. Further improvements led to the modern-day Spreadsheets like Lotus 1-2-3, Microsoft Excel and also towards the development of advanced databases like Microsoft SQL Server and Oracle to name a few.
Data today is received via infinite sources with high volumes, variety and velocity. This has provided the need to do the processing cycle faster and with much more accuracy. Modern day technologies like BI, ERP have come into existence supported by precise analytical tools like SAS, R or programming tools like Python. The large-scale requirement for storage led to the emergence of virtual computing and CLOUD.
In the coming articles, you will learn about the application, scope and career prospects of Data Science. Stay tuned!