Big Data represents the large and complex datasets that are beyond the processing capacity of traditional data processing software. It is recognized by having large amounts of data, velocity (the speed at which data is generated and processed), and variety (the many forms that data can take, including text, images, and audio).
Big data analytics is the process of extracting meaningful insights, and VALUE from data.
Every day a lot of data is produced. As Eric Schmidt, Executive Chairman at Google, stated “There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days.”
With this much data, analytics ensures there is proper exploitation of the proliferating volumes of data for a variety of business purposes, involving not only the production of simple data-driven insights on operations, but also the prediction of future trends and events.
Below are the main types of analytics:
Descriptive analytics: Focused on reporting what happened in the past. For example, financial reporting on month-over-month sales growth is a product of descriptive analytics.
Predictive analytics: It exploits the value of past data to try and predict future events. For example, a bank using predictive analytics can determine customers who are likely to subscribe to long-term savings.
Diagnostics analytics: Through diagnostics analytics, companies are able to save on errors happening again in the future, since it helps companies understand why a problem occurred.
Prescriptive analytics - Prescriptive analytics provides a solution to a problem, relying on AI and machine learning to gather data and use it for risk management.
Delen and Demirkan [DEL 13] noted that big data adds the ability to perform another type of analytics, called perspective analytics, which combines data from the two previous types and uses real-time external data to recommend an action that must be taken within a certain time to achieve a desired outcome.
Big data has been harnessed largely due to the development of tools and technologies such as Apache Spark (provide a link to Apache spark glossary), Hadoop, and Distributed storage e.g Cassandra, Apache Kafka and many more.