Summary  
This chapter introduces the concept of Big Data by defining its five core characteristics—volume, variety, velocity, veracity, and value—that guide the handling and processing of very large datasets.

General domain of usage  
Big Data analytics

To begin with, we'll focus on the most important aspects of Big Data.

# What is Big Data?

First of all, let's clarify what we're working with.

# 5 Vs

When discussing **the properties of Big Data**, we should mention the concept of the **"5 Vs"**, which highlights its **key characteristics**: 
* **Volume**; 
* **Variety**; 
* **Velocity**; 
* **Veracity**; 
* **Value**.

Now, let's explore each of them in detail.


It measures **the size of a dataset**, which can range from terabytes to petabytes and beyond.

The volume of data generated is **influenced by several factors**, including the proliferation of digital technologies, the increasing number of data-generating devices, and large-scale transactions.

In practical terms, volume is **a fundamental aspect of big data**; if the volume of data is large enough, it qualifies as big data.

It refers to the different types and formats of data.

In practice, data can be **structured** (e.g., tabular data), **semi-structured** (e.g., JSON, XML), or **unstructured** (e.g., text, images, videos, audio, etc.).

**Higher variety** of data leads to **higher complexity**, as it requires managing and storing multiple data types.

In the context of Big Data, this encompasses **real-time** or **near-real-time** data streams, such as data from social media platforms, financial transactions, and sensors.

It involves **addressing uncertainties** and **inconsistencies** in data, such as missing values, errors, or biases.

It's not only about having large volumes of data; it's about **extracting meaningful information** that can inform business decisions and drive innovations.

The objective is **to transform raw data into valuable insights** that can influence strategy and operations.

Which of the following "Vs" refers to data quality?

This course will help those who want to get some of Big Data basics, including different types of distributed computings and such programming paradigm as MapReduce. Also, main part of the course will be devoted to such framework as Apache Spark and it's high-level API PySpark using Python programming language.

Why Big Data?

What is Big Data?

5 Vs

Volume

Variety

Velocity

Veracity

Value