Notice: This page requires JavaScript to function properly.
Please enable JavaScript in your browser settings or update your browser.
Why Big Data? | Big Data Basics
Introduction to Big Data with Apache Spark in Python
course content

Contenido del Curso

Introduction to Big Data with Apache Spark in Python

Introduction to Big Data with Apache Spark in Python

1. Big Data Basics
2. Spark Basics
3. Spark SQL

bookWhy Big Data?

To begin with, we'll focus on the most important aspects of Big Data.

What is Big Data?

First of all, let's clarify what we're working with.

5 Vs

When discussing the properties of Big Data, we should mention the concept of the "5 Vs", which highlights its key characteristics:

  • Volume;
  • Variety;
  • Velocity;
  • Veracity;
  • Value.

Now, let's explore each of them in detail.

Volume

It measures the size of a dataset, which can range from terabytes to petabytes and beyond.

The volume of data generated is influenced by several factors, including the proliferation of digital technologies, the increasing number of data-generating devices, and large-scale transactions.

In practical terms, volume is a fundamental aspect of big data; if the volume of data is large enough, it qualifies as big data.

Variety

It refers to the different types and formats of data.

In practice, data can be structured (e.g., tabular data), semi-structured (e.g., JSON, XML), or unstructured (e.g., text, images, videos, audio, etc.).

Higher variety of data leads to higher complexity, as it requires managing and storing multiple data types.

Velocity

In the context of Big Data, this encompasses real-time or near-real-time data streams, such as data from social media platforms, financial transactions, and sensors.

Veracity

It involves addressing uncertainties and inconsistencies in data, such as missing values, errors, or biases.

Value

It's not only about having large volumes of data; it's about extracting meaningful information that can inform business decisions and drive innovations.

The objective is to transform raw data into valuable insights that can influence strategy and operations.

Which of the following "Vs" refers to data quality?

Which of the following "Vs" refers to data quality?

Selecciona la respuesta correcta

¿Todo estuvo claro?

¿Cómo podemos mejorarlo?

¡Gracias por tus comentarios!

Sección 1. Capítulo 2
some-alt