Summary  
This chapter introduces a distributed computing framework optimized for in-memory processing and unified analytics, highlighting its core abstractions: RDD, DataFrame, and Dataset.

General domain of usage  
Large-scale data processing and analytics

It provides a fast, general-purpose engine for big data processing, capable of handling both batch and real-time data processing tasks. 

Spark was developed to overcome the limitations of traditional MapReduce frameworks and offers advanced features like in-memory processing, support for complex analytics, and seamless integration with various data sources.

Spark provides high-level APIs in Java, Scala, Python, and R.

* **In-Memory Computing** - processes data in memory rather than on disk, which accelerates performance for iterative algorithms and interactive queries.


* **Unified Analytics Engine** - supports batch processing, interactive queries, streaming analytics, and machine learning through a single framework.

* **Scalability** - scales horizontally by adding more nodes to a cluster, making it capable of handling petabytes of data.

* **Resilient Distributed Dataset**(**RDD**);
* **DataFrame**;
* **Dataset**.


We will discuss them in a more detailed way soon.

This course will help those who want to get some of Big Data basics, including different types of distributed computings and such programming paradigm as MapReduce. Also, main part of the course will be devoted to such framework as Apache Spark and it's high-level API PySpark using Python programming language.

Why Apache Spark?

What is Apache Spark?

Key Features

Data Structures