Apache Hadoop Basics
What is MapReduce?
It was popularized by Google and has been widely adopted in various big data processing frameworks, most notably in Apache Hadoop.
Structure of MapReduce
MapReduce consist of 3 phases:
- Map Phase - involves dividing the input data into smaller chunks and processing each chunk independently. Each chunk is processed by a "mapper" function that applies a user-defined operation to generate intermediate key-value pairs.
- Shuffle and Sort Phase - after the map phase, intermediate key-value pairs are shuffled and sorted to group values by keys. This phase prepares the data for the reduce phase by organizing it so that all values for the same key are grouped together.
- Reduce Phase - involves processing the grouped key-value pairs produced by the shuffle and sort phase. Each reducer applies a user-defined reduce function to aggregate, summarize, or otherwise process the data for each key.
Var allt tydligt?
Tack för dina kommentarer!
Avsnitt 1. Kapitel 5
Fråga AI
Fråga AI
Fråga vad du vill eller prova någon av de föreslagna frågorna för att starta vårt samtal
Awesome!
Completion rate improved to 7.14
Apache Hadoop Basics
Svep för att visa menyn
What is MapReduce?
It was popularized by Google and has been widely adopted in various big data processing frameworks, most notably in Apache Hadoop.
Structure of MapReduce
MapReduce consist of 3 phases:
- Map Phase - involves dividing the input data into smaller chunks and processing each chunk independently. Each chunk is processed by a "mapper" function that applies a user-defined operation to generate intermediate key-value pairs.
- Shuffle and Sort Phase - after the map phase, intermediate key-value pairs are shuffled and sorted to group values by keys. This phase prepares the data for the reduce phase by organizing it so that all values for the same key are grouped together.
- Reduce Phase - involves processing the grouped key-value pairs produced by the shuffle and sort phase. Each reducer applies a user-defined reduce function to aggregate, summarize, or otherwise process the data for each key.
Var allt tydligt?
Tack för dina kommentarer!
Avsnitt 1. Kapitel 5