Hadoop speed up data processing in two ways:
A. Moving Computation Program to Data Node
- Traditional Method: Moving Data to Computation Node
- Hadoop Method: Moving Computation Program to Data Node
The size of data is a lot bigger than the size of program. Instead of transferring data from data node, Hadoop distributes computation program to data node. This saves a lot of network bandwidth and data transferring time.
B. Parallel Processing</h2>
To enable parallel processing,
- Data is split into several fragments, and is distributed to several data nodes.
- Data processing can be run in parallel on each fragment.
- Result will be combined.