Hadoop speed up data processing in two ways:

A. Moving Computation Program to Data Node

  1. Traditional Method: Moving Data to Computation Node
traditional-data-transfer

Traditional Method: Moving Data to Computation Node

  1. Hadoop Method: Moving Computation Program to Data Node
hadoop-program-transfer

Hadoop Method: Moving Computation Program to Data Node

</ol>

The size of data is a lot bigger than the size of program. Instead of transferring data from data node, Hadoop distributes computation program to data node. This saves a lot of network bandwidth and data transferring time.

B. Parallel Processing</h2>

To enable parallel processing,

  1. Data is split into several fragments, and is distributed to several data nodes.
  2. Data processing can be run in parallel on each fragment.
  3. Result will be combined.

parallel-computation