Why in the Map Reduce Implementation, why not stream the records to the reducer (via TCP) as they are being produced by the mappers?

Asked May 18 '19 at 08:51

Active May 18 '19 at 08:51

Viewed 40 times

In the Map-Reduce implementation, the reducers start when all mappers finish their jobs. Why don't we stream the records from mappers to reducers while they are being processed instead? I think there are two reasons: (i) if we use streaming method, if a reducer fails, all the mappers have to recompute the map function because the results are lost; (ii) if we want to sort the keys in each partition, streaming will have bigger time complexity than waiting for mappers finish task and sort the key. Are these reasons correct and are there any reasons that they didn't use streaming implementation?

asked May 18 '19 at 08:51

toantruong

Why in the Map Reduce Implementation, why not stream the records to the reducer (via TCP) as they are being produced by the mappers?

0 Answers0