In the Map-Reduce implementation, the reducers start when all mappers finish their jobs. Why don't we stream the records from mappers to reducers while they are being processed instead? I think there are two reasons: (i) if we use streaming method, if a reducer fails, all the mappers have to recompute the map function because the results are lost; (ii) if we want to sort the keys in each partition, streaming will have bigger time complexity than waiting for mappers finish task and sort the key. Are these reasons correct and are there any reasons that they didn't use streaming implementation?
Asked
Active
Viewed 40 times