1

I'm very new to machine learning. I am doing a project for a subject called parallel and distributed computing, in which we have to speed up a heavy computation using parallelism or distributed computing. My idea was to have a dataset divided in equal parts, and for each subset to have a neural network to be trained on a separate machine in the cloud. Once the models are trained, they would be returned back to me and somehow combined into a single model. I am aware of federated learning but it doesn't quite fit my scenario of actually sending and dividing the dataset into the cloud. Does someone know any feasible approaches (maybe a variant of federated learning) of how one would do this?

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113
ptushev
  • 21
  • 1
  • 6

1 Answers1

0

There are many ways to parallelism machine learning. It is often better to distribute the model parameters, not the data.

Training models only a subset of data will result in worse parameter estimates than training a model on random samples of the data.

Additionally, moving data around is more expensive than moving parameters.

Brian Spiering
  • 23,131
  • 2
  • 29
  • 113