1
  • I have data on many cars over time (a few years per car)

  • I am planning on creating a model for all the cars combined (not one model per car).

  • Do I want to scale the data (Normalize / Standardize) per car or for all the cars combined?

e.g.

enter image description here

If my data looks like this, with multiple observations at the same time for different cars, do I want to scale engine temperature for all cars or per car?

The cars do have similar characteristics but they will have slightly different mean/max.

I worry that scaling per car means that the same number per car, e.g. 60 degrees, will have a different value? This might suggest it's best to scale all the cars together.

But on the other hand I worry the stable states (when the car is on but not moving, when the car is at maximum speed etc) vary by car so it would be best to scale individually so that each car is comparable?

Any ideas?

Joshua
  • 135
  • 4

1 Answers1

1

Since you are building a model using all of the cars, the usual practice would be to scale based on all of the training data. One approach is to find the max/min of each feature across the training set, and use those measures to normalise the training set.

But on the other hand I worry the stable states (when the car is on but not moving, when the car is at maximum speed etc) vary by car so it would be best to scale individually so that each car is comparable?

Normalising using all of the training data maintains comparability between cars. In other words, you can still compare between cars, since they have all been scaled by the same amount.

Since the max/min used for scaling are based on the entire training set, they represent a group statistic that you can use to scale all members of the group.