1

The question is strictly related to What is a good way to transform Cyclic Ordinal attributes? and Ways to deal with longitude/latitude feature

They presented a very clear answer about the approach to normalise hour variable and latlong variables.

The addition of this question is about how to approach normalization having both weekday and hour features. I wonder this because they are different granular level of the same information, hence, strictly related.

The objective is to run a k-means algorithm combining the normalised weekday and hour features with other traditional numeric features.

Seymour
  • 163
  • 7

1 Answers1

1

The essential transformation in @AN6U5's answer ist done in the following lines:

df['hourfloat']=df.hour+df.minute/60.0
df['x']=np.sin(2.np.pidf.hourfloat/24.)
df['y']=np.cos(2.np.pidf.hourfloat/24.)

in the first line he transforms minutes into hours by dividing them by 60 so for example 20 Minutes are converted to 0.3333 hours

After that, in line 2 and 3 he converts this float number from polar coordinates to cartesian coordinates (https://en.wikipedia.org/wiki/Polar_coordinate_system)

So chaning this from hour to weekday you just need to adapt the first line.
Imagining a a clock where 00:00 is Monday, followed by Tuesday (clockwise), and so on ... you need to convert hour into weekday (for simplicity I assume weekday has the values 0-7). So first you divide your hour by 24 which transforms it into days and then further you divide it again by 7 which gives you a float number in weeks. Then add your weekday to the hour and then proceed with line 2 and 3 just as given, except that you correct the 24 to 7.

As a formula: hour/(24*7)+weekday = weekfloat I haven't tried it out by myself but I think this should do it.


Alternatively, when you have two cyclic features you could transform weekday and hour into spherical coordinates. This would leave you with three coordinates, x,y and z but also preserves the 'closeness' within one feature itself.

Sebastian
  • 121
  • 4