I have data of a metric grouped date wise. I have plotted the data, now, how do I remove the values outside the range of the boxplot (outliers)?
All the ['AVG'] data is in a single column, I need it for time series modelling.
I have data of a metric grouped date wise. I have plotted the data, now, how do I remove the values outside the range of the boxplot (outliers)?
All the ['AVG'] data is in a single column, I need it for time series modelling.
Seaborn uses inter-quartile range to detect the outliers. What you need to do is to reproduce the same function in the column you want to drop the outliers. It's quite easy to do in Pandas.
If we assume that your dataframe is called df and the column you want to filter based AVG, then
Q1 = df['AVG'].quantile(0.25)
Q3 = df['AVG'].quantile(0.75)
IQR = Q3 - Q1 #IQR is interquartile range.
filter = (df['AVG'] >= Q1 - 1.5 * IQR) & (df['AVG'] <= Q3 + 1.5 *IQR)
df.loc[filter]
If you need to remove outliers and you need it to work with grouped data, without extra complications, just add showfliers argument as False in the function call. It's inherited from matplotlib.
df.boxplot(..., showfliers=False)