Predict actual result after model trained with MinMaxScaler LinearRegression

Question

I was doing the modeling on the House Pricing dataset. My target is to get the mse result and predict with the input variable

I have done the modeling, I'm doing the modeling with scaling the data using MinMaxSclaer(), and the model is trained with LinearRegression(). After this I got the score, mse, mae, dan rmse result.

But when I want to predict it with the actual result. It got scaled, how to predict the after result with the actual price?

Dataset: https://www.kaggle.com/code/bsivavenu/house-price-calculation-methods-for-beginners/data

This is my script:

import pandas as pd
import numpy as np
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, mean_absolute_error
train = pd.read_csv('train.csv')
column = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']
train = train[column]
Convert Feature/Column with Scaler
scaler = MinMaxScaler()
train[column] = scaler.fit_transform(train[column])
X = train.drop('SalePrice', axis=1)
y = train['SalePrice']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15)
Calling LinearRegression
model = LinearRegression()
Fit linearregression into training data
model = model.fit(X_train, y_train)
y_pred = model.predict(X_test)
Calculate MSE (Lower better)
mse = mean_squared_error(y_test, y_pred)
print("MSE of testing set:", mse)
Calculate MAE
mae = mean_absolute_error(y_test, y_pred)
print("MAE of testing set:", mae)
Calculate RMSE (Lower better)
rmse = np.sqrt(mse)
print("RMSE of testing set:", rmse)
Predict the Price House by input:
overal_qual = 6
grlivarea = 1217
garage_cars = 1
totalbsmtsf = 626
fullbath = 1
year_built = 1980
predicted_price = model.predict([[overal_qual, grlivarea, garage_cars, totalbsmtsf, fullbath, year_built]])
print("Predicted price:", predicted_price)

The result:

MSE of testing set: 0.0022340806066149734
MAE of testing set: 0.0334447655149599
RMSE of testing set: 0.04726606189027147
Predicted price: [811.51843959]

Where the price is should be for example 208500, 181500, or 121600 with grands value in $.

What step I missed here?

score 3 · Accepted Answer · answered Oct 01 '22 at 16:17

First, you can't use anything from the test set before training. This means that the scaling should be done using only the test set, otherwise there's a risk of data leakage.
Then remember that scaling your features means that the model learns to predict with scaled features, therefore the test set should be passed after it has been scaled as well (using the same scaling as the training set, of course).
Finally you could obtain the real price value by "unscaling" with inverse_transform. But instead I decided not to scale the target variable in the code below because it's not needed (except if you really want to obtain evaluation scores scaled). It's also simpler ;)

full = pd.read_csv('train.csv')
column = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']
full = full[column]
X = train.drop('SalePrice', axis=1)
y = train['SalePrice']
always split between training and test set first
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=15)
Then fit the scaling on the training set
Convert Feature/Column with Scaler
scaler = MinMaxScaler()
Note: the columns have already been selected
X_train_scaled = scaler.fit_transform(X_train)
Calling LinearRegression
model = LinearRegression()
Fit linearregression into training data
model = model.fit(X_train_scaled, y_train)
Now we need to scale the test set features
X_test_scaled = scaler.transform(X_test)
y_pred = model.predict(X_test_scaled)
y has not been scaled so nothing else to do
Calculate MSE (Lower better)
mse = mean_squared_error(y_test, y_pred)
print("MSE of testing set:", mse)
Calculate MAE
mae = mean_absolute_error(y_test, y_pred)
print("MAE of testing set:", mae)
Calculate RMSE (Lower better)
rmse = np.sqrt(mse)
print("RMSE of testing set:", rmse)
... evaluation etc.
```

Predict actual result after model trained with MinMaxScaler LinearRegression

Convert Feature/Column with Scaler

Calling LinearRegression

Fit linearregression into training data

Calculate MSE (Lower better)

Calculate MAE

Calculate RMSE (Lower better)

Predict the Price House by input:

1 Answers1

always split between training and test set first

Then fit the scaling on the training set

Convert Feature/Column with Scaler

Note: the columns have already been selected

Calling LinearRegression

Fit linearregression into training data

Now we need to scale the test set features

y has not been scaled so nothing else to do

Calculate MSE (Lower better)

Calculate MAE

Calculate RMSE (Lower better)

... evaluation etc.