Questions tagged [r-squared]

For questions regarding R-squared ($R^2$), a statistical measure that represents the proportion of the variance for a dependent variable that's explained by an independent variable or variables in a regression model.

R-Squared ($R^2$ or the coefficient of determination) is a statistical measure in a regression model that determines the proportion of variance in the dependent variable that can be explained by the independent variable. In other words, r-squared shows how well the data fit the regression model (the goodness of fit).

The formula for calculating R-squared is:

$R-Squared(R^2) = \frac{SS_{regression}}{SS_{total}}$

Where:

  • $SS_{regression}$ is the sum of squares due to regression (explained sum of squares)
  • $SS_{total}$ is the total sum of squares
18 questions
3
votes
3 answers

Is R2 score a reasonable regression measure on huge datasets?

I'm running a regression model on a pretty large data set and getting a fairly woeful $R^2$ score of ~0.2 (see plot below), despite the plot looking like the model is generally pointing in the right direction. My question is, when you have over a…
jshep
  • 403
  • 2
  • 6
3
votes
0 answers

Multi-dimensional Euclidian R^2 squared - reasonable?

I have a high-dimensional space, say $\mathbb{R}^{1000}$, and I have samples $y_1, \ldots , y_n \in \mathbb{R}^{1000}$ and $\hat{y}_1, \ldots , \hat{y}_n \in \mathbb{R}^{1000}$. Would $$ R^2 = 1 - \frac{\sum_i || y_i - \hat{y}_i||^2}{\sum_i || y_i -…
2
votes
1 answer

Why would the result change so much for a linear regression with or without a constant?

I was running a Linear Regression with Wooldridge dataset named GPA2, which is found on Python library named wooldridge. I tried two linear regressions. The first: results = smf.ols('colgpa ~ hsperc + sat', data=gpa).fit() And the second results =…
dsbr__0
  • 191
  • 1
  • 5
2
votes
2 answers

SKlearn PolynomialFeatures R^2 score

I'm trying to create a linear regression model with use of PolynomialFeatures. But when I evaluate it, I get really strange scores. I know that R^2 can be applied to this model and I think I've trying everything. I'd really apricate a good advice.…
kosekk_g
  • 23
  • 5
1
vote
1 answer

Does statsmodels compute R2 and other metrics on a validation-/test- set?

Does statsmodels compute R2 and other metrics on a validation set? I am using the OLS from the statsmodels.api when printing summary, an r2 and r2_asjusted are presented. I did not trust those 0.88 and computed an own adjusted R2 with scikit-learn…
1
vote
1 answer

When can a low r-squared generate a good predictive models?

Most discussions on model prediction says that you should focus on error metrics, like RMSE, MSE, MAE or MAPE. Some even argue that r-squared can be low in a good model. However, I can't think of a model that would have a low r-squared and "good"…
1
vote
1 answer

Why I'm getting a negative R2 score with Random Forest Regressor?

I'm trying to predict some variables for MOF's (from a scientific paper) using the Random Forest model in Phyton, but the value of R2 is negative (different from the paper, which was positive). I actually don't know if the problem is with my dataset…
1
vote
1 answer

Calculate RMSE based on R squared and vice versa

If for example I have the value of RMSE can I calculate the $R^2$? And vice versa if I have the value of $R^2$ can I calculate the value of RMSE? I have all predictions, dataset, training set, and test set.
1
vote
1 answer

Why does an unimportant feature has a big impact on R2 in XGBoost?

I am training an XGBoost model, xgbr, using xgb.XGBRegressor() with 13 features and one numeric target. The R2 on the test set is 0.935, which is good. I am checking the feature importance by for col,score in…
volkan g
  • 121
  • 2
1
vote
0 answers

Does the appliance of R-squared to non-linear models depends on how we calculate it?

Does the appliance of R-squared to non-linear models depends on how we calculate it? $R^2 = \frac{SS_{exp}}{SS_{tot}}$ is going to be an inadequate measure for non-linear models since an increase of $SS_{exp}$ doesn't necessarily mean that the…
1
vote
3 answers

Determining which model result is better

I am trying to determine which model result is better. Both results are trying to achieve the same objective, the only difference is the exact data that is being used. I used random forest, xgboost, and elastic net for regression. Here is one of the…
justanewb
  • 83
  • 6
0
votes
2 answers

Can someone explain the solution to the following problem?

Q) We want to learn a function f(x) of the form f(x) = ax + b which is parameterized by (a, b). Using squared error as the loss function, which of the following parameters would you use to model this function to get a solution with the minimum…
0
votes
0 answers

R2 is good but not rmse , mse , mae

In my models, R2 in training and test sets are close to each other, but in RMSE, MSE, MAE of some models, these are very different? what is the reason Is there a solution?
0
votes
1 answer

Stock Price Prediction Using Random Forests (R-squared problem)

#===========================Importing packages================================= import yfinance as yf import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from sklearn.ensemble import…
Kemit4
  • 103
  • 2
0
votes
0 answers

What is reliability? How it is related to correlation coefficient?

We can compute real or population correlation(rho) by square-root of 1 minus R-squared.Is this a correct interpretation? Does population correlation mean a real correlation measured as square-root of 1 - R_squared? Here, Reliability seems to reflect…
Subhash C. Davar
  • 661
  • 5
  • 20
1
2