At my office, I am stuck in a weird situation. I am asked to perform a regression algorithm on the data, in which the target variable is continuous having values range between 0.6 to 0.9 with 8 digits of precision after the decimal. Although I know and have applied many linear and non-linear regression algorithms in the past the case here is something different. There is one variable, which, according to my BU, should have a positive and linear correlation with the target variable. But when I ran Pearson's correlation, the variable is negatively correlated and by plotting a scatter plot I can see that the relationship is not linear at all.
What transformations can I perform on the variable so that it can show a positive correlation? I am fairly new to this problem so hoping to get it solved here. Thanks much everyone in advance.
- 4,203
- 14
- 32
- 56
- 188
- 1
- 2
- 9
2 Answers
It sounds like there might be a (i.e., non-linear)complex relationship between the feature and the target. If the business unit (BU) requires a positive and linear relationship between the feature and the target, you can try a segmented regression. There might a range of feature values that show the relationship required by the business.
- 23,131
- 2
- 29
- 113
What's a BU? Business Unit?
Can you tell anything about the variable and target variable? Can you include the scatter plot?
It is still not clear if the problem is not in the domain knowledge/assumptions.
A business example: Profit depends on sales, which is generally assumed to have a positive correlation. But after a threshold marginal sales cost exceeds profits (very expensive to acquire new customers), we can still have a (local) negative correlation. If you only have data in the 'marginal' segment, you can draw incorrect conclusions.
- 1,051
- 6
- 7