11

I have ~7 million rows of customer data (~500 sparse attributes)

A million out of them have opted in to a new service.

How do I use this signal to predict which of the remaining customers are likely to adopt the service? And how do I measure the effectiveness?

Problems face so far -

  1. Unable to treat this as a supervised problem due to lack of definitely negative variable
  2. Unable to apply label propagation because there is only one class

Apart from treating this as an anomaly detection problem (oneclasssvm etc.), I also tried using nearest neighbors based approach.

Looking for other ways to solve the problem if there are some go-to techniques that I am missing.

I know there is an answer here but it only talks about oneclasssvm that I have already tried. Also trying to find ways to measure model effectiveness along with any novel ways to solve.

1 Answers1

11

The topic you are interest in is called "PU learning" or "positive and unlabeled learning".

You can start by having a look into survey literature.

Graph4Me Consultant
  • 1,064
  • 7
  • 15