1

Let's say I am working on handwritten digit recognition (0 to 9). I know for instance that if I use clustering then I need to look for 10 clusters. But once I have the 10 clusters,how do I identify automatically which cluster corresponds to which digit? In general, say I have some other classification task where I don't have the labels for the training data, but I know that there are, lets say, two classes. I want to build a model which can then tell me, for the test instances, which classes they belong to. But again the problem that even if I can group the train cases together, how do I know the actual label of the test instance in this unsupervised setting?

1 Answers1

1

You can't, in a completely unsupervised setting. You need something more: e.g., a training data set where you have labels for at least some of the instances, or some other information that allows you to set a label for each cluster.

How is the computer supposed to know that a circle means zero as opposed to some other number, if you don't tell it that? Obviously, it can't. In fact, in the Eastern Arabic numbering system, there's a zero-looking character that actually means the number 5, not the number 0. Obviously, a computer can't possibly know which of those two meanings is intended, if you don't tell it somehow.

D.W.
  • 167,959
  • 22
  • 232
  • 500