I am working on a problem, where I need to generate device identifiers for several devices. The REAL identifiers are usually unknown and the identifiers are generated based on certain features of the devices.
Let us call the set of the real identifiers X and the set of the generated identifiers Y.
Ideally, there should be a 1-1 correspondence between X and Y. However, due to the limitations of the identifier generation algorithm, there is no 1-1 correspondence between X and Y. Sometimes many elements of X might map into one element of Y, sometimes one element of X may be related to many elements of Y (as shown in the figure above).
Suppose I generate two sets of identifiers: Y1 and Y2. Let us assume that we know the set X (of REAL identifiers) in this case.
Now, the question is about defining an appropriate metric. I want to compute one or more metrics, which will help me in choosing one of the two sets Y1 or Y2.
In other words: Are there known metric(s) in the Statistics/Mathematics literature, which might help us decide which set is a better set of generated identifiers, Y1 or Y2? If not, what might be a logical way of defining such a metric?

