Statistical correspondence between two sets

Question

I am working on a problem, where I need to generate device identifiers for several devices. The REAL identifiers are usually unknown and the identifiers are generated based on certain features of the devices.

Let us call the set of the real identifiers X and the set of the generated identifiers Y.

Ideally, there should be a 1-1 correspondence between X and Y. However, due to the limitations of the identifier generation algorithm, there is no 1-1 correspondence between X and Y. Sometimes many elements of X might map into one element of Y, sometimes one element of X may be related to many elements of Y (as shown in the figure above).

Suppose I generate two sets of identifiers: Y1 and Y2. Let us assume that we know the set X (of REAL identifiers) in this case.

Now, the question is about defining an appropriate metric. I want to compute one or more metrics, which will help me in choosing one of the two sets Y1 or Y2.

In other words: Are there known metric(s) in the Statistics/Mathematics literature, which might help us decide which set is a better set of generated identifiers, Y1 or Y2? If not, what might be a logical way of defining such a metric?

Usually, X and Y are sets of strings. We can't compute correlations etc. — PTDS, Aug 22 '16 at 17:38
but what you describe is the measure/computation of correlations, a function of some spacial or temporal parameter ( by example ), surely the link between peculiar elements of X and of Y, the one which allows you to pair them. Or else, sorry if I don't understand — , Aug 22 '16 at 17:54
Your question is not exactly clear, but one possibility is that you are looking for things like Cramér's V, Chuprov's T and Theil's U — Henry, Aug 22 '16 at 18:35
See a recent question of mine https://math.stackexchange.com/q/3173596 involving the so-called "Jaccard index of correspondence" and the references therein. — Jean Marie, Apr 17 '19 at 16:57

Statistical correspondence between two sets

0 Answers0