Which algorithm to apply for choosing the right point

Question

Picture below shows 7 points around the origin. One of them has been selected by a human based on rules and experience and is colored red (the one in the lower left quadrant).

Now we have over 1000 of these sets of points and for each set a human has selected a single point. These conditions apply to all sets:

Each set has about 3 - 10 points
There are no outliers
Points can have positive and negative values
No mistakes were made when selecting a point

My question is: Does there exists a machine learning algorithm to learn from these sets and human made selections so it can automatically decide which point to select when a new set of points is given? This new set satisfies the first 3 conditions from above of course.

2 final remarks:

The example I gave is just a randomly constructed example by me to support the idea about points in a plane around the origin together with a selected one. In real life there might be more structure but for now I am curious and would like to know what is possible for this case.
Would variations be possible? Say it is about 2 selected points or you have circles with a given radius instead of points.

score 6 · Answer 1 · answered Feb 08 '18 at 14:54

This is a fascinating problem! Two things make it especially challenging:

How should we compare two point sets? Classical problems in Machine Learning have a fixed number of attributes, and these attributes are not interchangeable: For example, I might have data on different persons with attributes age and height (in centimeters). Every sample has one entry for each, and of course (age, height) = (22, 180) is not the same as (age, height) = (180, 22). Neither is true in your problem. A point set has between 3 and 10 points, and the order in which we enter the points should not make a difference when comparing two point sets.
How do we make a prediction? Say we have found a way to pick point sets from our training set that are similar to your point set above. We face the problem that our prediction must be one of the 7 points in your picture; but none of these points might be contained in the similar point sets.

Let me outline an algorithm that deals with both challenges. The prediction accuracy is not very good; but maybe you see a way how it can be improved. And at least it predicts something, right?

1. Simulating samples

To be able to test the algorithm, I wrote functions that generate samples and labels.

Generating samples: Each sample contains between 3 and 10 points. The number of points is random, drawn from a uniform distribution. Each point is of the form (x_coordinate, y_coordinate). The coordinates are again random, drawn from a normal distribution.

import numpy as np
from random import randint

def create_samples(number_samples, min_points, max_points):

    def create_single_sample(min_points, max_points):
        n = randint(min_points, max_points)
        return np.array([np.random.normal(size=2) for _ in range(n)]) 

    return np.array([create_single_sample(min_points, max_points) for _ in range(number_samples)])

Generating labels: As a toy example, let us assume that the rule for choosing a point is: Always pick the point that is closest to (0, 0), where 'closest' should be understood in terms of the Euclidean norm.

def decision_function_minnorm(sample):
    norms = np.apply_along_axis(np.linalg.norm, axis=1, arr=sample)
    return sample[norms.argmin()]

def create_labels(samples, decision_function):
    return np.array([decision_function(sample) for sample in samples])

We can now create our train and test sets:

n_train, n_test = 1000, 100
dec_fun = decision_function_minnorm

X_train = create_samples(number_samples=n_train, min_points=3, max_points=10)
X_test = create_samples(number_samples=n_test, min_points=3, max_points=10)
y_train = create_labels(X_train, dec_fun)
y_test = create_labels(X_test, dec_fun)

2. Comparing point sets via Hausdorff distance

Let us tackle the first problem: How should we compare different point sets? The number of points in the point sets is different. Also remember that the order in which we write down the points should not matter: Comparing to the point set [(0,0), (1,1), (2,2)] should yield the same result as comparing to the point set [(2,2), (0,0), (1,1)]. My approach is to compare point sets via their Hausdorff distance:

def hausdorff(A, B):

    def dist_point_to_set(x, A):
        return min(np.linalg.norm(x - a) for a in A)

    def dist_set_to_set(A, B):
        return max(dist_point_set(a, B) for a in A)

    return max(dist_set_to_set(A, B), dist_set_to_set(B, A))

3. Predicting via k-nearest neighbors and averaging

We now have a notion of distance between point sets. This makes it possible to use k-nearest neighbors classification: Given a test point set, we find the k point sets in our training sample that have the smallest Hausdorff distance relative to the test point set, and obtain their labels. Now comes the second problem: How do we turn these k labels into a prediction for the test point set? I took the simplest approach: average the labels and predict the point in the test point set that is closest to the average.

def predict(x, num_neighbors):
    # Find num_neighbors closest points in X_train.
    distances_to_train = np.array([hausdorff(x, x_train) for x_train in X_train])
    neighbors_idx = np.argpartition(distances_to_train, -num_neighbors)[-num_neighbors:]

    # Get labels of the neighbors and calculate the average.
    targets_neighbors = y_train[neighbors_idx]
    targets_mean = sum(targets_neighbors) / num_neighbors

    # Find point in x that is closest to targets_mean and use it as prediction.
    distances_to_mean = np.array([np.linalg.norm(p - targets_mean) for p in x])
    closest_point = x[distances_to_mean.argmin()]

    return closest_point

4. Testing

Everything is in place to test the performance of our algorithm.

num_neighbors = 70
successes = 0
for i, x in enumerate(X_test):
    print('%d/%d' % (i+1, n_test))
    prediction = predict(x, num_neighbors)
    successes += np.array_equal(prediction, y_test[i])

For the given decision function and num_neighbors = 70, we get a prediction accuracy of 84%. This is not terribly good, and it is of course specific to our decision function, which seems fairly easy to predict.

To see this, define a different decision function:

decision_function_maxaverage(sample):
    avgs = (sample[:, 0] + sample[:, 1]) / 2
    return sample[norms.argmin()]

Using this function via dec_fun = decision_function_maxaverage brings down prediction accuracy to 45%. This shows how important it is to think about the decision rules that generate your labels. If you have an idea why people choose certain points, this will help you find the best algorithm.

Some ways to improve this algorithm: (1) Use a different distance function instead of Hausdorff distance, (2) use something more sophisticated than k-nearest neighbors, (3) improve how the selected training labels are turned into a prediction.

Imran · Answer 2 · 2018-02-12T11:42:27.477

Here are a few ways you might use neural networks to solve this problem:

With a plain Feedforward Neural Network:

Scale your data to fit in the square around the origin from (-1,-1) to (1,1)
Represent each point with two inputs corresponding to its x and y coordinates, or 0,0 if the $k$th point is not present
Add a third indicator input for each point, indicating whether that point is present
Choose the number and size of hidden layers
Use a softmax layer of size 10 at the output

So every input example will be a vector of length 30, where the last 3*(10-$k$) values are zero when there are $k$ points present in the set, and the output is a vector of length 10 summing to 1, whether the largest value corresponds to the predicted point (whose position is corresponds to that position in the input).

With a Convolutional Neural Network:

Divide your plane into grid of $n$ x $n$ squares, and represent your input as an $n$ x $n$ matrix that is $k$ if there are $k$ points in the square ($i,j$) and $0$ otherwise. Hopefully points won't overlap, so you have a matrix of $1$s and $0$s.
Train a CNN on your input matrices. Your output shape should be a softmax of size $n*n$, which corresponds to the input shape flattened. Choose the point with the highest value at the corresponding output coordinates.

The CNN might perform better since your data is inherently spatial. However you have to decide what to do if two or more points overlap. The simplest solution is to pick one randomly, which might be OK depending on your specific task.

With a Recurrent Neural Network:

Feed in variable length sequences of scaled (x,y) points and output a size 10 softmax estimation

Yes, it's as easy as that with RNNs! They handle variable length inputs well, but they still lack the advantages of CNNs for handling spatial data.

Caveats:

If using a FNN or an RNN, there is also the matter of how you order your input data. If there is no inherent order in your real data, then we don't want our network making different predictions for the same data encoded in different orders. One way to handle this is with data augmentation: duplicate each training example a few times with different input orderings, so hopefully your network can learn the appropriate symmetries.

If you only have time to try one approach, I would choose the CNN. CNNs are designed to do well with spatial data, and there is no problem with input orderings.

Which algorithm to apply for choosing the right point

2 Answers2