2

I would like to estimate the best eps value for the DBSCAN algorithm on this dataset by following this set of rules:

  1. Set a minPts: 10
  2. Compute the reachability distance of the 10-th nearest neighbour for each data-point.
  3. Sort the set of reachability distances and plot to get the elbow of the diagram (best eps value).

This is the first part of my code where I load the dataset:

import csv
import sys
import os
from os.path import join
from sklearn.cluster import DBSCAN
from sklearn.neighbors import NearestNeighbors
import matplotlib.pyplot as plt
import numpy as np

def load_data(file_path, file_name): with open(join(file_path, file_name)) as csv_file: data_file = csv.reader(csv_file,delimiter=',') temp1 = next(data_file) n_samples = int(temp1[0]) print("n_samples=") print(n_samples) n_features = int(temp1[1]) temp2 = next(data_file) feature_names = np.array(temp2[:n_features])

   data_list = [iter for iter in data_file]

   data = np.asarray(data_list, dtype=np.float64)                  

return(data,feature_names,n_samples,n_features)

--- Main program ---

file_path="Datasets/"
file_name3="CURE-complete.csv"
data3,feature_names3,n_samples3,n_features3 = load_data(file_path, file_name3) fig = plt.figure(figsize=(8,8)) ax = fig.add_subplot(111) fig.subplots_adjust(top=1) ax.set_title('Dataset n. 3 of data points') ax.set_xlabel(feature_names3[0]) ax.set_ylabel(feature_names3[1]) plt.plot(data3[:,0], data3[:,1], '.', markersize=1.2, markeredgecolor = 'blue') plt.show()

This is where I compute the KNN-algorithm with ns (minpts) = 10.

ns = 10 #minpts
nbrs = NearestNeighbors(n_neighbors=ns).fit(data3)
distances, indices = nbrs.kneighbors(data3)
distanceDec = sorted(distances[:,ns-1], reverse=True)
plt.plot(list(range(1,len(distanceDec)+1)), distanceDec)

This is the resulting diagram, which seems unusual considering the range of values: enter image description here

How can I improve my algorithm? As you can see in the following page, the plot (in my case it's reversed), should be different.

From what I understood the best eps value should be in the corner above 2.2, right?

0 Answers0