2

Let's say I have dataset with inputs and expected outputs like this:

[
  {
    "input": "http://localhost/wordpress/wp-includes/blocks/navigation/view.min.js?ver=6.5.3",
    "output": ["WordPress 6.5.3"]
  },
  {
    "input": "<meta content=\"max-image-preview:large\" name=\"robots\"/>",
    "output": []
  },
  {
    "input": "https://cdnjs.cloudflare.com/ajax/libs/jquery/3.7.1/jquery.min.js?ver=3.7.1",
    "output": ["jQuery 3.7.1"]
  },
  {
    "input": "Server: Apache/2.4.56 (Win64) OpenSSL/1.1.1t PHP/8.2.4",
    "output": ["Apache 2.4.56", "OpenSSL 1.1.1t", "PHP 8.2.4"]
  },
  { "input": "X-Powered-By: PHP/7.4", "output": ["PHP 7.4"] },
  ...
]

I would like to create a program that extracts/guesses which technologies (ideally with version) are in a given input.

I read something about multi-label classification, named entity recognition and also about fine tuning some LLM. I'm still learning, not sure how best to solve this problem. Thanks for advice!

Cube64
  • 21
  • 1

1 Answers1

2

Noob question - which NLP/deep learning technique shoud I use

String matching + regex.

Franck Dernoncourt
  • 5,862
  • 12
  • 44
  • 80