1

I have found a number of parsers for the automatic extraction of institution names from texts (e.g. this one). My task is in a sense the inverse one: I want to automatically generate reality-like institution names, with a possibility to differentiate them by type (private-held, public, educational, etc.) and by branch.

Is there any algorithm / application / papers to be found? Alternatively, is there any (free access) database with such data?

Alex Konnen
  • 111
  • 2

2 Answers2

1

Conditional on you having data, yes, you can. Check out Generative Adversarial Networks and/or Reinforcement Learning for text generation. This paper is a good starting point: https://openreview.net/forum?id=rJedV3R5tm.

Also, here's a tool that might help you. What you can do is generat these institution names without differentiating by type, and then build another model to classify them.

Guillermo Mosse
  • 335
  • 1
  • 8
1

If you want to build your own dataset, you could look at packages such as:

They both provide features to generate company/institution names based on certain locales as well.

If your goal is to generate training data for a NER task, this should be a good start. If it's to generate company names, this will already cover quite a bit.

Valentin Calomme
  • 6,256
  • 3
  • 23
  • 54