1

I am working on a model to build questions automatically from some text

My model will analyse provided article and ask authors questions that can help improving their articles

How can we measure the accuracy of these ML-generated questions?

There is the relevance part of the questions as these questions represent an area of improvement in the article

How to measure that?

Any previous work on similar models would be a great help too

Thanks

asmgx
  • 549
  • 2
  • 18

1 Answers1

2

You can check the Question Generation section of paperswithcode. There, you can see for different datasets how the performance is measured and how different proposed approaches compare on them.

Usually, you check how similar is the question to the reference text. Some used measures are BLEU-1 (based on matching unigrams) and ROUGE-L (based on the longest common subsequence). This is "unsupervised testing" in the sense that you don't need labeled data. However, they may not be directly correlated with their actual quality (see Towards a better metric for evaluating question generation systems and Addressing Semantic Drift in Question Generation for Semi-Supervised Question Answering).

In other cases, they use QA-based Evaluation (QAE), which measures how similar the generated QA pairs are compared to some ground truth QA pairs. For this, you need a reference labeled QA dataset on which the model is to be evaluated.

noe
  • 28,203
  • 1
  • 49
  • 83