12

Given two sets $A,B$ of strings over alphabet $\Sigma$, can we compute the smallest deterministic finite-state automaton (DFA) $M$ such that $A \subseteq L(M)$ and $L(M) \subseteq \Sigma^*\setminus B$?

In other words, $A$ represents a set of positive examples. Every string in $A$ needs to be accepted by the DFA. $B$ represents a set of negative examples. No string in $B$ should be accepted by the DFA.

Is there a way to solve this, perhaps using DFA minimization techniques? I could imagine creating a DFA-like automaton that has three kinds of states: accept states, reject states, and "don't-care" states (any input that ends in a "don't-care" state can be either accepted or rejected). But can we then find a way to minimize this to an ordinary DFA?

You could think of this as the problem of learning a DFA, given positive and negative examples.

This is inspired by Is regex golf NP-Complete?, which asks a similar questions for regexps instead of DFAs.

D.W.
  • 167,959
  • 22
  • 232
  • 500

2 Answers2

8

There is a lot of literature on learning DFAs given positive and negative samples. If $A$ and $B$ are finite I don't see how the problem would ever be undecidable though. If $A \cap B = \emptyset$ then obviously the DFA that accepts only the strings in $A$ satisfies your requirement and one can simply enumerate all smaller DFAs. If $A \cap B \neq \emptyset$ then clearly no such DFA exists.

Finding the minimum DFA consistent with a given set of strings is NP-complete. This result appears as Theorem 1 in Angluin's paper On the complexity of minimum inference of regular sets. So clearly your problem is also NP-complete.

For lots of good links and discussion on learning regular languages check out the CSTheory blogpost On Learning Regular Languages.

alto
  • 1,528
  • 11
  • 12
7

A DFA as you describe is called a separating DFA. There is some literature on this problem when $A$ and $B$ are regular languages, such as Learning Minimal Separating DFA’s for Compositional Verification, by Yu-Fang Chen, Azadeh Farzan, Edmund M. Clarke, Yih-Kuen Tsay, Bow-Yaw Wang

Note that as @reinierpost states, without any restrictions on A and B, the problem may become undecidable.

Shaull
  • 17,814
  • 1
  • 41
  • 67