20

According to Immerman, the complexity class associated with SQL queries is exactly the class of safe queries in $\mathsf{Q(FO(COUNT))}$ (first-order queries plus counting operator): SQL captures safe queries. (In other words, all SQL queries have a complexity in $\mathsf{Q(FO(COUNT))}$, and all problems in $\mathsf{Q(FO(COUNT))}$ can be expressed as an SQL query.)

Based on this result, from theoretical point of view, there are many interesting problems that can be solved efficiently but are not expressible in SQL. Therefore an extension of SQL which is still efficient seems interesting. So here is my question:

Is there an extension of SQL (implemented and used in the industry) which captures $\mathsf{P}$ (i.e. can express all polynomial-time computable queries and no others)?

I want a database query language which stisfies all three conditions. It is easy to define an extension which would extend SQL and will capture $\mathsf{P}$. But my questions is if such a language makes sense from the practical perspective, so I want a language that is being used in practice. If this is not the case and there is no such language, then I would like to know if there is a reason that makes such a language uninteresting from the practical viewpoint? For example, are the queries that rise in practice usually simple enough that there is no need for such a language?

Kaveh
  • 22,661
  • 4
  • 53
  • 113

3 Answers3

5

As for your main question, I recommend this short survey by Martin Grohe.


Are the queries that are needed in practice usually simple enough that there is no need for a stronger language?

I'd say this holds most of the time, given the fair amount of extensions added to common query languages (transitive closure, arithmetic operators, counting, etc.). This comes from the point of view of somebody who has done some work as a freelance developer of relatively simple web sites and other applications, so I'm not sure about the real uses of SQL in bigger industries/larger databases.

In the rare cases a more powerful language might be needed, my guess is that software developers deal with them by using the language in which they write the application, not the queries (like C++ or java).

Janoma
  • 5,555
  • 3
  • 21
  • 21
3

First, the expressive power of SQL is less clear-cut than it seems. The aggregate, grouping, and arithmetic features of SQL turn out to have quite subtle effects. A priori, it seems feasible that by some encoding of algebraic operators using these features, one could actually express reachability in SQL. It turns out this isn't actually the case for SQL-92, which is "local".

This means that an extension is required for SQL-92 to capture PTIME, and one that allows the resulting language to be "non-local".

However, allowing ordered structures and with realistically limited arithmetic, proving that SQL-92 cannot express reachability would imply that uniform $\text{TC}^0 \subsetneq \text{NLOGSPACE}$ and is therefore likely to be quite difficult. (It could be argued that a natural linear ordering always exists on the data types in SQL-92, and that one could therefore assume that the underlying structures are ordered.)

Then the landscape changed again, since SQL:1999 (SQL3) included recursion. So SQL:1999 seems to be at least as expressive as fixed-point logic with counting (though I think the details might again be rather tricky, including the issue of order). Whether the new constructs made the logic more expressive than is required to capture PTIME, I don't know, and some study would be required to establish this. In the meantime, further revisions were made in 2003, 2006, 2008 and 2011 (being ISO documents, only drafts are freely available). These revisions added a whole slew of new features, including allowing XQuery as "part" of SQL queries. My guess is that "SQL" now is more expressive than is required to capture PTIME, but that the encoding required to do so might require large and rather un-natural queries that might not be supported in real systems.

So I think there is evidence that there is no industrial extension of SQL that precisely captures PTIME, answering your question in a fuzzy way. In short, the industrial extensions are rather powerful and may already have overshot PTIME. If it is true that SQL:1999 is already powerful enough to capture at least PTIME, then it is also not clear what "SQL" really means in your question, since one would have to define "SQL" to mean a version predating SQL:1999.

Finally, Grohe's survey on the search for logics capturing PTIME (also mentioned by Janoma) indicates not only that capturing PTIME is tricky unless we have a linear order as part of the language, but that a proof that there could be no such logic would also imply $\text{P} \ne \text{NP}$.

András Salamon
  • 3,532
  • 1
  • 21
  • 37
-1

Your question is not clear enough if you want an extension that captures $P$ and only $P$ or if it captures $P$ and possibly things outside $P$. It looks like that you are interesting only in the exact $P$ class, since you want safe queries and if otherwise, any Turing-complete extension of SQL would do.

We don't know if $P = NP$. If $P = NP$, then a SQL extension capable of capturing $P$ and just $P$ should be capable to compute if a boolean formula is satisfiable in polynomial time or solve any other $NPC$ problem in polynomial time.

But, if $P \ne NP$, then your SQL-extended language will not be capable to compute if a given boolean formula is satisfiable.

So, we have a question: There exist and algorithm (or query, or whatever) written in such language that is capable to decide if a boolean formula is satisfiable? I can't answer this (and probably nobody here can, as answering this is answering if $P = NP$). This question makes me believe that it is very unlikely that this language exists for real purposes.

Althought it probably does not exists for real purposes, it surely exists and is constructible and implementable. You could define that language with something capable of simulating a Turing-machine up to a given unary number of steps. I.E, capable to solve a P-complete problem. However, if you construct such a thing, it is almost Turing-complete except for the "given a unary number of steps" restriction, which in an SQL-like language would be a very strange way to limit it to only safe queries. You could do this if the steps are records of some table, but this still does not looks anything valuable for practical purposes.

Victor Stafusa
  • 1,212
  • 12
  • 21