Given a sequence of amino acids, the protein folding problem is to find a geometric structure of the amino acids that minimizes energy. Given that this problem is NP-Hard, one should be able to do the following:
- Convert an instance of an arbitrary problem to 3-SAT
- Use the reduction from 3-SAT to protein folding to generate an instance of the protein folding problem.
- Synthesize the amino acid sequence in a lab and allow the protein structure to fold.
- Observe the structure of the resultant protein.
- Finally, use the observed structure to find a solution to the original problem utilizing the aforementioned reduction.
This is almost certainly not possible at the moment as it would have some very serious implications. Here are my theories as to why
- The proof that the protein folding problem is NP-hard relies on the Hydrophobic-polar protein folding model, a simplified model of protein structure. Perhaps the simplifications that allow us to study the problem abstractly prevent us from applying this result to the physical problem of protein folding.
- While the problem of protein structure prediction is NP-hard in the worst case, perhaps proteins that are found in nature are a special case on which the protein folding problem is in P. So, if you were to generate an amino acid sequence corresponding to an instance of a supposedly hard problem, then the protein would fail to reliably fold to a minimal-energy state. My understanding is that proteins occasionally misfold to local minima in nature; maybe some proteins are more prone to this behavior than others.
- Synthesizing a protein in a lab given its amino acid sequence is not feasible.
- Observing the physical structure of a protein at the required resolution is not possible with today’s technology.
Number 1 does not seem entirely convincing since effects considered in the Hydrophobic-polar protein folding model should still be present in real life. Similarly, theories 3 and 4 are also likely incomplete explanations. If they are true, they are simply technological limitations as opposed to fundamental properties of the the problem. So, as technology progresses and we are able to synthesize and observe proteins, this just raises more questions such as “why can physical/chemical systems compute things that computers cannot?”