Is it possible to train a neural network to solve polynomial equations?

Question

I randomly generate millions groups of triplet $\lbrace x_0, x_1, x_2 \rbrace$ within range $(0,1)$, then calculate the corresponding coefficients of the polynomial $(x-x_0)(x-x_1)(x-x_2)$, which result in triplet groups normalized in a form of $\lbrace { {x_0+x_1+x_2 \over 3} , {\sqrt{x_0x_1+x_1x_2+x_0x_2 \over 3}} , {\sqrt[3]{x_0x_1x_2}}} \rbrace$;
After that, I feed the coefficient triplets in to a 5-layered neural network $\lbrace 3,4,5,4,3 \rbrace$, in which all the activation function is set to sigmoid and the learning rate is set to 0.1;
However, I only get a very poor cross validation, around 20%.

How can I fix this?

BackGround

My original problem is a dynamic inverse problem. In that problem, I have hundreds of thousands of observations $O$, from these observations, I need to recover several hundred parameters $P$. The simulation process from $P$ to $O$ is very easy and cheap to calculate, but the inversion from $O$ to $P$ is highly nonlinear and nearly impossible. My idea is to train a neural network getting $O$ as inputs and $P$ as outputs. To check the feasibility of this idea, I employ a 3-ordered polynomial equation to do the validation.

update half a year later

With more nodes per layer, I have successfully trained a neural network. The topology is set to $\lbrace 3, 64, 64, 64 \rbrace$. And the most important trick is, sorting the generated triplet $\lbrace x_0, x_1, x_2 \rbrace$, ensuring $x_0 <= x_1 <= x_2$ always holds.

score 6 · Accepted Answer · edited Apr 13 '17 at 12:32

You're trying to fit a very complicated function. There is no reason to expect that neural networks will be very good at this. Neural networks aren't magic pixie dust. They can do some things well, but don't expect a silver bullet.

You're trying to get the neural network to learn to compute some complicated function. We do know that given sufficiently nodes and sufficiently many layers, you can get an arbitrarily good approximation to this function, but there's no a priori reason to expect it should be possible with the particular (small) number of nodes and layers you chose. In fact, two layers suffices, if you have sufficiently many nodes -- but we have no way to compute how many nodes are needed.

In particular, the function you are trying to compute amounts to computing $x_0,x_1,x_2$ from $a,b,c,d$, where

$$\begin{align*} x_k &= - {1 \over 3a} \left(b + \eta^k C + {\Delta_0 \over \eta^k C}\right)\\ \Delta_0 &= b^2 - 3ac\\ C &= \sqrt[3]{\Delta_1 \pm \sqrt{\Delta_1^2 - 4\Delta_0^3} \over 2}\\ \Delta_1 &= 2b^3 - 9abc + 27a^2d\\ \eta &= -{1 \over 2} + {1 \over 2} \sqrt{3} i \end{align*}$$

(This comes from the general formula for solving a cubic equation.)

That's an extremely messy function, and thus it might be challenging to approximate using a neural network of the architecture you present. You can always try increasing the number of nodes and/or number of layers, but there's no a priori theory to tell you what the right number is.

Is it possible to train a neural network to solve polynomial equations?

1 Answers1