9

If I have a learning problem that should have an inherent symmetry, is there a way to subject my learning problem to a symmetry constraint to enhance learning?

For example, if I am doing image recognition, I might want 2D rotational symmetry. Meaning that the rotated version of an image should get the same result as the original.

Or if I am learning to play tic-tac-toe, then rotating by 90deg should yield the same game play.

Has any research been done on this?

2 Answers2

8

From Emre's comment above, Section 4.4 of Group theoretical methods in machine learning by Risi Kondor has detailed information and proofs about creating kernel methods that inherently have symmetries. I will summarize it in a hopefully intuitive way (I am a physicist not a mathematician!).

Most ML algorithms have a matrix multiplication like, \begin{align} s_i &= \sum_j W_{ij}~x_j \\ &= \sum_j W_{ij}~(\vec{e}_j \cdot \vec{x}) \end{align} with $ \vec{x} $ being the input and $ W_{ij} $ being the weights we wish to train.

Kernel Method

Enter the realm of kernel methods and let the algorithm handle input via, \begin{align} s_i &= \sum_j W_{ij}~k(e_j,~x) \end{align} where now we generalize to $ x, e_j \in \mathcal{X} $.

Consider a group $ G $ that acts on $ \mathcal{X} $ via $ x \rightarrow T_g(x) $ for $ g \in G $. A simple way to make our algorithm invariant under this group is to make a kernel, \begin{align} k^G(x, y) &= \frac{1}{|G|} \sum_{g \in G} k(x, T_g(y)) \end{align} with $ k(x, y) = k(T_g(x), T_g(y)) $.

So, \begin{align} k^G(x, T_h(y)) &= \frac{1}{|G|} \sum_{g \in G} k(x, T_{gh}(y)) \\ &= \frac{1}{|G|} \sum_{g \in G} k(x, T_{g}(y)) \\ &= \frac{1}{|G|} \sum_{g \in G} k(T_{g}(x), y) \end{align}

For $ k(x, y) = x \cdot y $ which works for all unitary representations,

\begin{align} k^G(x, T_h(y)) &= \left[ \frac{1}{|G|} \sum_{g \in G} T_{g}(x) \right] \cdot y \end{align}

Which offers a transformation matrix that can symmeterize the input into the algorithm.

SO(2) Example

Actually just the group that maps to $ \frac{\pi}{2} $ rotations for simplicity.

Let us run linear regression on data $ (\vec{x}_i, y_i) \in \mathbb{R}^2 \times \mathbb{R} $ where we expect a rotational symmetry.

Our optimization problem becomes, \begin{align} \min_{W_{j}} &\sum_i \frac{1}{2} (y_i - \tilde{y}_i)^2 \\ \tilde{y}_i &= \sum_j W_{j} k_G(e_j, x_i) + b_i \end{align}

The kernel $ k(x, y) = \| x - y \|^2 $ satisfies $ k(x, y) = k(T_g(x), T_g(y)) $. You could also use $ k(x, y) = x \cdot y $ and a variety of kernels.

Thus, \begin{align} k_G(e_j, x_i) &= \frac{1}{4} \sum_{n=1}^4 \| R(n\pi/2)~\vec{e}_j - \vec{x}_i \|^2 \\ &= \frac{1}{4} \sum_{n=1}^4 ( \cos(n\pi/2) - \vec{x}_{i1} )^2 + ( \sin(n\pi/2) - \vec{x}_{i2} )^2 \\ &= \frac{1}{4} \left[ 2 \vec{x}_{i1}^2 + 2 \vec{x}_{i2}^2 + (1 - \vec{x}_{i1} )^2 + (1 - \vec{x}_{i2} )^2 + (1 + \vec{x}_{i1} )^2 + (1 + \vec{x}_{i2} )^2 \right] \\ &= \vec{x}_{i1}^2 + \vec{x}_{i2}^2 + 1 \end{align}

Note that we needn't sum over $ j $ because it is the same for both. So our problem becomes, \begin{align} \min_{W} &\sum_i \frac{1}{2} (y_i - \tilde{y}_i)^2 \\ \tilde{y}_i &= W \left[ \vec{x}_{i1}^2 + \vec{x}_{i2}^2 + 1 \right] + b_i \end{align}

Which yields the expected spherical symmetry!

Tic-Tac-Toe

Example code can be seen here. It shows how we can create a matrix that encodes the symmetry and use it. Note that this is really bad when I actually run it! Working with other kernels at the moment.

1

Turns out this is just the study of Invariant Theory applied to Machine Learning