I have a matrix $\mathbf{A} \in \mathbb{R}^{2000 \times 2000}$ represented in memory by an array of $2000 \times 2000$ float32 elements and I also have $10$ arrays $\mathbf{E}^i \in \mathbb{R}^{2000 \times 2000}$.
I know, that there exists separation $\mathbf{A} = \mathbf{F} + \mathbf{E}$, where $\mathbf{F}$ represents foreground signal and $\mathbf{E}$ represents background signal. I know further, that $$\mathbf{E} = \sum_{i=0}^{9} \beta_i \mathbf{E}^i,$$ is a good approximation of the background signal. And I would like to uncover coefficients $\beta_i$.
Unfortunatelly, I don't know too much about foreground $\mathbf{F}$, for sure $<\mathbf{F}, \mathbf{E}^i> \neq 0$. So unfortunatelly $\beta_i = <\mathbf{E}_i, \mathbb{A}>$ or least square solution leads to very suboptimal results. I assume that $\mathbb{F}$ is zero or close to zero for at least 20% of the elements and that the texture of $\mathbb{E}$ is somehow inpainted into $\mathbb{A}$. To fit this texture I try to estimate $\beta_i$ by minimizing
$$ ||\mathbf{A} - \sum_{i=0}^{9} \beta_i \mathbf{E}_i ||_{TV},$$
hoping that I will remove dynamic features of the texture and obtain uncorrupted foreground signal. I have tried to attack the problem using https://www.cvxpy.org/ and ECOS solver. It is utilizing single processor only and it is very slow. I can define the problem somehow symbolically but I don't know what is happenning under the hood. Do you have any idea for fast algorithms to attack this problem and e.g. some fast implementations with wrappers in Python?