Answer to question 1.
- Is the above argument correct? especially the last equality in \eqref{1} which uses product of a weakly and strongly convergent sequence is weakly convergent
Yes, but I'd choose the following route to proof the result, as it is more detailed even if more complex. Let's proceed step by step.
The first thing to note is that $\{(f\circ u^\varepsilon)^\prime\}_{\varepsilon\in ]0, 1] } = \{(f^\prime\circ u^\varepsilon){u^\varepsilon}^\prime\}_{\varepsilon\in ]0, 1] }$ is a sequence of Radon measures. Indeed
$$
C_c(\Bbb R)\ni\varphi\mapsto \int\limits_{\Bbb R} \varphi ( f^\prime\circ u^\varepsilon ) {u^\varepsilon}^\prime \in{\Bbb R}
$$
is a continuous map for every $\varepsilon\in\, ]0, 1]$ since
- $f^\prime\circ u^\varepsilon(x)$ is a continuous function by construction, $\varphi(x)\cdot ( f^\prime\circ u^\varepsilon)(x)$ is a continuous map from $C_c(\Bbb R)$ to itself and
- ${u^\varepsilon}^\prime\in \mathcal{M}(\Bbb R)$ again by construction and the conclusion holds by Riesz theorem (see for example [1], §1.4, p. 26, theorem 1.54 and remark 1.57).
Edit. At this point we still do not know if the sequence converges. We just know that the have a sequence of Radon measures which is obtained fro two weakly* convergent sequences of Radon measures: we have not used this latest fact and, as we'll see, we will not use it in the following step.
Second edit I am not sure that the argument used for this part of the answer is entirely correct. In particular, thinking about the bound \eqref{bd}, I doubt it is effective. I do not delete the entire answer just because the second part is correct and it includes also the case when $f\in C^1$.
Now recall the definition of locally weak* convergence and a lemma on the convergence of sequences of Radon measures.
- (see [1], §1.4, p. 27, definition 1.58) Let $\mu$ and the sequence $\{\mu_h\}_{h}$ be $\Bbb R^m$-valued ($m\ge 1$) Radon measures on the locally convex separable metric space $X$; we say that $\{\mu_h\}$ locally weakly* converges to $\mu$ if
$$\DeclareMathOperator{\Dd}{d\!}
\lim_{h\to\infty} \int\limits_X \varphi \Dd \mu_h =\int\limits_X \varphi \Dd \mu
$$
for every $\varphi \in C_c(X)$.
- A corollary of the classical De la Valle Pussin compactness criterion for finite Radon measures (see [1], §1.4, p. 28, corollary 1.60).
If a sequence $\{\mu_h\}_{h}$ of Radon measures on a locally convex separable metric space $X$ is such that
$$
\sup\{|\mu_h|(K) \mid h\in\Bbb N\}<+\infty
$$ where $\lvert\mu_h\rvert$ is the total variation of the measure $\mu_h$, and the inequality holds for every compact set $K\subset X$ then it has a locally weakly* converging subsequence.
But then we are done, as the family $\{(f\circ u^\varepsilon)^\prime\}_{\varepsilon\in ]0, 1] } = \{(f^\prime\circ u^\varepsilon){u^\varepsilon}^\prime\}_{\varepsilon\in ]0, 1] }$ satisfies exactly these requirements. Indeed, choosing $\varepsilon =\frac{1}{n}$, $n\in\Bbb N$ and using Hölder's inequality we have
$$
\begin{split}
\lvert u^\varepsilon(x)\rvert & = \left \lvert\int\limits_{\Bbb R} u(x-y)\nu_{1\over n}(y)\Dd y\right\rvert \\
& = \left \lvert\int\limits_{\Bbb R} u(x-y)n \nu(ny)\Dd y\right\rvert\\
& = \left \lvert\int\limits_{\Bbb R} u\left(x-\frac{z}{n}\right)\nu(z)\Dd z\right\rvert\\
& \le \int\limits_{\Bbb R} \left\lvert u\left(x-\frac{z}{n}\right)\nu(z)\right\rvert\Dd z \\
& \le \lVert u\rVert_{L^1}\lVert \nu\rVert_\infty < + \infty
\end{split}
$$
i.e. the mollified version $u^\varepsilon$ of a $L^1$ function is bounded and its upper bound does not depend on the value of $\varepsilon$, thus
$$
\begin{split}
\sup_{\substack{\varphi\in C_c (K)\\ \|\varphi\|_{L^{\infty}} \leq 1}} \int\limits_K \varphi (f^\prime\circ u^\varepsilon){u^\varepsilon}^\prime \Dd x &= \sup_{\substack{\varphi\in C_c (K)\\ \|\varphi\|_{L^{\infty}} \leq 1}} \int\limits_K \varphi (f\circ u^\varepsilon)^\prime\Dd x\\
&= \sup_{\substack{\varphi\in C_c (K)\\ \|\varphi\|_{L^{\infty}} \leq 1}}\int\limits_K \varphi^\prime (f\circ u^\varepsilon)\Dd x \\
&\le M_f \sup_{\substack{\varphi\in C_c (K)\\ \|\varphi\|_{L^{\infty}} \leq 1}} \int\limits_K \varphi^\prime \Dd x
\end{split}\label{bd}\tag{BD}
$$
where $M_f=\sup_{|x|\le \lVert u\rVert_{L^1}\lVert \nu\rVert_\infty}f$
The formula
$$
\lim_{\varepsilon\to 0}(f\circ u^\varepsilon)^\prime = \lim_{\varepsilon\to 0} (f^\prime\circ u^\varepsilon){u^\varepsilon}^\prime\triangleq (f\circ u)^\prime \triangleq (f^\prime\circ u)u^\prime
$$
holds true and, as stated above, we do not need the fact that the sequence $\{(f^\prime\circ u^\varepsilon){u^\varepsilon}^\prime\}_{\varepsilon\in ]0, 1] }$ is the product of two locally weak* convergent sequences.
Answer to question 2.
- Can we justify the same if $f$ is only Lipschitz continuous in particular $f(x)=|x|$ and $f'(x)=\mathrm{sgn}(x)$.
Yes, it may be possible to proceed as shown above: nevertheless we can obtain the most general result by using the classical definition of variation for functions of one real variable. Precisely, assuming that $f$ is Lipschitz with constant $M>0$ i.e. $f\in C^{0,1}(\Bbb R)$ $\lvert f(x)-f(y)\rvert \le M \lvert x-y \rvert$ fro all $x,y \in\Bbb R$, we have that
$$
\begin{split}
V_a^b(f\circ u) & =\sup_{P \in \mathscr{P}} \sum_{i=0}^{n_{P}-1} | f\circ u(x_{i+1})-f\circ u(x_i)|\\
&\le M \sup_{P \in \mathscr{P}} \sum_{i=0}^{n_{P}-1} | u(x_{i+1})- u(x_i)| =M V_a^b(u)\quad \forall a, b\in \Bbb R.
\end{split}
$$
This implies that $f\circ u\in BV_\text{loc}(\Bbb R)$ thus its derivative is a Radon measure. Furthermore since $f\in C^{0,1}(\Bbb R)$ it is also absolutely continuous, almost everywhere differentiable and essentially bounded by its Lipschitz constant and we can express its first derivative as
$$
(f\circ u)^\prime = (f^\prime\circ u)u^\prime
$$
since for all $\varphi \in C_c(\Bbb R)$
$$
\begin{split}
\left|\int\limits_{\Bbb R} \varphi ( f\circ u )^\prime\right|
& = \left|\int\limits_{\Bbb R} \varphi ( f^\prime\circ u ) {u}^\prime\right|\\
& \le M \left|\int\limits_{\Bbb R} \varphi {u}^\prime\right|< +\infty
\end{split}
$$
as $u\in BV(\Bbb R)$.
Notes
For the sake of completeness, as briefly shown in this Q&A, a deeper result is true :
Theorem (Josephy [2], p. 355, theorem 4) For a given function $f:[0,1]\to[0,1]$, the composition $f\circ g$ is of bounded variation for all functions $g:[0,1]\to[0,1]$ of bounded variation if and only if $f$ satisfies a Lipschitz condition on $[0,1]$.
The fact that the domain and codomain of the functions in the statement of theorem is $[0,1]$ does not reduce its generality: every finite interval will do.
In the same paper, theorem 3 states a necessary and sufficient condition for a function $g$ such that $f\circ g$ is of bounded variation for each $f$ of bounded variation, and identifies the class for such functions.
References
[1] Luigi Ambrosio, Nicola Fusco, Diego Pallara, Functions of bounded variation and free discontinuity problems, Oxford Mathematical Monographs, New York and Oxford: The Clarendon Press/Oxford University Press, New York, pp. xviii+434 (2000), ISBN 0-19-850245-1, MR1857292, Zbl 0957.49001.
[2] Michael Josephy, "Composing Functions of Bounded Variation", Proceedings of the American Mathematical Society
Vol. 83, No. 2, pp. 354-356, (1981), MR624930, Zbl 0475.26005.