Confidence intervals within the frequentist paradigm: You are correct that these assertions (warning against interpreting the confidence interval as a probability interval for the parameter) come from the fact that confidence intervals arise in the classical frequentist method, and in that context, the parameter is considered a fixed "unknown constant", not a random variable. There is a relevant probability statement pertaining to the confidence interval, which is:
$$\mathbb{P}(L(\mathbf{X}) \leqslant \mu \leqslant U(\mathbf{X}) \mid \mu) = 1-\alpha,$$
where $L(\mathbf{X})$ and $U(\mathbf{X})$ are bounds formed as functions of the sample data $\mathbf{X}$ (usually via the use of rearrangement of a probability statement about a pivotal quantity). Importantly, the data vector $\mathbf{X}$ is the random variable in this probability statement, and the parameter $\mu$ is treated as a fixed "unknown constant". (I have indicated this by putting it as a conditioning variable, but within the frequentist paradigm you wouldn't even specify this; it would just be implicit.) The confidence interval is derived from this probability statement by taking the observed sample data $\mathbf{x}$ to yield the fixed interval $\text{CI}(1-\alpha) = [ L(\mathbf{x}), U(\mathbf{x}) ]$.
The reason for the assertions you are reading is that once you replace the random sample data $\mathbf{X}$ with the observed sample data $\mathbf{x}$, you can no longer make the probability statement analogous to the above. Since the data and parameters are both constants, you now have the trivial statement:
$$\mathbb{P}(L(\mathbf{x}) \leqslant \mu \leqslant U(\mathbf{x})) = \begin{cases} 0 & \text{if } \mu \notin \text{CI}(1-\alpha), \\[6pt] 1 & \text{if } \mu \in \text{CI}(1-\alpha). \end{cases}$$
Confidence intervals within the Bayesian paradigm: If you would prefer to interpret the unknown parameter $\mu$ as a random variable, you are now undertaking a Bayesian treatment of the problem. Although the confidence interval is a procedure formulated within the classical paradigm, it is possible to interpret it within the context of Bayesian analysis.
However, even within the Bayesian context, it is still not valid to assert a posteriori that the CI contains the true parameter with the specified probability. In fact, this posterior probability depends on the prior distribution for the parameter. To see this, we observe that:
$$\mathbb{P}(L(\mathbf{x}) \leqslant \mu \leqslant U(\mathbf{x}) \mid \mathbf{x})
= \int \limits_{L(\mathbf{x})}^{U(\mathbf{x})} \pi(\mu | \mathbf{x}) d\mu = \frac{\int_{L(\mathbf{x})}^{U(\mathbf{x})} L_\mathbf{x}(\mu) \pi(\mu)d\mu}{\int L_\mathbf{x}(\mu) \pi(\mu) d\mu}.$$
This posterior probability depends on the prior, and is not generally equal to $1-\alpha$ (though it may be in some special cases). The initial probability statement used in the confidence interval imposes a restriction on the sampling distribution, which constrains the likelihood function, but it still allows us freedom to choose different priors, yielding different posterior probabilities for the correctness of the interval.
(Note: It is easy to show that $\mathbb{P}(L(\mathbf{X}) \leqslant \mu \leqslant U(\mathbf{X})) = 1-\alpha$ using the law-of-total probability, but this is a prior probability, not a posterior probability, since it does not condition on the data. Thus, within the Bayesian paradigm, we may say a priori that the confidence interval will contain the parameter with the specified probability, but we cannot generally say this a posteriori.)