Suppose we observe one draw from the random variable $X$, which is distributed with normal distribution $\mathcal{N}(\mu,\sigma^2)$. The variance $\sigma^2$ is known, $\mu$ isn't. We want to estimate $\mu$.
Suppose further that the prior distribution is given by truncated normal distribution $\mathcal{N}(\mu_0,\sigma^2_0,t)$, i.e., density $f(\mu)=c/\sigma \phi((\mu-\mu_0)/\sigma_0)$ if $\mu<t$, and $f(\mu)=0$ otherwise, where $t>\mu$ and $c$ is a normalizing constant. (Interpretation: we get noisy signals about $\mu$, which are known to be normally distributed with known variance---this is the draw of $X$. But we have prior knowledge that values $\mu\ge t$ are not possible.)
In this setup, is the resulting posterior a truncated normal distribution (truncated at $t$ like the prior)? I tried to adapt the derivation of the posterior for the well known conjugate normal pair (e.g., here and here), and it seems to work. Do you see any mistake in this derivation?
The likelihood function is given by $$f(x|\mu)=\frac{1}{\sigma\sqrt{2\pi}} \exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} \right\} $$ The prior density is ($\Phi(.)$ is the cdf of the standard normal distribution) $$f(\mu)=\begin{cases} \frac{1}{\sigma_0\sqrt{2\pi}\Phi((t-\mu_0)/\sigma_0)} \exp\left\{-\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\} &\text{ if } \mu\le t \\ 0 & \text{else}. \end{cases}$$ The prior density can be rewritten as $$f(\mu)=c \phi((\mu-\mu_0)/\sigma_0)\mathbf{1}\{\mu<t\},$$ where $c$ is the normalizing constant (independent of $\mu$, but dependent on $t$). Now, by Bayes' rule, \begin{equation} f(\mu|x)\propto f(x|\mu) f(\mu)\propto\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} \right\} \exp\left\{-\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\}\mathbf{1}\{\mu<t\} \\ =\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2} -\frac{(\mu-\mu_0)^2}{2\sigma_0^2} \right\}\mathbf{1}\{\mu<t\}\\ \propto \exp\left\{-\frac{1}{2\sigma^2\sigma_0^2/(\sigma^2+\sigma_0^2)} \left(\mu-\frac{\sigma^2\mu_0+\sigma_0^2 x}{\sigma^2+\sigma_0^2}\right)^2 \right\}\mathbf{1}\{\mu<t\}. \end{equation} This is the kernel of the normal distribution with the usual mean and variance (as if we had done the derivation for an untruncated prior), but truncated at $t$ and above. In other words, ignoring the truncation in the prior distribution, using the usual learning rule for the conjugate normal pair, and then applying the truncation gives the same result as the derivation above (assuming it is correct). Is it correct? All I do is add the indicator function (and adapt the normalizing constant), does that introduce problems somewhere?