The difference between these definitions is that, while the first one presents a property that a p-value must satisfy, the latter ones define the p-value as a function of a collection of hypothesis tests. It is possible to show that these definitions are related, as I discuss below.
Note that the latter two definitions refer to varying levels of significance. In order to make these definitions operational, you must consider a collection of hypothesis tests. Formally, an hypothesis test, $\phi: \mathcal{X} \rightarrow \{0,1\}$ is a function from the sample space that assumes the value $1$ if $H_0$ is rejected and the value $0$, otherwise. Let $(\phi_{\alpha})_{\alpha \in (0,1)}$ be a collection of hypothesis tests such that $\phi_{\alpha}$ has size $\alpha$ and that satisfy monotonicity, that is, if $\alpha_1 \leq \alpha_2$, then for every $x$, $\phi_{\alpha_1}(x) \leq \phi_{\alpha_2}(x)$. The latter two definitions say that the p-value is a function $p: \mathcal{X} \rightarrow (0,1)$ such that $p(x)=\inf \{\alpha: \phi_{\alpha}(x)=1\}$. Observe that, for every $\theta \in H_0$,
\begin{align*}
\mathbb{P}_\theta(p(X) \leq \alpha^*)
&= \mathbb{P}_\theta(\inf \{\alpha: \phi_{\alpha}(X)=1\} \leq \alpha^*) \\
&= \mathbb{P}_\theta(\phi_{\alpha^*}(X)=1)
& \text{monotonicity} \\
&\leq \alpha^*
& \phi_{\alpha^*} \text{ has size } \alpha^*
\end{align*}
This shows that the p-value as in Rohatgi and Schervish satisfies the property presented in Casella.
Next, consider that $p: \mathcal{X} \rightarrow (0,1)$ is a function such that, for every $\theta \in H_0$, $P_{\theta}(p(X) \leq \alpha^*) \leq \alpha^*$. In this case, you can define a collection of hypothesis tests such that $\phi_{\alpha}(x)=\mathbb{I}(p(x) \leq \alpha)$. It follows from the initial definition that each $\phi_{\alpha}$ has size $\alpha$. Also, it follows from construction that, if $\alpha_1 \leq \alpha_2$, then for every $x$, $\phi_{\alpha_1}(x) \leq \phi_{\alpha_2}(x)$. Finally, note that
$p(x) = \inf\{\alpha: \phi_{\alpha}(x)=1\}$. That is, you can construct a collection of hypothesis tests based on Casella's definition. If you apply Rohatgi's or Schervishe's definition to this class, then you obtain $p(x)$.