Cantor's Naive Set Theory allows the construction of the set of all ordinals, which contains itself, which triggers the Burali-Forti Paradox. ZFC both disallows a set of the size of all ordinals and typically uses Von Neumann's definition of ordinals. Under Von Neumann's definition, a set $\alpha$ is an ordinal number iff
1. If $\beta$ is a member of $\alpha$, then $\beta$ is a proper subset of $\alpha$;
2. If $\beta$ and $\gamma$ are members of $\alpha$ then one of the following is true: $\beta=\gamma$, $\beta$ is a member of $\gamma$, or $\gamma$ is a member of $\beta$;
and 3. If $\beta$ is a nonempty proper subset of $\alpha$, then there exists a $\gamma$ member of $\alpha$ such that the intersection $\gamma \cap \beta$ is empty. (Definition from Wolfram Mathworld).
The first rule implies that no ordinal is an element of itself, hence even if the axiom of foundation and axiom of replacement did not exist, the set of all ordinals could not be an ordinal, since it would violate the first rule, and the Burali-Forti Paradox would therefore still be resolved. So... there seems like their must be a significant difference between Cantor's definition of ordinals and Von Neumann's, since Cantor's Naive Set Theory still allows the set of all ordinals to be an ordinal. Cantor's ordinals can be elements of themselves. Why can Cantor's ordinals be elements of themselves? What is it about his definition of ordinals that allows them to be elements of themselves?