One way to approach this is to start with $\in $, and build up a "universe" $U$, using a standard naive set theoretic construction. $U$ is defined in such a way so as to ensure that all the usual operations of set theory applied to elements of $U$, always produce elements of $U$.
Now, we say that $u$ is $\textit {small}\Leftrightarrow u\in U$. Otherwise $u$ is said to be $\textit {large}$. Note that there are large sets: $U$ itself is large because of course $U\notin U$.
Once $U$ is defined, then you get most of the properties you want in order to do "ordinary" mathematics, such as $f:u\to v$ is small whenever $u$ and $v$ are small.
A $\textit {class}$ is defined to be any subset $S\subset U$. Note that since, by construction of $U$, $x\in u\in U\Rightarrow x\in U$, every element of $U$ is a subset of $U$ so every small set is a class. The classes that do not belong to $U$ are called $\textit {proper classes}$, $U$ being an example of a proper class.
From here, one defines Cat, the category of small categories, Cat', the category of large categories, etc.
This approach has some disadvantages. For example, if you define Cls to be the category of all classes then the set of objects of Cls is $\mathcal P(U)$ which is not a class, since its cardinality is strictly greater than that of $U$.
There are more sophisticated approaches to defining categories, but I am not expert enough to elaborate on them.