When faced with an abstraction that feels unmotivated, I try to (1) look to the historical context and (2) look for clues that it's somehow natural.
Set the stage with the collapse of the Italian school of algebraic geometry. The field had outgrown its foundations. Could algebraic geometry be put on solid footing while salvaging these techniques, like generic points? One idea was to use commutative algebra as a basis. Then, there was the search for a Weil cohomology theory to prove the Weil conjectures, and, in general, a way to apply algebraic topology to seemingly-discrete objects from algebra and number theory, like finite fields. After all, topological techniques were already useful for proving fundamentally algebraic results, like the fundamental theorem of algebra, in settings we already knew how to apply it.
Schemes fit these goals exactly. They are built from commutative algebra. They mirror the construction of manifolds. They have generic points. They're not only geometric enough to give you a cohomology theory, but they even give you a notion of derivatives via nilpotents.
What makes the seemingly-strange construction natural? I'm still green, but here's what I have so far. Look back on classical algebraic geometry, the study of solution sets of polynomials, with modern algebra. We have the ring
$$
\mathbb{C}[x,y],
$$
and we want to study systems of equations, $$p_1(x,y) = 0,\dots p_n(x,y)=0.$$ But we don't really care about the specific polynomials. We want all their consequences. We want to zero-out the whole ideal
$$
(p_1,\dots p_n)\mapsto 0.
$$
Any set of generators of this ideal will do fine.
Every point in $\mathbb{C}^2$ embeds as a system of polynomials—erm, maximal ideal:
$$
(a,b) \mapsto \left(x-a, y-b\right),
$$
so we don't need treat evaluating polynomials as a separate thing. It's just a special case of modding out by an ideal. A particularly useful consequence embedding is ring homomorphisms on the polynomials $\mathbb{C}[x,y]$ will carry the base-space with it.
There's one problem, though. Ring homomorphisms do induce a pullback map on ideals, but it may not preserve maximality. It does, however preserve primality. I find this easiest to see by thinking in terms of quotients.
Suppose we have the ring homomorphism
$f: A \to B$ and $B$ has an ideal $I$, with associated quotient map $q: B \to B/I$. Then
$$A \xrightarrow{f} B \xrightarrow{q} B/I.$$ Then $q\circ f: A \to B/I$ factors through the quotient under the preimage ideal $f^{-1}(I)$, so
$$A/f^{-1}(I) \hookrightarrow B/I$$ is injective. If $I$ is prime, then $B/I$ has no zerodivisors, hence any subring has no zerodivisors. Conclude $f^{-1}(I)$ is prime too. On the other hand, if $I$ is maximal, then $B/I$ is a field. But not every subring of a field is a field—most aren't!
The base space doesn't want to be all maximal ideals, it wants to be all prime ideals. So we have to ask, what are the nonmaximal prime ideals? They are irreducible varieties. What do we make of this? You already know the punchline from Vakil, but they're the generic points.
Besides functoriality, why is this "natural"? Vakil's Exercise 3.2.1(a) shows it in action. There's an inclusion
$$
i:\mathbb{Q}[x,y] \hookrightarrow \mathbb{R}[x,y]
$$
which induces a function in the opposite direction on their base spaces:
$$
i^*: \text{Spec}(\mathbb{R}[x,y]) \to \text{Spec}(\mathbb{Q}[x,y])
$$
This can't be a map
$$
\mathbb{R}^2 \to \mathbb{Q}^2
$$
because we need to send $(\pi,\pi^2)$ somewhere. But because $\pi$ is transcendental, the best we can do is send it to the generic point on $y=x^2$. Algebraically, this is sensible. The rationals can recognize $(\pi,\pi^2)$ lies on the parabola, so it would be a waste to throw that away. But they can't get any more specific. It's as if you said $(t,t^2)$, as far as $\mathbb Q$ is concerned. Generic points fit naturally into this picture.
The Zariski topology, too, is natural. It's the topology induced by polynomials, in the sense that it's the coarsest topology where the preimage of any single point is closed. Its apparent pathologies accurately reflect the algebra: there's only so much polynomials can see, they can only grow so fast. Even if you look only at closed points in $\text{Spec}\mathbb C [x,y]$, Hausdorffness fails. But it has to! Nonzero polynomials can only make giant open sets because their zeros are rare.
For the structure sheaf, it's sort-of forced on you if you want to study rational (or meromorphic) functions. Without cutoff functions, which are non-analytic, you can't edit out the singularities, so you have to shrink the domain. And the shrinking had better commute, so you get a prescheme. But we also want consistency for a sheaf's sections as you shrink or grow the domain, hence the identity and gluing axioms. We want $$(x,y)\mapsto \frac{1}{x^2+y^2}$$
to be "the same thing" no matter which open set we are looking at.
One sheaf that helped me is the sheaf of inverses of a holomorphic function, for example $\exp$. You can define $\log$ by taking a branch cut, but this choice is arbitrary. Even worse, it introduces a discontinuity. Instead, you can take all possible $\log$s together as a sheaf. And, via analytic continuation, the whole thing is determined by its germ at one point.