Let $\{X_n\}_{n \in \mathbb{N}}$ be a sequence of real i.i.d. random variables with mean $\mu$. Let $S_n$ be the sum of the first $n$ elements of this seqeuence, $$S_n = \frac1n\sum_{i=1}^n X_i.$$ Then the weak law of large numbers states that for any $\epsilon > 0$ $$\lim_{n\rightarrow \infty}P(|S_n - \mu| > \epsilon) = 0 \tag{1}$$ and the strong law of large number states that $$P(\lim_{n \rightarrow \infty}|S_n - \mu| = 0) = 1 \tag{2}$$
I have read a number of questions regarding these two laws, but I'm still having some trouble seeing the subtle differences between them and what they're saying on a less technical level. I suspect part of my confusion could be related to a few questions I have on convergence in probability vs convergence a.s., which I've posted as a separate question.
My current intuition is as follows. For fixed $\epsilon$ the weak law tells us that we will increasingly be likely to be near $\epsilon$ of $\mu$ as $n \rightarrow \infty$. In other words, if we fix a large $n$ then it is likely that $S_n \in (\mu - \epsilon, \mu + \epsilon)$. However, since the probability is only $0$ in the limit (1), it can be nonzero at this fixed $n$, and so there is a chance that $S_n$ might fall outside of this interval. As $n$ increases, the likelihood of $S_n$ falling outside this range decreases. On Wikipedia they state
The weak law states that for a specified large $n$, the average $\overline{X}_n$ is likely to be near $\mu$. Thus, it leaves open the possibility that $|\overline{X}_n - \mu| > \epsilon$ happens an infinite number of times, although at infrequent intervals.
How did they conclude that $S_n$ might fall outside the range an infinite number of times?
What I am less sure about is what the strong law is exactly saying. For example, if we again choose large $n$, wouldn't we again have nonzero probability that $S_n$ will fall outside of some neighorhood around $\mu$, since (2) only holds in the limit? However on Wikipedia they say
The strong law shows that this (by this they are referring to the passage I quoted above) almost surely will not occur. It does not imply that with probability 1, we have that for any $\epsilon > 0$ the inequality $|\overline{X}_n - \mu|$ holds for large enough $n$, since the convergence is not necessarily uniform on the set where it holds.
But I am not sure how this gives anymore information than the weak law already does. How does it rule out "bad" events occurring infinitely often as in the weak law?
As another example, in Ross's book on probability he says:
The weak law of large numbers states that, for any specified large value $n^*$, $(X_1 + \cdots + X_{n^*})/n^*$ is likely to be near $\mu$. However, it does not say that $(X_1 + \cdots + X_n)/n$ is bound to stay near $\mu$ for all $n$ larger than $n^*$. Thus, it leaves open the possibility that large values of $|(X_1 + \cdots + X_n)/n - \mu|$ can occur infinitely often (though at infrequent intervals). The strong law shows that this cannot occur. In particular, it implies that, with probability 1, for any positive value $\epsilon$, $$\Big| \sum_{i=1}^n \frac{X_i}n - \mu \Big|$$ will be greater than $\epsilon$ only a finite number of times.
But again, I do not see how his last claim follows from convergence a.s. as in (2).