The definition of the Möbius function should resemble the adding and subtracting of inclusion-exclusion:
$$
\mu(n) = (-1)^{\omega(n)} [ n \text{ squarefree}]
$$
$\omega(n)$ counts the number of distinct prime factors of $n$. The inclusion-exclusion is useful for counting and sums. Here's the simplest example I can think of:
$\varphi(n)$ counts the positive integers up to $n$ that are coprime to $n$.
To count the totient, we count how many are coprime to $n$ using inclusion-exclusion:
Start with $n$. Then for each prime divisor $p$, $n / p$ are divisible by $p$, so exclude those. Then $n / pq $ for distinct $p, q$ have been over-excluded, so add those back in, etc.
$$\varphi(n) = n - \sum_{p | n} \frac{n}{p} + \sum_{\substack{p < q \\ p, q | n }} \frac n {p q} - \cdots $$
The trick is we can combine all these sums into one sum over the divisors of $n$. Every divisor is either square-free, the product of $\omega(n)$ distinct prime factors, or has a square factor. This gives us the nice identity
$$\varphi(n) = \sum_{d | n} \mu(d) \frac n d$$
where $\mu(d)$ encodes the sign from inclusion-exclusion, including ignoring divisors $d$ with square factors by virtue of $\mu(d) = 0$. In the notation of Dirichlet convolution, $\varphi = \mu * \operatorname{Id}$. We can do similar identities with divisor-related functions like $G(n) = \sum_{a < b \le n} [\gcd(a,b) = 1] $, by summing over every possible $d \le N$. Then we can even calculate sums involving $\lfloor n / d \rfloor$ in sub-linear time with the Dirichlet hyperbola method.