Understanding the definition of continuity from real analysis

Question

I've stared at and worked with the definition of continuity of a real valued function at a point for many (like $3$) years, but there are some things that have always bothered me about it.

First, here is the definition I'm talking about:

Definition. If $f: \Bbb R \to \Bbb R$ is a function, and $x \in \Bbb R$, we say $f$ is continuous at $x$ if $\forall \epsilon > 0$, $\exists \delta > 0$ such that $|x - y| < \delta \implies |f(x) - f(y)| < \epsilon$.

I understand that this means: given any interval, or ball, around our output $f(x)$, we can always find an interval, or ball, around $x$ that is being mapped into the ball around $f(x)$.

I also understand the following two points, which I think are really important, even though they may seem obvious: 1. Given an $\epsilon > 0$, when we find some $\delta$ that satisfies this condition, any positive number smaller than $\delta$ will also work, so in effect we find infinitely many $\delta$. 2. Given an $\epsilon > 0$, if we can find a $\delta$ that satisfies this condition, then this $\delta$ works for every larger $\epsilon$, too.

So here are my questions:

We often say that intuitively, a function is continuous at a point if "small changes in the input lead to small changes in the output". But what does "small" mean? It can't mean the same thing for input and output because the changes in the input are with respect to $\delta$ and the changes in the output are with respect to $\epsilon$... who is to say these two numbers are the same type of "small"?
What is the purpose of starting the definition by looking at changes in the output (i.e., saying $\forall \epsilon > 0$ first)? If we are saying small changes in the input lead to small changes in the output, shouldn't we start with a small change in the input and check that it leads to a small change in the output?
Why do many people say, "as we shrink $\epsilon$, $\delta$ will shrink"? I don't really see how that follows from the definition. And, for example, a constant function is continuous but does not satisfy this shrinking property.

score 2 · Accepted Answer · answered Dec 12 '14 at 02:06

2

1/2. Our intuition of "small changes in input result in small changes in output" is made more precise by saying "arbitrarily small changes in output can be obtained by taking sufficiently small changes in input". That is, if I want to ensure that my function doesn't change by any particular small amount, I should be able to choose a (perhaps small) interval to guarantee that. The precise correspondence between these two views of the situation is somewhat subtle; in particular, nonstandard analysis really does adopt the first view, and can be proven equivalent to standard analysis, but the equivalence is somewhat delicate.

This is a generic statement, and you're right that in certain trivial cases it doesn't hold. It still typically holds, essentially because $f((x-\delta,x+\delta))$ will typically be a bigger set if $\delta$ is bigger (since there are more points available to map from).

answered Dec 12 '14 at 02:06

Ian

104,572

I'm not really understanding your restatement of "small changes in input result in small changes in output". How is what you said equivalent (intuitively) to the original statement? – layman Dec 12 '14 at 02:09
If $f$ is to be continuous at $x$ then there should be a region around $x$, so that everywhere in that region, $f(y)$ is within $0.1$ of $f(x)$. There should be another region for $0.01$, another for $0.001$, etc. If there is some $\varepsilon$ such that there is no such region, then there will be a point where shrinking the change in the input further fails to decrease the change in the output. – Ian Dec 12 '14 at 02:10
As we said, this region around $x$ isn't necessarily "small", whatever that means. So why do we say "small changes in the input result in small changes in the output"? – layman Dec 12 '14 at 02:11
If there is a large region then there is necessarily a small region (just take a small subset of the large region), so it is enough to talk about the existence of a small region. – Ian Dec 12 '14 at 02:12
Good point. It's still a bit unclear to me why we start our definition with the change in the output even though we say intuitively "small changes in the input lead to small changes in the output". If we were to follow the intuitive definition of continuity, we should take a sufficiently small change in the input and make sure it results in a small change in the output, right? Why do we start backwards in the formal definition? I hope my question makes sense, and sorry if you already answered it. – layman Dec 12 '14 at 02:15
It's as though we are saying "let's check that small changes in the output have all small changes in the input that result in this", which kind of sounds backwards... – layman Dec 12 '14 at 02:16
Another way of saying it is to describe how small changes in input will affect changes in output. This is through a function called the modulus of continuity, which can be defined as $\omega(x,\delta) = \sup_{y \in (x-\delta,x+\delta)} |f(x)-f(y)|$. Continuity at $x$ is equivalent to $\lim_{\delta \to 0^+} \omega(x,\delta) = 0$. Nicer properties than just continuity (Lipschitz continuity or Holder continuity, for example) can be defined through this function. Does that make things any more concrete? – Ian Dec 12 '14 at 02:17
1

Here's another way of thinking about it: we want to deal with arbitrarily small changes in output and sufficiently small changes in input. "Sufficiently small" only makes sense in the context of "sufficiently small for something", which for continuity is "the change in output being no more than the specified tolerance". – Ian Dec 12 '14 at 02:21
So at the end of the day, what we are saying is this: it is accurate to say the formal definition of continuity at a point means "small changes in the input result in small changes in the output" because if we look at small changes in the output, we want to make sure that small changes in the input are responsible for this. It might be that large changes result in this, but as you said, if large changes result in this, small changes result in it too by restricting the large changes. Is that right? – layman Dec 12 '14 at 02:23
That much is right, the only gap there is that it's not clear what "small changes in output" mean. The changes in the output and the changes in the input come in at different places. We can control the changes in the input in response to a specified tolerance with respect to the change in the output. We can't do the reverse in any way that respects our visual intuition about continuity. – Ian Dec 12 '14 at 02:25
I guess what I meant is the change we define in the output (i.e., the $\epsilon$ we decide to observe) is what we consider "small" for the output. And we want to make sure that there is some level of change in the input such that all changes to the input within that level result in changes in the output at most as small as our specified $\epsilon$... – layman Dec 12 '14 at 02:27
1

Right; the point is that the definition of "small" changes in output has to be specified externally, while the definition of "small" changes in input get defined relative to the scale for the change in output (i.e. relative to $\varepsilon$). It's not very productive but to prove a point we could define something like "$\delta$ is $\varepsilon$-small for $f$ at $x$ if ..." – Ian Dec 12 '14 at 02:28
Thank you for taking the time to read through the annoying details in my posts. :) You've been very helpful. – layman Dec 12 '14 at 02:30

score 2 · Answer 2 · answered Dec 12 '14 at 02:41

Here's a point I didn't see mentioned in Ian's answer:

The only counterexamples to (3) are constant. To be more specific:

Claim: Let $f(x)$ be a function continuous at $x_0$, and let $\delta(\varepsilon)$ be an assignment of an appropriate $\delta(\varepsilon)$ to each $\varepsilon>0$, a $\delta(\varepsilon)$ such that $$|x-x_0| < \delta(\varepsilon) \implies |f(x)-f(x_0)| < \varepsilon$$ (this is possible by continuity at $x_0$. Then, if $f$ is not constant in a neighborhood of $x_0$, then $$\lim_{\varepsilon \to 0} \delta(\varepsilon) = 0.$$

Proof: We prove the contrapositive; suppose $\lim_{\varepsilon \to 0} \delta(\varepsilon) \neq 0$. Then there is some sequence $\varepsilon_k \to 0$ and some positive real $c$ such that $\delta(\varepsilon_k) > c$ for all $k$. (Check this - it just comes out of the definition of convergence.) Then for $|x-x_0| < c$, $|f(x)-f(x_0)| < \varepsilon_k$ for all $k$; as $\varepsilon_k \to 0$, and $c$ does not depend on $k$, we see that for $|x-x_0| < c \implies |f(x)-f(x_0)| = 0$; that is, if $|x-x_0| < c$, then $f(x)=f(x_0)$. So $f$ was constant in a neighborhood of $x_0$.

Awesome, thanks for that! Now I know that only constant functions betray the "as $\epsilon$ shrinks, $\delta$ shrinks" intuition. — layman, Dec 12 '14 at 02:47
Well, constant near the point you're checking your $\varepsilon-\delta$s at - no need to be constant everywhere! — , Dec 12 '14 at 02:50

Understanding the definition of continuity from real analysis

2 Answers2

Linked