Why is data in computer science considered to be discrete?

Question

I understand that "structure" of data is totally dependent on Boolean Algebra, but:

Why is data considered to be a discrete mathematical entity rather than a continuous one?

Related to this:

What are the drawbacks, or invariants, that are violated in structuring data as a continuous entity in $r$ dimensions?

I am not an expert in the field as I am an undergrad math student, so I'd really appreciate it if someone would explain this to me like I'm five.

score 45 · Accepted Answer · edited Apr 15 '17 at 10:16

Answer

why was the data considered to be a discrete mathematical entity rather than a continuous one

This was not a choice; it is theoretically and practically impossible to represent continuous, concrete values in a digital computer, or actually in any kind of calculation.

Note that "discrete" does not mean "integer" or something like that. "discrete" is the opposite of "continuous". This means, to have a computer that is truly able to store non-discrete things, you would need to be able to store two numbers a and b where abs(a-b) < ε for any arbitrarily small value of ε. Sure, you can go as deep as you want (by using more and more storage space), but every (physical) computer always has an upper bound. No matter what you do, you can never make a (physical) computer that stores arbitrarily finely resolved numbers.

Even if you are able to represent numbers by mathematical constructs (for example π), this does not change anything. If you store a graph or whatever that represents a mathematical formula, then this is just as discrete as anything else.

Addendum

The rest is just a little perspective beyond the field of computer science. As the comments have shown, the physical topic is not undisputed, and as you can see I have formulated my next paragraph in a way that is rather uncommitted to whether it's true or not. Take it more as a motivation that the concept of "continuum" is not a trivial one. The answer given above does not depend on whether space is discrete or not.

Note that all of this is not so much a problem of computers, but a problem with the meaning of "continuous". For example, not everyone even agrees, or did agree in the past, that the Universe is continuous (e.g., Does the Planck scale imply that spacetime is discrete?). For some things (e.g., energy states of electrons and many other features in Quantum (sic) Mechanics) we even know that the Universe is not continuous; for others (e.g., position...) the jury is still out (at least regarding the interpretation of research results...). (Notwithstanding the problem that even if it is continuous, we could not measure to arbitrary precision => Heisenberg etc.).

In mathematics, studying the continuum (i.e., the reals) opens up a lot of fascinating aspects, like measure theory, which makes it utterly impossible to actually store a "continuous" kind of number/data.

score 30 · Answer 2 · answered Mar 17 '17 at 09:56

Computers represent a piece of data as a finite number of bits (zeros and ones) and the set of all finite bit strings is discrete. You can only work with, say, real numbers if you find some finite representation for them. For example, you can say "this data corresponds to the number $\pi$", but you cannot store all digits of $\pi$ in a computer. Hence, computer programs that work with real numbers actually only work on a discrete subset of $\mathbb{R}$.

score 9 · Answer 3 · answered Mar 18 '17 at 03:44

It's all in the implementation.

If you think about it, computers really are continuous devices. This is easily shown by the fact that all the EM equations governing how they work are continuous. The thing which is discrete is the models we use to decide how to use these computing devices. The abstract machines we use to describe computation are all discrete.

The huge practical advantage of this is in having independence from a lot of quality control challenges. If our models of computers leveraged the full continuous nature of their transistors and capacitors, then we would have to care about how well we built every transistor to a tremendous degree. We can see this in the audio world. In the world audiophiles inhabit, it's reasonable to spend \$2000 on an amplifier that might have 10 very carefully chosen and matched transistors that do exactly the continuous thing they want. Contrast this with 1,400,000,000 transistors in a Core i7 CPU at the mighty cost of \$400.

Because our computational models are discrete, we can model all of the signals we see in a computer as a discrete signal plus some continuous error term. We can then filter out the errors simply by observing that they aren't the right shape to be part of the discrete signal.

A major part of this is the removal of time terms in our abstract models. Many of our models don't measure time against some physical process, but against some "logical" signal known as a clock. If you interrupt a clock, the system stops moving, but doesn't break down. It just finishes clearing out any analog errors it might have, and sits waiting for the next discrete pulse of the clock. Removing continuous time terms drastically simplifies computation and proofs about computation. Instead, our concepts of time are measured discretely, as seen in P and NP categorizations of algorithms.

score 9 · Answer 4 · answered Mar 18 '17 at 04:52

To add to all of these great answers, it's worthy to note that Alan Turing , when defining his machines, argues that the amount of symbols needs to be finite (even if arbitrarily large) since a computer (meaning: a human) could not distinguish all the symbols otherwise.

Here's some excerpts from his 1936 paper "On Computable Numbers, with an Application to the Entscheidungsproblem":

And then on section 9:

Rodrigo de Azevedo · Answer 5 · 2017-04-15T08:33:16.410

Because:

Digital computers cannot store arbitrary real numbers.
Analog computers are plagued by thermal noise (if electronic), friction (if mechanical or hydraulic), disturbances, sensitivity to temperature variations, inevitable imperfections and ageing. Dealing with such difficulties is what (experimental) physicists and engineers do. Most computer science simply abstracts the physics away.

Here are some papers on real computation:

Mark Braverman, Stephen Cook, Computing over the reals: foundations for scientific computing, Notices of the AMS, March 2006.
Mark Braverman, On the complexity of real functions, arXiv:cs/0502066.
Lenore Blum, Computing over the reals: where Turing meets Newton, Notices of the AMS, October 2004.
Vasco Brattka, Realistic models of computability on the real numbers, April 2000.
Vasco Brattka, Peter Hertling, Feasible real random access machines, December 1998.
Lenore Blum, Mike Shub, Steve Smale, On a theory of computation and complexity over the real numbers: NP-completeness, recursive functions and universal machines, Bulletin of the AMS, July 1989.

and here's a paper on analog computation:

Hava Siegelmann, Eduardo Sontag, Analog computation via neural networks, September 1994.

score 5 · Answer 6 · answered Mar 17 '17 at 21:23

The term "computer" in modern parlance means "digital computer"; the essence of a digital computer is that it has a finite number of discrete states. One could have an interesting debate about whether the reasons that digital computers won favour over analogue computers were primarily about engineering practicality, or were primarily due to better underpinning from theoretical computer science. But whatever the reasons, digital computers are what we ended up with, and any useful mathematical model of a digital computer (and therefore its data) is going to be discrete rather than continuous.

Evil Dog Pie · Answer 7 · 2017-03-17T14:17:30.403

The word data derives from the Latin word datum, which means something which was given. Over time the plural form has changed usage and is now commonly used as both singular and plural. It has also come to be associated with information specifically.

Note that there is a difference between an item of information (a datum) and its representation.

Information Theory deals with (amongst other things) discrete pieces of information represented by variables. These are countable entities. For example, velocity, location, mass, and so on are all continuous quantities, but discrete from each other: there is no transformation between mass and location. When these quantities are represented numerically, their data items , however they are represented, are also discrete from each other.

On the other hand, the vast majority of our current computers use some form of electrical charge to represent information. The charge is either present or it is not; there is current in the circuit or there is not. This is also discrete, but it need not be! It is simply because of the way our technology has developed that we use binary representation. It is possible that developments in Quantum Computing will change this in the near future. It is also not inconceivable that analogue computers will make a resurgence and our notions that numbers have to be represented by binary will be washed away!

To summarise: data are composed of discrete items of information, each of which is a datum; whereas each datum does not need to be represented using discrete mathematics, but currently is purely by contemporary coincidence.

score 2 · Answer 8 · answered Mar 20 '17 at 12:27

I want to challenge your fundamental premise:

Why is data considered to be a discrete mathematical entity rather than a continuous one?

It isn't.

For example, the study of Algorithms is an important subfield of Computer Science, and there are many algorithms that work with continuous data. You are probably familiar with Euclid's Algorithm for computing the greatest common divisor of two natural numbers, but did you know that Euclid also had a geometrical version of that same algorithm which computes the longest common measure of two commensurable lines? That is an example of an algorithm (and thus an object of study of computer science) over real numbers, i.e. continuous data, even though Euclid didn't think about it this way.

There are many different ways to classify algorithms, but one way that is used, is to classify them by their "continuousness":

Digital Algorithms (discrete-event algorithms over digital data):
- the numerical variant of Euclid's algorithm
- long-hand division, multiplication, etc. as taught in school
- any computer program, λ-calculus program, Turing Machine
Non-digital data, discrete-event algorithms (algorithms over continuous data, which however still have a notion of "step", i.e. continuous data but discrete time):
- the geometrical variant of Euclid's algorithm
- algorithms on real numbers (e.g. Gauss' Elimination Procedure)
- algorithms on continuous functions (e.g. the bisection algorithm)
Analog Algorithms (continuous time, continuous data):
- electrical circuits
- mechanical gyroscopes
Hybrid Algorithms (any combination of the above)
- robots

Other answers have already mentioned Real Computation in Computability Theory, another important subfield of Computer Science.

What are the drawbacks, or invariants, that are violated in structuring data as a continuous entity in $r$ dimensions?

The only real (pun very much intended) drawback is that such data cannot be represented with common digital computers. You can think about algorithms over continuous data, but you cannot run them on the standard machines we usually use to run algorithms.

That's the main reason why continuous data is not as "visible" as digital data.

However, an implementation of an analog algorithm doesn't actually need to be complicated to imagine or even to build. For example, this is an implementation of an analog algorithm:By Andrew Dressel – Own work, CC BY-SA 3.0, Link

Now, you might say "Wait, that's not a computer, that's a bicycle", but actually, you can use it as an analog computer: it computes the multiplication of a real number $r$ by a fixed rational number $q$. Turn the crankshaft $r$ times and the rear wheel will turn $q×r$ times. You can use this to scale any real number, e.g. turn the crankshaft $π$ times and the rear wheel will turn $q×π$ times; this is something you can not do with a digital computer.

Mark H · Answer 9 · 2017-03-18T23:19:06.080

To take a more abstract tack, any possible computation that will eventually give a result, whether on a computer or in your head, can only deal with a finite amount of data. This means that the data can be represented by a string of symbols. This string could be the digits of a number ("42") or the text of a program that creates the data ("4*atan(1)" for $\pi$). The string has to be finite or the entire thing could never be read to run the program.

Now, the set of all possible finite data can be put in lexicographic order, meaning the set is countable. But, the set of continuous real numbers is uncountable, so there are always numbers in the continuum that cannot be stored by a given computation system. From this, we can conclude that the storage of an arbitrary real number requires infinite resources.

score 0 · Answer 10 · answered Mar 17 '17 at 10:54

Data is not always considered as discrete. Scientific programming often involves floating-point arithmetic. The programmer usually pretends that the variables involved are continuous, while keeping in mind the issue of numerical stability, which stems from the fact that data is stored to only a finite precision.

score -2 · Answer 11 · answered Mar 19 '17 at 19:46

For a computer to work with data, the data must exist within the computer's accessible memory
A computer's accessible memory is finite
Only finite data can exist within a computer's accessible memory
Non-discrete values are infinite

Data in computer science is considered to be discrete.

Why is data in computer science considered to be discrete?

11 Answers11

Answer

Addendum