32

I’m not even a CS student, so this might be a stupid question, but please bear with me...

In the pre-computer era, we can only implement an array data structure with something like an array of drawers. Since one have to locate the drawer with corresponding index before extracting the value from it, the time complexity of array lookup is $O(log(n))$, assuming binary search.

However, the invention of computers made a big difference. Modern computers can read from their RAM so fast that we now consider the time complexity of array lookup to be $O(1)$ (even it’s technically not the case, because it takes more time to move the register over a greater distance, etc)

Another example is Python dictionaries. While one might get a dictionary access complexity of $O(n)$ with an ill-written overloaded __hash__ magic method (or ridiculously bad luck, i.e. keys having lots of hash collisions), it’s usually presumed to be $O(1)$. In this case, time complexity depends on both the hash table implementation of Python dictionaries, and the keys’ implementation of the hash functions.

Does this imply that hardware/implementation can affect the time complexity of algorithms? (While both examples are about data structures instead of algorithms, the latter are built on the former, and I've never heard of time complexity of data structures, so I'm using the term "algorithms" here)

To me, algorithms are abstract and conceptual, whose properties like time/space complexity shouldn’t be affected by whether they’re implemented in a specific way, but are they?

nalzok
  • 1,111
  • 11
  • 21

3 Answers3

45

Sure. Certainly. Here's how to reconcile your discomfort.

When we analyze the running time of algorithms, we do it with respect to a particular model of computation. The model of computation specifies things like the time it takes to perform each basic operation (is an array lookup $O(\log n)$ time or $O(1)$ time?). The running time of the algorithm might depend on the model of computation.

Once you've picked a model of computation, the analysis of the algorithm is a purely abstract, conceptual, mathematical exercise that no longer depends on hardware.

However, in practice we usually want to pick a model of computation that reflects the reality of our hardware -- at least to a reasonable degree. So, if hardware changes, we might decide to analyze our algorithms under a different model of computation that is more appropriate to the new hardware. That is how the hardware can affect the running time.

The reason this is non-obvious is because, in introductory classes, we often don't talk about the model of computation. We just implicitly make some assumptions, without ever making them explicit. That's reasonable, for pedagogical purposes, but it has a cost -- it hides away this aspect of the analysis. Now you know.

D.W.
  • 167,959
  • 22
  • 232
  • 500
5

I think there's a fundamental misunderstanding in the question. You compare a person finding an object in a sorted list (e.g., a specific page in a book, given its number) with a computer looking up an item from an array.

The reason that the former takes time $O(\log n)$ and the latter takes time $O(1)$ is not that the computer is so fast that it can do the binary search in the blink of an eye. Rather, it's because the computer doesn't use binary search at all. The computer has a mechanism to directly retrieve items from the array without searching. To retrieve the contents of an array cell, the computer just tells the memory controller the analog of "Give me page seventeen", the memory controller sets the voltages on the address wires to the binary representation of seventeen and the data comes back.

So, yes, the hardware (i.e., the model of computation) does affect the running time of algorithms, as D.W. explains, but that's not what your array access example seems to be based on.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
2

No, the hardware doesn't affect the complexity of algorithms.

But, it does affect the choice of algorithm, and it can affect the usefulness of complexity analysis to a point where the analysis becomes pretty much meaningless (or merely of academic interest).

Finding the right drawer (as accessing an array element) uses the "open Nth element directly by index" algorithm, not the "search linearly" or "do binary search" algorithm. The algorithms are not changed, but the choice.

On the other hand side, complexity analysis itself, or rather its meaningfulness, is affected greatly by hardware.

Many algorithms that are stellar by their complexity analysis are poor performers or even useless in practice because the insignificant constant factor is not at all insignificant, but dominating.

Or, because assumptions that were once true (or mostly true) no longer hold. Such as, for example, every operation is mostly the same (only small constant differences that don't matter), or it doesn't make a difference which memory locations you access in which order. By complexity analysis, you may conclude that some algorithm is vastly superior because it only needs so and so many operations. In practice, you may find that each operation causes a guaranteed cache miss (or worse yet, page fault), which introduces a k that's so huge that it is no longer insignificant, but dominating everything.
If algorithm A takes 500 operations for processing a dataset of a given size and algorithm B takes only 5, but B causes 5 faults which burn twenty million cycles each, then despite what anaylsis or common sense may tell you, A is better.

This has lead to funny surprises such as e.g. in Cuckoo Hashing a few years ago. Which was vastly superior because [long list of benefits]. After the hype cooled, it turned out that it was vastly inferior because it guaranteed two cache misses (faults, for larger data sets) on every access.

Similar has happened to identifying and processing subsets of data. Often, the correct solution nowadays is: "just do it all", i.e. instead of figuring out what you need to proess and do that, process the complete dataset linearly even if you maybe only need half of it. Because, believe it or not, that's faster due to no branch mispredictions, no cache misses, no page faults.
Need to read the first 8kB and the last 3kB of a 3MB file? Well, read the complete file, and throw away what you don't want, because seeking in between will be ten times slower than just reading the complete thing.

Use a map because it has logarithmic complexity? Or a hash table, which has constant access time? Constant sounds awesome. Well, for anything with fewer than a thousand or so things (depending on hardware, data size, and access pattern), a linear search may be just as good or better. Surprise.

So, it's not the algorithms per se that are affected, but their usefulness, and choice.

Damon
  • 161
  • 3