30

I've been assigned an exercise in my university. I took it home and tried to program an algorithm to solve it, it was something related to graphs, finding connected components, I guess.

Then I made the most trivial thing that came into my mind and then showed to my lecturer. After a brief observation, he perceived that the runtime complexity of my solution was inviable and shown something more efficient. And there is a tradition of programmers who have no idea on what is computational complexity (I was one of those), so is it a problem if a programmer has no idea on what is computational complexity?

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
Red Banana
  • 479
  • 4
  • 7

8 Answers8

42

Yes, I would say knowing something about computational complexity is a must for any serious programmer. So long as you are not dealing with huge data sets you will be fine not knowing complexity, but if you want to write a program that tackles serious problems you need it.

In your specific case, your example of finding connected components might have worked for graphs of up to say $100$ nodes. However, if you tried a graph with $100.000$ nodes then your lecturer's algorithm would probably have managed that in 1 second, while your algorithm would have (depending on how bad the complexity was) taken 1 hour, 1 day, or maybe even 1 eternity.

A somewhat common mistake students make in our algorithms course is to iterate over an array like this:

while array not empty
    examine first element of array
    remove first element from array

This might not be the most beautiful code but in a complicated program something like this might show up without the programmer being aware of it. Now, what is the problem with this program?

Suppose we run it on a data set of $100.000$ elements. Compared to the following program, the former program will run $50.000$ slower.

while array not empty
    examine last element of array
    remove last element from array

I hope you agree that having the knowledge to make your program run $50.000$ times faster is probably an important thing for a programmer. Understanding the difference between the two programs requires some basic knowledge about complexity theory and some knowledge about the particulars of the language you are programming in.

In my pseudocode language, "removing an element from an array" shifts all the elements to the right of the element being removed one position from the left. This makes removing the last element an $O(1)$ operation since in order to do that we only need to interact with 1 element. Removing the first element is $O(n)$ since in order to remove the first element we need to shift all the other $n-1$ elements one position to the left as well.

A very basic exercise in complexity is to prove that the first program will do $\frac{1}{2}n^2$ operations while the second program uses only $n$ operations. If you plug in $n=100.000$ you will see one program is drastically more efficient than the other.

This is just a toy example but it already requires a basic understanding of complexity to tell the difference between the two programs, and if you are actually trying to debug/optimize a more complicated program that has this mistake it takes an even greater understanding to find out where the bug is. Because a mistake like removing an element from an array in this fashion can be hidden very well by abstractions in the code.

Having a good understanding of complexity also helps when comparing two approaches to solve a problem. Suppose you had come up with two different approaches to solving the connected components problem on your own: in order to decide between them it would be very useful if you could (quickly) estimate their complexity and pick the better one.

Tom van der Zanden
  • 13,493
  • 1
  • 39
  • 56
27

This is a rebuttal of Tom van der Zanden's answer, which states that this is a must.

The thing is, most times, 50.000 times slower is not relevant (unless you work at Google of course).

If the operation you do takes a microsecond or if your N is never above a certain threshold (A high portion of the coding done nowadays) it will NEVER matter. In those cases thinking about computational complexity will only make you waste time (and most likely money).

Computational complexity is a tool to understand why something might be slow or scale badly, and how to improve it, but most of the time is complete overkill.

I've been a professional programmer for more than five years now and I've never found the need to think about computational complexity when looping inside a loop O(M * N) because always the operation is really fast or M and N are so small.

There are far more important, generally used, and harder things to understand for anyone doing programming jobs (threading and profiling are good examples in the performance area).

Of course, there are some things that you will never be able to do without understanding computational complexity (for example: finding anagrams on a dictionary), but most of the time you don't need it.

Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
claudio495h
  • 387
  • 2
  • 4
14

I've been developing software for about thirty years, working both as a contractor and employee, and I've been pretty successful at it. My first language was BASIC, but I quickly taught myself machine language to get decent speed out of my underpowered box. I have spent a lot of time in profilers over the years and have learned a lot about producing fast, memory efficient optimized code.

Regardless to say, I'm self taught. I never encountered the O notation until I started interviewing a few years ago. It's never come up in my professional work EXCEPT during interviews. So I've had to learn the basics just to handle that question in interviews.

I feel like the jazz musician who can't read sheet music. I can still play just fine. I know about hashtables (heck, I invented hashtables before I learned that they had already been invented) and other important data structures, and I might even know some tricks that they don't teach in school. But I think the truth is that if you want to succeed in this profession, you will either need to go indie or learn the answers to the questions that they will ask during interviews.

Incidentally, I most recently interviewed for a front end web developer role. They asked me a question where the answer required both a knowledge of computational complexity and logarithms. I managed to remember enough math from twenty years ago to answer it more or less correctly, but it was a bit jarring. I've never had to use logarithms in any front end development.

Good luck to you!

Scott Schafer
  • 249
  • 1
  • 3
9

The question is quite subjective, so I think the answer is it depends.

It doesn't matter that much if you work with small amounts of data. In these cases, it is usually fine to use whatever e.g. the standard library of your language offers.

However, when you deal with large amounts of data, or for some other reason you insist that your program is fast, then you must understand computational complexity. If you don't, how do you know how a problem should be solved, or how quickly it is even possible to solve it? But understanding just theory is not enough to be a really good programmer. To produce extremely fast code, I believe, you also have to understand how e.g. your machine works (caches, memory layout, the instruction set), and what your compiler does (compilers do their best, but are not perfect).

In short, I think understanding complexity clearly makes you a better programmer.

Juho
  • 22,905
  • 7
  • 63
  • 117
4

It depends, but not on amount of data you're working with, but on kind of work you do, programs you develop.

Let's call programmer that doesn't know about conceptual complexity noobish programmer.

The noobish programmer can do:

  • develop big data databases - he doesn't have to know how it works inside, all he has to know are rules about developing databases. He knows things like: what should be indexed,... where it is better to make redundancy in data, where it is not...
  • make games - he just has to study how some game engine works and follow its paradigms, games and computer graphics are quite a big data problems. Consider 1920*1080*32bit = cca 7.9MB for single picture/frame... @60 FPS it's at least 475MB/s. Consider, that just one unnecessary copy of fullscreen picture would waste around 500MB memory throughput per second. But, he doesn't need to care about that, because he only uses engine!

The noobish programmer shouldn't do:

  • develop very frequently used complex programs no matter of size of data it's working with,... for example, small data won't cause noticeable impact of improper solution during development, because it will be slower than compilation time, etc. So, 0.5sec for one simple program ain't that much from noobish programmer perspective, Well, consider server server, that runs this program twenty times per second. It would require 10cores to be able to sustain that load!
  • develop programs for embedded devices. Embedded devices work with small data, but they need to be as efficient as it's possible, because redundant operations make unnecessary power consuption

So, noobish programmer is fine, when you want just use technologies. So, when it comes to development of new solutions, custom technologies, etc. Then it's better to hire not noobish programmer.

However, if company doesn't develop new technologies, just uses already made ones. It would be waste of talent to hire skilled and talented programmer. The same applies, if you don't want to work on new technologies and you're fine putting customers ideas into designs and programs using already made frameworks, then it's waste of your time, to learn something you won't ever need, except if it's your hobby and you like logical challenges.

kravemir
  • 202
  • 1
  • 8
4

It is certainly a problem if someone who is developing significant algorithms does not understand algorithm complexity. Users of an algorithm generally rely on a good quality of implementation that has good performance characteristics. While complexity is not the only contributor to performance characteristics of an algorithm, it is a significant one. Someone who does not understand algorithm complexity is less likely to develop algorithms with useful performance characteristics.

It is less of a problem for users of an algorithm, assuming the algorithms available are of good quality. This is true for developers who use languages that have a significant, well-specified, standard library - they just need to know how to pick an algorithm that meets there needs. The problem comes in where their are multiple algorithms of some type (say, sorting) available within a library, because complexity is often one of the criteria for picking between. A developer who does not understand complexity then cannot understand the basis for picking an effective algorithm for their task at hand.

Then there are developers who focus on (for want of a better description) non-algorithmic concerns. For example, they may focus on developing intuitive user interfaces. Such developers will often not need to worry about algorithm complexity although, again, they may rely on libraries or other code being developed to a high quality.

Rob
  • 141
  • 3
3

I'm somewhat hesitant to write an answer here but since I found myself nitpicking on several others' [some of my comments got moved to chat], here's how I see it...

There are levels/degrees of knowledge to a lot of things in computing (and by this term I mean roughly the union of computer science with information technology). Computation complexity surely is a vast field (Do you know what OptP is? Or what the Abiteboul-Vianu theorem says?) and also admits a lot of depth: most people with a CS degree can't produce the expert proofs that go into research publications in computational complexity.

The level of knowledge and skill/competence required in such matters depends a lot on what one works on. Completely clueless O($n^2$) sorting is sometimes said to be a major cause of slow programs[citation needed], but a 2003 SIGCSE paper noted "Insertion sort is used to sort small (sub) arrays in standard Java and C++ libraries." On the flip side, premature optimization coming from someone who doesn't understand what asymptotic means (computational complexity being such a measure) is sometimes a problem in programming practice. However, the point of knowing at least when computational complexity matters is why you need to have some clue about it, at least at an undergraduate level.

I would honestly dare compare the situation of knowing when to apply computational complexity concepts (and knowing when you can safely ignore them) with the somewhat common practice (outside of Java world) of implementing some performance-sensitive code in C and the performance-insensitive stuff in Python etc. (As an aside, this was called in a Julia talk the "standard compromise".) Knowing when you don't have to think about performance saves you programming time, which is a fairly valuable commodity too.

And one more point is that knowing computational complexity won't automatically make you good at optimizing programs; you need to understand more architecture-related stuff like cache locality, [sometimes] pipelining, and nowadays parallel/multi-core programming too; the latter has both its own complexity theory and practical considerations as well; a taste of the latter from a 2013 SOSP paper "Every locking scheme has its fifteen minutes of fame. None of the nine locking schemes we consider consistently outperforms any other one, on all target architectures or workloads. Strictly speaking, to seek optimality, a lock algorithm should thus be selected based on the hardware platform and the expected workload."

-1

If you don't know big-O you should learn it. It's not hard, and it's really useful. Start with searching and sorting.

I do notice that a lot of answers and comments recommend profiling, and they almost always mean use a profiling tool.

The trouble is, profiling tools are all over the map in terms of how effective they are for finding what you need to speed up. Here I've listed and explained the misconceptions that profilers suffer from.

The result is that programs, if they are larger than an academic exercise, can contain sleeping giants, that even the best automatic profiler cannot expose. This post shows a few examples of how performance problems can hide from profilers.

But they cannot hide from this technique.

Mike Dunlavey
  • 244
  • 1
  • 6