Software Failures

Ovid on 2004-10-06T20:52:59

I write bugs. You write bugs. The customers generate bugs and we all look bad. For the most part, though, it's not the programmer's fault. CNN has an article about software disasters and it makes it pretty clear where the bulk of the problem is: bad management.

Alice the programmer creates a system that implements the spec handed to her by her boss. Because she didn't understand one of the lines of the spec, her software deletes cancelled products rather than marking them as "discontinued." Thus, the system starts to repeatedly fail as queries try to join on non-existent products and someone's failure to add foreign key constraints to the order_items table ensures that the problem is not caught right away. So Alice screwed up big time, right?

Well, possibly. Maybe Alice has been a dedicated employee for 10 years and rarely, if ever, produces buggy software. Sure, maybe a major product has a serious flaw that's not been caught, but the reality is, Bob the manager has a responsibility to ensure that programmers understand his specs. Bob has to make sure that QA has drawn up an acceptable test plan and fully implemented it. Bob has to know that a hard-drive failure on flight-control software doesn't kill everyone on the plane. Bob has to have confidence that customers will know how to use the software and that the software does what is planned.

Customers, of course, don't care, but managers need to understand that they are the ones responsible for the product. The programmers are only responsible for the piece they are working on. And if Alice keeps turning in bad code? It may not be Bob's fault, but it's Bob's responsibility. He has to figure out why Alice is turning in bad code. Is she sloppy? Are the specs clear? Is she not writing tests? Does she need more training? Is QA just missing things? (Side note: QA is Quality Assurance, not Quality Assessment! That's a common misperception which causes all sorts of problems.)

Software is a process. Sometimes the process is all rolled into one individual, but if you have a management team, they are there to manage the process. Sometimes programmers are bad and have to go, but management ultimately has to realize that sometimes programmers are great and the software is still rotten. When that happens, it's time to start figuring out what's wrong with the process.


Management *and* Programmer failures

drhyde on 2004-10-07T08:52:19

Alice the programmer creates a system that implements the spec handed to her by her boss. Because she didn't understand one of the lines of the spec, her software deletes cancelled products rather than marking them as "discontinued."

This is, to me, at least partially Alice's fault. If the spec is even the least bit unclear then the programmer has a duty to seek clarification. A good way of doing this is for the programmers to take the customer's spec and write their own more detailed version, and get that approved before starting work. Many ambiguities are caught by this.

Bob the manager has a responsibility to ensure that programmers understand his specs. Bob has to make sure that QA has drawn up an acceptable test plan and fully implemented it.

Managers have to take certain things on trust. If the QA guys show him what looks like a good test plan, claim to have implemented it - but are actually lazy or incompetent or just misinterpreted the spec - then you can't put all the blame on the manager. It is unrealistic to expect the same level of technical competence from a manager as from his underlings. In fact, the best managers I've had were technically INcompetent, but had sufficient people-skills to let techs get on with their jobs while still being able to tell when they were trying to pull a fast one.

It's *really* hard to write anything which is not ambiguous in some way. The onus is on both the reader *and* the writer to ensure that both are reading it the same way.

Re:Management *and* Programmer failures

Ovid on 2004-10-07T13:05:57

Again, while it may not be the manager's fault, it's certainly the manager's responsibility. If issues like this happen once or twice, no big deal. If they happen repeatedly, the manager has to deal with this. Any manager who just points to others whose work he or she has control of doesn't understand the term "manager." When I first started managing, about 15 years ago, I almost got fired for this. After a while, I realized that I had to assume that responsibility lest their be no accountability. Of course, sometimes managers are given responsibility without commensurate authority and this is just as distastrous, but that just means that management higher up the chain is at fault.

Gaps in responsibility is a major factor

Phred on 2004-10-11T19:14:17

What I have found is that bugs most often occur where there are gaps in knowledge, whether it be technical, product, or managerial knowledge.

'Because she didn't understand one of the lines of the spec, her software deletes cancelled products rather than marking them as "discontinued."'

This is an example of a gap in product knowledge. Alice certainly is technically capable of marking products discontinued. Not enough attention was put into managing the product. This is different than project management, which focuses on ensuring the project is proceeding according to schedule and has adequate resources.

There are three distinct managerial roles here. The first is technical, which involves ensuring that the product is technically sound. The second is product, which ensures the pragmatic goals being touted to the client are going to be met. The third is administrative. Make sure the project is on time, has enough resources, and handle the balance between the multifaceted goals of the project

Each role requires different skills, and most people who are good at one of the three niches are that way because they are focused on that niche. The programmer can fill the technical managerial role, a sales/marketing/qa person can fill the product role, and the operations person can fill the administrative role in smaller companies if necessary. But one thing I have found common to projects which succeed in all of these three metrics is good communication between the three different roles. That is the key to delivering a solid solution which is low on defects and meets the deadline.