How readable os your program?

brian_d_foy on 2003-11-20T18:20:26

I read a book on a bunch a kids that beat up Vegas at blackjack with an elaborate card-counting system involving several people and lots of statistical tables. Somehow, out of that, I wondered how dense Perl programs look---that is, when we view them in an editor, just as characters.

So, I started to write a tiny program to compare the amount of visual whitespace (e.g. tabs count more than spaces) to the number of characters. That is pretty useless though, except as a rough measure of overall density. Programs are hard to read because they have islands of high density, so the overall approach doesn't work.

Next, I wrote what I call my "minesweeper" program. I would show it here but the program is on my laptop and I am using a restricted community computer. Basically, I go through a text file and look at all of the positions around a position. Each character can have eight characters around it. For each non-whitespace character around a position, I add 1 to that position. Whitespace and edges (including the parts of lines longer than the ones around it) get 0. The output from that is not very illuminating because it is denser than the programmer because it is just a big matrix. Mathematically it works, but visually it is worse.

So, from there, I created a density plot using GD::Graph. I create a canvas with the same number of rows and columns as the script, then color the pixels. Positions with higher densities show up darker. Positions with zero density show up white. Right now it is grayscale, but I would like to use colors at some point.

On one run, I ran the program using its source as input. The results were surprising. The islands of high density are where I would expect them (where all the typing is, silly), but their contour really shows where I am putting a lot of characters close together, which, I contend, makes the program harder to read there, just like it is harder to read porportional fonts (at least I think so).

Some programs are long (Shocked! Shocked I say!), so I break up the program into several images. Putting a bunch of small images on a single peice of paper can represent the entire program quickly. Next, I want to make several images of the same script from different versions, then create a movie out of it---let's see how the density changes as we code. It probably varies from coder to coder, but I think for my stuff I would see a lot of random stuff, then points of gravity pulling code towards it, then a big bang where bits of code travel long distances as they get relegated to subroutines at the end of the script---maybe a text version of the Oregon Trail.

For some people, the gravity centers will keep attracting more and more characters, so I am also thinking about adding long distance effects. A character two positions away counts partially, although I have not decided if it should be a second or third power effect. If I really want to waste a lot of time, I can figure out how to calculate a programs Big-G gravitational constant, or shoehorn special relativity (some piece of code must bend the code around it somehow).

Some people may have density islands that seem to pulse as they add or subtract code. Who knows?

I still have a lot of small technical problems to decide. Do I keep the POD in or out? Or do I color it differently?

Oh well. Six minutes left on this computer.


Visualizing code

dws on 2003-11-21T06:59:26

A grayscale "minesweeper" bitmap of code sounds like a nifty visualization, and a lot more straightforward than trying to extract history from CVS to "age" code, painting lines in different colors depending on how recently they've been touched.

Re:Visualizing code

brian_d_foy on 2003-11-21T14:21:29

Oh no, i'm not trying to age code, just show the migration of clumps of characters. I am curious how different the clumps look from start to finish. I might even be able to identity distinct coding styles.

Re:Visualizing code

dws on 2003-11-21T16:33:16

I'm trying to age code, but progress is slow. Extracting the right info from CVS and collating it is messy.

Re:Visualizing code

brian_d_foy on 2003-11-21T17:28:50

Have you looked at the things like viewcvs and cvsweb do? That might help you identify chunks.

Re:Visualizing code

dws on 2003-11-21T23:56:21

Getting coarse-grained chunks is relatively easy. Doing finer grain, say by using Algorithm::Diff within chunks, is messier. I started this thinking I could age each character. That's proven to be very difficult.

Re:Visualizing code

brian_d_foy on 2003-11-22T04:27:21

what we need is radioactive labeling.

i was thinking last night that a journaling editor would make this easier because you could see the file keystroke to keystroke.

Re:Visualizing code

dws on 2003-11-22T17:01:42

Even with radioactive labeling (or the equivalent), there are some interesting edge cases. How do you score (or relabel) XY becoming YX, especially when X and Y are substantial blocks of code?

Re:Visualizing code

rob_au on 2003-11-22T01:41:35

After I read this I had a thought that whilst it may be interesting to display the age of code visually, it may be more worthwhile to consider displaying the variability of the code in a visual form - That is, plotting the number of revisions and changes made to code against the code age. My rationale for this is that while working with code age would provide a incidencal overview of code stability, variability assessed by revisions and changes, potentially as a secondary measure to age, could provide a more interesting overview of development trends within larger codebases.

Sounds interesting ...

rob_au on 2003-11-21T10:52:43

The results were surprising. The islands of high density are where I would expect them (where all the typing is, silly), but their contour really shows where I am putting a lot of characters close together, which, I contend, makes the program harder to read there, just like it is harder to read porportional fonts (at least I think so).

This sounds very similar to some image analysis which I employed for a research thesis.

The topic of my research was investigating differences in the vasculature of benign tumours, specifically, uterine leiomyoma, surrounding tissue and control samples through the use of immunohistochemistry. This investigation technique relies upon the action of antibody-specific reactions to generate insoluble immune complexes which can subsequently be demonstrated by enzymatic staining - This results in the staining and highlighting of specific areas of the fixed tissue sample - In the case of my research which employed vascular specific antigens, CD31, CD34, factor VIII related antigen and Ulex, the vasculature structure of the tissue was highlighted.

Analysis of these stained sections involved the capture of images using a high-resolution digital camera attached to the viewing microscope. These images, with the tissue vasculature accentuated by the immunohistochemical staining, were then processed for evidence of statistically significant differences between the tissue groups. The criteria used in analysis included vascular density, proportional vascular density, vascular luminal diameter and, most interestingly, vascular distribution variability.

It was the analysis of area of vascular distribution variability which presented the greatest challenge as a result of the cluster nature of stained image components in a similar manner to that which you describe for text components of a program source. Types of analyses which may be worth investigating for this project include average distance between text components, average proportional size of text components and/or scatterplot distributions. Unfortunately, the final solution to this statistical problem eluded me as time constraints required the finalisation of research work and submission of research thesis.

I'll be interested to see how you get on with this project ...

Re:Sounds interesting ...

brian_d_foy on 2003-11-21T14:24:12

Very interesting---I hadn't thought to look at things like distances and areas.

Now it's looking like a GIS problem. I bet they have all sorts of nifty software to analyse this sort of stuff.

I should be able to post some stuff next week.

Interesting...

Adrian on 2003-11-21T15:04:37

Looking forward to the source :-)

I'm interested in quick ways to identify problem areas of code, often being tasked with having to pile through tons of "legacy" (spelt sh*t) code.

Have you come across Ward Cunningham's Signature Survey method. Competely different method, but useful.