Justifications for code generation

ziggy on 2002-11-04T18:06:51

Shawn Wildermuth is thinking about code generators again:

So this is my stake in the ground. Using Code Generators to create code that is based on metadata to help make a performant, but flexible system. Anyone else use code generation in this way? If so, please let me know. I want to confirm my suspicion.

I've used code generators before. I haven't written a code generator or a used one in a few years. Why? Because the code generators that (I think) Shawn is talking about are an inelegant hack around a bad programming language. That is, when writing code in a statically typed language, it is often necessary to abstract out the metadata for a system and write an external program to generate code. The benefit is that the system is somewhat easier to maintain. The drawback is that the system is defined in three places -- the handwritten portion, the metadata, and the code generator. Often, these are three different types of systems, requiring three totally different skills and mindsets to understand.

The approach I've been using for the last few years (ever since I became comfortable using Perl) is to use dynamically generated code. There's still a benefit of defining the system in terms of metadata, but the handwritten code, the code generator and the metadata are all maintained in the same environment. That is, the code generator is a bit of Perl code that gets run at compile time (after the program is loaded, but before it starts execution), and the data is a set of Perl data structures (or something else, like YAML or XML if you absolutely prefer).

Add it all up, and dynamically generated code is much easier to understand, use, and implement. Based on my limited experience, it's also easier to debug, but I could be wrong...

debugging and maintaining

merlyn on 2002-11-04T18:11:47

When you consider that a "compiler" is just a "code generator" for machine language, all the same rules apply.

When we first got compilers, people complained that they couldn't debug, because the debuggers showed only the translated version. And there were times when the compilers generated sub-optimal code, and there was no way to tune or fix the output without breaking the paradigm.

But now, high-level languages are commonplace. High-level debuggers are commonplace. The resulting machine code is optimized and trusted.

For a good meta-code generator, you'll need to get to the point where you have good debugging hooks, trusted code, and optimal code. Prior to that, it'll be a maintenance nightmare. {grin}

Re:debugging and maintaining

ziggy on 2002-11-04T21:17:48

When you consider that a "compiler" is just a "code generator" for machine language, all the same rules apply.
When we first got compilers, people complained that they couldn't debug, because the debuggers showed only the translated version. And there were times when the compilers generated sub-optimal code, and there was no way to tune or fix the output without breaking the paradigm.

I don't think it's the same thing here. In an abstract sense, it is all about the maturity of the tools. But in a practical sense, the real issue is about over-abstracting a problem to produce a solution. That is, who is to blame when something goes wrong? The immature tool, or the programmer who crafted the inappropriate solution?
I'll buy that there comes a time when the tools are inadequate and immature. But to reduce the argument to say that in the end the tools are always inadequate doesn't fly.
For a good meta-code generator, you'll need to get to the point where you have good debugging hooks, trusted code, and optimal code. Prior to that, it'll be a maintenance nightmare.
With or without mature meta-code tools, the simplest meta-code to write is the meta-code that you don't have to write in the first place. Same as other code. ;-)

State Machines, math transforms, etc.

barries on 2002-11-04T19:27:41

Lately, we've been using code generators to build C (gcc and quircky C compilers for embedded processors) in two areas:

State machines. Using something I call StateML to defined events, actions, states and arcs along with snippets of documentation and snippets of source code for zero, one, or more languages side by side in the same file to generate any and often all of:
- State diagrams (via GraphViz) in .png or .ps format.
- HTML text + graphics docs, HTML generated using a TT2 template and graphics from the last bullet, and then,
- via html2ps and ps2pdf and some creative cat(1)ing, .pdf text+graphics docs (this toolchain was build by a coworker).
- Working code in various languages simultaneously using TT2 templates.
The idea is that all the things you need are defined in one .stml file that's an XML formatted literate, declarative definition that can be "tangled" via TT2 C, C++, Perl, Java, etc. templates in to source code or "woven" via graphviz + TT2 HTML template in to copious reference documentation. We don't generate user / tutorial docs in true literate programming style, but I think we could if need be (state machines are so branchy that they're not necessarily conducive to non-reference descriptions that I normally think of when I think of literate programming).
Math transforms. When dealing with a very limited embedded chip, we need to do complex, nonlinear transforms, so I use Perl to
1. build the tables,
2. perform analytics to help design appropriate approximation parameters,
3. emit the C arrays and matched lookup routines via TT2 that implement the transform, and
4. build test suites using real floating point that are run on the device or a microprocessor simulator to make sure the integer math all comes out within acceptable error limits (we've used a GPLed PIC microprocessor simulator for this, for example).

The advantages to #1 is that we generate docs and prototype gcc code while in the speccing and analysis stage so that we can get customer feedback and make sure the state machine works, and then generate real code. Then as we go through the implementation and revision / maintenance phases, we update the .stml file and get working code and updated docs all at once. No more stale / misleading docs.

The advantage to #2 is a reduced compile/edit/debug cycle with easily parameterized transforms without having to recode the C all the time. We hard code in a few sanity tests at corner and one or two typical points on the transform's input range and let the perl script verify that all the other data points match perl's floating point calcs.

The overall advantage to this approach is that bugs are usually systematic (instead of just being at the corner conditions) and show up far sooner and more pervasively (since they're usually bugs in the TT2 templates), and that we can make minor design changes (add a state, move a transform's endpoint, etc) easily with less chance of introducing bugs (like forgetting to break out of a switch condition) and get new documentation and code that's guaranteed to be in sync.

phew.

The disadvantage is that we spend more time on tools development and there's temptation to use a tool like the StateML tool where it doesn't belong.

I've finally understood why both flowcharts and state charts (even newer incarnations of such) are so damn frustrating: they're all to abstract and limited. They're good for modelling simple things, or homogeneous problem or solution spaces, but large flowcharts and state charts really inspire MEGO (My Eyes Glaze Over) because flowcharts capture state in an awkward fashion, while state diagrams and state machines are too limited to work for many hairy problems.

I've been looking around at newer modelling techniques with an eye towards being able to generate code and documentation from a single spec and I've found that many of them lack the expressive power of a hand drawn diagram. And tools like dia, which allows you to get pretty expressive (assuming you find the symbols you want or can take the time to do some C++ hacking to make your own), aren't really flexible enough quite yet to serve as diagramming front ends with enough rigor / extensibility to allow code generation. dia's getting close though.

Note that I have little use for state diagrams and flow charts as pure modelling artifacts; that's trivial. Where the art comes in in all of this is being able to build a model that's expressive enough to communicate with clients (with a lot of technical details not filled in yet or supressed) while being complete and rigorous enough by the time you need to generate code that you don't need to throw the model away.

I've looked at a UML CASE tool or two, and the failing there I see is that they take UML, which is designed to be usable piecewise and build all-inclusive, prescriptive OO modelling / code generating frameworks in them. Few real world apps I've worked on could be cost effectively modelled that way; often the GUI components are more feasible to use drag-n-drop IDEs for (IOW, proprietary codegens to address the language issues you mention while facilitating graphic layout) while internals are often piecewise add-hoc: a state machine here, an IDL-based RPC thingy there, an XML parser over yonder.

Ok, ramble done. Suffice it to say that I'm suffused with the itch to build a better diagramming tool that scales between expressivity (so my firm can communicate designs clearly, both internally and externally) and rigourousness / completeness (so we can reap the rewards of literate programming) while not being so exclusive and prescriptive that it's all or nothing. A pipe dream, but a useful one to persue, I hope. All in my copious spare time.

- Barrie

Re:State Machines, math transforms, etc.

ziggy on 2002-11-04T21:04:32
Wow.
Suffice it to say that I'm suffused with the itch to build a better diagramming tool that scales between expressivity (so my firm can communicate designs clearly, both internally and externally) and rigourousness / completeness (so we can reap the rewards of literate programming) while not being so exclusive and prescriptive that it's all or nothing. A pipe dream, but a useful one to persue, I hope. All in my copious spare time.
Sounds like you're attacking an interesting, yet different problem than the one I had in mind.
My rant against Shawn's technique is primarily rooted in building up all of the complexity you describe to build one system once in one programming language. When you get into building one abstraction that can generate multiple implementations (including nasty easy-to-get-wrong-versions in C), along with the corresponding docs, it certainly makes sense to abstract out the work into a code generator.
It's the old cost/benefit analysis. If you're willing to put a lot of work into a tool to get a lot in return (and ultimately save work in the process), then a code generator is certainly justified. On the other hand, building up all of those abstracts and tools that just get in the way to build one single application seems like wasted effort to me. Especially if that "code generation" can be replaced by dynamic programming or better use of structured data (worst case).

Static Typing Considered Busywork

chromatic on 2002-11-04T22:11:08

Are you suggesting that the popularity of templates in C++ indicates that something's horribly wrong with its typing system? I could be persuaded to agree, especially with the push for genericity in Java. :)

It's interesting to hear some of the more recent XP ideas about code generators. I think it was Alistair Cockburn's Agile Programming book that suggested the master programmers should write code generators and the rest of us should customize the output as appropriate. (That sounds like a horrible misquote; I can find the correct reference if you'd like.)