Pitfalls in Converting Base Classes to Roles

Ovid on 2009-12-04T13:32:44

I should have responded to this a while ago, but didn't. Basically, nothingmuch has a blog post responding to some of my comments about roles and it's worth reading. However, there's one problematic bit:

If you take a working multiple inheritance based design and change every base class into a role, it will still work. Roles will produce errors for ambiguities, but if the design makes sense there shouldn't be many of those to begin with. The fundamental structure of the code hasn't actually changed with the migration to roles.

For the majority of code bases I see, this is doubtless true. However, I tend to get jobs where I work on large code bases. For example, just looking at our lib/ directory:

$ find lib/ -name '*.pm' | wc -l
652

That's not huge, but it's not particularly small, either. And it doesn't count our various scripts, tests, schemas, etc. So let's say I want to add a boolean attribute to an object and said attribute must be persisted in the database and must be readable/writeable via our API. That's a single attribute. Just one. I need to:

Write the database migration level
Update our RelaxNG schemas
Update our data extractors (extracts data from XML docs)
Add the column to our domain objects
Add the column to our resultsource classes
Possibly add the column to resultset classes
Update our XML builders
Write tests
And possibly more, depending on why we're adding the attribute

That's a lot of work, but adding a single boolean attribute can take me an hour or two if I'm not interrupted. And, believe it or not, this is easier than it used to be. And, believe it or not, much of that isn't the silly grunt work that it appears to be because there's a lot of stuff that is hard to automate away safely (my unreleased Bermuda project was an attempt to do this). Some can and we've managed some of it, but not all.

In short, we have a complicated code base that has decades of programmer hours in it. We used to rely a lot on multiple inheritance, despite the well-known issues with MI. Now let's revisit the key sentence in nothingmuch's post: "If you take a working multiple inheritance based design and change every base class into a role, it will still work."

To nothingmuch's credit, he does point out that ambiguities will cause errors, but there's a lot more to it than that. As projects grow organically, things often get put into incorrect spots. We had all sorts of code shoved into base classes that our derived classes shouldn't inherit from, but did. Since tests tend to be written against expectations, it's easy to not write tests for things you don't expect.

The reason classes sometimes have inappropriate behavior is because abusing inheritance tends to mean that you're having classes manage both responsibilities and cross-cutting concerns. I've found that base classes often had to be broken up into two or more roles (nothingmuch has a great example of Test::Builder doing the same thing and that's a small code base). That can get really annoying when you find that each role requires the same private method and you're trying to figure out where to put it. As you juggle things like this, behaviors will change and you had better have a good test suite to catch them.

There's another, more subtle problem. Imagine that you have the following (it gets much worse when MI is involved):

AbstractParent
     ^
     |
AbstractChild
     ^
     |
ConcreteClass

In ConcreteClass we have this:

sub clean_the_kitchen {
    my $self = shift;
    $self->SUPER::clean_the_kitchen;
    $self->mop_the_floor;
    return $self;
}

What happens with that SUPER:: call? If it's only in AbstractParent, you're fine, but if it's also in AbstractChild, converting AbstractChild into a role will break your code. Further, it might break your code in very subtle ways that are hard to detect. This might seem uncommon, but we actually had this happen quite a bit when we were converting over to roles. This means that our concrete class methods needed to be written as:

after 'clean_the_kitchen' => sub {
    my $self = shift;
    $self->mop_the_floor;
    return $self;
};

That's easy to get wrong and, frankly, method modifiers such as before, after and around generally make me feel uncomfortable. Like role method aliasing and excluding, they're code smells suggesting your design needs work. However, as we were trying for "the simplest thing that could possibly work", we went with this with the intent of revisiting it later. We still haven't revisited it. We have work to do.

In short, converting bases classes to roles and resolving composition errors is often enough for small code bases, but as code bases get large and complex, it's much harder. There are plenty of pitfalls and they're not always obvious. Plus, despite having excellent code coverage for tests, you probably don't have great path coverage (few do). This means that switching from base classes to roles can easily introduce behavioral changes that you haven't been expecting.

See also Tips For Converting Base Classes Into Roles. (gosh, tags would be useful)

Agreed

nothingmuch on 2009-12-04T14:40:37

Though that wasn't the main point of the post, which was that if you just go about converting base classes to roles where you have recurrence of god objects, you will get god objects implemented with roles ;-)

As you showed, in real life lots of stuff that happens to be working probably shouldn't be. Many a time i find myself reading through old code and saying to myself something along the lines of "OH F*ET!$%*Y HOW THE H*U$!(&Y DID THIS PIECE OF &$!(&!Y% EVER WORK TO BEGIN WITH?!!". Something like that would not be easy to transform and my statement doesn't really apply. Unfortunately real code has parts like this and that's just life.

The real thing I was getting at was that roles, though more reusable and generally safer, are still not a fundamental departure from a class like code reuse hierarchy. They provide saner defaults, but you can still make design mistakes that roles can't catch.

Personally I think delegates are a better solution for many of the things people use roles on a single object for. What I like about roles in this context is that they are much easier to move around and consume in different classes than base classes, so refactoring a single object into a delegate hierarchy is much easier and safer when you use them.

Re:Agreed

Ovid on 2009-12-04T16:52:09

I certainly hope, though, that it didn't sound like I was being too critical of your reasoning. You had a great post.
I'm concerned about the God Object comments, though. I think it's easy to create God Objects with roles and to fundamentally misunderstand OO, but God Objects are less of a concern (I think) with roles. For example, with Test::Builder generating TAP and providing the ability to share the TAP generator, you suggest that these are behaviors which can be separated by delegating to a TAP generator. That's really interesting because, from the point of view of the consumer, you present the same interface. So does T::B do too much? For the programmers in the "let's have a class for every possible thing we can ever think of doing" camp, then yes, it does too much. However, it's really easy to use and the convenience of conflating these behaviors seems to outweigh the drawbacks of "doing too much".
A more concrete example would be the ill-fated Pixie project:
use Pixie; use Customer; my $customer = Customer->new({ name => 'Bob', salesman => $some_salesman, }); my $pixie = Pixie->new->connect( 'dbi:mysql:dbname=test', user => $user, pass => $pass ); # Save an object my $cookie = $pixie->insert($some_object);

Why am I creating two objects and writing this boilerplate code just to persist something? Sure, we get separation of concerns, but what I want to do is this:
use Customer; my $customer = Customer->new({ name => 'Bob', salesman => $some_salesman, }); $customer->save;

Sure, I'm putting a cross-cutting concern on the Customer object, but the convenience is worth it. But how many cross-cutting concerns should I bake into an object (via delegation or otherwise?) before the convenience is overwhelmed by the complexity? Roles, at least, can mitigate this, while still making objects easier to use and maintain. When you drive a car, it's not a car object with four wheel objects and an engine object and a steering wheel object. It's just a car.
What I would like to see, and what I don't think anyone has ever done, is define God Objects in anything even remotely close to a quantifiable manner. What does "does too much" mean? That's where much of the "God Object" debate fails for me :)

Re:Agreed

Ovid on 2009-12-04T16:56:34

And to illustrate the point, consider the opposite of the god object: ravioli code. It can be equally unmaintainable.

Re:Agreed

perigrin on 2009-12-04T20:20:26

Attempting a definition of a "God Object"
Any object that implements a model for more than one concept simultaneously.
To take your "car" analogy for a second, you don't drive just a car. The interface presented to you delegates the actions you make to other components you don't have to think about yes, but at no point are the Engine Block, Steering Assembly, Chassis or Wheels wholly consumed by the "car". There are even those of us who prefer cars that make you *more* aware of this (manual transmission for example) rather than less.
Roles aren't really re-usable components, they're more like an alloying material. Adding carbon to iron provides you with a new stronger material steel. Steel is a new basic block to build components out of. Carbon and iron are useful in other alloys (and individually ... the analogy and Perl5/Moose's object model break down here), and can be reused ... but they're not the same as steel.
So yes Roles make building God objects easier. I can't see that they fundamentally affect the problems with God objects. Those problems stem from the fact that your single Class/Object is trying to model more than one concept. I think nothingmuch's point is delegation provides a cleaner way to compose two different concepts than either Roles or Inheritance.
Finally using this definition the problem with Ravioli Code is that the concepts are modeled across more than one class, making it difficult to understand how the model is composed at all.

Re:Agreed

nothingmuch on 2009-12-04T19:24:14

Because you can always implement $customer->save in terms of a well factored interface, but going the other way is much harder.
The interface that T::B would present is just one facet (albeit a very important one). It used to be that the tradeoff was too much java style boilerplate or a god object, but Moose's handles feature lets you have a nice interface on the outside and flexibility without needing combinatorial explosion on the inside.
The point is that for 99% of the usage you can still use T::B because that "too much" is handy, but when you want to output a differfent test format or output TAP without T::B.
But note that I'm also not saying that the remaining 1% of the use cases justify another API. That's just an added bonus. The real benefit is that a clean separation is forced that makes such code generally last much longer, which would mean that developing a better T::B would be a much smaller effort than developing a full stack.
Isn't that pretty much the same reason for TAP::Parser's design?

Re:Agreed

chromatic on 2009-12-04T18:19:55

Personally I think delegates are a better solution for many of the things people use roles on a single object for.

Don't forget that one of the primary purposes of roles is to encourage type-safe delegation.