TorgoX has handed down this little hack for a problem I was having: It's a hash of the HTML tags that are implicitly whitespace.
%Breaker_elements = map {; $_ => 1 } keys %HTML::Tagset::isKnown;
delete @Breaker_elements{ keys %HTML::Tagset::isPhraseMarkup };
$Breaker_elements{'br'} = 1;
$Breaker_elements{'hr'} = 1;
$Breaker_elements{'title'} = 1; # a hack
Now I can parse nicely!