Oneupmanship

davorg on 2003-09-01T10:27:57

I get a guilty pleasure out of conversations that go like this:

Me: Don't use regexes to parse HTML, use HTML::Parser instead.
Him: But how do you think HTML::Parser parses HTML then?
Me: Actually it uses this C code here (pointing) and that looks to me like a state machine that implements an LR parser. Not regular expressions.


Oneupmanship

2shortplanks on 2003-09-01T11:32:38

Either HTML::Parser solved the halting problem, or that wasn't a state machine implementing the LR parser.

A state machine can only parse as well as a regular expression (and vice versa). But you knew that, right?

Re:Oneupmanship

davorg on 2003-09-01T11:46:50

I only said that's what it looked like to me. And I never claimed to be any kind of parsing expert.

But I know that HTML::Parser doesnt' use regular expressions :)

Re:Oneupmanship

bart on 2003-09-01T13:41:38

But I know that HTML::Parser doesnt' use regular expressions :)
You really should take a look of HTML::Parser version 2.25, then.

Re:Oneupmanship

davorg on 2003-09-01T13:54:51

OK, make that "any reasonably modern version of HTML::Parser" :)

broken html parsing

gav on 2003-09-01T15:20:06

I've made a pretty succesful effort to stop people from hacking at html with regexps here. Unfortunately we then hired a bunch of C# programmers who like to hack at html with regexps.

Arrgh.

Re:broken html parsing

babbage on 2003-09-02T14:45:48

To misuse the JWZ quote, "now you have two problems."

Three, if you count C# :-)

Re:broken html parsing

jjohn on 2003-09-05T14:23:49

JWZ is right, except when he isn't. I hope being a club owner is everything he dreamed of.