Line Noise

Robrt on 2006-05-03T04:37:13

I wrote this regular expression yesterday: /\._.*/

My thought at that moment? Yes, perl is line noise.

Anyone want to guess what it's for?


Death to .*

Enoch on 2006-05-03T05:05:19

Death to Dot Star!

Re:Death to .*

Aristotle on 2006-05-03T05:30:44

How, pray tell, is this particular regex going to backtrack needlessly? Or are you just cargo culting the “death to dot star” line?

(That said, the .* in Robrt’s pattern is superfluous, as the pattern will match the exact same things with or without it.)

Re:Death to .*

Enoch on 2006-05-03T17:31:18

Yes, I was invoking it in the "Cargo Cult" context. The .* at the end was superfluous. He could have put them at both ends: /.*\._.*/.

So, yes, I was cargo culting it. I was trying to draw attention to the fact that .* is almost never what you want to use.

Re:Death to .*

Aristotle on 2006-05-04T00:18:07

“Death to dot star” is about backtracking. At the end of the pattern, the .* won’t backtrack. If you put another one at the front, though, it will. What is your point?

You would have made your case much better if you just said “the .* there is a noop” instead of throwing in something entirely unrelated that happens to be about dot star.

I was trying to draw attention to the fact that .* is almost never what you want to use.

But you’re wrong. It is exactly what you want to use in many cases. You have to understand it, rather than using it blindly, sure, but that’s different from “it’s almost never what you want to use”. How would would you write this?

$filename ~= m/ .+ \. (.*) /x; # get extension

Re:Death to .*

Aristotle on 2006-05-04T00:18:58

Err, that is supposed to be a =~ there of course.

I think I know

Aristotle on 2006-05-03T05:26:07

Let me guess: it has to do with removing those resource fork files that you get on OS X in some circumstances?

Re:I think I know

Robrt on 2006-05-03T06:20:12

Very close.

Re:I think I know

pudge on 2006-05-18T05:46:51

So, what is it?

Re:I think I know

Robrt on 2006-05-22T03:36:36

It had something to do with allowing Mac OS X clients to use a particular WebDAV service. (OS X insists on transfering all the dot-underscore-files even to non-local filesystems.)

Re: Line Noise

Aristotle on 2006-05-03T05:34:25

It’s not Perl that’s line noise, it’s the regex syntax. Last night I wrote this: s{/(?!\.\.)[^/]+/+\.\.(?=/|\z)}{}g

Re: Line Noise

Dom2 on 2006-05-03T07:24:40

I love the x modifier. :-)
s{
    /           # Leading slash.
    (?!\.\.)    # No parent directories of root.
    [^/]+       # Pick out the directory name in root.
    /+\.\.      # Match any number of slashes, then parent directory.
    (?=/|\z)    # Assert that there is a following slash (or end of string).
}{}gx
So it cleans up pathnames to remove /foo/../bar/../baz to be just /baz if I understand that right.

Re: Line Noise

Aristotle on 2006-05-03T07:48:43

Even /x only really helps because of your profuse comments, though.

You guessed correctly: I needed to normalize HTTP URIs to compare them, and URI’s canonical method doesn’t finish the job. To be precise, this runs in a 1 while s/// loop, which is necessary to handle paths like foo/bar/baz/quux/../../../bar.

Re: Line Noise

Dom2 on 2006-05-03T08:22:31

It's not just the comments. I also find that breaking the regex into its component groups makes it easier to understand. It's what I do mentally anyway, so it just means that it's done already for me.

-Dom

Re: Line Noise

Aristotle on 2006-05-03T08:09:36

Hmm, actually, that has a bug. The negative look-ahead must contain a trailing slash, otherwise the pattern will erroneously fail to match something like foo/..fooledya/../bar.

At first I thought I needed a more complex assertion than just include a trailing slash in there, so I started rewriting the regex extensively, and after I realised that it’s not that complex, I noticed that my comments actually have a noticably different focus from yours, so I decided to keep the result for comparison:

s{
    /             # Leading dir separator
    (?! \.\./ )   # Make sure this is not an updir
    [^/]+ /+      # Dir name, then any number of separators
    \.\. (?=/|\z) # Updir, and make sure it is one
}{}gx