Perl catches a pedophile on Myspace.

pemungkah on 2006-10-17T00:10:55

Wired Magazine's Kevin Poulsen has shown that at least part of the problem of policing MySpace is a SMOP.

My road to this New York police unit began in Perl.

In May, I began an automated search of MySpace's membership rolls for 385,932 registered sex offenders in 46 states, mined from the Department of Justice's National Sex Offender Registry website -- a gateway to the state-run Megan's Law websites around the country. I searched on first and last names, limiting results to a five mile radius of the offender's registered ZIP code.

Wired News will publish the code under an open-source license later this week.

The code swept in a vast number of false or unverifiable matches. Working part time for several months, I sifted the data and manually compared photographs, ages and other data, until enhanced privacy features MySpace launched in June began frustrating the analysis.

Excluding a handful of obvious fakes, I confirmed 744 sex offenders with MySpace profiles, after an examination of about a third of the data. Of those, 497 are registered for sex crimes against children. In this group, six of them are listed as repeat offenders, though Lubrano's previous convictions were not in the registry, so this number may be low. At least 243 of the 497 have convictions in 2000 or later.


Excellent; now what about the flipside?

Aristotle on 2006-10-17T11:11:20

That’s good; but think about the other side of the coin too. The Semantic Web and The Quilombo makes this point…

You have got to be kidding!?!?!

Stevan on 2006-10-17T12:27:11

To start with, the person doing this was using a public list of known criminals and searching and cross referencing that list though a site membership which is a known hot spot for sexual predators. You only need to do a quick search of the news from the past few months to see that this is a growing problem, and something needs to be done about it. And while you could say it is the same thing as IBM collating the German census to identify the Jewish populations (search population $x for members which meet criteria $y), I think maybe you are reaching on that one a little. He is trying to protect children, this should be commeneded. As for the rights of the sexual predators, they lost those already when the commited their crime. Really, think a little before you make such associations.

- Stevan

Re:You have got to be kidding!?!?!

Aristotle on 2006-10-17T12:57:24

I didn’t criticise him for identifying sex offenders on Myspace, did I?

I am worried about the trend of putting more and more ever more easily minable personal records online. In this case it was used for good, but what might happen when someone else with less noble intentions tries the same trick?

Re:You have got to be kidding!?!?!

Stevan on 2006-10-17T13:41:33

I didn’t criticise him for identifying sex offenders on Myspace, did I?

No, you equated it with IBM sorting out Jews in the German census data. Of course you did it indirectly, but you still did it. This was my issue.

I am worried about the trend of putting more and more ever more easily minable personal records online. In this case it was used for good, but what might happen when someone else with less noble intentions tries the same trick?

I think it is safe to say that everything has the potential for being used for good or for evil. Personally I am less concerned about easily minable public records, then I am about easily minable records which are privately held by various corporate interests (credit card info, airline reservations, phone records, etc.). At least with publicly available information, it has the benefit of being public, the privately held data about me I have no control at all.

- Stevan

Re:You have got to be kidding!?!?!

Aristotle on 2006-10-17T14:07:33

You don’t have much control over your public data either. Just try to cancel your account and remove all of its traces with most online services. Or try to get everything you have put online out of Google.

The only difference is that with public data you know what’s on your record. Granted, that’s of major importance, and in Germany it is even protected by law (which is, of course, under siege in the wake of the world wide terror craze).

Clifford Stoll mentions in The Cuckoo’s Egg that there should really be finer distinctions than just public and confidential. A lot of public data is itself innocous, but by correlating enough of it you may be able to infer facts that were not intended for public consumption.

Zed Shaw’s article is stretching it in my opinion also, but I think on the whole, we suffer from too little concern about these issues, not too much of it.

Re:You have got to be kidding!?!?!

Aristotle on 2006-10-17T14:27:08

Some more points (sorry):

you equated it with IBM sorting out Jews in the German census data

Well, of course. Both cases are, in fact, exactly the same act. But you failed to notice that I made a disctinction about the intent. I did not say it was a bad thing to do here and did not object to it being done.

It is the tool I am worried about. Tools can be morally neutral; they do not necessarily carry an inherent value. It is their use that has moral value. Some, such as the Bomb, do have inherent value because the act they are used for dictates their very existence and design. But data mining is no such loaded tool; it is neutral. It’s the basis of science, for one.

Even if a tool of unprecedented might is neutral, though, should someone doing great good with it preclude us from worrying about what unprecedented evil someone else might wield the very same tool for?

Pandorra’s box is a one-way street.