Funny Little Images

pudge on 2002-04-17T18:35:03

We have funny little images on use.perl.org for posting comments, making new users, and requesting passwords. They are intended to make sure you are a human and not an automated bot. They are here for testing, so if you hate them, don't freak out; we will only use them on a permanent basis if we have a problem with bots, which I don't anticipate. We are just testing some Slash code; so feel free to let me know here your experiences with it. Update: 04/19 12:41 GMT by P : The test is over, the images are gone now. Thanks for the feedback.

Funny Little Images - trivial to work around

jcwren on 2002-04-17T20:10:45

Personally, I think the images are a waste of time. merlyn (Randal Schwartz) did a column in Web Techniques for the same basic thing.

Never one to resist a pointless challenge, before the article hit print, I wrote a "cracker" for it. The write-up is here, for those that may be interested.

You're going to have to get a lot more tricky than 3 letters with a consistent font to stop a 'bot. Most of the time is invested in creating the font table, but once you've got that, the pattern matching is trival.

--jcwren

Re:Funny Little Images - trivial to work around

pudge on 2002-04-17T22:05:49
Did you try it on the images here?
Re:Funny Little Images - trivial to work around

jcwren on 2002-04-17T22:11:51

No, mostly because I'd have to build the font maps. But in loading the images several times, the fonts all appear consistent, along with their positioning. The slight color in the background is easily worked around.

The down side to the images is that it makes posting with lynx pretty darn impossible. And considering that a great many Perl users are *nix users, that doesn't seem like a nice thing to do. Even if lynx *does* represent a small viewer-shared.

--jcwren

Re:Funny Little Images - trivial to work around

pudge on 2002-04-18T01:07:36
As to whether it is not nice for the users, that's not relevant to anything in particular that we're doing right now. Sites don't have to use this. As I said, we are testing it. I don't know of any site that we are working on that will turn it on for posting comments on a regular basis.

And I do doubt how "easily" you could work around things. What if every letter were a different color with a different background, with dithering all throughout? As Jamie notes, it's trivial to add things like that, and it could be different for each image, so you never know what you're going to get. Figuring it all out to break it is decidedly difficult. I know that my professional OCR software couldn't pick out the letters in even this simple graphic (though that might mean the software sucks, I dunno, I don't use it that much :-).

Re:Funny Little Images - trivial to work around

dito on 2002-04-18T03:45:38
Could somebody shoot Tim Berners Lee so he can turn in his grave!

Am I missing something or is this a big two fingers to blind users? Maybe you could put the letters in the ALT tag ;-)

Helping put this in slashcode is just as bad as Adobe allowing publishers to disable "Read Aloud" on their e-books. The argument that sites/publishers don't actually have to use it is no more a defense for slashcode than it is for Adobe.

Re:Funny Little Images - trivial to work around

jamiemccarthy on 2002-04-18T04:10:24
We're not happy about its effect on blind users, but images are not the only possibility for this kind of verification. We hope in future to offer alternate methods -- I'm thinking audio snippets, in particular. We can also set up a method for admins to exempt particular accounts. Etc.
The nature of the internet is that it's trivial to DDoS any site that allows anonymous or semi-anonymous postings. Some Slash sites are actively targeted by hostile users for scripted attacks, and those sites need defenses.

Re:Funny Little Images - trivial to work around

dito on 2002-04-18T06:34:16
I wrote a long (constructive) response to this but the combination of IE and the absolutely shit bag of a Chinese internet cafe I'm in just ate it on me!

Basically it amounted to doing a few checks on the recent history of the IP address an account is being registered from or a check on the history of a doubtful account (all new accounts being doubtful until they prove themselves). If the checks fail, then get them to pass a humanity test.

This should mean that a blind person would have to be unlucky when joining up (IP-wise) or doing script like things when posting to actually get hit with an image test.

Also, the images as they stand look fairly easy to clean up and then OCR. The more difficult you make them to clean, the closer you might come to knocking out the colour-blind as well, a much larger section of the population

As for audio being an alternative. Even short WAVs get pretty big. Plus you're going to have to add interference to that too and not all blind people have good hearing.

I think humanity tests are not your best front line defence.

When I get myself a PC over here I may have a pop at implementing something else.

Re:Funny Little Images - trivial to work around

2shortplanks on 2002-04-18T09:06:23
It's like the ability for an administrator to remove comments. It's probably a bad idea to use in most implmentations, but some possible uses of Slash might need it.
From what I understand about Slash (having no more experience than reading the book) the code base isn't intended to enforce any policy on the admins of Slash systems. That policy is up to the admins - the code gives them the freedom to make their own decisions.

Re:Funny Little Images - trivial to work around

dito on 2002-04-18T10:20:49
I just think it's a bad solution to the problem. Jcwren has already very quickly cracked it and making it harder to crack just excludes an even larger section of sight-impaired users.

And when a DOSer does finally crack the latest version. you're goosed again until you can find some other way of obscuring the letters and thus exclude even more people!

Monitoring account creation activity and the posting activity of accounts that have yet to prove themselves would be a much sturdier way of doing things and doesn't alienate anyone. It's just not much fun to implement. I'll give it a crack if I get a PC in the next while.

Re:Funny Little Images - trivial to work around

Fletch on 2002-04-18T02:42:33

Don't forget links (which I've come to like better than lynx; handles tables better) and w3m.

As a sugguestion, maybe have an option / configuration value / something that it gets turned off after you get x karma. That way you get the benefit of suppressing automation from new accounts, but long time users aren't inconvenienced.

Re:Funny Little Images - trivial to work around

2shortplanks on 2002-04-18T08:52:27
Maybe Slash could have an option to display the image as a text based image so that lynx can render it - run it through aalib or something.
Doesn't really help blind users though.

Re:Funny Little Images - trivial to work around

jamiemccarthy on 2002-04-17T22:14:09
You've probably noticed there is noise in the background of the Slash images. Any thoughts on how difficult that makes the problem?
If misregistration, dithering, etc. would make things harder to crack, the Slash team can do those things too. In this case, the arms race advantage goes to the server side. Tweaking text to make it less computer-readable is easy; recoding OCR algorithms is comparatively extremely difficult. The Slash code doesn't have such things yet but it would be a matter of minutes to add them.

Re:Funny Little Images - trivial to work around

jcwren on 2002-04-18T06:24:38

Perhaps it's time, then. I wrote a small utility to take the images and extract the characters. Out of the 24 images or so I pulled, I was able to decode them 100%

Mind you, all this program does is take the image, convert it to a bitmap, run a simple threshold comparison, and if the RGB value is less than a certain value, it's black, otherwise it's white. I output this as an ASCII image comprised of '#' and '.' in the 24 x 19 array.

All the images I tested were perfectly legible, which means they can be OCR'ed at this point. I tried to add a sample of the output, but the SlashCode decided it was smarter than me, and I didn't really didn't need that extra white space. In spite of enclosing it in <code> tags, no less.

If you're really going to pursue this idea, I would recommend looking at AltaVistas method. They use different fonts, rotate the images, etc. Harder to match, but I'm thinking some neural network software might be smart enough to extract the images.

Oh, and addition to precluding the lynx/links users, you're also going to be cutting out the people who use PDAs with less than VGA resolution. Maybe not many now, but it's a growing userbase.

--jcwren

Re:Funny Little Images - trivial to work around

jamiemccarthy on 2002-04-19T15:34:28

"All the images I tested were perfectly legible, which means they can be OCR'ed at this point."

I agree with part 1 and part 2 of the above statement, but not the "which means" part that bridges them -- there exist some legible images which can't be OCR'd.
Example: http://www.captcha.net/cgi-bin/ez-gimpy
The Slash plugin could move to using something like this, if there's a need for it, without too much trouble. The current model is really just infrastructure to allow things like that to happen, plus one sample chunk of code. The guts of making image/text pairs is pretty easy to change, it's all perl and there are comments in the source.
If a site decides that they want the hassle of installing Gtk, storing thousands of 25K images instead of 1K images, and taking up 30x as much screen space for images which are barely more legible than the default Slash examples, they are welcome to use the ez-gimpy example above. Those are tradeoffs a site can make; the advantage is that the result is not OCR'able at all. Apparently the source code for ez-gimpy can be downloaded. If the license is compatible with Slash's, I'd love to distribute it as an alternative method :)

choke?

TeeJay on 2002-04-18T08:34:10

why not just choke ?

If an address or user seems to be abusive then drop the connection on the floor.

the image thing is more annoying than the noise that would accumulate without it.

A.

Use Perl vs. The Blind

sdague on 2002-04-18T12:08:08

I know that the subject is very inflamatory, but I have a good reason for it. In my last job I worked on some fairly large web projects, one of the very important things to do (from both a "its the right thing" and a "don't want to get sued" perspect) was to make those websites accessible.

This means that the website should be able to be read in a screen reader, that all images have an apropriate ALT tag, and colors used on the pages are propperly contrasting for those that are color blind (a full 10% of the male population). I spent a number of days browsing the internet with a screen reader and my monitor turned off to get a feel for what it is like. Honestly, for the most part, the internet sounds better than it looks.

Any time I see anyone deliberately go out of their way to make sites not work in Text only environments because they are being "clever about bots" just annoys me. Any work arround which is programatically generated, and still has to be used by a user, can be programatically cracked.

I am hoping that this change was out of ignorance of the fact their are people without 20 / 20 vision using the web, and not that it was taken into account and ignored. Otherwise you should change the text at the bottom of the page to:

To confirm you're not a script, or using a standards based text or voice browser (in which case shuffle off, because we don't like your kind arround here) please type the text shown in this image.

Re:Use Perl vs. The Blind

pudge on 2002-04-18T15:07:33
I know that the subject is very inflamatory, but I have a good reason for it.

No, you don't, because I have already said that this feature will not remain active on use Perl. It is only for testing the code, as this site is a test site for the Slash development work that I do for my day job. As such, this feature has nothing to do with use Perl itself, and your subject is unreasonable.

Re:Use Perl vs. The Blind

jaffray on 2002-04-18T17:24:49
"Slashcode vs. The Blind", however, would be quite reasonable.

If you have no choice but to implement this feature, I do hope that you include a BIG LOUD WARNING in the documentation that the feature will make the site unusable for people who are blind or using a text browser, should be disabled in virtually all cases, and cannot be legally enabled on any site maintained by a US government agency or a company whose Slash site is in any way related to a government contract or grant.

Thanks.

Re:Use Perl vs. The Blind

pudge on 2002-04-18T20:15:04
No warning is necessary. Anyone setting up a US government site, or contracted site, should know these things. If they don't, they are unqualified to be using my tax dollars and should mail them back to me immediately.

Re:Use Perl vs. The Blind

jaffray on 2002-04-19T15:47:53
1) A warning would still be appropriate for Slash admins who might enable the feature without realizing that they are hosing disabled users or users of text-based browsers, whether or not they are in any way connected to the government. Does it really hurt to add a two-line comment saying "WARNING: Enabling this will hose blind people and users of text browsers, and should be considered a last resort"?

2) As for the legal aspects, no, I don't think it's reasonable to expect some random grad student setting up a Slash site for his/her research lab that receives government grant money to both know about Section 508 and realize that enabling this option would violate it. Nor do I believe that even experienced professionals who are being paid primarily for web work generally have a clue about accessibility issues.

Re:Use Perl vs. The Blind

TeeJay on 2002-04-18T16:39:26

Regardless of where you are going to use it - its still the wrong approach - wouldn't it be far better for both the Slash engine and your employer to use a choker that locks out users based on overly-heavy use (a sign of bots and abuse or addiction) or mis-use or karma or a combination of both.

I can't think of any site or situation where it would be more help than hinderence.

Re:Use Perl vs. The Blind

jamiemccarthy on 2002-04-18T22:13:17

"wouldn't it be far better for both the Slash engine and your employer to use a choker that locks out users based on overly-heavy use (a sign of bots and abuse or addiction) or mis-use or karma or a combination of both."

This is already done in a number of ways. Sometimes it is not enough.
"I can't think of any site or situation where it would be more help than hinderence."

That's OK, as long as you recognize that your inability to imagine such situations does not mean none exist.
The reason you can't think of any is because you don't run a Slash site that is subject to frequent attack from vandals. Those who do will be interested in tools to help them ward off such attacks.

Re:Use Perl vs. The Blind

jcwren on 2002-04-19T02:37:51

I don't run a Slash site, but I *have* given this a lot of thought.

Obviously, it's a very difficult issue that has to balance usability (I am NOT going to enter a 44 digit key every time I post) vs reliability (putting the security image contents in an ALT tag would be braindead).

It seems that the first thing that's required is lack of anonominity, at least as far as the site is concerned. You may post as Anonymous Coward, but it should still require you to be a registered user. It does sort of limit the number of people that inflammatory comments could be made by (you know it has to be 1 of 'n' people), but with a large enough user base, it's as good as being really anonymous.

It seems to me that anything that's easy enough for a person to use is easy enough for a 'bot to work around. Certain types of authentication are impracticle. Audio clips limit the site to those with soundcards. I've never gotten sound working right on my ThinkPad 1411i. So I shouldn't be able to post because I have a disadvantaged machine? Not to mention I would bet someones paycheck that there are more hearing impaired web users than there are visually impaired ones.

Security requiring visual cues obviously has a problem for the visually impaired, (again) not to mention those running on less capable machines. If I have a monochrome display, can I read it? What if I'm using a text browser? And if the security image accomodates a text browser somehow, that's only going to make the 'bot writing easier, because we're living in a ASCII world. So seemingly, that method is out.

Olfactory? Not bloody likely. Facial recognition? Nope. Biometrics? Not likely again.

The only thing I can see is making the registration process sufficiently robust to say that you have an accountable human at the far end. Since notes can only be posted by registered users, we know that someone, be it a person or person with a 'bot, is accountable. Now the question becomes "How do we certify it's a person in the registration process?"

This allows us a little more leeway, since it's a one time deal. Obviously, if we take too long to return a registration (like waiting 24 hours), you'll lose interest in people joining, since many people join to provide feedback immediately on a point that interests them. After 24 hours, *I'm* not going to bother commenting on a topic, especially if it's time sensitive, like humor.

I don't know what the answer is for a reliable registration process that's timely and accurate and accomodating to disabilities and various hardware. We do know that Tim Berners Lee vision of the net was accessability, and disabilities and hardware limitations (at least to a point) shouldn't limit that accessability. Maybe you have to play a full game of 'Punch the Monkey', and get a score 7 or higher. On the other hand, a friend and I are messing around with the idea of AI and GA to play Defender, and if that leads to anything, that might be too easy to defeat.

At any rate, this is where I would focus my energies. If a non-validated person can't post, that means posts can only come from a user that's registered, and therefore, there's a chain of accountability.

--jcwren

BTW, how about making the dang text entry box for comments larger. You get verbose, and it's a pain in the arse to edit in.

Re:Use Perl vs. The Blind

jaffray on 2002-04-19T15:51:57
If a tool can't be used because of unacceptable side effects, it's useless.
I would certainly hope that any site large and popular enough to face such attacks would not consider "hoses blind people and other users of text browsers" an acceptable side effect.

why use.perl and some suggestions...

hossman on 2002-04-19T07:24:49

The images seem to be gone now (i saw it on the comments form before, I just didn't accutally post a comment) but I'd like to go ahead and toss out a few things that have been on my mind while reading this thread so far...

why use.perl.org? Wouldn't slashcode.com have been a better place to beta test this, and seek peoples opinions about it? (maybe it was beta'ed there as well, but i didn't notice it, or a discussion about it)
It's not that big of an issue, I'm just curious more then anything.
why are people so worked up about this? Does it prevent access from text based browsers, yes. Does it make slash hard to use for the blind, yes. is it a narrow minded feature, yes? DO i think it's a very effective deterant, no. Is it something that all slash sites will have to use, no. Is it something slashcode will come with defaulting to on, i hope not. Is it something use.perl is going to use permanently, pudge said no at the same time he accounced them.
This is an optional feature folks, and I can't believe how many people from a programming community are up in arms about the creation of a feature to a piece of software. If you don't think the feature should be used fine, say so -- but come on people, there's no reason to bite pudge's head off just 'cause the code exists.
I hope the slash team considers the comments so far with an open mind. If you sift through the negativity, there's some decent suggestions (i particularly liked the idea of admins being able to disable it on a per user basis if a particular trusted user has special needs)
Lastly, I'd like to suggest that the slash team that there are alternative ways of doing this that don't require images. with the "antispamification" of email addresses, you've allready created a masterpiece of text based obvuscation. I'm constantly amazed at the multitudes of methods used to scramble up an address, such that i can still tell what it should be, but a script wouldn't (unless it was highly specialized, and then it wouldn't work on the next 99 addresses, why not base your validation on the same type of ascii-obfustcation? printing out a random (fake) scrambled email address and asking the user to descramble it is just one of lots of "ascii games" that can be played -- not to mention other simple questions that can be generated automaticly, but would be semi-complicated for a computer to parse (especially if it didn't know what type of question to expect) ... "what is ten plus three?", "hoW maNy UpercaSE VoweLs in THiS sEnTence?", etc...
sure, people could write scripts to crack all of these brute force ... but they could write scripts to brute force OCR as well. (And i know which one i would consider more entertaining and browser compatible)

Re:why use.perl and some suggestions...

jamiemccarthy on 2002-04-19T15:48:47

"Is it something slashcode will come with defaulting to on, i hope not."

I can upgrade that answer to "no" -- the default Slash theme will only ship with the minimal plugins that are required for its proper operation, and I can't see HumanConf ever being one of them.
"I hope the slash team considers the comments so far with an open mind. If you sift through the negativity, there's some decent suggestions (i particularly liked the idea of admins being able to disable it on a per user basis if a particular trusted user has special needs)"

Hehehe, I think that was one of my suggestions (I'm a member of the Slash team :)
"with the 'antispamification' of email addresses, you've allready created a masterpiece of text based obvuscation"

Cool, thanks, glad you like it -- that was mostly me too :)
Writing the spamarmoring was a lot of fun; since it's done with code (typically 1 to 10 lines of perl), and since that code is stored in the DB, for security reasons we run it in a Safe. If you're interested scroll down to getArmoredEmail() in Slash::Utility::Data.
I'd love to share with you the little 1-to-10-line perl codelets that actually do the armoring on Slashdot, but if spammers had that code, reverse-engineering would be much easier. The shipping defaults are available, though.
It's fun to make up something new every week and put it up on Slashdot to tweak a few thousand people's emails, it's kind of like writing JAPH scripts and getting paid for it. I love my job.

Re:why use.perl and some suggestions...

jaffray on 2002-04-19T16:37:10
Why is it a "big issue"?

Well, if a problem exists, and the provided solution hoses blind people, then blind people will be hosed. If you believe blind people shouldn't be hosed, then perhaps developers should provide different solutions.

If there's going to be a better solution that doesn't have such side effects, then why waste time on the broken solution at all?

I was also going to suggest the "riddle" idea - if you're willing to keep the riddle generators secret, it should provide decent human verification.

I dislike the "special needs exception" idea, because it's not equal access, and because it seems silly to make someone apply for special access to post from Lynx or a cellphone. Besides, all the abusers will just immediately apply for said exception, and you're back to where you started.