The distribution of tags

acme on 2005-08-22T23:02:08

The Distribution of tags is a difficult matter,
It isn't just one of your holiday games;
You may think at first I'm as mad as a hatter
When I tell you, a tag may be distributed in many different ways...

Ever since releasing the first version of HTML::TagCloud, I've been wondering about the distribution of tags. I mean, I had a hunch and square-rooted the tag count and it looked fairly pretty. But then I remembered Clay Shirky talking about power laws and the whole long tail hype. How are tags distributed? How might my module represent tag clouds in a better way?

There only so much wild assed guessing I can do, so I hacked up a quick script using Flickr::API to pick 42 random users from Flickr recent photos, crawl their entire photo collection and find their tag distribution. What do I mean? Well, in my photo collection "london" is used a lot more than, say, "bamboo". In this case I'm not interested in the tags themselves but the distribution of their counts. I produced a chart - each colour is a different Flickr user, with the tags on the left being their popularity by that user: the maximum count being 30 and minimum (and most common) clearly being 1. See that slope! Looks awfully power law to me. So I change my sqrt() to be a log() and everything looks much prettier.

While I was at it I incorporated a patch from Dean Wilson and stole an idea from O'Reilly Radar to bunch the tags closer together. In time honoured tradition, I present before and after screenshots of my recipe tags. Also, I hadn't really considered that people would use HTML::TagCloud with only a few tags - now it copes better with that case. Get HTML::TagCloud 0.32 from CPAN now!

obra and I have been wondering how we could represent more information with tag clouds. I reckon we can easily represent three dimensions: the font size, the text colour and the tag order. (Even though I strongly feel that tags, and menus, should be sorted alphabetically). Any other ideas? How much further can we push this boat?


Use the temporal axis

grantm on 2005-08-23T00:13:57

Any other ideas? How much further can we push this boat?

You could represent the 'freshness' of a tag (how recently it's been assigned to new objects) by blinking the tag name more or less quickly.

I'm not suggesting you should. Merely that you could :-)

huh

jesse on 2005-08-23T04:48:08

Interestingly, I find the "before" more pleasant to look at. I think I _like_ the whitespace, but yay power law. With luck, I'll have more time to play with the tri-axis tagcloud this week.

Re:huh

acme on 2005-08-23T09:03:08

Ah well, the nice thing about the CSS is that you can remove the squishing together with one easy step ;-)

light - medium - dark

jhi on 2005-08-23T05:02:56

> the font size, the text colour and the tag order

How about splitting the colour into colour and intensity? Makes it a bit harder to read, but then again, you can't use all the possible text colours, either, since there has to be some contrast.

hmm.

2shortplanks on 2005-08-23T09:04:48

So, if we take the example of Flickr, we could have:

a) Newness. Represented by the order of the tags
b) Quantity of photos tagged with the tag. Represented by Size
c) Interestingness, represented by intensity of the tag. Tags associated with photos that are on average more interesting are darker (or lighter, depending on your background colour)
d) Popularity. Tags that are associated with photos that are viewed more than others are "warmer".

So, for example, my tag argh would be somewhere near the start of the list (as there's a relatively new photo there) small (since it refers only to one photo) red (since it's been viewed by 140+ people) and bright, since flickr thinks that photo is 'interesting'. In contrast, my léonbrocard tag would be bigger, but further down the list (since I haven't uploaded a photo of you in aged [bbq pictures to be uploaded tomorrow!]), more blueish (hell, there's a photo of you in there that no-one else has even looked at) and less intense because flickr thinks that you're boring (or at least my photos of you are).

CSS

2shortplanks on 2005-08-23T09:11:20

The new layout, while looking better, is a little more confusing about what tag you're point at with your mouse. For example the t and p of the potato and pasta as seen here. Some sweet sweet mouseover action (which could be implemented in pure CSS) would be nice.

themes/groups/categories

tannie on 2005-08-23T16:17:54

You could (alphabetically) order tags by themes in sub-clouds or something. Say you have photos (or whatever) with tags like London, Paris, Amsterdam, you could 'add' them to a category or theme. These are all cities, so you could make a subcloud of cities, and a subcloud of something like 'Friends' and "family'. You'd need to tell it which tags belong with which groups, and have the option of placing one tag in multiple groups and then you'd get really interesting clouds, I think.

Using colour and intensity seems to suit 'older/newer' and 'more interesting / less interesting'. Though that may conflict with the viewers point of view...

Zipf's Law

brian_d_foy on 2005-08-24T07:40:15

You might want to check out Zipf's Law, which desccribes such rankings. :)