If for some strange reason you've looked on my User Info page, you might've noticed that I'm studying (Mandarin) Chinese. I started taking classes at Ecole-club Migros last August, so I've been at it for about a year. I'm still nowhere near being able to speak Chinese yet[1], but it's going pretty well I think.
So my point in bringing this up is.. a long shot. I'm hoping that someone familiar with Chinese characters will have some information.
As I've been studying writing the characters, I've of course noticed there are a lot of patterns. Each character is basically composed of one or more "components". For example, if you look at 能 neng, which means "to be able to" or "can", it's made of four parts. The top-left looks like the bottom part of 云, "cloud". The bottom-left, 月, means "moon" or "month". Both top and bottom of the right side are the same; I'm not sure, but it might be 七, which means "seven".[2]
When you look up Chinese characters in a dictionary, you look them up by "radical", which is the main "component" of the character. I don't think there's a steadfast rule for determining the radical of any given character; I don't even know which component of 能 above is the radical. Beginner dictionaries try to make it easier by locating a character under several of its components, not just the radical.
There are programs to train yourself in Chinese characters, notably Hanzi Master [3]. I also got a lot of data on Chinese characters from the same guy's web site.
In addition to a relatively limited number of components (a few hundred total, I think) being shared between all the characters (a few tens of thousands), the strokes of each character are written in a specific order (top to bottom, left to right, several other rules).
Here, finally, is my problem. I can't find a database with the stroke-order or component data. I found data on Chinese<->English definitions, on the radicals of the characters (this is used in Hanzi Master), the characters' stroke count, etc. I'd really like to find a database with the stroke order. I want to program a little flashcard application that teaches you how to write the characters. I want to find data on the character componenents more out of curiosity, though I could also use that in a flashcard kind of app.
I know this kind of data must exist because I've seen applications that show how to write characters. But I don't know if the data is publicly useable. If anyone has any information, I'd appreciate it. (I just wonder if it's findable - if you know Chinese and know where to look for it. :)
--
[1] c.f. French, where I could get by after one year of classes (and I guess watching the news, reading papers, and hanging out on #perlfr help a lot...)
[2] It also appears on the right side of 北 bei, which you're familiar with from the name 北京 Beijing, which literally means "north capital". Looking at the "jing" character, the top is a lid, the middle box means "mouth", and the bottom (小 xiao) means "small". So you can see how these components are put together to form characters.
[3] "hanzi", 汉字, means "Chinese character(s)"
Re:seeking from my memory
slanning on 2006-08-29T08:25:30
Thanks, the website does look useful. I liked reading your experience learning how to write Chinese.
Re:character analysis
slanning on 2006-08-29T09:10:02
Thanks, the csulb.edu link looks close to what I'm looking for. I hadn't come across that one in searching, apparently. I know there are many sites on line that help you learn Chinese. For example, from that site you linked to, there is a page to learn to write Chinese, where if you click on a character, an animated gif shows the order in which you draw the character. But that's the problem, it's an animated gif, not a database! Same with this Chinese Writing Master application, I'd like to find the "raw data". What I want, as a programmer, is to be able to write my own application so that I can learn how I want to, not from a webpage online or from some Windows application.
Ideally I'd like a database listing all of the components of the characters, not just the radicals, and some way of identifying which strokes those components are. Maybe it's just too hard. Like how would I manually draw the characters given the description of their strokes? Probably what I have in mind isn't feasible.
About the book "Reading and Writing Chinese", I actually have it. It's excellent.
:) It shows the characters in an order so that you see how some characters are made of simpler ones. In fact I looked up in that book last night my explanation of 能 neng in my original post, and I found that I'd made a couple of mistakes. The top-left character is the "cocoon" radical (or was it "private"... or "coil"...? ack.. :), while the ones on the right aren't 七 seven but rather 匕 ladel. Re:character recognition
mr_bean on 2006-08-30T22:20:52
Most of the effects of the programs looked at here could be achieved with a drawing program and an image manipulation program.
A more interesting area to look at for help would be hand-written text to character recognition. I imagine such programs exist, but don't know anything about them.
They might analyze characters into strokes.
On the other hand, a demo of a Chinese speech-to-text program I saw was quite impressive. And Chinese text-to-speech programs exist too.
The example of 七 and 匕 indicates character recognition would be quite difficult. The first stroke of 七 is drawn left to right, and that of 匕 is right to left. (Both the Chinese and Japanese dictionary I have disagree with the McNaughton and Li stroke order for 匕. Perhaps 4 primitives is not enough.)
天 and 夭 are different in the same way.