chinese character components

slanning on 2006-08-28T15:27:04

If for some strange reason you've looked on my User Info page, you might've noticed that I'm studying (Mandarin) Chinese. I started taking classes at Ecole-club Migros last August, so I've been at it for about a year. I'm still nowhere near being able to speak Chinese yet[1], but it's going pretty well I think.

So my point in bringing this up is.. a long shot. I'm hoping that someone familiar with Chinese characters will have some information.

As I've been studying writing the characters, I've of course noticed there are a lot of patterns. Each character is basically composed of one or more "components". For example, if you look at 能 neng, which means "to be able to" or "can", it's made of four parts. The top-left looks like the bottom part of 云, "cloud". The bottom-left, 月, means "moon" or "month". Both top and bottom of the right side are the same; I'm not sure, but it might be 七, which means "seven".[2]

When you look up Chinese characters in a dictionary, you look them up by "radical", which is the main "component" of the character. I don't think there's a steadfast rule for determining the radical of any given character; I don't even know which component of 能 above is the radical. Beginner dictionaries try to make it easier by locating a character under several of its components, not just the radical.

There are programs to train yourself in Chinese characters, notably Hanzi Master [3]. I also got a lot of data on Chinese characters from the same guy's web site.

In addition to a relatively limited number of components (a few hundred total, I think) being shared between all the characters (a few tens of thousands), the strokes of each character are written in a specific order (top to bottom, left to right, several other rules).

Here, finally, is my problem. I can't find a database with the stroke-order or component data. I found data on Chinese<->English definitions, on the radicals of the characters (this is used in Hanzi Master), the characters' stroke count, etc. I'd really like to find a database with the stroke order. I want to program a little flashcard application that teaches you how to write the characters. I want to find data on the character componenents more out of curiosity, though I could also use that in a flashcard kind of app.

I know this kind of data must exist because I've seen applications that show how to write characters. But I don't know if the data is publicly useable. If anyone has any information, I'd appreciate it. (I just wonder if it's findable - if you know Chinese and know where to look for it. :)

--

[1] c.f. French, where I could get by after one year of classes (and I guess watching the news, reading papers, and hanging out on #perlfr help a lot...)

[2] It also appears on the right side of 北 bei, which you're familiar with from the name 北京 Beijing, which literally means "north capital". Looking at the "jing" character, the top is a lid, the middle box means "mouth", and the bottom (小 xiao) means "small". So you can see how these components are put together to form characters.

[3] "hanzi", 汉字, means "Chinese character(s)"


seeking from my memory

Qiang on 2006-08-28T17:31:07

i don't use chinese that much now ( esp for writing ). seeking from my memory..

i do not remember learning stroke order for hanzi from school. many chinese characters are composed by some simple characters such as 文 then 蚊. i think it is also called 偏旁部首 ( the left part may represent the sound and the right part may represent the meaning). some of the simple character has it's own meaning too.

one thing i like about chinese character is that the shape/structure of the character usually can tell you something about the character. such as 苗 , here 田 means field, the above part means grass. therefor the combination has a meaning of grass in the filed. another one is 車, if you look it horizontally, it is two wheels with a container in the middle. Sadly, chinese characters have been simplified by chinese gov and 車 is written as 车 which has lots it's meaning. arg. i digress....

anyway.. iirc, we started learning chinese from the basic/simple characters first ( like 人,火,云,山 etc ) , gradually we learned something bit more complicated which uses the simple characters we had learned.

i still remember trying to memory characters by writing it hundred times as homework in elementary school :)

this website seems to be useful http://zhongwen.com/

incidentially, today a friend just told me that she has enrolled the university program to learn mandrain. good luck! :)

Re:seeking from my memory

slanning on 2006-08-29T08:25:30

Thanks, the website does look useful. I liked reading your experience learning how to write Chinese.

character analysis

mr_bean on 2006-08-29T03:46:28

I googled on Chinese character stroke order and found this page, http://www.csulb.edu/user/txie/character.htm
where he has his own character drawing program and mentions this company (www.eon.com.hk)'s program that will draw arbitrary characters for you. But I guess it has to be a unicode character which the program has already analyzed, not your own made-up character.

There are some general rules, like left to right, top to bottom, all based on ease/smoothness of movement of the brush.

Sometimes there are different views about the order. The second stroke in the character for 'king' for example. Is it across, or down. In one Chinese dictionary I have it is across. In a Japanese dictionary, it is down. The order may depend on whether it is cursive, or careful, too perhaps.

I wonder how far it is possible to analyze the calligraphic construction of character Radicals are a start but are only part of the character. However most character recycle components seen in lots of other characters.

McNaughton and Li's Reading and Writing Chinese has a stroke-order index decomposing all characters into a sequence of just 4 primitive strokes.
http://books.google.com/books?id=G8pX7AFFyPgC&pg=PA328&lpg=PA326&vq=stroke+order +index&dq=%22McNaughton%22+%22Reading+and+Writing+Chinese%22+&sig=9PCu8Rw-LezGca vGbpuuoGFwG2g

Re:character analysis

slanning on 2006-08-29T09:10:02

Thanks, the csulb.edu link looks close to what I'm looking for. I hadn't come across that one in searching, apparently. I know there are many sites on line that help you learn Chinese. For example, from that site you linked to, there is a page to learn to write Chinese, where if you click on a character, an animated gif shows the order in which you draw the character. But that's the problem, it's an animated gif, not a database! Same with this Chinese Writing Master application, I'd like to find the "raw data". What I want, as a programmer, is to be able to write my own application so that I can learn how I want to, not from a webpage online or from some Windows application.

Ideally I'd like a database listing all of the components of the characters, not just the radicals, and some way of identifying which strokes those components are. Maybe it's just too hard. Like how would I manually draw the characters given the description of their strokes? Probably what I have in mind isn't feasible.

About the book "Reading and Writing Chinese", I actually have it. It's excellent. :) It shows the characters in an order so that you see how some characters are made of simpler ones. In fact I looked up in that book last night my explanation of 能 neng in my original post, and I found that I'd made a couple of mistakes. The top-left character is the "cocoon" radical (or was it "private"... or "coil"...? ack.. :), while the ones on the right aren't 七 seven but rather 匕 ladel.

Re:character recognition

mr_bean on 2006-08-30T22:20:52

Most of the effects of the programs looked at here could be achieved with a drawing program and an image manipulation program.

A more interesting area to look at for help would be hand-written text to character recognition. I imagine such programs exist, but don't know anything about them.

They might analyze characters into strokes.

On the other hand, a demo of a Chinese speech-to-text program I saw was quite impressive. And Chinese text-to-speech programs exist too.

The example of 七 and 匕 indicates character recognition would be quite difficult. The first stroke of 七 is drawn left to right, and that of 匕 is right to left. (Both the Chinese and Japanese dictionary I have disagree with the McNaughton and Li stroke order for 匕. Perhaps 4 primitives is not enough.)

天 and 夭 are different in the same way.