Thoughts on the Semantic Web

ziggy on 2003-10-18T21:20:32

Here's an example on why the semantic web is a hard problem to crack. The following is a conversation from earlier today at a coffee shop:

I heard they found a way to handle the death of that commedian.

How?

Well, they're going to kill off the character in the show, like they said they would. Then they're going to bring in Jim Rockford as Peg Bundy's dad, and Rockford is going to help raise his granddaughters.

Oh, that makes sense. He's well liked now.

Why? He hasn't been on TV for 30 years.

But he was in that movie about the sisterhood. And he was in that space movie with Al Gore's roommate, Dirty Harry, and that guy from The Dirty Dozen and Invasion of the Body Snatchers.

Yeah, I guess he's pretty popular these days.

Note that this conversation was in no way confusing to the two people discussing John Ritter, even though the subjects being discussed are rather ambiguous, and none of the subjects are referred to directly.


problem

inkdroid on 2003-10-18T22:56:36

None of them have URIs or namespaces either :)

Re:problem

ziggy on 2003-10-19T16:23:22

URIs and namespaces don't solve the problem. I could have expressed those same statements using a set of namespaces completely foreign to you (or your SemWeb agent), and these statements would have made as much sense as sucking a few megabytes from /dev/zero.

Re:problem

inkdroid on 2003-10-19T17:21:32

Yah, if they are foreign there not going to be of much help. Isn't that what "foreign" means?

Uhhh...

jordan on 2003-10-18T23:07:51

I thought that documents containing information like this conversation are exactly what motivates the Semantic Web.

I'm certainly not an expert in this area, but I thought the idea was to add annotations and structure to documents on the web to provide sufficient context to categorize data. A few notes to that conversation would allow you to zero right in on the fact that it's about John Ritter and the show Eight Simple Rules for dating my daughter.

What that conversation demonstrates is how difficult it is to gather information WITHOUT additional contextual information.

Tim Berners-Lee says it better than me:


A Semantic Web is not Artificial Intelligence

  • The concept of machine-understandable documents does not imply some magical artificial intelligence which allows machines to comprehend human mumblings. It only indicates a machine's ability to solve a well-defined problem by performing well-defined operations on existing well-defined data. Instead of asking machines to understand people's language, it involves asking people to make the extra effort


Of course, if a conversation like that one were put in a document on the web without any additional notations, then it would not add to the Semantic Web, but it wouldn't distract from it either, such a document would be outside of the Semantic Web.

Re:Uhhh...

ziggy on 2003-10-19T16:41:26

I thought that documents containing information like this conversation are exactly what motivates the Semantic Web.
In a sense, yes. The web cannot be automated if we're dealing with bits of HTML like «<i>Programming Perl, 3ed</i>».

At the same time, it's a hard problem because simply couching everything in RDF triples doesn't solve the problem in one fell swoop. First, everyone has to agree on the namespaces and semantic meanings of the subjects, objects and predicates we are sharing. That's a hard problem in and of itself, mostly social in nature. Even if we get that social agreement, there's a bigger issue that's not being solved.

In the conversation fragment above, everything was perfectly clear to the people in the conversation, but the subjects being discussed referred to with at least one level of indirection. For example, what does Peg Bundy mean in this context? It could be the character in a comedy that's thankfully off the air, or it could mean the actress Katey Segal who played that character.

People can infer the hidden context, but with semweb agents cannot.

Similarly, the indirections were all over the map. This conversation would be equally meaningful if it dealt with Leela and K instead of Peg Bundy and Al Gore's roommate. Presumably, each of these four assertions are stored in a totally different area of the semweb, using totally different URIs (with totally different meanings). Making these kinds of jumps easy for an agent is a hard problem to solve.

semantic beauty

inkdroid on 2003-10-19T14:28:43

Computers are exceptionally good at doing mind numbingly boring things, mind numbingly fast. So having to say the unsaid, "John Ritter" in this case, isn't necessarily a problem. Well, it is a problem for the programmer who has to create the web-service (or what have you), but there are lots of programmers :-) and this problem can be solved one at a time. The semantic web is just about programmers working together to build applications that can talk to each other in the same way that they wrote programs to build the infrastructure we call the WWW.

Your post was interesting and sparked some other random thoughts: I was in the car driving to a meeting this weekend, and noticed a huge tree by the side of the road swaying in the breeze. I was thinking how beautiful it looked, and wondered why I thought it looked beautiful: the invisible breeze that is made visible by the tree, the autumn colors, all the individual movements that came together to make one fluid, undulating movement. This is data that my "hardware" is built to receive, and my "software" found interesting. Perhaps computers as they begin to share data will start to perceive a beauty that we cannot even comprehend, because of the intricateness, and alieness. Which reminds me of how Kasparov felt like Big Blue was an "alien intelligence", and of some of Vernor Vinge's work. Perhaps I also just drank too much coffee :-)