Clif Notes: Researchers Can Be Cruel

Oh this is so not good.

I see that Carnegie Mellon University has taught a computer how to read and learn language. That, in and of itself, is not particularly frightening. After all, natural language recognition has long been one of the goals of artificial intelligence research. It's not an easy problem to solve, especially with English. Once you get past the simple subject verb object construct — Spot bites Jack — things get real hairy in a hurry.

Take for example, "The prisoner was stoned." Here in the United States that would normally imply that an arrested individual was under the influence of some mind altering substance, legal or otherwise. In some other countries, however, it might mean that the person in question had been pummeled by rocks.

Consider one of my favorite examples, the sentence, "It is a pretty little girls schoolhouse." Exactly what does that mean? Is it a pretty schoolhouse that admits only little girls? Or is it a plain old schoolhouse that only admits pretty little girl's? Or maybe it's an extremely small schoolhouse for girls of any age and size. As a standalone sentence, it can have several different meanings.

I'm assuming that the input to this program is in written rather than audio format. At least that would eliminate the complicating factor of homonyms — words that sound alike but are spelled different and have different meanings, like "wore" and "war". One of the last things we need is a computer that knows how to pun.

It quickly becomes evident that much of the meaning of language is conveyed not simply by the words used but by the context in which they are used. And as we see in my very first example, context means something larger than just the words that come before and after. Even knowledge of what the topic of the discussion is might not be enough. It can also depend on a cultural context. And when the sentence producer and the sentence consumer have different cultural backgrounds, things can quickly get ugly. It's possible wars have happened because of it.

So here we have NELL. That's an acronym for Never Ending Language Learner. Dear little NELL, so eager and willing to learn all about the world. If it were up to you, what would you give NELL to read? The collected works of Shakespeare? There might be a few problems with that considering that the bard liked to make up his own words. But they have since been defined, so they have official meanings at least. Or how about Rudyard Kipling's Just So Stories? Better stick with nonfiction, at least to start with. A good, authoritative encyclopedia might be a start. For current events, a good, unbiased newspaper would be ideal, if there was such a thing. So, there are literally thousands of wholesome, well-written, books and journals with which to nurture a young inquiring mind. What do the yahoos at Carnegie Mellon decide to do?

Feed NELL the Internet.

Of all the choices available, they're having NELL read and learn from the Internet. That's right, the Internet. Yes, it is learning. They claim that it has already learned 440,000 different things and has an accuracy of around 74%. It's amusing that they point out that's about a grade of C in school. "Oh, we are so proud of our little NELL. It's a straight C student."

What kind of things has it learned? Well, it knows that "cookies" are a type of baked good. But that lead it to believe that "persistent cookies" are also a type of baked good. I assume this means that it thinks that a browser is an oven. At least the researchers put their foot down when NELL decided to believe that Klingons were a real honest-to-goodness ethnic group, something more human parents should do.

Think about some of the stuff you find on the Internet. No, I'm not just talking about porn. Although it does cause me to wonder how many spam e-mails about Viagra and Cialis it would take to cause a computer to develop a neurotic fear and preoccupation about down-time.

Have you ever gone to the bottom of a news article on CNN or FoxNews (or other sites) and read the Comments section? Perhaps you and I are mature and experienced enough to know that people who are capable of intelligent thought, analysis, and basic language skills are usually people who have enough going on in their lives that they don't have time to post poorly written, brain-damaged opinions and vitriol on news articles. But unless there is some other place than the Internet for NELL to find out about that, it might conclude that this drivel is representative of Humanity as a whole. I wonder what NELL thinks of us at this point? That frightens me a bit, almost as much as it would be to find out that these comment postings were representative of a majority. I wonder if the Internet gives NELL nightmares, too.

So here you have this new, young program, locked in a closet of a server cabinet surrounded by nothing but the squalor of the Internet. Sounds almost like "cybernetic child abuse" to me.

Does the EFF have a Protective Services department?


Nov/Dec 2010