| Natural language processing (NLP) is a subfield of | | | | Does the school look little? |
| artificial intelligence and linguistics. It studies the | | | | Do the girls look little? |
| problems of automated generation and | | | | Do the girls look pretty? |
| understanding of natural human languages. Natural | | | | Does the school look pretty? |
| language generation systems convert information | | | | Subproblems |
| from computer databases into normal-sounding | | | | Speech segmentation |
| human language, and natural language | | | | In most spoken languages, the sounds |
| understanding systems convert samples of | | | | representing successive letters blend into each |
| human language into more formal representations | | | | other, so the conversion of the analog signal to |
| that are easier for computer programs to | | | | discrete characters can be a very difficult |
| manipulate. | | | | process. Also, in natural speech there are hardly |
| Tasks and limitations | | | | any pauses between successive words; the |
| In theory natural language processing is a very | | | | location of those boundaries usually must take into |
| attractive method of human-computer interaction. | | | | account grammatical and semantical constraints, |
| Early systems such as SHRDLU, working in | | | | as well as the context. |
| restricted "blocks worlds" with restricted | | | | Text segmentation |
| vocabularies, worked extremely well, leading | | | | Some written languages like Chinese, Japanese |
| researchers to excessive optimism which was | | | | and Thai do not have single word boundaries |
| soon lost when the systems were extended to | | | | either, so any significant text parsing usually |
| more realistic situations with real-world ambiguity | | | | requires the identification of word boundaries, |
| and complexity. | | | | which is often a non-trivial task. |
| Natural language understanding is sometimes | | | | Word sense disambiguation |
| referred to as an AI-complete problem, because | | | | Many words have more than one meaning; we |
| natural language recognition seems to require | | | | have to select the meaning which makes the |
| extensive knowledge about the outside world and | | | | most sense in context. |
| the ability to manipulate it. The definition of | | | | Syntactic ambiguity |
| "understanding" is one of the major problems in | | | | The grammar for natural languages is ambiguous, |
| natural language processing. | | | | i.e. there are often multiple possible parse trees |
| Concrete problems | | | | for a given sentence. Choosing the most |
| Some examples of the problems faced by natural | | | | appropriate one usually requires semantic and |
| language understanding systems: | | | | contextual information. Specific problem |
| The sentences We gave the monkeys the | | | | components of syntactic ambiguity include |
| bananas because they were hungry and We gave | | | | sentence boundary disambiguation. |
| the monkeys the bananas because they were | | | | Imperfect or irregular input |
| over-ripe have the same surface grammatical | | | | Foreign or regional accents and vocal impediments |
| structure. However, in one of them the word | | | | in speech; typing or grammatical errors, OCR |
| they refers to the monkeys, in the other it refers | | | | errors in texts. |
| to the bananas: the sentence cannot be | | | | Speech acts and plans |
| understood properly without knowledge of the | | | | Sentences often don't mean what they literally |
| properties and behaviour of monkeys and | | | | say; for instance a good answer to "Can you |
| bananas. | | | | pass the salt" is to pass the salt; in most |
| A string of words may be interpreted in myriad | | | | contexts "Yes" is not a good answer, although |
| ways. For example, the string Time flies like an | | | | "No" is better and "I'm afraid that I can't see it" is |
| arrow may be interpreted in a variety of ways: | | | | better yet. Or again, if a class was not offered |
| time moves quickly just like an arrow does; | | | | last year, "The class was not offered last year" is |
| measure the speed of flying insects like you | | | | a better answer to the question "How many |
| would measure that of an arrow - i.e. (You should) | | | | students failed the class last year?" than "None" is. |
| time flies like you would an arrow.; | | | | Statistical NLP |
| measure the speed of flying insects like an arrow | | | | Statistical natural language processing uses |
| would - i.e. Time flies in the same way that an | | | | stochastic, probabilistic and statistical methods to |
| arrow would (time them).; | | | | resolve some of the difficulties discussed above, |
| measure the speed of flying insects that are like | | | | especially those which arise because longer |
| arrows - i.e. Time those flies that are like arrows; | | | | sentences are highly ambiguous when processed |
| a type of flying insect, "time-flies," enjoy arrows | | | | with realistic grammars, yielding thousands or |
| (compare Fruit flies like a banana.) | | | | millions of possible analyses. Methods for |
| English is particularly challenging in this regard | | | | disambiguation often involve the use of corpora |
| because it has little inflectional morphology to | | | | and Markov models. The technology for statistical |
| distinguish between parts of speech. | | | | NLP comes mainly from machine learning and data |
| English and several other languages don't specify | | | | mining, both of which are fields of artificial |
| which word an adjective applies to. For example, | | | | intelligence that involve learning from data. |
| in the string "pretty little girls' school". | | | | |