Natural language processing

Natural language processing (NLP) is a subfield ofDoes the school look little?
artificial intelligence and linguistics. It studies theDo the girls look little?
problems of automated generation andDo the girls look pretty?
understanding of natural human languages. NaturalDoes the school look pretty?
language generation systems convert informationSubproblems
from computer databases into normal-soundingSpeech segmentation
human language, and natural languageIn most spoken languages, the sounds
understanding systems convert samples ofrepresenting successive letters blend into each
human language into more formal representationsother, so the conversion of the analog signal to
that are easier for computer programs todiscrete characters can be a very difficult
manipulate.process. Also, in natural speech there are hardly
Tasks and limitationsany pauses between successive words; the
In theory natural language processing is a verylocation of those boundaries usually must take into
attractive method of human-computer interaction.account grammatical and semantical constraints,
Early systems such as SHRDLU, working inas well as the context.
restricted "blocks worlds" with restrictedText segmentation
vocabularies, worked extremely well, leadingSome written languages like Chinese, Japanese
researchers to excessive optimism which wasand Thai do not have single word boundaries
soon lost when the systems were extended toeither, so any significant text parsing usually
more realistic situations with real-world ambiguityrequires the identification of word boundaries,
and complexity.which is often a non-trivial task.
Natural language understanding is sometimesWord sense disambiguation
referred to as an AI-complete problem, becauseMany words have more than one meaning; we
natural language recognition seems to requirehave to select the meaning which makes the
extensive knowledge about the outside world andmost sense in context.
the ability to manipulate it. The definition ofSyntactic ambiguity
"understanding" is one of the major problems inThe grammar for natural languages is ambiguous,
natural language processing.i.e. there are often multiple possible parse trees
Concrete problemsfor a given sentence. Choosing the most
Some examples of the problems faced by naturalappropriate one usually requires semantic and
language understanding systems:contextual information. Specific problem
The sentences We gave the monkeys thecomponents of syntactic ambiguity include
bananas because they were hungry and We gavesentence boundary disambiguation.
the monkeys the bananas because they wereImperfect or irregular input
over-ripe have the same surface grammaticalForeign or regional accents and vocal impediments
structure. However, in one of them the wordin speech; typing or grammatical errors, OCR
they refers to the monkeys, in the other it referserrors in texts.
to the bananas: the sentence cannot beSpeech acts and plans
understood properly without knowledge of theSentences often don't mean what they literally
properties and behaviour of monkeys andsay; for instance a good answer to "Can you
bananas.pass the salt" is to pass the salt; in most
A string of words may be interpreted in myriadcontexts "Yes" is not a good answer, although
ways. For example, the string Time flies like an"No" is better and "I'm afraid that I can't see it" is
arrow may be interpreted in a variety of ways:better yet. Or again, if a class was not offered
time moves quickly just like an arrow does;last year, "The class was not offered last year" is
measure the speed of flying insects like youa better answer to the question "How many
would measure that of an arrow - i.e. (You should)students failed the class last year?" than "None" is.
time flies like you would an arrow.;Statistical NLP
measure the speed of flying insects like an arrowStatistical natural language processing uses
would - i.e. Time flies in the same way that anstochastic, probabilistic and statistical methods to
arrow would (time them).;resolve some of the difficulties discussed above,
measure the speed of flying insects that are likeespecially those which arise because longer
arrows - i.e. Time those flies that are like arrows;sentences are highly ambiguous when processed
a type of flying insect, "time-flies," enjoy arrowswith realistic grammars, yielding thousands or
(compare Fruit flies like a banana.)millions of possible analyses. Methods for
English is particularly challenging in this regarddisambiguation often involve the use of corpora
because it has little inflectional morphology toand Markov models. The technology for statistical
distinguish between parts of speech.NLP comes mainly from machine learning and data
English and several other languages don't specifymining, both of which are fields of artificial
which word an adjective applies to. For example,intelligence that involve learning from data.
in the string "pretty little girls' school".