Shallow parsing
Shallow parsing (also chunking or light parsing) is an analysis of a sentence which first identifies constituent parts of sentences (nouns, verbs, adjectives, etc.) and then links them to higher order units that have discrete grammatical meanings (noun groups or phrases, verb groups, etc.). While the most elementary chunking algorithms simply link constituent parts on the basis of elementary search patterns (e.g., as specified by regular expressions), approaches that use machine learning techniques (classifiers, topic modeling, etc.) can take contextual information into account and thus compose chunks in such a way that they better reflect the semantic relations between the basic constituents.[1] That is, these more advanced methods get around the problem that combinations of elementary constituents can have different higher level meanings depending on the context of the sentence.
It is a technique widely used in natural language processing. It is similar to the concept of lexical analysis for computer languages. Under the name "shallow structure hypothesis", it is also used as an explanation for why second language learners often fail to parse complex sentences correctly.[2]
References
Citations
Sources
- "NP Chunking (State of the art)". Association for Computational Linguistics. Retrieved 2016-01-30.
- Abney, Steven (1991). "Parsing By Chunks | Principle-Based Parsing" (PDF). www.vinartus.net. pp. 257–278.
External links
- Apache OpenNLP OpenNLP includes a chunker.
- GATE General Architecture for Text Engineering GATE includes a chunker.
- NLTK chunking
- Illinois Shallow Parser Shallow Parser Demo
See also
- Parser
- Semantic role labeling
- Named entity recognition
- v
- t
- e
- AI-complete
- Bag-of-words
- n-gram
- Computational linguistics
- Natural-language understanding
- Stop words
- Text processing
- Collocation extraction
- Concept mining
- Coreference resolution
- Deep linguistic processing
- Distant reading
- Information extraction
- Named-entity recognition
- Ontology learning
- Parsing
- Part-of-speech tagging
- Semantic role labeling
- Semantic similarity
- Sentiment analysis
- Terminology extraction
- Text mining
- Textual entailment
- Truecasing
- Word-sense disambiguation
- Word-sense induction
Text segmentation |
---|
datasets and corpora
Types and standards | |
---|---|
Data |
and data capture
reviewing
user interface
- Hallucination
- Natural Language Toolkit
- spaCy
![]() | This computational linguistics-related article is a stub. You can help Wikipedia by expanding it. |
- v
- t
- e