Module 3: Analyzing text content with natural language processing

AI-aided content analysis of sustainability communication

Lesson 3.2: Part-of-speech and named entity recognition

lecture slides

lecture video

lecture text

Token Relationships and Knowledge Graphs

Understanding token relationships is fundamental in NLP, where aspect-based sentiment analysis identifies sentiment tied to specific aspects of a text. Knowledge graphs further enrich this understanding by mapping relationships between entities, enabling sophisticated insights into connections and context within the data.

Units of NLP Analysis

Natural language processing operates across multiple levels of granularity, from complete texts and paragraphs to sentences, words, and individual tokens. Each unit provides unique insights, with tokens representing the smallest meaningful elements, forming the foundation for complex linguistic analysis.

Applying SpaCy NLP Models to Dataframes

SpaCy’s powerful NLP models can be seamlessly integrated with text and sentence dataframes, enabling batch processing of textual data. This application simplifies linguistic analysis and provides structured outputs, such as token attributes and syntactic dependencies, directly usable for further processing.

Iterating Over Sentence Dataframes and SpaCy Documents

Iterating over a sentence-level dataframe and corresponding SpaCy document objects allows detailed exploration of linguistic features. This method facilitates tasks like extracting sentence-specific attributes, analyzing syntactic structures, and identifying patterns within textual data.

Text Normalization and Token Attributes

Text normalization ensures consistency and clarity by standardizing tokens, extracting essential attributes like token text and lemma. Lemmatization, in particular, reduces words to their base forms, enhancing search, indexing, and overall NLP model performance.

Text Inference: Named Entity Recognition (NER)

Named entity recognition (NER) identifies and classifies entities such as names, organizations, and dates within text, enabling applications like automated content categorization, customer sentiment analysis, and improved information retrieval systems.

Text Inference: Part-of-Speech Tagging (POS)

Part-of-speech tagging assigns grammatical categories to tokens, such as nouns, verbs, or adjectives, providing critical insights into text structure. Applications include syntactic parsing, language generation, and improving machine translation systems by capturing grammatical nuances.