Module 3: Analyzing text content with natural language processing
AI-aided content analysis of sustainability communication
Lesson 3.2: Part-of-speech and named entity recognition
lecture slides
lecture video
lecture text
Token Relationships and Knowledge Graphs
Understanding token relationships is fundamental in NLP, where aspect-based sentiment analysis identifies sentiment tied to specific aspects of a text. Knowledge graphs further enrich this understanding by mapping relationships between entities, enabling sophisticated insights into connections and context within the data.
Units of NLP Analysis
Natural language processing operates across multiple levels of granularity, from complete texts and paragraphs to sentences, words, and individual tokens. Each unit provides unique insights, with tokens representing the smallest meaningful elements, forming the foundation for complex linguistic analysis.
Applying SpaCy NLP Models to Dataframes
SpaCy’s powerful NLP models can be seamlessly integrated with text and sentence dataframes, enabling batch processing of textual data. This application simplifies linguistic analysis and provides structured outputs, such as token attributes and syntactic dependencies, directly usable for further processing.
Iterating Over Sentence Dataframes and SpaCy Documents
Iterating over a sentence-level dataframe and corresponding SpaCy document objects allows detailed exploration of linguistic features. This method facilitates tasks like extracting sentence-specific attributes, analyzing syntactic structures, and identifying patterns within textual data.
Text Normalization and Token Attributes
Text normalization ensures consistency and clarity by standardizing tokens, extracting essential attributes like token text and lemma. Lemmatization, in particular, reduces words to their base forms, enhancing search, indexing, and overall NLP model performance.
Text Inference: Named Entity Recognition (NER)
Named entity recognition (NER) identifies and classifies entities such as names, organizations, and dates within text, enabling applications like automated content categorization, customer sentiment analysis, and improved information retrieval systems.
Text Inference: Part-of-Speech Tagging (POS)
Part-of-speech tagging assigns grammatical categories to tokens, such as nouns, verbs, or adjectives, providing critical insights into text structure. Applications include syntactic parsing, language generation, and improving machine translation systems by capturing grammatical nuances.