Natural Language Processing
Core Concepts
- Natural Language Processing
- Bag of Words TF-IDF Explained
- Named Entity Recognition (NER)
- N-grams in NLP
- POS Tagging in NLP
- Stemming & Lemmatization
- Stopword Removal in NLP
- Tokenization
- Word Embeddings for NLP
Program(s)
- Build a Chatbot Using NLP
- Extracting Meaning from Text Using NLP in Python
- Extracting Email Addresses Using NLP in Python
- Extracting Names of People, Cities, and Countries Using NLP
- Format Email Messages Using NLP
- N-gram program
- Resume Skill Extraction Using NLP
- Sentiment Analysis in NLP
- Optimizing Travel Routes Using NLP & TSP Algorithm in Python
N-gram programs
N-gram Examples and Implementations
Example 1: Generating N-grams in Python
Let’s generate N-grams using Python’s NLTK library.
Code Implementation:
import nltk
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
text = "I love natural language processing"
tokens = word_tokenize(text.lower())
# Generate Bigrams
bigrams = list(ngrams(tokens, 2))
print(bigrams)
Output:
[('i', 'love'), ('love', 'natural'), ('natural', 'language'), ('language', 'processing')]
Example 2: N-gram Frequency Analysis
N-grams are often used to determine the most common word pairs in a dataset.
Code Implementation:
from collections import Counter
text = "I love NLP. NLP is fun. NLP helps in text analysis."
tokens = word_tokenize(text.lower())
# Generate bigrams
bigrams = list(ngrams(tokens, 2))
# Count frequency
bigram_freq = Counter(bigrams)
print(bigram_freq.most_common(2)) # Top 2 bigrams
Output:
[(('nlp', 'is'), 1), (('is', 'fun'), 1)]
Example 3: N-gram Language Modeling
N-grams can be used to predict the next word in a sequence.
Code Implementation:
from nltk.lm import MLE
from nltk.lm.preprocessing import padded_everygram_pipeline
# Training data
text_data = [['i', 'love', 'nlp'], ['nlp', 'is', 'amazing']]
n = 2 # Bigrams
# Prepare data
train_data, vocab = padded_everygram_pipeline(n, text_data)
# Train the model
model = MLE(n)
model.fit(train_data, vocab)
# Predict probability of next word
print(model.score("nlp", ["i", "love"])) # Probability of 'nlp' given ['i', 'love']
Example 4: N-gram for Text Prediction (Autocomplete)
N-grams help in predicting the next word in applications like search engines.
Example:
- Input:
"machine"
- Prediction using bigrams:
"learning"
,"translation"
,"vision"
- Prediction using trigrams:
"learning algorithms"
,"translation techniques"
,"vision models"
Example 5: N-gram for Sentiment Analysis
Sentiment classification can be improved using N-grams, as they capture word context.
Code Implementation:
from sklearn.feature_extraction.text import CountVectorizer
text_data = ["I love NLP", "NLP is difficult", "Machine learning is fun"]
# Create bigram model
vectorizer = CountVectorizer(ngram_range=(2,2))
X = vectorizer.fit_transform(text_data)
print(vectorizer.get_feature_names_out())
Output:
['i love', 'love nlp', 'nlp is', 'is difficult', 'machine learning', 'learning is', 'is fun']