Natural Language Processing
Fundamental Concepts
- Tokenization
- Stemming
- Lemmatization
- POS Tagging
- Named Entity Recognition
- Stopword Removal
- Syntax
- Dependency Parsing
- Parsing
- Chunking
Text Processing & Cleaning
- Text Normalization
- Bag of Words
- TF-IDF
- N-grams
- Word Embeddings
- Sentence Embeddings
- Document Similarity
- Cosine Similarity
- Text Vectorization
- Noise Removal
Tools, Libraries & APIs
- NLTK
- spaCy
- TextBlob
- Hugging Face Transformers
- Gensim
- OpenAI
- CoreNLP
- FastText
- Flair NLP
- ElasticSearch + NLP
Program(s)
- Build a Chatbot Using NLP
- Extracting Meaning from Text Using NLP in Python
- Extracting Email Addresses Using NLP in Python
- Extracting Names of People, Cities, and Countries Using NLP
- Format Email Messages Using NLP
- N-gram program
- Resume Skill Extraction Using NLP
- Sentiment Analysis in NLP
- Optimizing Travel Routes Using NLP & TSP Algorithm in Python
N-gram programs
N-gram Examples and Implementations
Example 1: Generating N-grams in Python
Let’s generate N-grams using Python’s NLTK library.
Code Implementation:
import nltkfrom nltk.util import ngramsfrom nltk.tokenize import word_tokenize
text = "I love natural language processing"tokens = word_tokenize(text.lower())
# Generate Bigramsbigrams = list(ngrams(tokens, 2))print(bigrams)
Output:
[('i', 'love'), ('love', 'natural'), ('natural', 'language'), ('language', 'processing')]
Example 2: N-gram Frequency Analysis
N-grams are often used to determine the most common word pairs in a dataset.
Code Implementation:
from collections import Counter
text = "I love NLP. NLP is fun. NLP helps in text analysis."tokens = word_tokenize(text.lower())
# Generate bigramsbigrams = list(ngrams(tokens, 2))
# Count frequencybigram_freq = Counter(bigrams)print(bigram_freq.most_common(2)) # Top 2 bigrams
Output:
[(('nlp', 'is'), 1), (('is', 'fun'), 1)]
Example 3: N-gram Language Modeling
N-grams can be used to predict the next word in a sequence.
Code Implementation:
from nltk.lm import MLEfrom nltk.lm.preprocessing import padded_everygram_pipeline
# Training datatext_data = [['i', 'love', 'nlp'], ['nlp', 'is', 'amazing']]n = 2 # Bigrams
# Prepare datatrain_data, vocab = padded_everygram_pipeline(n, text_data)
# Train the modelmodel = MLE(n)model.fit(train_data, vocab)
# Predict probability of next wordprint(model.score("nlp", ["i", "love"])) # Probability of 'nlp' given ['i', 'love']
Example 4: N-gram for Text Prediction (Autocomplete)
N-grams help in predicting the next word in applications like search engines.
Example:
- Input:
"machine"
- Prediction using bigrams:
"learning"
,"translation"
,"vision"
- Prediction using trigrams:
"learning algorithms"
,"translation techniques"
,"vision models"
Example 5: N-gram for Sentiment Analysis
Sentiment classification can be improved using N-grams, as they capture word context.
Code Implementation:
from sklearn.feature_extraction.text import CountVectorizer
text_data = ["I love NLP", "NLP is difficult", "Machine learning is fun"]
# Create bigram modelvectorizer = CountVectorizer(ngram_range=(2,2))X = vectorizer.fit_transform(text_data)
print(vectorizer.get_feature_names_out())
Output:
['i love', 'love nlp', 'nlp is', 'is difficult', 'machine learning', 'learning is', 'is fun']