Natural Language Processing
Teaching machines to understand, interpret, and generate human language.
The NLP Challenge
Natural language is ambiguous, context-dependent, and infinitely variable. Unlike structured data, text requires specialized techniques to convert words into numerical representations that neural networks can process.
Word Representations
One-Hot Encoding
Represents each word as a binary vector with 1 at its index. Simple but inefficient (vocabulary-sized vectors) and captures no semantic relationships.
Word Embeddings
Dense vector representations (typically 100-300 dimensions) that capture semantic relationships. Similar words have similar vectors. Revolutionary breakthrough for NLP.
- Word2Vec (2013): Skip-gram and CBOW models. Learns from word co-occurrence
- GloVe (2014): Global vectors. Uses matrix factorization of co-occurrence statistics
- FastText (2016): Subword embeddings. Handles out-of-vocabulary words
Contextual Embeddings
Modern embeddings (ELMo, BERT) generate different vectors for the same word based on context. "Bank" in "river bank" vs "savings bank" gets different representations.
Sequence-to-Sequence Models
Many NLP tasks involve transforming one sequence into another: translation, summarization, question answering. Seq2seq models use an encoder-decoder architecture.
Architecture
- Encoder: Processes input sequence, creates context vector (RNN/LSTM)
- Context Vector: Fixed-size representation of input
- Decoder: Generates output sequence using context vector
Problem: Fixed context vector is a bottleneck for long sequences. Solution: Attention mechanism.
Attention Mechanism
Attention allows the decoder to "look at" different parts of the input at each step. Instead of compressing everything into one context vector, the model learns where to focus.
How It Works
- Compute attention scores between decoder state and all encoder states
- Apply softmax to get attention weights (probabilities)
- Compute weighted sum of encoder states (context vector)
- Use context vector for prediction
This breakthrough enabled transformers and revolutionized NLP. "Attention Is All You Need" (2017) showed attention alone, without RNNs, could achieve state-of-the-art results.
Common NLP Tasks
Text Classification
Sentiment analysis, topic categorization, spam detection
Named Entity Recognition
Identify and classify entities (people, organizations, locations)
Machine Translation
Translate text between languages
Question Answering
Extract or generate answers from context
Summarization
Generate concise summaries of longer texts
Text Generation
Create coherent text, dialogue, stories
Key Takeaways
- →Word embeddings capture semantic relationships in dense vectors
- →Contextual embeddings provide word representations based on context
- →Attention mechanisms allow models to focus on relevant input parts
- →Transformers replaced RNNs as the dominant NLP architecture