Skip to content

Uzi-gpu/nlp-projects

Repository files navigation

πŸ’¬ NLP Projects

Python TensorFlow NLTK License

A collection of Natural Language Processing projects demonstrating expertise in Text Generation, Sequence Modeling, and Language Understanding using TensorFlow, NLTK, and modern NLP techniques.


πŸ“‹ Table of Contents


πŸš€ Projects Overview

# Project Task Notebook Technique
1 Text Generator Language Modeling 01_text_generator.ipynb RNN/LSTM Sequence Generation
2 NLP Final Project Comprehensive NLP 02_nlp_final_project.ipynb Multiple NLP Tasks

πŸ› οΈ Technologies Used

Core NLP Libraries

  • TensorFlow/Keras - Deep learning for NLP
  • NLTK - Natural Language Toolkit
  • spaCy - Industrial-strength NLP
  • Transformers - State-of-the-art models (optional)

Text Processing

  • Tokenization - Word and sentence splitting
  • Lemmatization & Stemming - Word normalization
  • Stop Words Removal - Text cleaning
  • Word Embeddings - Word2Vec, GloVe

Deep Learning for NLP

  • RNNs - Recurrent Neural Networks
  • LSTMs - Long Short-Term Memory
  • GRUs - Gated Recurrent Units
  • Attention Mechanisms - Focus on relevant parts

πŸ“¦ Installation

Prerequisites

  • Python 3.8 or higher

Setup Instructions

  1. Clone the repository

    git clone https://github.com/uzi-gpu/nlp-projects.git
    cd nlp-projects
  2. Create a virtual environment

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\\Scripts\\activate
  3. Install dependencies

    pip install -r requirements.txt
  4. Download NLTK data (if needed)

    import nltk
    nltk.download('punkt')
    nltk.download('stopwords')
    nltk.download('wordnet')
  5. Launch Jupyter Notebook

    jupyter notebook

πŸ“Š Project Details

1. πŸ“ Text Generator

File: 01_text_generator.ipynb

Objective: Build a character-level or word-level text generator using Recurrent Neural Networks

Task: Language Modeling & Text Generation

Architecture:

  • Input: Sequences of characters/words
  • Model: LSTM/GRU layers
  • Output: Next character/word prediction

Implementation:

1. Data Preprocessing:

  • βœ… Text corpus loading
  • βœ… Tokenization (character or word-level)
  • βœ… Sequence creation
  • βœ… Vocabulary building
  • βœ… One-hot encoding or embeddings

2. Model Architecture:

Model: Sequential
β”œβ”€β”€ Embedding Layer (word-level) OR Input Layer (char-level)
β”œβ”€β”€ LSTM/GRU Layers (stacked)
β”œβ”€β”€ Dropout (regularization)
β”œβ”€β”€ Dense Layer
└── Softmax (probability distribution)

3. Training:

  • βœ… Teacher forcing
  • βœ… Cross-entropy loss
  • βœ… Adam optimizer
  • βœ… Perplexity tracking

4. Text Generation:

  • βœ… Seed text input
  • βœ… Sampling strategies (greedy, temperature, top-k)
  • βœ… Beam search (optional)
  • βœ… Diverse output generation

Key Features:

  • Character-level generation for creative text
  • Word-level generation for coherent sentences
  • Temperature-controlled creativity
  • Sequence padding and batching

Applications:

  • Creative writing assistance
  • Code generation
  • Poetry/story generation
  • Chatbot responses

2. πŸŽ“ NLP Final Project

File: 02_nlp_final_project.ipynb

Objective: Comprehensive NLP project covering multiple language processing tasks

Tasks Covered:

1. Text Preprocessing Pipeline:

  • βœ… Tokenization
  • βœ… Lowercasing
  • βœ… Stop words removal
  • βœ… Punctuation handling
  • βœ… Lemmatization/Stemming
  • βœ… Text normalization

2. Feature Extraction:

  • βœ… Bag of Words (BoW)
  • βœ… TF-IDF (Term Frequency-Inverse Document Frequency)
  • βœ… N-grams
  • βœ… Word embeddings (Word2Vec, GloVe)

3. NLP Tasks:

  • Text Classification
  • Sentiment Analysis
  • Named Entity Recognition (NER)
  • Part-of-Speech (POS) Tagging
  • Text Summarization
  • Language Translation (if applicable)

4. Advanced Techniques:

  • βœ… Sequence-to-Sequence models
  • βœ… Attention mechanisms
  • βœ… Transfer learning with pre-trained models
  • βœ… Fine-tuning BERT/GPT (optional)

Pipeline:

Raw Text β†’ Preprocessing β†’ Feature Extraction β†’ Model Training β†’ Evaluation β†’ Deployment

Evaluation Metrics:

  • Classification: Accuracy, Precision, Recall, F1-Score
  • Generation: BLEU, ROUGE, Perplexity
  • NER: Entity-level F1

πŸ“š Key NLP Concepts Demonstrated

Text Preprocessing

  1. Tokenization - Breaking text into words/sentences
  2. Normalization - Lowercasing, stemming, lemmatization
  3. Stop Words - Removing common words
  4. Special Characters - Cleaning punctuation

Feature Engineering

  1. Bag of Words - Simple word frequency
  2. TF-IDF - Term importance weighting
  3. Word Embeddings - Dense vector representations
  4. Contextual Embeddings - BERT, ELMo

Sequence Modeling

  1. RNNs - Recurrent architectures
  2. LSTMs - Long-term dependencies
  3. GRUs - Gated mechanisms
  4. Bidirectional RNNs - Context from both directions

Advanced NLP

  1. Attention Mechanisms - Focus on relevant parts
  2. Transformer Architecture - Self-attention
  3. Transfer Learning - Pre-trained models
  4. Fine-tuning - Task-specific adaptation

πŸ† Results

Text Generator

  • Perplexity: Achieved low perplexity indicating good language modeling
  • Coherence: Generated text shows grammatical structure
  • Creativity: Temperature parameter controls diversity
  • Quality: Longer sequences maintain context

NLP Final Project

  • Classification Accuracy: High performance on text classification tasks
  • Feature Engineering: TF-IDF outperforms BoW
  • Model Comparison: Deep learning models excel on complex tasks
  • Pipeline: End-to-end NLP workflow successfully implemented

πŸŽ“ Learning Outcomes

Through these projects, I have demonstrated proficiency in:

  1. NLP Fundamentals

    • Text preprocessing and cleaning
    • Tokenization strategies
    • Feature extraction techniques
    • Vocabulary management
  2. Deep Learning for NLP

    • Recurrent architectures (RNN, LSTM, GRU)
    • Sequence-to-sequence models
    • Attention mechanisms
    • Loss functions for language tasks
  3. Practical NLP

    • Data pipeline creation
    • Model training and evaluation
    • Text generation strategies
    • Real-world application development
  4. Advanced Topics

    • Transfer learning in NLP
    • Word embeddings
    • Language modeling
    • Evaluation metrics (BLEU, perplexity)

πŸ“§ Contact

Uzair Mubasher - BSAI Graduate

LinkedIn Email GitHub


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • NLTK and spaCy communities
  • TensorFlow/Keras documentation
  • NLP course instructors and resources

⭐ If you found this repository helpful, please consider giving it a star!

About

Natural Language Processing projects with text generation, sentiment analysis, and fundamental NLP techniques using TensorFlow and NLTK.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors