Skip to content

EvanGks/shakespeare-text-generation-transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Shakespearean Text Generation with Transformer

TensorFlow Python License: MIT Jupyter Run on Kaggle

A comprehensive, from-scratch implementation of the Transformer architecture for generating Shakespearean text, built with TensorFlow and designed for deep learning research and creative AI.


πŸš€ Project Overview

This project implements a complete Transformer model from scratch for word-level language modeling and text generation in the style of William Shakespeare. It is designed as both an educational resource and a showcase of advanced deep learning engineering, following the "Attention Is All You Need" paper (Vaswani et al., 2017).

  • Full Transformer architecture (encoder-decoder, multi-head attention, positional encoding, etc.)
  • Word-level modeling for rich, context-aware text generation
  • Custom training pipeline with data augmentation, label smoothing, and advanced learning rate scheduling
  • Extensive documentation and code comments for learning and reproducibility

πŸ“‚ Directory Structure

β”œβ”€β”€ transformer.ipynb      # Main Jupyter notebook (full implementation & experiments)
β”œβ”€β”€ requirements.txt       # Project dependencies
β”œβ”€β”€ README.md              # Project documentation
β”œβ”€β”€ .gitignore             # Git ignore rules
β”œβ”€β”€ data/                  # Processed data, vocabularies, and numpy arrays (auto-generated)
β”œβ”€β”€ checkpoints/           # Model checkpoints (auto-generated)
β”œβ”€β”€ saved_model/           # Saved trained models (auto-generated)
└── LICENSE                # MIT License

πŸ§‘β€πŸ’» Quick Start

1. Clone the Repository

git clone https://github.com/EvanGks/shakespeare-text-generation-transformer.git
cd shakespeare-text-generation-transformer

2. Set Up the Environment

It is recommended to use a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

3. Run the Notebook

Open transformer.ipynb in Jupyter and run all cells sequentially:

jupyter notebook transformer.ipynb

πŸ“Š Features

  • Complete Transformer implementation (no high-level Keras API for core logic)
  • Word-level language modeling for richer context
  • Custom learning rate scheduler with warmup
  • Advanced text generation (temperature, top-k sampling)
  • Data augmentation (word dropout, random swaps)
  • Label smoothing for better generalization
  • Early stopping & checkpointing
  • Comprehensive evaluation (accuracy, perplexity, qualitative samples)
  • Visualization of training metrics
  • Well-documented, modular code

πŸ—οΈ Model Architecture

  • Encoder-Decoder Transformer
  • Multi-head self-attention
  • Position-wise feed-forward networks
  • Sinusoidal positional encoding
  • Dropout and label smoothing

Key Hyperparameters:

  • Layers: 3
  • Embedding dim: 192
  • Attention heads: 6
  • Feed-forward dim: 768
  • Dropout: 0.3

Architecture Diagram:

+-------------------------------------------------------------+
|                    TRANSFORMER ARCHITECTURE                 |
|-------------------------------------------------------------|
|                                                             |
|  Input Sequence (word indices)                              |
|         |                                                   |
|         v                                                   |
|  +-------------------+                                      |
|  | Embedding Layer   |  (dim: 192)                          |
|  +-------------------+                                      |
|         |                                                   |
|  +---------------------------+                              |
|  | Positional Encoding       |  (sinusoidal, max len: 5000) |
|  +---------------------------+                              |
|         |                                                   |
|         v                                                   |
|  +-------------------------------------------------------+  |
|  |                   ENCODER STACK (3 layers)            |  |
|  |  - Multi-Head Self-Attention (6 heads, dim: 192)      |  |
|  |  - Feed-Forward Network (dim: 768)                    |  |
|  |  - Dropout: 0.3, LayerNorm, Residual                  |  |
|  +-------------------------------------------------------+  |
|         |                                                   |
|         v                                                   |
|  +-------------------------------------------------------+  |
|  |                   DECODER STACK (3 layers)            |  |
|  |  - Masked Multi-Head Self-Attention (6 heads, 192)    |  |
|  |  - Encoder-Decoder Attention (6 heads, 192)           |  |
|  |  - Feed-Forward Network (dim: 768)                    |  |
|  |  - Dropout: 0.3, LayerNorm, Residual                  |  |
|  +-------------------------------------------------------+  |
|         |                                                   |
|         v                                                   |
|  +-------------------+                                      |
|  | Linear Projection |  (to vocab size)                     |
|  +-------------------+                                      |
|         |                                                   |
|         v                                                   |
|  Output Sequence (predicted word indices)                   |
+-------------------------------------------------------------+

A high-level overview of the Transformer model used in this project. Key hyperparameters: 3 layers, 192 embedding dim, 6 attention heads, 768 feed-forward dim, 0.3 dropout.


πŸ“ˆ Results

  • Training accuracy: ~92%
  • Validation accuracy: ~86%
  • Coherent Shakespearean text generation
  • Captures character dialogue and style

Example Outputs:

HAMLET: To be, or not to be, that is the question
ROMEO: But soft, what light through yonder window breaks
MACBETH: Is this a dagger which I see before me

πŸ“₯ Dataset

  • Source: Tiny Shakespeare (by Andrej Karpathy)
  • ~1 million characters from Shakespeare's plays and sonnets
  • Includes dialogue, stage directions, and scene descriptions

πŸ› οΈ Usage Example

Generate text after training:

from transformer import generate_text  # If you modularize the code

sample = generate_text(
    model=word_transformer,
    start_string="HAMLET:",
    word2idx=word2idx,
    idx2word=idx2word,
    generation_length=30,
    temperature=0.7
)
print(sample)

🧩 Contributing

Contributions, issues, and feature requests are welcome! Please open an issue or submit a pull request.

  • Fork the repository
  • Create a feature branch (git checkout -b feature/your-feature)
  • Commit your changes (git commit -m 'feat: add new feature')
  • Push to the branch (git push origin feature/your-feature)
  • Open a Pull Request

πŸ“œ License

This project is licensed under the MIT License. See the LICENSE file for details.


πŸ™ Acknowledgments


🌟 Why This Project?

This project demonstrates advanced deep learning engineering, from data preprocessing to custom model implementation and evaluation. It is designed to showcase:

  • Mastery of modern NLP architectures
  • Ability to build complex models from scratch
  • Best practices in code, documentation, and reproducibility
  • A passion for both research and creative AI

πŸ“ˆ Live Results

You can view the notebook with all outputs and results on Kaggle: https://www.kaggle.com/code/evangelosgakias/transformer-nlp-tensorflow


πŸ“¬ Contact

For questions or feedback, please reach out via:


Happy Coding! πŸš€

About

A comprehensive, from-scratch implementation of the Transformer architecture for Shakespearean text generation using TensorFlow and following the "Attention Is All You Need" paper (Vaswani et al., 2017). Includes custom training pipeline, data augmentation, and advanced text generation techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors