Markov Text Generator

Dynamic N-gram Markov Chain text generator with efficient parallelized NLP preprocessing.

Overview

This project demonstrates core concepts in Natural Language Processing (NLP) and probabilistic modeling:

Parallelized reading and cleaning of multiple text files
Tokenization preserving sentence boundaries
Dynamic Unigram, Bigram, Trigram model construction
Weighted word prediction based on real transition probabilities
Side-by-side comparison of different N-gram models

What is a Markov Chain?

A Markov Chain predicts the next step based only on the current state, not the full history.

In text modeling:

Unigram looks at 1 word.
Bigram uses 2 words.
Trigram uses 3 words.

Using more context improves the coherence of generated text. This project uses weighted chains to reflect real-world transition probabilities.

Project Structure

markov-text-generator/
├── src/          # Code
├── data/         # Input texts
├── examples/     # Sample outputs
├── .gitignore
├── requirements.txt
├── LICENSE
└── README.md

How to Run

git clone https://github.com/yourusername/markov-text-generator.git
cd markov-text-generator
pip install -r requirements.txt
cd src
python markov_text_generator.py

Sample Output

Starting word: "the"

Unigram:
"the forest was deep and the sun rose above the old hills..."
Bigram:
"the forest was silent as the sun rose over the eastern hills..."
Trigram:
"the forest was silent as dawn broke over the misty hills..."

Notice how text becomes more natural from unigram to trigram.

Skills Demonstrated

Text preprocessing and tokenization
Markov Chain modeling
Weighted sampling
Parallel file processing
Clean modular Python coding

Future Enhancements

Character-level Markov generation
Temperature control for creative text
Save/load trained models
Interactive Streamlit app

Learning Outcomes

Through this project, I:

Gained intuition about Markov Chains and N-gram context.
Practiced weighted sampling for realistic text generation.
Learned parallel processing for faster data handling.
Designed scalable, modular code for NLP tasks.

License

Licensed under the MIT License.

Connect

Connect with me on LinkedIn
or View more projects

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Markov Text Generator

Overview

What is a Markov Chain?

Project Structure

How to Run

Sample Output

Skills Demonstrated

Future Enhancements

Learning Outcomes

License

Connect

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
examples		examples
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Markov Text Generator

Overview

What is a Markov Chain?

Project Structure

How to Run

Sample Output

Skills Demonstrated

Future Enhancements

Learning Outcomes

License

Connect

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages