Learning English can be done in many ways, and not everyone wants to rely solely on language learning apps. Some prefer to access vocabulary datasets and integrate them into their own tools, such as Anki.
To support those looking for a structured TOEIC 600 words dataset and to enhance my data scraping skills, I created this project.
A huge thank you to the TFLAT team for their dedication in creating high-quality English learning content and applications. This project utilizes their valuable resources from TFLAT Blog to compile and structure TOEIC vocabulary data.
- Programming Language: Python
- Libraries:
requests,beautifulsoup4,re,numpy,pandas
- Vocabulary dataset available in Excel and CSV formats.
- Images categorized by topic.
- Audio files grouped by topic for better accessibility.
- Total words: 615
- Total topics: 50
- Min-Max words per topic: 12-13
- Noun (n.): 41.67%
- Verb (v.): 37.42%
- Adjective (adj.): 13.24%
- Adverb (adv.): 6.37%
- Noun, Verb (n, v.): 0.33%
- Preposition (perp.): 0.16%
- Verb, Noun (v, n.): 0.16%
- Phrasal Verb (phr.v.): 0.16%
- Noun Phrase (n.ph.): 0.16%
This dataset is perfect for learners who want to:
- β Build their own study materials using tools like Anki.
- β Explore TOEIC vocabulary in an organized and structured way.
- β Access images and audio for better memorization.
If this project helps you, feel free to share it with others who might benefit! π
