A backend automation script that authenticates with Gmail using OAuth 2.0, fetches unread emails, extracts relevant information, and appends the data to a Google Sheet with idempotent execution and persistent state management.
MailParser/
βββ credentials/
β βββ credentials.json # OAuth client credentials (not in repo)
βββ logs/
β βββ mailparser_2026-01-14.log # Application logs (auto-generated)
βββ src/
β βββ store/
β β βββ __init__.py
β β βββ email_store.py # Email deduplication logic
β β βββ token_store.py # OAuth token persistence
β βββ __init__.py
β βββ email_parser.py # Email content extraction
β βββ gmail_service.py # Gmail API interactions
β βββ logger.py # Logging configuration
β βββ main.py # Entry point & orchestration
β βββ oauth.py # OAuth 2.0 flow handler
β βββ sheets_service.py # Google Sheets API wrapper
β βββ test.py # Unit tests
β βββ utils.py # Utility functions
βββ venv1/ # Virtual environment
βββ .env # Environment variables (not in repo)
βββ .gitignore
βββ config.py # Configuration loader
βββ README.md
βββ requirements.txt # Python dependencies
βββ state.db # SQLite database (generated at runtime)
Flow:
main.pyorchestrates the entire processoauth.pyhandles authentication and stores tokens viatoken_store.pygmail_service.pyfetches unread emails from Gmail API and marks them readed after processingemail_parser.pyextracts sender, subject, body, and timestampemail_store.pychecks if email was already processed (by message ID)sheets_service.pyappends new emails to Google Sheet- All state persists in
state.dbSQLite database logger.pymanages application logging tologs/directory
- Google account
- Google Cloud Platform project
git clone <repository-url>
cd MailParser
python -m venv venv1
source venv1/bin/activate # On Windows: venv1\Scripts\activate
pip install -r requirements.txt-
Create a Google Cloud Project:
- Go to Google Cloud Console
- Create a new project
-
Enable Required APIs:
- Navigate to "APIs & Services" > "Library"
- Enable Gmail API
- Enable Google Sheets API
-
Configure OAuth Consent Screen:
- Go to "APIs & Services" > "OAuth consent screen"
- User type: External
- Add your email as a Test User
- Add scopes:
https://www.googleapis.com/auth/gmail.modifyhttps://www.googleapis.com/auth/spreadsheets
-
Create OAuth Client ID:
- Go to "APIs & Services" > "Credentials"
- Click "Create Credentials" > "OAuth client ID"
- Application type: Desktop app
- Download the JSON file
-
Save Credentials:
- Rename the downloaded file to
credentials.json - Place it in the
credentials/folder
- Rename the downloaded file to
- Create a new Google Sheet
- Add headers in the first row:
Sender | Subject | Body | Timestamp
- Copy the Spreadsheet ID from the URL:
https://docs.google.com/spreadsheets/d/{SPREADSHEET_ID}/edit
Create a .env file in the project root:
SPREADSHEET_ID=your_spreadsheet_id_herepython -m src.mainOn first run:
- A browser window will open for OAuth consent
- Grant the requested permissions
- Tokens will be stored in
state.dbfor future runs
This project uses OAuth 2.0 Desktop Application Flow (formerly called "Installed Application Flow").
-
Initial Authentication:
- User runs the script for the first time
- Script redirects to Google's OAuth consent page in browser
- User grants permissions
- Authorization code is exchanged for access token + refresh token
-
Token Storage:
- Both tokens are stored in SQLite database (
state.db) - Tokens are encrypted using the database's built-in mechanisms
- Both tokens are stored in SQLite database (
-
Token Refresh:
- Access tokens expire after 1 hour
- When expired, the refresh token automatically requests a new access token
- No user interaction required after initial setup
-
Security Considerations:
- Tokens are stored locally (acceptable for single-user automation)
credentials.jsonand.envare excluded from version control- For production use, consider encrypting the database file
Why this approach?
- Recommended by Google for CLI/backend tools
- No web server required
- Seamless token refresh without re-authentication
Running the script multiple times should not create duplicate entries in the Google Sheet.
Each Gmail message has a unique, immutable message ID assigned by Google.
- Database Table:
CREATE TABLE processed_emails (
message_id TEXT PRIMARY KEY,
processed_at TIMESTAMP
);- Check Before Processing:
def is_processed(message_id: str) -> bool:
# Query database for message_id
# Return True if exists, False otherwise- Store After Processing:
def mark_as_processed(message_id: str):
# Insert message_id with current timestamp- Workflow:
- Fetch all unread emails
- For each email, check if
message_idexists in database - If exists β skip
- If not β process and store
message_id
- Idempotent execution: Running the script 10 times processes each email only once
- Reliable: Message IDs never change
- Efficient: Database lookups are fast even with thousands of entries
The project uses SQLite (state.db) for all persistent storage.
-- OAuth tokens table
CREATE TABLE oauth_tokens (
id INTEGER PRIMARY KEY,
token_data TEXT NOT NULL,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Processed emails table
CREATE TABLE processed_emails (
message_id TEXT PRIMARY KEY,
processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);| Feature | Benefit |
|---|---|
| Zero Configuration | No server setup required |
| ACID Compliance | Guarantees data integrity |
| Atomic Operations | Prevents corruption during concurrent access |
| File-Based | Single file, easy to backup |
| Fast Lookups | Indexed queries for duplicate checking |
| Better than JSON | No race conditions or partial writes |
- JSON files were considered but rejected due to:
- Risk of file corruption during concurrent writes
- No built-in indexing for fast lookups
- Manual handling of race conditions
See requirements.txt for complete list. Key dependencies:
google-auth-oauthlib- OAuth 2.0 authenticationgoogle-api-python-client- Gmail & Sheets API clientspython-dotenv- Environment variable management
Run unit tests:
python -m src.testThis project is created as part of a technical assignment.