Google Scholar Library & API

Google Scholar scraping with Python library and REST API with Redis caching.

What's Included

Python Library (google_scholar_lib) - Direct Python interface
REST API (src/api) - FastAPI server with Redis caching

Features

Publication search, author profiles, citations (BibTeX, etc.)
Automatic fallback for rate limiting
Redis caching with configurable TTL
Interactive API documentation at /docs
ARM64 compatible (Jetson, Raspberry Pi)

Requirements

Python 3.8+
Chrome or Chromium browser
ChromeDriver
Redis (for API caching - optional for library use)

Installation

Library Only

pip install .

Library + API

pip install -r requirements.txt
# or
pip install -e .

For ARM64 systems (Jetson, Raspberry Pi):

sudo apt install chromium-browser chromium-chromedriver redis-server

🐍 Python Library Usage

Quick Start

from google_scholar_lib import GoogleScholar

# Initialize
api = GoogleScholar()

# Search for publications
results = api.search_scholar("Deep Learning", num=5)
for paper in results.organic_results:
    print(f"{paper.title} - {paper.link}")

Usage Examples

1. Search Publications

from google_scholar_lib import GoogleScholar

api = GoogleScholar()
results = api.search_scholar("Machine Learning", num=10)

for paper in results.organic_results:
    print(f"Title: {paper.title}")
    print(f"Authors: {', '.join([a.name for a in paper.authors])}")
    if paper.inline_links and paper.inline_links.cited_by:
        print(f"Citations: {paper.inline_links.cited_by['total']}")
    print()

2. Get Author Profile (by ID)

If you know the author's Google Scholar ID:

api = GoogleScholar()
results = api.search_author(author_id="JicYPdAAAAAJ")  # Geoffrey Hinton

print(f"Name: {results.author.name}")
print(f"Affiliation: {results.author.affiliations}")
print(f"Interests: {', '.join([i.title for i in results.author.interests])}")

print("\nTop Publications:")
for article in results.articles[:5]:
    print(f"  - {article.title}")

Common Author IDs:

JicYPdAAAAAJ - Geoffrey Hinton
WLN3QrAAAAAJ - Yann LeCun
kukA0LcAAAAJ - Yoshua Bengio

3. Find Author Profile (by Name)

Search for an author by name. Uses automatic fallback if direct search is blocked:

api = GoogleScholar()
results = api.search(engine="google_scholar_profiles", q="Andrew Ng")

for profile in results.profiles:
    print(f"Name: {profile.name}")
    print(f"ID: {profile.author_id}")
    print(f"Affiliation: {profile.affiliations}")

4. Get Citations

api = GoogleScholar()
results = api.search_cite(data_cid="PAPER_CID")

# Available formats
for link in results.links:
    print(f"{link['title']}: {link['link']}")

# Citation text
for citation in results.citations:
    print(f"{citation['title']}")
    print(f"{citation['snippet']}\n")

REST API Usage

Quick Start

# 1. Start Redis (if not already running)
sudo systemctl start redis-server

# 2. Start API
cd src/api
uvicorn main:app --host 0.0.0.0 --port 8765

# Or with reload for development
uvicorn main:app --host 0.0.0.0 --port 8765 --reload

Access the API at http://localhost:8765/docs

Example Requests

# Search publications
curl -X POST http://localhost:8765/api/v1/search/scholar \
  -H "Content-Type: application/json" \
  -d '{"q": "machine learning", "num": 10}'

# Get author profile
curl http://localhost:8765/api/v1/author/JicYPdAAAAAJ

# Check health & cache stats
curl http://localhost:8765/health

Python:

import requests
response = requests.post("http://localhost:8765/api/v1/search/scholar",
                        json={"q": "quantum computing", "num": 10})
print(response.headers.get("X-Cache-Status"))  # HIT or MISS

Full documentation: src/api/README.md or visit /docs when running

Interactive Demo

Test the API interactively:

python demo.py

The demo lets you:

Choose between different search engines
Enter custom search parameters
View formatted results with metadata

Search Engines

The API supports four search engines:

Engine	Purpose	Returns
`google_scholar`	Search publications	Papers, authors, citations
`google_scholar_author`	Get author by ID	Profile, publications, interests
`google_scholar_profiles`	Find author by name	Profile matches
`google_scholar_cite`	Get citation formats	BibTeX, EndNote, etc.

Advanced Features

Headless Mode

Runs Chrome in headless mode by default (no visible browser window):

# Headless by default
api = GoogleScholar()

# Or explicitly set headless mode
from google_scholar_lib.backends.selenium_backend import SeleniumBackend
backend = SeleniumBackend(headless=True)

Caching (API Only)

The REST API automatically caches responses with configurable TTL:

Scholar searches: 24 hours
Author profiles: 7 days
Profile searches: 12 hours
Citations: 30 days

Cache status is indicated in the X-Cache-Status response header.

☁️ Cloud Deployment (Google Cloud Free Tier)

This project runs on Google Cloud's Always Free e2-micro (1GB RAM). A 2GB swap file is required to prevent Chrome crashes.

Setup (e2-micro, Debian 12)

# 1. Create 2GB Swap (Required!)
sudo fallocate -l 2G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab

# 2. Install Dependencies
sudo apt update
sudo apt install -y git python3-venv python3-pip redis-server chromium chromium-driver
sudo systemctl enable redis-server && sudo systemctl start redis-server

# 3. Clone & Install
git clone https://github.com/YOUR_USERNAME/google-scholar-api.git
cd google-scholar-api
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e .

# 4. Configure
cp .env.example .env
nano .env

Run as Service

Create /etc/systemd/system/scholar.service:

[Unit]
Description=Google Scholar API
After=network.target redis-server.service

[Service]
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/google-scholar-api
Environment="PATH=/home/YOUR_USERNAME/google-scholar-api/venv/bin"
ExecStart=/home/YOUR_USERNAME/google-scholar-api/venv/bin/uvicorn src.api.main:app --host 0.0.0.0 --port 8765
Restart=always

[Install]
WantedBy=multi-user.target

sudo systemctl daemon-reload
sudo systemctl enable scholar
sudo systemctl start scholar

Public Access (Cloudflare Tunnel)

# Install
curl -L --output cloudflared.deb https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64.deb
sudo dpkg -i cloudflared.deb

# Quick tunnel (temporary URL)
cloudflared tunnel --url http://localhost:8765

# Or run as service for persistent tunnel
sudo cloudflared service install

Note: Use Standard Persistent Disk (30GB) in us-central1, us-west1, or us-east1 to stay within free tier limits.

Legal

This tool is for educational and research purposes only. Please respect Google's Terms of Service and robots.txt.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Scholar Library & API

What's Included

Features

Requirements

Installation

Library Only

Library + API

🐍 Python Library Usage

Quick Start

Usage Examples

1. Search Publications

2. Get Author Profile (by ID)

3. Find Author Profile (by Name)

4. Get Citations

REST API Usage

Quick Start

Example Requests

Interactive Demo

Search Engines

Advanced Features

Headless Mode

Caching (API Only)

☁️ Cloud Deployment (Google Cloud Free Tier)

Setup (e2-micro, Debian 12)

Run as Service

Public Access (Cloudflare Tunnel)

Legal

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
demo.py		demo.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Google Scholar Library & API

What's Included

Features

Requirements

Installation

Library Only

Library + API

🐍 Python Library Usage

Quick Start

Usage Examples

1. Search Publications

2. Get Author Profile (by ID)

3. Find Author Profile (by Name)

4. Get Citations

REST API Usage

Quick Start

Example Requests

Interactive Demo

Search Engines

Advanced Features

Headless Mode

Caching (API Only)

☁️ Cloud Deployment (Google Cloud Free Tier)

Setup (e2-micro, Debian 12)

Run as Service

Public Access (Cloudflare Tunnel)

Legal

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages