🕷️ Web Scraping with Scrapy

A web scraping project using Scrapy — Python's most powerful and scalable scraping framework. Demonstrates XPath and CSS selector usage for precise, fast data extraction across multiple pages.

📌 What This Project Does

Crawls multi-page websites using Scrapy spiders
Extracts structured data using XPath and CSS selectors
Handles pagination and link following automatically
Exports data to CSV, JSON, or XML via Scrapy's built-in pipelines
Demonstrates Scrapy's item pipeline architecture

🧰 Tech Stack

Tool	Purpose
`Python 3.8+`	Core language
`Scrapy 2.x`	Scraping framework
`XPath`	Element targeting
`CSS Selectors`	Alternative element targeting
`Scrapy Item Pipelines`	Data processing & export

🚀 Getting Started

1. Run a Spider

scrapy crawl <spider_name>

2. Export Data

# Export to CSV
scrapy crawl <spider_name> -o output.csv

# Export to JSON
scrapy crawl <spider_name> -o output.json

🔍 XPath vs CSS Selectors — Quick Reference

# XPath examples
response.xpath('//h1/text()').get()
response.xpath('//a/@href').getall()

# CSS selector examples
response.css('h1::text').get()
response.css('a::attr(href)').getall()

📦 Requirements

Scrapy>=2.8.0

⚠️ Notes

Configure DOWNLOAD_DELAY in settings.py to avoid overwhelming servers
Use ROBOTSTXT_OBEY = True (default in Scrapy) to respect robots.txt
For large crawls, consider enabling Scrapy's AutoThrottle extension

🙋 Author

Priyanka Rajput — GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
items.py		items.py
middlewares.py		middlewares.py
pipelines.py		pipelines.py
scraper.csv		scraper.csv
scraper.json		scraper.json
scraper.py		scraper.py
scrapy.cfg		scrapy.cfg
settings.py		settings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🕷️ Web Scraping with Scrapy

📌 What This Project Does

🧰 Tech Stack

🚀 Getting Started

1. Run a Spider

2. Export Data

🔍 XPath vs CSS Selectors — Quick Reference

📦 Requirements

⚠️ Notes

🙋 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🕷️ Web Scraping with Scrapy

📌 What This Project Does

🧰 Tech Stack

🚀 Getting Started

1. Run a Spider

2. Export Data

🔍 XPath vs CSS Selectors — Quick Reference

📦 Requirements

⚠️ Notes

🙋 Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages