🌐 Python Web Surfing

This Python project allows you to scrape search results from the web using Google API and Google Custom Search Engine ID, extract useful information, and perform basic data analysis using Gemini API. It is designed to be reliable, modular, and easy to run from the command line.

✅ Functionalities Implemented

Extracting Titles, URLs, and Snippets
- Scrapes and saves the title, URL, and snippet/description from search results.
Taking Dynamic Input (Query from Command Line)
- Run the scraper with any search query directly from the command line:
```
python scraper.py <your query>
```
For Example
```
python scraper.py "AI in healthcare"
```
Saving Results to CSV File
- Results are saved in a seperate CSV file for each query.
Running in Headless Mode (Browser in Background)
- The usage of the Custom Search Engine ID makes it totally headless.
Crawling Multiple Pages
- The scraper can crawl multiple pages of search results (Free tier Google API only allows max 10 results at a time).
Adding Logs
- Logs are stored in data/logs/.
Data Summarizer
- Summarizes the results all the results that were fetched and stores them in data_analysis folder.

⚡ How to Run

Install dependencies:

pip install -r requirements.txt

Run Scraper

python scraper.py <your query>

💡 Notes

Ensure you have Google API key, Google Custom Search Engine ID and Gemini API key set up in the script.
Logs are automatically created for debugging and tracking scraping activity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🌐 Python Web Surfing

✅ Functionalities Implemented

⚡ How to Run

💡 Notes

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

🌐 Python Web Surfing

✅ Functionalities Implemented

⚡ How to Run

💡 Notes