Extreme Uber Eats Scraping

Folder description

/archive contains data and scripts for reference purposes only while the project was being worked on
/assets contains image assets for README
/notebooks contains all the scripts used for scraping in the form of Jupyter Notebooks
/scraped-data contains all the scraped data city-wise
- /scraped-data/restaurant-categories contains the stored category (cuisine) info for Uber Eats
- /scraped-data/resturant-urls contains the files corresponding to each city which store the URLs of all restaurants in that city
- /scraped-data/restaurant-details contains the files correspionding to each city which store the complete data of all restaurants in that city
- /scraped-data/UE-cities.csv contains the list of all cities and corresponding URLs in which Uber Eats is operational
/1.5M-Uber-Eats-Restaurants.zip contains the zipped csv file which is the collection of all the data from /scraped-data/restaurant-details into a single csv

What data are we extracting?

Here is a screenshot of one of the Uber Eats restaurants.

What does the scraped data look like?

Good question! I had to sacrifice the head of one my DataFrames (pun intended) for this demonstration, but here you go!

Step #1 - Enlist all Uber Eats cities

cities.ipynb contains the code to fetch names and URLs of each city Uber Eats is operational in USA. As of the July 2020, they were a dozen shy of 33k cities. This first step is crucial as we would later make calls to the every city's page.

Step #2 - Fetch and shortlist cuisines

sort-categories.ipynb contains the code to shortlist the most popular cuisines out of the total (200+) tags on Uber Eats. To collect the tags go to https://www.ubereats.com/location and Ctrl+A + Ctrl+C the categories into a text file. Then read the file using Python, turn spaces into - and each letter to lowercase.

To shortlist the most popular categories I fetched all the restaurants for NYC (using steps #1 to #5), and stored the no. of restaurants for each category in a dict. Then simply fetch the top 50. Done!

Yes, I had to assume that other cities would follow suit. And yes, I added a few of my favourite cuisines too :)

Step #3 - Scrape restaurant URLs for each city

restaurant-urls.ipynb contains the code to fetch urls of all the restuarants in a city. This step is a bit tricky and time-consuming. Uber Eats does not serve you the list of all restaurants on a platter. However, while surfing through their pages, I started observing patterns. So to get an exhaustive list of all restaurants (quickly), I came up with this:

To navigate to a city's page you can simply add city's name at the end of https://ubereats.com/location/. So the NYC's URL would look like https://ubereats.com/location/new-york. However, you would see the featured restaurants here.
Also, if you add the city's name followed by a category name at the end of https://ubereats.com/category/, you will get the complete list of all the restaurants in the city (but only with that category tag). For instance, [https://ubereats.com/category/new-york/indian].
Fetch complete list of each category restaurants
Keep adding their urls to a Set. In this way, no restaurants will be stored twice. However, you would have to repeat the process for 200+ categories.

Who has that much time? Notice, that for there are hardly any restaurants which have only a single tag.

So we would have covered about 90% of the restaurants in just the top 30 cuisines.

Step #4 - Fetch details for each restaurant URL

restaurant-details.ipynb contains the code to fetch details of restaurants. After doing all the hard work, this should be nice and easy! However, some cities can have more than 3000 restaurants. Make sure to parallelize the loop while processing restaurants in a city.

Word of caution: Threads under threads is not a good idea! Do not try to make threads for cities once you have parallelized the restaurants.

Step #5 - Finally, combine and process the data

post-processing.ipynb contains the code to combine the restaurant details csv of each city, and finally perform processing.

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
archive		archive
assets		assets
notebooks		notebooks
scraped-data		scraped-data
.gitattributes		.gitattributes
.gitignore		.gitignore
1.5M-Uber-Eats-Restaurants.zip		1.5M-Uber-Eats-Restaurants.zip
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extreme Uber Eats Scraping

Folder description

What data are we extracting?

What does the scraped data look like?

Step #1 - Enlist all Uber Eats cities

Step #2 - Fetch and shortlist cuisines

Step #3 - Scrape restaurant URLs for each city

Step #4 - Fetch details for each restaurant URL

Step #5 - Finally, combine and process the data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Extreme Uber Eats Scraping

Folder description

What data are we extracting?

What does the scraped data look like?

Step #1 - Enlist all Uber Eats cities

Step #2 - Fetch and shortlist cuisines

Step #3 - Scrape restaurant URLs for each city

Step #4 - Fetch details for each restaurant URL

Step #5 - Finally, combine and process the data

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages