This Python script demonstrates how to perform web scraping using BeautifulSoup to extract data from a webpage. In this example, we scrape data from the RigPix website, specifically the Icom radio models and their corresponding details.
Before running the script, ensure you have the required libraries installed. You can install them using pip:
pip install pandas requests beautifulsoup4-
Import the necessary libraries:
import pandas as pd import requests from bs4 import BeautifulSoup
-
Specify the URL of the webpage you want to scrape:
url = "http://www.rigpix.com/icom/icomselect.htm"
-
Send an HTTP GET request to the URL and parse the HTML content using BeautifulSoup:
r = requests.get(url) soup = BeautifulSoup(r.text, 'html.parser')
-
Identify the HTML elements that contain the data you want to extract. In this script, we find all
h3headers:headers = soup.find_all('h3')
-
Extract the headers into a list:
headers_list = [] for h in headers: headers_list.append(h.text)
-
Create a dictionary to store the scraped data:
data = {}
-
Use Pandas to parse the HTML tables on the webpage and store them in a list:
df_list = pd.read_html(r.content)
-
Populate the
datadictionary with the scraped data, using the headers as keys:for i, l in enumerate(df_list[3:]): data[headers_list[i]] = l
-
Print or further process the extracted data. In this script, we print the
datadictionary:print(data)
The script retrieves information about Icom radio models and stores it in a dictionary, making it easy to work with the data programmatically.
This script is provided under the MIT License.
Feel free to customize the README.md file further to include additional information or usage examples for your specific project.