Skip to content

Commit 3d7fee5

Browse files
committed
First release
1 parent cb50924 commit 3d7fee5

4 files changed

Lines changed: 79025 additions & 1 deletion

File tree

README.md

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,87 @@
11
# UserAgent-Parser
2-
This repository holds data for Proxy_Bypass vulnerability research tool with the user-agents.json file generated from Parser.py script within this repository. Additionally the ua-stats.py scripts is used to draw various statistics out of user-agents.json file
2+
3+
This repository holds data for the Proxy_Bypass vulnerability research tool with the `user-agents.json` file generated from the `User-Agent-Parser.py` script within this repository. Additionally, the `ua-stats.py` script is used to draw various statistics out of the `user-agents.json` file.
4+
5+
## Overview
6+
7+
The User Agent Data Scraper is designed to:
8+
9+
🎯 **Primary:**
10+
- Scrape user agent data from [useragentstring.com](https://www.useragentstring.com/pages/All/)
11+
- Create a dictionary of user agents to be used by the proxy-bypass vulnerability research tool
12+
- Organize the data into dictionaries.
13+
14+
🚀 **Secondary:**
15+
- Identify user agent groups based on specified conditions.
16+
- Display statistics about general and mobile user agents.
17+
- Provide options for data visualization using pie charts, word clouds, and more.
18+
19+
## Features
20+
21+
🌟 **Scrapes User Agent Data:**
22+
Scrapes user agent data from a URL and stores it in dictionaries.
23+
24+
🔍 **Filters and Organizes:**
25+
Filters and organizes user agents based on conditions.
26+
27+
📊 **Provides Statistics:**
28+
Provides statistics about general and mobile user agents.
29+
30+
📈 **Data Visualization:**
31+
Offers options for data visualization, including:
32+
- Pie charts for user agent groups.
33+
- Word clouds for user agent group names.
34+
35+
**Cool Statistics:**
36+
- ![General User Agents > 500](Charts/General%20User-agents%20greater%20than%20500.png)
37+
- ![Highest Mobile User Agents](Charts/Highest%20Mobile%20User-agents.png)
38+
- ![General User Agents < 50](Charts/General%20User-agents%20less%20than%2050.png)
39+
- ![Mobile User Agents < 10](Charts/Mobile%20User-agents%20less%20than%2010.png)
40+
- ![General User Agents < 500](Charts/General%20User-agents%20less%20than%20500.png)
41+
- ![Mobile User Agents < 500](Charts/Mobile%20User-agents%20less%20than%20500.png)
42+
- ![Highest General User Agents](Charts/Highest%20General%20User-agents.png)
43+
44+
💾 **Save Charts and Data:**
45+
Allows users to save generated charts and data to a local directory.
46+
47+
🎈 **Easy-to-Use Interface:**
48+
An easy-to-use command-line interface.
49+
50+
## How to Use
51+
52+
[![Watch the video](video_thumbnail.png)](video_link)
53+
54+
1. **Installation:**
55+
Clone this repository to your local machine.
56+
57+
2. **Setup:**
58+
Install the required libraries using the following command:
59+
60+
```bash
61+
pip3 install -r requirements.txt
62+
63+
```
64+
65+
Run the Script: Open a terminal and navigate to the project directory. Run the script using the following command:
66+
67+
```bash
68+
python User-Agent-Parser.py
69+
```
70+
71+
Follow the Prompts: The script will prompt you to interactively choose from various options, such as viewing pie charts, generating word clouds, and more.
72+
73+
Data Visualization: The script generates various types of charts and visualizations to analyze user agent data.
74+
75+
Saving Charts: Choose to save generated charts and data to a local directory.
76+
77+
Sample Output
78+
79+
Video Tutorial
80+
Watch the full tutorial on YouTube
81+
82+
Note
83+
The script may require an internet connection to retrieve data from the specified URL.
84+
If you encounter any issues or have questions, feel free to open an issue in this repository.
85+
86+
License
87+
This project is licensed under the MIT License - see the LICENSE file for details.

User-Agent-Parser.py

Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
import requests
2+
from bs4 import BeautifulSoup
3+
import json
4+
import os
5+
6+
# Color Codes for Printing
7+
GREEN = "\033[32m"
8+
YELLOW = "\033[33m"
9+
BLUE = "\033[34m"
10+
RED = "\033[31m"
11+
RESET = "\033[0m"
12+
13+
url = "https://www.useragentstring.com/pages/useragentstring.php?name=All"
14+
15+
try:
16+
response = requests.get(url)
17+
response.raise_for_status() # Raise an exception if there's an HTTP error
18+
soup = BeautifulSoup(response.content, 'html.parser')
19+
20+
user_agents = [] # List to store user agent dictionaries
21+
group = None # Initialize group value
22+
sequence = 0 # Initialize sequence value
23+
seen_combinations = set() # Set to store seen combinations of user-agent and title
24+
mobile_detected = False # Flag to indicate if mobile user agent has been detected
25+
26+
for tag in soup.find_all(True):
27+
if tag.name == 'h3':
28+
group_text = tag.get_text(strip=True) # Get the text from <h3>
29+
if group_text == "MOBILE BROWSERS":
30+
host = "Mobile"
31+
else:
32+
host = "General"
33+
group = group_text # Update group value
34+
35+
elif tag.name == 'h4':
36+
title = tag.get_text(strip=True) # Get the text from <h4> as the title
37+
38+
ul_tag = tag.find_next('ul') # Find the <ul> under <h4>
39+
a_tags = ul_tag.find_all('a') # Find all <a> tags within the <ul>
40+
41+
for a_tag in a_tags:
42+
href = a_tag['href'] # Get the 'href' attribute value
43+
user_agent = a_tag.text # Get the text within the <a> tag
44+
45+
user_agent = user_agent.replace("-->>", "") # Remove "-->>" if present
46+
47+
if "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912Mod.By.www.9jamusic.cz.cc/22.387; U; en)" in user_agent:
48+
continue # Skip this iteration if the condition is met
49+
50+
host = "Mobile" if mobile_detected else "General" # Set the "Host" value
51+
52+
# Check if the combination of user-agent and title has been seen before
53+
if (user_agent, title) in seen_combinations:
54+
continue # Skip this iteration if the combination has been seen
55+
56+
sequence += 1 # Increment the sequence value
57+
58+
ua_dict = {
59+
"title": title if title != "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912Mod.By.www.9jamusic.cz.cc/22.387; U; en)" else "",
60+
"group": group if group != "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912Mod.By.www.9jamusic.cz.cc/22.387; U; en)" else "",
61+
"id": f"ua-{sequence}",
62+
"user-agent": user_agent if user_agent != "Opera/9.80 (J2ME/MIDP; Opera Mini/4.2.14912Mod.By.www.9jamusic.cz.cc/22.387; U; en)" else "",
63+
"Host": host
64+
}
65+
66+
# Check if "Xenu Link Sleuth/1.3.7" is in the user agent
67+
if "Xenu Link Sleuth/1.3.7" in user_agent:
68+
mobile_detected = True
69+
70+
user_agents.append(ua_dict)
71+
seen_combinations.add((user_agent, title)) # Add the combination to the set
72+
73+
# Prompt the user whether to print the data on the screen
74+
while True:
75+
print_data = input("Do you want to print the data on the screen? (yes/no): ").lower()
76+
if print_data in ["yes", "no"]:
77+
break
78+
else:
79+
print(f"{RED}[ERROR]{RESET} Please enter 'yes' or 'no.'")
80+
81+
if print_data == "yes":
82+
# Print the user agent dictionaries in a pretty format
83+
print(json.dumps(user_agents, indent=4))
84+
else:
85+
print("Data was not printed on the screen.")
86+
87+
# Count the total number of user-agents before updating
88+
total_user_agents_before = len(user_agents)
89+
90+
# Prompt the user whether to update the JSON file
91+
while True:
92+
update_json = input("Do you want to update the JSON file? (yes/no): ").lower()
93+
if update_json in ["yes", "no"]:
94+
break
95+
else:
96+
print(f"{RED}[ERROR]{RESET} Please enter 'yes' or 'no.'")
97+
98+
# Calculate statistics for user agents per "Mobile" and "General" host values
99+
mobile_user_agents = sum(1 for ua in user_agents if ua["Host"] == "Mobile")
100+
general_user_agents = sum(1 for ua in user_agents if ua["Host"] == "General")
101+
102+
# Count the total number of user-agents after updating
103+
total_user_agents_after = len(user_agents)
104+
105+
print(GREEN + "[+]" + RESET + " General User Agents: " + BLUE + str(general_user_agents) + RESET)
106+
print(GREEN + "[+]" + RESET + " Mobile User Agents: " + BLUE + str(mobile_user_agents) + RESET)
107+
print(GREEN + "[+]" + RESET + " Total User Agents: " + BLUE + str(total_user_agents_after) + RESET)
108+
109+
if update_json == "yes":
110+
json_file_path = "user_agents.json" # Updated to use default working directory
111+
112+
try:
113+
# Save the user agent dictionaries to the JSON file
114+
with open(json_file_path, "w") as json_file:
115+
json.dump(user_agents, json_file, indent=4)
116+
print(GREEN + "[+]" + RESET +" JSON file updated successfully.")
117+
118+
# Check if new user agents were identified
119+
if total_user_agents_after > total_user_agents_before:
120+
new_user_agents = user_agents[total_user_agents_before:]
121+
print(GREEN + '[+]' + RESET + " New User Agents Identified: " + BLUE + str(len(new_user_agents)) + RESET)
122+
for ua in new_user_agents:
123+
print(json.dumps(ua, indent=4))
124+
125+
input("Press Enter to acknowledge...")
126+
else:
127+
print(f"{YELLOW}[!]{RESET} No new user-agents found.")
128+
129+
except (FileNotFoundError, PermissionError, json.JSONDecodeError) as e:
130+
print(f"{RED}[ERROR]{RESET} Error while updating JSON file:", e)
131+
else:
132+
print(RED + "[x]" + RESET + " JSON file was not updated.")
133+
134+
except requests.exceptions.RequestException as e:
135+
print(f"{RED}[ERROR]{RESET} Unable to retrieve data from the URL.")
136+
except (AttributeError, ValueError, BeautifulSoup.exceptions.BeautifulSoupError) as e:
137+
print(f"{RED}[ERROR]{RESET} Error while parsing HTML:", e)
138+
except ValueError as e:
139+
print(f"{RED}[ERROR]{RESET} Invalid input:", e)
140+
except KeyboardInterrupt:
141+
print("\nProgram interrupted by user.")

requirements.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
beautifulsoup4==4.10.0
2+
requests==2.26.0
3+
matplotlib==3.4.3
4+
wordcloud==1.8.1

0 commit comments

Comments
 (0)