Skip to content

Commit 49348a4

Browse files
author
pgaref
committed
Simple readme file to get started
1 parent ef90e9a commit 49348a4

1 file changed

Lines changed: 26 additions & 2 deletions

File tree

README.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1-
# HTTP_Request_Randomizer
1+
# HTTP Request Randomizer in Python
22

3-
##
3+
A convenient way to implement HTTP requests is using Pythons' **requests** library.
4+
One of requests’ most popular features is simple proxying support.
5+
HTTP as a protocol has very well-defined semantics for dealing with proxies, and this contributed to the widespread deployment of HTTP proxies
6+
7+
Proxying is very useful when conducting intensive web crawling/scrapping or when you just want to hide your identity (anomization).
8+
9+
In this project I am using public proxies to randomise over a number of IP addresses and a variety of known user agents to generate requests that look to be produced by different applications and operating systems.
10+
11+
12+
## Proxies
13+
14+
Proxies are a way to tell server P (the middleman) to contact server A and then route the response back to you. In more nefarious circles, it's a prime way to make your presence unknown and pose as many clients to a website instead of just one client.
15+
Often times websites will block IPs that make too many requests, and proxies is a way to get around this. But even for simulating an attack, you should know how it's done.
16+
17+
18+
## User Agent
19+
20+
Surprisingly, the only thing that tells a server the application triggered the request (like browser type or from a script) is a header called a "user agent" which is included in the HTTP request.
21+
22+
## The source code
23+
24+
The project code in this repository is crawling two different public proxy websites http://proxyfor.eu/geo.php and http://free-proxy-list.net.
25+
After collecting the proxy data and filtering the slowest ones it is randomly selecting one of them to query the target url.
26+
The request timeout is configured at 30 seconds and if the proxy fails to return a response it is deleted from the application proxy list.
27+
I have to mention that for each request a different agent header is used. This headers are strong in the /data/user_agents.txt file which contains around 900 different agents.

0 commit comments

Comments
 (0)