You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A convenient way to implement HTTP requests is using Pythons' **requests** library.
4
+
One of requests’ most popular features is simple proxying support.
5
+
HTTP as a protocol has very well-defined semantics for dealing with proxies, and this contributed to the widespread deployment of HTTP proxies
6
+
7
+
Proxying is very useful when conducting intensive web crawling/scrapping or when you just want to hide your identity (anomization).
8
+
9
+
In this project I am using public proxies to randomise over a number of IP addresses and a variety of known user agents to generate requests that look to be produced by different applications and operating systems.
10
+
11
+
12
+
## Proxies
13
+
14
+
Proxies are a way to tell server P (the middleman) to contact server A and then route the response back to you. In more nefarious circles, it's a prime way to make your presence unknown and pose as many clients to a website instead of just one client.
15
+
Often times websites will block IPs that make too many requests, and proxies is a way to get around this. But even for simulating an attack, you should know how it's done.
16
+
17
+
18
+
## User Agent
19
+
20
+
Surprisingly, the only thing that tells a server the application triggered the request (like browser type or from a script) is a header called a "user agent" which is included in the HTTP request.
21
+
22
+
## The source code
23
+
24
+
The project code in this repository is crawling two different public proxy websites http://proxyfor.eu/geo.php and http://free-proxy-list.net.
25
+
After collecting the proxy data and filtering the slowest ones it is randomly selecting one of them to query the target url.
26
+
The request timeout is configured at 30 seconds and if the proxy fails to return a response it is deleted from the application proxy list.
27
+
I have to mention that for each request a different agent header is used. This headers are strong in the /data/user_agents.txt file which contains around 900 different agents.
0 commit comments