|
| 1 | +HTTP Request Randomizer in Python |Build Status| |Coverage Status| |PyPI version| |
| 2 | +================================================================================= |
| 3 | + |
| 4 | +A convenient way to implement HTTP requests is using Pythons' |
| 5 | +**requests** library. One of requests’ most popular features is simple |
| 6 | +proxying support. HTTP as a protocol has very well-defined semantics for |
| 7 | +dealing with proxies, and this contributed to the widespread deployment |
| 8 | +of HTTP proxies |
| 9 | + |
| 10 | +Proxying is very useful when conducting intensive web crawling/scrapping |
| 11 | +or when you just want to hide your identity (anomization). |
| 12 | + |
| 13 | +In this project I am using public proxies to randomise http requests |
| 14 | +over a number of IP addresses and using a variety of known user agent |
| 15 | +headers these requests look to have been produced by different |
| 16 | +applications and operating systems. |
| 17 | + |
| 18 | +Proxies |
| 19 | +------- |
| 20 | + |
| 21 | +Proxies provide a way to use server P (the middleman) to contact server |
| 22 | +A and then route the response back to you. In more nefarious circles, |
| 23 | +it's a prime way to make your presence unknown and pose as many clients |
| 24 | +to a website instead of just one client. Often times websites will block |
| 25 | +IPs that make too many requests, and proxies is a way to get around |
| 26 | +this. But even for simulating an attack, you should know how it's done. |
| 27 | + |
| 28 | +User Agent |
| 29 | +---------- |
| 30 | + |
| 31 | +Surprisingly, the only thing that tells a server the application |
| 32 | +triggered the request (like browser type or from a script) is a header |
| 33 | +called a "user agent" which is included in the HTTP request. |
| 34 | + |
| 35 | +The source code |
| 36 | +--------------- |
| 37 | + |
| 38 | +The project code in this repository is crawling **four** different |
| 39 | +public proxy websites: \* http://proxyfor.eu/geo.php \* |
| 40 | +http://free-proxy-list.net \* http://rebro.weebly.com/proxy-list.html \* |
| 41 | +http://www.samair.ru/proxy/time-01.htm |
| 42 | + |
| 43 | +After collecting the proxy data and filtering the slowest ones it is |
| 44 | +randomly selecting one of them to query the target url. The request |
| 45 | +timeout is configured at 30 seconds and if the proxy fails to return a |
| 46 | +response it is deleted from the application proxy list. I have to |
| 47 | +mention that for each request a different agent header is used. The |
| 48 | +different headers are stored in the **/data/user\_agents.txt** file |
| 49 | +which contains around 900 different agents. |
| 50 | + |
| 51 | +How to use |
| 52 | +---------- |
| 53 | + |
| 54 | +The project is now distribured as a PyPI package! To run an example |
| 55 | +simply include **http-request-randomizer==1.0.1** in your |
| 56 | +requirements.txt file. Then run the code below: |
| 57 | + |
| 58 | +.. code:: python |
| 59 | +
|
| 60 | + import time |
| 61 | + from http_request_randomizer.requests.proxy.requestProxy import RequestProxy |
| 62 | +
|
| 63 | + if __name__ == '__main__': |
| 64 | +
|
| 65 | + start = time.time() |
| 66 | + req_proxy = RequestProxy() |
| 67 | + print "Initialization took: {0} sec".format((time.time() - start)) |
| 68 | + print "Size : ", len(req_proxy.get_proxy_list()) |
| 69 | + print " ALL = ", req_proxy.get_proxy_list() |
| 70 | +
|
| 71 | + test_url = 'http://ipv4.icanhazip.com' |
| 72 | +
|
| 73 | + while True: |
| 74 | + start = time.time() |
| 75 | + request = req_proxy.generate_proxied_request(test_url) |
| 76 | + print "Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__()) |
| 77 | + if request is not None: |
| 78 | + print "\t Response: ip={0}".format(u''.join(request.text).encode('utf-8')) |
| 79 | + print "Proxy List Size: ", len(req_proxy.get_proxy_list()) |
| 80 | +
|
| 81 | + print"-> Going to sleep.." |
| 82 | + time.sleep(10) |
| 83 | +
|
| 84 | +Documentation |
| 85 | +------------- |
| 86 | + |
| 87 | +`http-request-randomizer |
| 88 | +documentation <http://pythonhosted.org/http-request-randomizer>`__ |
| 89 | + |
| 90 | +Contributing |
| 91 | +------------ |
| 92 | + |
| 93 | +Contributions are always welcome! Feel free to send a pull request. |
| 94 | + |
| 95 | +Faced an issue? |
| 96 | +--------------- |
| 97 | + |
| 98 | +Open an issue |
| 99 | +`here <https://github.com/pgaref/HTTP_Request_Randomizer/issues>`__, and |
| 100 | +be as detailed as possible :) |
| 101 | + |
| 102 | +License |
| 103 | +------- |
| 104 | + |
| 105 | +This project is licensed under the terms of the MIT license. |
| 106 | + |
| 107 | +.. |Build Status| image:: https://travis-ci.org/pgaref/HTTP_Request_Randomizer.svg?branch=master |
| 108 | + :target: https://travis-ci.org/pgaref/HTTP_Request_Randomizer |
| 109 | +.. |Coverage Status| image:: https://coveralls.io/repos/github/pgaref/HTTP_Request_Randomizer/badge.svg?branch=master |
| 110 | + :target: https://coveralls.io/github/pgaref/HTTP_Request_Randomizer?branch=master |
| 111 | +.. |PyPI version| image:: https://badge.fury.io/py/http-request-randomizer.svg |
| 112 | + :target: https://badge.fury.io/py/http-request-randomizer |
0 commit comments