Skip to content

Commit bd930ad

Browse files
author
pgaref
committed
Merge branch 'hotfix/1.0.2' into develop
2 parents 9df1bd1 + 8db643b commit bd930ad

2 files changed

Lines changed: 117 additions & 11 deletions

File tree

README.rst

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
HTTP Request Randomizer in Python |Build Status| |Coverage Status| |PyPI version|
2+
=================================================================================
3+
4+
A convenient way to implement HTTP requests is using Pythons'
5+
**requests** library. One of requests’ most popular features is simple
6+
proxying support. HTTP as a protocol has very well-defined semantics for
7+
dealing with proxies, and this contributed to the widespread deployment
8+
of HTTP proxies
9+
10+
Proxying is very useful when conducting intensive web crawling/scrapping
11+
or when you just want to hide your identity (anomization).
12+
13+
In this project I am using public proxies to randomise http requests
14+
over a number of IP addresses and using a variety of known user agent
15+
headers these requests look to have been produced by different
16+
applications and operating systems.
17+
18+
Proxies
19+
-------
20+
21+
Proxies provide a way to use server P (the middleman) to contact server
22+
A and then route the response back to you. In more nefarious circles,
23+
it's a prime way to make your presence unknown and pose as many clients
24+
to a website instead of just one client. Often times websites will block
25+
IPs that make too many requests, and proxies is a way to get around
26+
this. But even for simulating an attack, you should know how it's done.
27+
28+
User Agent
29+
----------
30+
31+
Surprisingly, the only thing that tells a server the application
32+
triggered the request (like browser type or from a script) is a header
33+
called a "user agent" which is included in the HTTP request.
34+
35+
The source code
36+
---------------
37+
38+
The project code in this repository is crawling **four** different
39+
public proxy websites: \* http://proxyfor.eu/geo.php \*
40+
http://free-proxy-list.net \* http://rebro.weebly.com/proxy-list.html \*
41+
http://www.samair.ru/proxy/time-01.htm
42+
43+
After collecting the proxy data and filtering the slowest ones it is
44+
randomly selecting one of them to query the target url. The request
45+
timeout is configured at 30 seconds and if the proxy fails to return a
46+
response it is deleted from the application proxy list. I have to
47+
mention that for each request a different agent header is used. The
48+
different headers are stored in the **/data/user\_agents.txt** file
49+
which contains around 900 different agents.
50+
51+
How to use
52+
----------
53+
54+
The project is now distribured as a PyPI package! To run an example
55+
simply include **http-request-randomizer==1.0.1** in your
56+
requirements.txt file. Then run the code below:
57+
58+
.. code:: python
59+
60+
import time
61+
from http_request_randomizer.requests.proxy.requestProxy import RequestProxy
62+
63+
if __name__ == '__main__':
64+
65+
start = time.time()
66+
req_proxy = RequestProxy()
67+
print "Initialization took: {0} sec".format((time.time() - start))
68+
print "Size : ", len(req_proxy.get_proxy_list())
69+
print " ALL = ", req_proxy.get_proxy_list()
70+
71+
test_url = 'http://ipv4.icanhazip.com'
72+
73+
while True:
74+
start = time.time()
75+
request = req_proxy.generate_proxied_request(test_url)
76+
print "Proxied Request Took: {0} sec => Status: {1}".format((time.time() - start), request.__str__())
77+
if request is not None:
78+
print "\t Response: ip={0}".format(u''.join(request.text).encode('utf-8'))
79+
print "Proxy List Size: ", len(req_proxy.get_proxy_list())
80+
81+
print"-> Going to sleep.."
82+
time.sleep(10)
83+
84+
Documentation
85+
-------------
86+
87+
`http-request-randomizer
88+
documentation <http://pythonhosted.org/http-request-randomizer>`__
89+
90+
Contributing
91+
------------
92+
93+
Contributions are always welcome! Feel free to send a pull request.
94+
95+
Faced an issue?
96+
---------------
97+
98+
Open an issue
99+
`here <https://github.com/pgaref/HTTP_Request_Randomizer/issues>`__, and
100+
be as detailed as possible :)
101+
102+
License
103+
-------
104+
105+
This project is licensed under the terms of the MIT license.
106+
107+
.. |Build Status| image:: https://travis-ci.org/pgaref/HTTP_Request_Randomizer.svg?branch=master
108+
:target: https://travis-ci.org/pgaref/HTTP_Request_Randomizer
109+
.. |Coverage Status| image:: https://coveralls.io/repos/github/pgaref/HTTP_Request_Randomizer/badge.svg?branch=master
110+
:target: https://coveralls.io/github/pgaref/HTTP_Request_Randomizer?branch=master
111+
.. |PyPI version| image:: https://badge.fury.io/py/http-request-randomizer.svg
112+
:target: https://badge.fury.io/py/http-request-randomizer

setup.py

Lines changed: 5 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,22 +5,16 @@
55
import sys
66
import os
77
HERE = os.path.abspath(os.path.dirname(__file__))
8-
from setuptools import setup
9-
try:
10-
from pypandoc import convert
11-
read_md = lambda f: convert(f, 'rst')
12-
except ImportError:
13-
print("warning: pypandoc module not found, could not convert Markdown to RST")
14-
read_md = lambda f: open(f, 'r').read()
15-
168

179

1810
def read(*parts):
19-
'''Return multiple read calls to different readable objects as a single
20-
string.'''
11+
"""Return multiple read calls to different readable objects as a single
12+
string."""
2113
# intentionally *not* adding an encoding option to open
2214
return codecs.open(os.path.join(HERE, *parts), 'r').read()
2315

16+
LONG_DESCRIPTION = read('README.rst')
17+
2418

2519
class Tox(TestCommand):
2620
def finalize_options(self):
@@ -59,7 +53,7 @@ def run_tests(self):
5953
author='Panagiotis Garefalakis',
6054
author_email='pangaref@gmail.com',
6155
description='A package using public proxies to randomise http requests.',
62-
long_description=read_md('README.md'),
56+
long_description=LONG_DESCRIPTION,
6357
packages=find_packages(exclude=['tests']),
6458
include_package_data=True,
6559
platforms='any',

0 commit comments

Comments
 (0)