Skip to content

Commit 7fd0886

Browse files
authored
[WIP] start of work to refactor checker (#31)
* start of work to refactor checker * version bump and update to changelog * adding util to create temporary directory with test (used by clone) * adding test for save_results * cleanly separate github clone testing functions * better naming files for testing, adding more tests for urlproc and whitelisting * use pattern for running tests Signed-off-by: vsoch <vsochat@stanford.edu>
1 parent ea1e13e commit 7fd0886

21 files changed

Lines changed: 1020 additions & 539 deletions

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ before_script:
2727

2828
# command to run tests
2929
script:
30-
- pytest -vs -x --cov=./ tests/test_check.py tests/test_fileproc.py tests/test_files tests/test_urlproc.py
30+
- pytest -vs -x --cov=./ tests/test_*.py
3131

3232
after_success:
3333
- codecov

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ and **Merged pull requests**. Critical items to know are:
1212
Referenced versions in headers are tagged on Github, in parentheses are for pypi.
1313

1414
## [vxx](https://github.com/urlstechie/urlschecker-python/tree/master) (master)
15+
- refactor check.py to be UrlChecker class, save with filename (0.0.18)
1516
- default for files needs to be empty string (not None) (0.0.17)
1617
- bug with incorrect return code on fail, add files flag (0.0.16)
1718
- reverting back to working client (0.0.15)

README.md

Lines changed: 203 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
[![Build Status](https://travis-ci.com/urlstechie/urlchecker-python.svg?branch=master)](https://travis-ci.com/urlstechie/urlchecker-python) [![Documentation Status](https://readthedocs.org/projects/urlchecker-python/badge/?version=latest)](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [![codecov](https://codecov.io/gh/urlstechie/urlchecker-python/branch/master/graph/badge.svg)](https://codecov.io/gh/urlstechie/urlchecker-python) [![Python](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue)](https://www.python.org/doc/versions/) [![CodeFactor](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python/badge)](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python) ![PyPI](https://img.shields.io/pypi/v/urlchecker) [![Downloads](https://pepy.tech/badge/urlchecker)](https://pepy.tech/project/urlchecker) [![License](https://img.shields.io/badge/license-MIT-brightgreen)](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE)
44

55

6-
76
# urlchecker-python
87

98
This is a python module to collect urls over static files (code and documentation)
@@ -273,19 +272,215 @@ https://github.com/SuperKogito/URLs-checker/issues/4,failed
273272
274273
### Usage from Python
275274
275+
#### Checking a Path
276+
276277
If you want to check a list of urls outside of the provided client, this is fairly easy to do!
277-
Let's say we have a list of urls, `urls`:
278+
Let's say we have a path, our present working directory, and we want to check
279+
.py and .md files (the default)
278280
279281
```python
280-
from urlchecker.core.urlproc import check_urls
282+
from urlchecker.core.check import UrlChecker
283+
import os
284+
285+
path = os.getcwd()
286+
checker = UrlChecker(path)
287+
# UrlChecker:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python
288+
```
289+
290+
And of course you can provide more substantial arguments to derive the original
291+
file list:
281292
282-
# We will update a dictionary with failed and passed
283-
check_results = {"failed": [], "passed": []}
284-
check_urls(urls=urls, retry_count=3, timeout=5, check_results=check_results)
293+
```python
294+
checker = UrlChecker(
295+
path=path,
296+
file_types=[".md", ".py", ".rst"],
297+
include_patterns=[],
298+
white_listed_files=["README.md", "LICENSE"],
299+
print_all=True,
300+
)
301+
```
302+
I can then run the checker like this:
303+
304+
```python
305+
checker.run()
306+
```
307+
308+
Or with more customization of white listing urls:
309+
310+
```python
311+
checker.run(
312+
white_listed_urls=white_listed_urls,
313+
white_listed_patterns=white_listed_patterns,
314+
retry_count=3,
315+
timeout=5,
316+
)
317+
```
318+
319+
You'll get the results object returned, which is also available at `checker.results`,
320+
a simple dictionary with "passed" and "failed" keys to show passes and fails across
321+
all files.
322+
323+
```python
324+
{'passed': ['https://github.com/SuperKogito/spafe/issues/4',
325+
'http://shachi.org/resources',
326+
'https://superkogito.github.io/blog/SpectralLeakageWindowing.html',
327+
'https://superkogito.github.io/figures/fig4.html',
328+
'https://github.com/urlstechie/urlchecker-test-repo',
329+
'https://www.google.com/',
330+
...
331+
'https://github.com/SuperKogito',
332+
'https://img.shields.io/',
333+
'https://www.google.com/',
334+
'https://docs.python.org/2'],
335+
'failed': ['https://github.com/urlstechie/urlschecker-python/tree/master',
336+
'https://github.com/SuperKogito/Voice-based-gender-recognition,passed',
337+
'https://github.com/SuperKogito/URLs-checker/README.md',
338+
...
339+
'https://superkogito.github.io/tables',
340+
'https://github.com/SuperKogito/URLs-checker/issues/2',
341+
'https://github.com/SuperKogito/URLs-checker/README.md',
342+
'https://github.com/SuperKogito/URLs-checker/issues/4',
343+
'https://github.com/SuperKogito/URLs-checker/issues/3',
344+
'https://github.com/SuperKogito/URLs-checker/issues/1',
345+
'https://none.html']}
285346
```
286347
287-
Using the above, you're check_results will be updated to include those in your
288-
list that passed, and those that failed.
348+
You can look at `checker.checks`, which is a dictionary of result objects,
349+
organized by the filename:
350+
351+
```python
352+
for file_name, result in checker.checks.items():
353+
print()
354+
print(result)
355+
print("Total Results: %s " % result.count)
356+
print("Total Failed: %s" % len(result.failed))
357+
print("Total Passed: %s" % len(result.passed))
358+
359+
...
360+
361+
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/tests/test_files/sample_test_file.md
362+
Total Results: 26
363+
Total Failed: 6
364+
Total Passed: 20
365+
366+
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.pytest_cache/README.md
367+
Total Results: 1
368+
Total Failed: 0
369+
Total Passed: 1
370+
371+
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.eggs/pytest_runner-5.2-py3.7.egg/ptr.py
372+
Total Results: 0
373+
Total Failed: 0
374+
Total Passed: 0
375+
376+
UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/docs/source/conf.py
377+
Total Results: 3
378+
Total Failed: 0
379+
Total Passed: 3
380+
```
381+
382+
For any result object, you can print the list of passed, falied, white listed,
383+
or all the urls.
384+
385+
```python
386+
result.all
387+
['https://www.sphinx-doc.org/en/master/usage/configuration.html',
388+
'https://docs.python.org/3',
389+
'https://docs.python.org/2']
390+
391+
result.failed
392+
[]
393+
394+
result.white_listed
395+
[]
396+
397+
result.passed
398+
['https://www.sphinx-doc.org/en/master/usage/configuration.html',
399+
'https://docs.python.org/3',
400+
'https://docs.python.org/2']
401+
402+
result.count
403+
3
404+
```
405+
406+
407+
#### Checking a List of URls
408+
409+
If you start with a list of urls you want to check, you can do that too!
410+
411+
```python
412+
from urlchecker.core.urlproc import UrlCheckResult
413+
414+
urls = ['https://www.github.com', "https://github.com", "https://banana-pudding-doesnt-exist.com"]
415+
416+
# Instantiate an empty checker to extract urls
417+
checker = UrlCheckResult()
418+
File name None is undefined or does not exist, skipping extraction.
419+
```
420+
421+
If you provied a file name, the urls would be extracted for you.
422+
423+
```python
424+
checker = UrlCheckResult(
425+
file_name=file_name,
426+
white_listed_patterns=white_listed_patterns,
427+
white_listed_urls=white_listed_urls,
428+
print_all=self.print_all,
429+
)
430+
```
431+
432+
or you can provide all the parameters without the filename:
433+
434+
```python
435+
checker = UrlCheckResult(
436+
white_listed_patterns=white_listed_patterns,
437+
white_listed_urls=white_listed_urls,
438+
print_all=self.print_all,
439+
)
440+
```
441+
442+
If you don't provide the file_name to check urls, you can give the urls
443+
you defined previously directly to the `check_urls` function:
444+
445+
446+
```python
447+
checker.check_urls(urls)
448+
449+
https://www.github.com
450+
https://github.com
451+
HTTPSConnectionPool(host='banana-pudding-doesnt-exist.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f989abdfa10>: Failed to establish a new connection: [Errno -2] Name or service not known'))
452+
https://banana-pudding-doesnt-exist.com
453+
```
454+
455+
And of course you can specify a timeout and retry:
456+
457+
```python
458+
checker.check_urls(urls, retry_count=retry_count, timeout=timeout)
459+
```
460+
461+
After you run the checker you can get all the urls, the passed,
462+
and failed sets:
463+
464+
```python
465+
checker.failed
466+
['https://banana-pudding-doesnt-exist.com']
467+
468+
checker.passed
469+
['https://www.github.com', 'https://github.com']
470+
471+
checker.all
472+
['https://www.github.com',
473+
'https://github.com',
474+
'https://banana-pudding-doesnt-exist.com']
475+
476+
checker.all
477+
['https://www.github.com',
478+
'https://github.com',
479+
'https://banana-pudding-doesnt-exist.com']
480+
481+
checker.count
482+
3
483+
```
289484
290485
If you have any questions, please don't hesitate to [open an issue](https://github.com/urlstechie/urlchecker-python).
291486

0 commit comments

Comments
 (0)