|
3 | 3 | [](https://travis-ci.com/urlstechie/urlchecker-python) [](https://urlchecker-python.readthedocs.io/en/latest/?badge=latest) [](https://codecov.io/gh/urlstechie/urlchecker-python) [](https://www.python.org/doc/versions/) [](https://www.codefactor.io/repository/github/urlstechie/urlchecker-python)  [](https://pepy.tech/project/urlchecker) [](https://github.com/urlstechie/urlchecker-python/blob/master/LICENSE) |
4 | 4 |
|
5 | 5 |
|
6 | | - |
7 | 6 | # urlchecker-python |
8 | 7 |
|
9 | 8 | This is a python module to collect urls over static files (code and documentation) |
@@ -273,19 +272,215 @@ https://github.com/SuperKogito/URLs-checker/issues/4,failed |
273 | 272 |
|
274 | 273 | ### Usage from Python |
275 | 274 |
|
| 275 | +#### Checking a Path |
| 276 | +
|
276 | 277 | If you want to check a list of urls outside of the provided client, this is fairly easy to do! |
277 | | -Let's say we have a list of urls, `urls`: |
| 278 | +Let's say we have a path, our present working directory, and we want to check |
| 279 | +.py and .md files (the default) |
278 | 280 |
|
279 | 281 | ```python |
280 | | -from urlchecker.core.urlproc import check_urls |
| 282 | +from urlchecker.core.check import UrlChecker |
| 283 | +import os |
| 284 | +
|
| 285 | +path = os.getcwd() |
| 286 | +checker = UrlChecker(path) |
| 287 | +# UrlChecker:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python |
| 288 | +``` |
| 289 | +
|
| 290 | +And of course you can provide more substantial arguments to derive the original |
| 291 | +file list: |
281 | 292 |
|
282 | | -# We will update a dictionary with failed and passed |
283 | | -check_results = {"failed": [], "passed": []} |
284 | | -check_urls(urls=urls, retry_count=3, timeout=5, check_results=check_results) |
| 293 | +```python |
| 294 | +checker = UrlChecker( |
| 295 | + path=path, |
| 296 | + file_types=[".md", ".py", ".rst"], |
| 297 | + include_patterns=[], |
| 298 | + white_listed_files=["README.md", "LICENSE"], |
| 299 | + print_all=True, |
| 300 | +) |
| 301 | +``` |
| 302 | +I can then run the checker like this: |
| 303 | +
|
| 304 | +```python |
| 305 | +checker.run() |
| 306 | +``` |
| 307 | +
|
| 308 | +Or with more customization of white listing urls: |
| 309 | +
|
| 310 | +```python |
| 311 | +checker.run( |
| 312 | + white_listed_urls=white_listed_urls, |
| 313 | + white_listed_patterns=white_listed_patterns, |
| 314 | + retry_count=3, |
| 315 | + timeout=5, |
| 316 | +) |
| 317 | +``` |
| 318 | +
|
| 319 | +You'll get the results object returned, which is also available at `checker.results`, |
| 320 | +a simple dictionary with "passed" and "failed" keys to show passes and fails across |
| 321 | +all files. |
| 322 | +
|
| 323 | +```python |
| 324 | +{'passed': ['https://github.com/SuperKogito/spafe/issues/4', |
| 325 | + 'http://shachi.org/resources', |
| 326 | + 'https://superkogito.github.io/blog/SpectralLeakageWindowing.html', |
| 327 | + 'https://superkogito.github.io/figures/fig4.html', |
| 328 | + 'https://github.com/urlstechie/urlchecker-test-repo', |
| 329 | + 'https://www.google.com/', |
| 330 | + ... |
| 331 | + 'https://github.com/SuperKogito', |
| 332 | + 'https://img.shields.io/', |
| 333 | + 'https://www.google.com/', |
| 334 | + 'https://docs.python.org/2'], |
| 335 | + 'failed': ['https://github.com/urlstechie/urlschecker-python/tree/master', |
| 336 | + 'https://github.com/SuperKogito/Voice-based-gender-recognition,passed', |
| 337 | + 'https://github.com/SuperKogito/URLs-checker/README.md', |
| 338 | + ... |
| 339 | + 'https://superkogito.github.io/tables', |
| 340 | + 'https://github.com/SuperKogito/URLs-checker/issues/2', |
| 341 | + 'https://github.com/SuperKogito/URLs-checker/README.md', |
| 342 | + 'https://github.com/SuperKogito/URLs-checker/issues/4', |
| 343 | + 'https://github.com/SuperKogito/URLs-checker/issues/3', |
| 344 | + 'https://github.com/SuperKogito/URLs-checker/issues/1', |
| 345 | + 'https://none.html']} |
285 | 346 | ``` |
286 | 347 |
|
287 | | -Using the above, you're check_results will be updated to include those in your |
288 | | -list that passed, and those that failed. |
| 348 | +You can look at `checker.checks`, which is a dictionary of result objects, |
| 349 | +organized by the filename: |
| 350 | +
|
| 351 | +```python |
| 352 | +for file_name, result in checker.checks.items(): |
| 353 | + print() |
| 354 | + print(result) |
| 355 | + print("Total Results: %s " % result.count) |
| 356 | + print("Total Failed: %s" % len(result.failed)) |
| 357 | + print("Total Passed: %s" % len(result.passed)) |
| 358 | +
|
| 359 | +... |
| 360 | +
|
| 361 | +UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/tests/test_files/sample_test_file.md |
| 362 | +Total Results: 26 |
| 363 | +Total Failed: 6 |
| 364 | +Total Passed: 20 |
| 365 | +
|
| 366 | +UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.pytest_cache/README.md |
| 367 | +Total Results: 1 |
| 368 | +Total Failed: 0 |
| 369 | +Total Passed: 1 |
| 370 | +
|
| 371 | +UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/.eggs/pytest_runner-5.2-py3.7.egg/ptr.py |
| 372 | +Total Results: 0 |
| 373 | +Total Failed: 0 |
| 374 | +Total Passed: 0 |
| 375 | +
|
| 376 | +UrlCheck:/home/vanessa/Desktop/Code/urlstechie/urlchecker-python/docs/source/conf.py |
| 377 | +Total Results: 3 |
| 378 | +Total Failed: 0 |
| 379 | +Total Passed: 3 |
| 380 | +``` |
| 381 | +
|
| 382 | +For any result object, you can print the list of passed, falied, white listed, |
| 383 | +or all the urls. |
| 384 | +
|
| 385 | +```python |
| 386 | +result.all |
| 387 | +['https://www.sphinx-doc.org/en/master/usage/configuration.html', |
| 388 | + 'https://docs.python.org/3', |
| 389 | + 'https://docs.python.org/2'] |
| 390 | +
|
| 391 | +result.failed |
| 392 | +[] |
| 393 | +
|
| 394 | +result.white_listed |
| 395 | +[] |
| 396 | +
|
| 397 | +result.passed |
| 398 | +['https://www.sphinx-doc.org/en/master/usage/configuration.html', |
| 399 | + 'https://docs.python.org/3', |
| 400 | + 'https://docs.python.org/2'] |
| 401 | +
|
| 402 | +result.count |
| 403 | +3 |
| 404 | +``` |
| 405 | +
|
| 406 | +
|
| 407 | +#### Checking a List of URls |
| 408 | +
|
| 409 | +If you start with a list of urls you want to check, you can do that too! |
| 410 | +
|
| 411 | +```python |
| 412 | +from urlchecker.core.urlproc import UrlCheckResult |
| 413 | +
|
| 414 | +urls = ['https://www.github.com', "https://github.com", "https://banana-pudding-doesnt-exist.com"] |
| 415 | +
|
| 416 | +# Instantiate an empty checker to extract urls |
| 417 | +checker = UrlCheckResult() |
| 418 | +File name None is undefined or does not exist, skipping extraction. |
| 419 | +``` |
| 420 | +
|
| 421 | +If you provied a file name, the urls would be extracted for you. |
| 422 | +
|
| 423 | +```python |
| 424 | +checker = UrlCheckResult( |
| 425 | + file_name=file_name, |
| 426 | + white_listed_patterns=white_listed_patterns, |
| 427 | + white_listed_urls=white_listed_urls, |
| 428 | + print_all=self.print_all, |
| 429 | +) |
| 430 | +``` |
| 431 | +
|
| 432 | +or you can provide all the parameters without the filename: |
| 433 | +
|
| 434 | +```python |
| 435 | +checker = UrlCheckResult( |
| 436 | + white_listed_patterns=white_listed_patterns, |
| 437 | + white_listed_urls=white_listed_urls, |
| 438 | + print_all=self.print_all, |
| 439 | +) |
| 440 | +``` |
| 441 | +
|
| 442 | +If you don't provide the file_name to check urls, you can give the urls |
| 443 | +you defined previously directly to the `check_urls` function: |
| 444 | +
|
| 445 | +
|
| 446 | +```python |
| 447 | +checker.check_urls(urls) |
| 448 | +
|
| 449 | +https://www.github.com |
| 450 | +https://github.com |
| 451 | +HTTPSConnectionPool(host='banana-pudding-doesnt-exist.com', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f989abdfa10>: Failed to establish a new connection: [Errno -2] Name or service not known')) |
| 452 | +https://banana-pudding-doesnt-exist.com |
| 453 | +``` |
| 454 | +
|
| 455 | +And of course you can specify a timeout and retry: |
| 456 | +
|
| 457 | +```python |
| 458 | +checker.check_urls(urls, retry_count=retry_count, timeout=timeout) |
| 459 | +``` |
| 460 | +
|
| 461 | +After you run the checker you can get all the urls, the passed, |
| 462 | +and failed sets: |
| 463 | +
|
| 464 | +```python |
| 465 | +checker.failed |
| 466 | +['https://banana-pudding-doesnt-exist.com'] |
| 467 | +
|
| 468 | +checker.passed |
| 469 | +['https://www.github.com', 'https://github.com'] |
| 470 | +
|
| 471 | +checker.all |
| 472 | +['https://www.github.com', |
| 473 | + 'https://github.com', |
| 474 | + 'https://banana-pudding-doesnt-exist.com'] |
| 475 | +
|
| 476 | +checker.all |
| 477 | +['https://www.github.com', |
| 478 | + 'https://github.com', |
| 479 | + 'https://banana-pudding-doesnt-exist.com'] |
| 480 | +
|
| 481 | +checker.count |
| 482 | +3 |
| 483 | +``` |
289 | 484 |
|
290 | 485 | If you have any questions, please don't hesitate to [open an issue](https://github.com/urlstechie/urlchecker-python). |
291 | 486 |
|
|
0 commit comments