Skip to content

uk-third-sector-database/ch_adv_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Companies House Advanced Search API Scraper

coverage Generic badge Generic badge Generic badge Generic badge DOI

A scraper for the Companies House Advanced Search API, with the specific intention of collecting data on Third Sector Organisations. The open data on this website is sourced from Public Records made available by Companies House and licensed under the Open Government License.

It should run without any special installations or requirements; tqdm and np are mostly luxuries which improve the quality of life. To install them, a simple pip install -r requirements.txt should do the trick. There are various things that can be done to improve the scraper, including but not limited to:

  • Logging
  • Better error handling of unknown status codes.
  • De-duplicate and compress upon completion of the script.
  • Dynamically scale up and down the window of the scrape, based on whether the previous period was close to the 10k threshold.

License

This work is free. You can redistribute it and/or modify it under the terms of the GNU GPL 3.0 license.

Acknowledgements

We are grateful for funding from the ESRC (project reference: ES/X000524/1).

About

A scraper for the Companies House Advanced Search API

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages