In 'projects' table, there are cases of different 'id's matching the same url, directing to the same Github project. In other words, there are duplicates in projects. As in the Nov 2016 version, we found about 10k duplicates for C++ projects, 27k duplicates for Java projects, and 22k for Python projects.
For example,
id=102747 and id=424108 have the same url https://api.github.com/repos/zmeadows/cybernetic-banana
id=43736 and id=1307727 have the same url https://api.github.com/repos/indutny/defer-tick
id=1881530 and id=2502246 have the same url https://api.github.com/repos/persistentsnail/AOI
In 'projects' table, there are cases of different 'id's matching the same url, directing to the same Github project. In other words, there are duplicates in projects. As in the Nov 2016 version, we found about 10k duplicates for C++ projects, 27k duplicates for Java projects, and 22k for Python projects.
For example,
id=102747 and id=424108 have the same url https://api.github.com/repos/zmeadows/cybernetic-banana
id=43736 and id=1307727 have the same url https://api.github.com/repos/indutny/defer-tick
id=1881530 and id=2502246 have the same url https://api.github.com/repos/persistentsnail/AOI