Add Sofascore referee scraping methods#85
Conversation
Adds three methods to `Sofascore` that wrap Sofascore's existing referee API endpoints. There is no breaking change — all additions are new methods alongside existing ones.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 13 |
| Duplication | 0 |
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
|
|
Overall Grade |
Security Reliability Complexity Hygiene |
Code Review Summary
| Analyzer | Status | Updated (UTC) | Details |
|---|---|---|---|
| Python | May 10, 2026 11:35a.m. | Review ↗ | |
| Code coverage | May 10, 2026 11:35a.m. | Review ↗ |
Important
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
| return data | ||
|
|
||
| # ============================================================================================== | ||
| def get_match_referee(self, match_id: str | int) -> dict | None: |
There was a problem hiding this comment.
Is this function necessary if it's just accessing a key returned by get_match_dict()?
| return match_dict["referee"] if "referee" in match_dict else None | ||
|
|
||
| # ============================================================================================== | ||
| def get_referee(self, referee_id: str | int) -> dict: |
There was a problem hiding this comment.
Do you think there's any value in creating a SofascoreReferee object similar to what I've done for SofascorePlayer? Class attributes would things like id, name, and then the matches being scraped by the next function in your PR? And then this function would return an instance of the SofascoreReferee object?
| return response["referee"] | ||
|
|
||
| # ============================================================================================== | ||
| def get_referee_matches(self, referee_id: str | int, max_pages: int=10) -> list[dict]: |
There was a problem hiding this comment.
I've never really used referee data before so a few questions:
- Is there any reason to default to 10 pages and now just scraping all of the matches?
- Is there any concern with return a list of match dicts being "too much"? Either too much data/RAM or too much info? Thoughts on just returning a list of match IDs?
Again, I've never used ref data so I'm not familiar with the use case and what info is valuable or not valuable.
Adds three methods to
Sofascorethat wrap Sofascore's existing referee API endpoints. There is no breaking change — all additions are new methods alongside existing ones.What's added
get_match_referee(match_id)— extracts the referee dict from a match's event data (uses the existing/event/{id}call already issued byget_match_dict, so no extra request).get_referee(referee_id)— fetches a referee profile + career aggregates (games, yellow/red cards) from/referee/{id}.get_referee_matches(referee_id, max_pages=10)— paginates/referee/{id}/events/last/{page}and returns a flat list of recent event dicts.Why
Referee identity and history are commonly used features for football modelling (cards, fouls, stoppage time, penalty rate). Sofascore exposes this data through the existing API but ScraperFC didn't surface
it. These three methods complete that gap with no new dependencies.
Tested
Smoke-tested end-to-end on a 2024/25 Premier League match: all three methods return non-empty data. Pagination on
get_referee_matchescorrectly stops at empty pages.get_match_referee(match_id: str | int) → dict | None
Get the referee dict for a single match.
Parameters:
Returns:
Return type:
get_referee(referee_id: str | int) → dict
Get a referee dict from a referee ID.
Parameters:
Raises:
Return type:
get_referee_matches(referee_id: str | int, max_pages: int = 10) → list[dict]
Get recent match dicts for a referee.
Parameters:
Raises:
Returns:
Return type: