Skip to content

Add Sofascore referee scraping methods#85

Open
AlexMaster23 wants to merge 1 commit into
oseymour:mainfrom
AlexMaster23:main
Open

Add Sofascore referee scraping methods#85
AlexMaster23 wants to merge 1 commit into
oseymour:mainfrom
AlexMaster23:main

Conversation

@AlexMaster23
Copy link
Copy Markdown
Contributor

Adds three methods to Sofascore that wrap Sofascore's existing referee API endpoints. There is no breaking change — all additions are new methods alongside existing ones.

What's added

  • get_match_referee(match_id) — extracts the referee dict from a match's event data (uses the existing /event/{id} call already issued by get_match_dict, so no extra request).
  • get_referee(referee_id) — fetches a referee profile + career aggregates (games, yellow/red cards) from /referee/{id}.
  • get_referee_matches(referee_id, max_pages=10) — paginates /referee/{id}/events/last/{page} and returns a flat list of recent event dicts.

Why

Referee identity and history are commonly used features for football modelling (cards, fouls, stoppage time, penalty rate). Sofascore exposes this data through the existing API but ScraperFC didn't surface
it. These three methods complete that gap with no new dependencies.

Tested

Smoke-tested end-to-end on a 2024/25 Premier League match: all three methods return non-empty data. Pagination on get_referee_matches correctly stops at empty pages.


get_match_referee(match_id: str | int) → dict | None

Get the referee dict for a single match.

Parameters:

  • match_id (str | int) – Sofascore match URL or match ID

Returns:

  • Referee dict for the match, or None if the match does not have a referee field.

Return type:

  • dict | None

get_referee(referee_id: str | int) → dict

Get a referee dict from a referee ID.

Parameters:

  • referee_id (str | int) – Sofascore referee ID

Raises:

  • TypeError – If referee_id is not a string or int.

Return type:

  • dict

get_referee_matches(referee_id: str | int, max_pages: int = 10) → list[dict]

Get recent match dicts for a referee.

Parameters:

  • referee_id (str | int) – Sofascore referee ID
  • max_pages (int) – Maximum number of pages to request. Defaults to 10.

Raises:

  • TypeError – If referee_id is not a string or int.
  • TypeError – If max_pages is not an int.

Returns:

  • Flat list of event dicts for the referee's recent matches.

Return type:

  • list[dict]

Adds three methods to `Sofascore` that wrap Sofascore's existing referee API endpoints. There is no breaking change — all additions are new methods alongside existing ones.
@codacy-production
Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 13 complexity · 0 duplication

Metric Results
Complexity 13
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@deepsource-io
Copy link
Copy Markdown
Contributor

deepsource-io Bot commented May 10, 2026

DeepSource Code Review

We reviewed changes in 50f5df9...98d6030 on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade   Security  

Reliability  

Complexity  

Hygiene  

Code Review Summary

Analyzer Status Updated (UTC) Details
Python May 10, 2026 11:35a.m. Review ↗
Code coverage May 10, 2026 11:35a.m. Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

return data

# ==============================================================================================
def get_match_referee(self, match_id: str | int) -> dict | None:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function necessary if it's just accessing a key returned by get_match_dict()?

return match_dict["referee"] if "referee" in match_dict else None

# ==============================================================================================
def get_referee(self, referee_id: str | int) -> dict:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there's any value in creating a SofascoreReferee object similar to what I've done for SofascorePlayer? Class attributes would things like id, name, and then the matches being scraped by the next function in your PR? And then this function would return an instance of the SofascoreReferee object?

return response["referee"]

# ==============================================================================================
def get_referee_matches(self, referee_id: str | int, max_pages: int=10) -> list[dict]:
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never really used referee data before so a few questions:

  • Is there any reason to default to 10 pages and now just scraping all of the matches?
  • Is there any concern with return a list of match dicts being "too much"? Either too much data/RAM or too much info? Thoughts on just returning a list of match IDs?

Again, I've never used ref data so I'm not familiar with the use case and what info is valuable or not valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants