Update profiles parser and related staged changes by AJayaV06 · Pull Request #156 · UTDNebula/api-tools

AJayaV06 · 2026-04-09T20:38:14Z

No description provided.

Copilot

Pull request overview

This PR modernizes the UTD Profiles pipeline by switching the profiles scraper to use the official Profiles API (instead of ChromeDP HTML scraping) and adding a new parser path that consumes the raw API output while preserving a fallback to the legacy profiles.json format.

Changes:

Replaced ChromeDP-based profile scraping with net/http calls to the Profiles API, writing profiles_raw.json (+ index output) to disk.
Added optional PROFILE_SCHOOLS environment variable to scope profile detail requests by school code.
Added a new parser.LoadProfiles implementation for profiles_raw.json, updated loader fallback behavior, and introduced unit tests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`scrapers/profiles.go`	Reimplemented scraping via Profiles API; adds batching, school-scoped fetching, and JSON file outputs.
`scrapers/profiles_test.go`	Adds unit coverage for delimiter parsing, request URL building, and dedupe logic.
`parser/profiles.go`	New parser/loader for `profiles_raw.json` into `schema.Professor` records with heuristics for titles/URLs/location.
`parser/profiles_test.go`	Adds unit tests for “best” selection heuristics and fallbacks.
`parser/profileLoader.go`	Attempts `profiles_raw.json` load first, falls back to legacy `profiles.json`.
`README.md`	Documents `PROFILE_SCHOOLS` usage and behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T20:42:01Z

parser/profiles.go

+	apiPrimaryLocationRegex  = regexp.MustCompile(`^(\w+)\s+(\d+\.\d{3}[A-z]?)$`)
+	apiFallbackLocationRegex = regexp.MustCompile(`^([A-z]+)(\d+)\.?([\d]{3}[A-z]?)$`)
+)


The location regexes use [A-z]/[A-z]+, which also matches non-letter ASCII characters (e.g., [, \, ], ^, _, `). This can lead to incorrect matches for building codes / room suffixes. Use [A-Za-z] (or a more specific allowed set) instead.

Copilot · 2026-04-09T20:42:01Z

parser/profiles.go

+type profileDetailsResponse struct {
+	Information []profileInformation `json:"information"`
+	Areas       []profileArea        `json:"areas"`
+}


profileDetailsResponse is declared but never used in this package. Consider removing it (or using it) to avoid accumulating dead/unused types and reduce maintenance overhead.

Copilot · 2026-04-09T20:42:01Z

parser/profiles.go

+func firstInformationData(items []profileInformation) profileInformationData {
+	if len(items) == 0 {
+		return profileInformationData{}
+	}
+	return items[0].Data
+}


firstInformationData is never called. If it’s not needed, removing it will keep the parser surface area smaller and make future refactors less error-prone.

Copilot · 2026-04-09T20:42:02Z

scrapers/profiles.go

+	if len(indexResponse.Profile) == 0 {
+		log.Print("Profile API returned no profiles.")
+		return
+	}

-func parseLocation(text string) schema.Location {
-	var building string
-	var room string
+	schools := parseProfileSchoolsFromEnv()

-	submatches := primaryLocationRegex.FindStringSubmatch(text)
-	if submatches == nil {
-		submatches = fallbackLocationRegex.FindStringSubmatch(text)
-		if submatches == nil {
-			return schema.Location{}
-		} else {
-			building = submatches[1]
-			room = fmt.Sprintf("%s.%s", submatches[2], submatches[3])
+	slugs := make([]string, 0, len(indexResponse.Profile))
+	for _, row := range indexResponse.Profile {
+		slug := strings.TrimSpace(row.Slug)
+		if slug == "" {
+			continue
 		}
-	} else {
-		building = submatches[1]
-		room = submatches[2]
+		slugs = append(slugs, slug)
 	}
+	slugs = dedupeStrings(slugs)

-	return schema.Location{
-		Building: building,
-		Room:     room,
-		Map_uri:  fmt.Sprintf("https://locator.utdallas.edu/%s_%s", building, room),
+	if len(slugs) == 0 {
+		log.Print("Profile API index contained no valid slugs.")


When PROFILE_SCHOOLS is set, the scraper still builds slugs from the index and returns early if the index is empty / contains no valid slugs. That prevents the school-scoped path from running even though it doesn’t require slugs. Consider branching on schools earlier so the school-based scrape can proceed even if the index has no/invalid slugs (and only enforce the slug checks in the slug-batch path).

Update profiles parser and related staged changes

83f42ff

Copilot AI review requested due to automatic review settings April 9, 2026 20:38

Copilot started reviewing on behalf of AJayaV06 April 9, 2026 20:38 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update profiles parser and related staged changes#156

Update profiles parser and related staged changes#156
AJayaV06 wants to merge 1 commit intoUTDNebula:developfrom
AJayaV06:feat/profiles-parser-staged-updates-2026-04-09

AJayaV06 commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AJayaV06 commented Apr 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants