Skip to content

Commit 3b0df8a

Browse files
authored
Merge branch 'develop' into feat/174-searchLogAndUiImprove
2 parents c01c0f0 + c377757 commit 3b0df8a

14 files changed

Lines changed: 268 additions & 11 deletions

.github/workflows/crawl_recent_tj.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,6 @@ jobs:
3333
echo "SUPABASE_URL=${{ secrets.SUPABASE_URL }}" >> .env
3434
echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env
3535
36-
- name: run crawl script
36+
- name: run crawl script - crawlRecentTJ.ts
3737
working-directory: packages/crawling
3838
run: pnpm run recent-tj

.github/workflows/tagging_song.yml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
name: Tagging Songs
2+
3+
on:
4+
schedule:
5+
- cron: "0 14 * * *" # 한국 시간 23:00 실행 (UTC+9 → UTC 14:00)
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: write # push 권한을 위해 필요
10+
11+
jobs:
12+
run-npm-task:
13+
runs-on: ubuntu-latest
14+
15+
steps:
16+
- name: Checkout branch
17+
uses: actions/checkout@v4
18+
19+
- name: Use Node.js 20
20+
uses: actions/setup-node@v4
21+
with:
22+
node-version: "20"
23+
24+
- name: Install pnpm
25+
uses: pnpm/action-setup@v2
26+
with:
27+
version: 9
28+
run_install: false
29+
30+
- name: Install dependencies
31+
working-directory: packages/crawling
32+
run: pnpm install
33+
34+
- name: Create .env file
35+
working-directory: packages/crawling
36+
run: |
37+
echo "SUPABASE_URL=${{ secrets.SUPABASE_URL }}" >> .env
38+
echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env
39+
echo "OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> .env
40+
41+
- name: run tagging script - taggingSongs.ts
42+
working-directory: packages/crawling
43+
run: pnpm run tag-songs

.github/workflows/update_ky_youtube.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@ jobs:
3838
echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env
3939
echo "OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> .env
4040
41-
- name: run update script - packages/crawling/crawlYoutube.ts
41+
- name: run update script - crawlYoutube.ts
4242
working-directory: packages/crawling
4343
run: pnpm run ky-youtube

.github/workflows/verify_ky_youtube.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,6 @@ jobs:
3838
echo "SUPABASE_KEY=${{ secrets.SUPABASE_KEY }}" >> .env
3939
echo "OPENAI_API_KEY=${{ secrets.OPENAI_API_KEY }}" >> .env
4040
41-
- name: run verify script - packages/crawling
41+
- name: run verify script - crawlYoutubeVerify.ts
4242
working-directory: packages/crawling
4343
run: pnpm run ky-verify

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ packages/
4444
eslint-config/ — Shared ESLint config (@repo/eslint-config)
4545
format-config/ — Shared Prettier config (@repo/format-config)
4646
typescript-config/ — Shared tsconfig bases
47-
crawling/ — One-off data crawling scripts (not a published package)
47+
crawling/ — Data crawling & tagging scripts (see packages/crawling/CLAUDE.md)
4848
```
4949

5050
## Web App Architecture

packages/crawling/CLAUDE.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@ pnpm ky-open # Open API(금영)로 KY 번호 수집
1313
pnpm ky-youtube # YouTube 크롤링으로 KY 번호 수집 + AI 검증
1414
pnpm ky-verify # 기존 KY 번호의 실제 존재 여부 재검증 (체크포인트 지원)
1515
pnpm ky-update # ky-youtube + ky-verify 병렬 실행
16+
pnpm recent-tj # TJ 최신곡 크롤링
17+
pnpm tag-songs # AI 기반 곡 자동 태깅
1618
pnpm test # vitest 실행
1719
pnpm lint # ESLint
1820
```
@@ -94,8 +96,33 @@ findKYByOpen.ts
9496
| ------------------ | -------------------------------- |
9597
| `songs` | 메인 곡 데이터 (TJ/KY 번호 포함) |
9698
| `invalid_ky_songs` | KY 번호 수집 실패 목록 |
99+
| `tags` | 태그 마스터 (id, name, category) |
100+
| `song_tags` | 곡-태그 매핑 (song_id, tag_id) |
101+
| `verify_ky_songs` | KY 번호 검증 완료 목록 |
97102

98103
### AI 유틸
99104

100105
- `utils/validateSongMatch.ts``gpt-4o-mini`로 두 (제목, 아티스트) 쌍이 같은 곡인지 판단. `temperature: 0`, `max_tokens: 20`, 완전 일치 시 API 호출 생략.
101106
- `utils/transChatGPT.ts``gpt-4-turbo`로 일본어 → 한국어 번역.
107+
- `utils/getSongTag.ts``gpt-4o-mini`로 곡에 적절한 태그 ID 자동 할당. DB의 `tags` 테이블에서 태그 목록을 캐싱하여 프롬프트에 포함.
108+
109+
### 곡 태깅 파이프라인
110+
111+
```
112+
taggingSongs.ts
113+
└─ getSongsAllDB() # 전체 곡 조회
114+
└─ getSongTagSongIdsDB() # 이미 태그된 곡 ID Set 로드 (스킵 처리)
115+
└─ autoTagSong(title, artist) # AI로 태그 ID 추출 (1~4개)
116+
└─ postSongTagsDB(songId, tagIds) # song_tags 테이블에 insert
117+
```
118+
119+
### GitHub Actions 워크플로우
120+
121+
| 워크플로우 파일 | 스케줄 (UTC) | 실행 스크립트 |
122+
| ----------------------- | ------------ | -------------------- |
123+
| `crawl_recent_tj.yml` | 매일 14:00 | `pnpm recent-tj` |
124+
| `tagging_song.yml` | 매일 14:00 | `pnpm tag-songs` |
125+
| `update_ky_youtube.yml` | 수동 | `pnpm ky-youtube` |
126+
| `verify_ky_youtube.yml` | 수동 | `pnpm ky-verify` |
127+
128+
모든 워크플로우는 `workflow_dispatch`로 수동 실행도 가능하다.

packages/crawling/package.json

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@
88
},
99
"scripts": {
1010
"ky-open": "tsx src/findKYByOpen.ts",
11-
"ky-youtube": "tsx src/crawling/crawlYoutube.ts",
12-
"ky-verify": "tsx src/crawling/crawlYoutubeVerify.ts",
11+
"ky-youtube": "tsx src/cron/crawlYoutube.ts",
12+
"ky-verify": "tsx src/cron/crawlYoutubeVerify.ts",
1313
"ky-update": "pnpm run ky-youtube & pnpm run ky-verify",
14-
"recent-tj": "tsx src/crawling/crawlRecentTJ.ts",
14+
"recent-tj": "tsx src/cron/crawlRecentTJ.ts",
15+
"tag-songs": "tsx src/cron/taggingSongs.ts",
1516
"lint": "eslint .",
1617
"test": "vitest run",
1718
"format": "prettier --write \"**/*.{ts,tsx,md}\""
File renamed without changes.

packages/crawling/src/crawling/crawlYoutube.ts renamed to packages/crawling/src/cron/crawlYoutube.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import { postInvalidKYSongsDB } from '@/supabase/postDB';
66
import { updateSongsKyDB } from '@/supabase/updateDB';
77
import { Song } from '@/types';
88

9-
import { isValidKYExistNumber } from './isValidKYExistNumber';
9+
import { isValidKYExistNumber } from '../crawling/isValidKYExistNumber';
1010

1111
// --- Constants ---
1212
const BASE_YOUTUBE_SEARCH_URL = 'https://www.youtube.com/@KARAOKEKY/search';

packages/crawling/src/crawling/crawlYoutubeVerify.ts renamed to packages/crawling/src/cron/crawlYoutubeVerify.ts

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import { getSongsKyNotNullDB, getVerifyKySongsDB } from '@/supabase/getDB';
44
import { postVerifyKySongsDB } from '@/supabase/postDB';
55
import { updateSongsKyDB } from '@/supabase/updateDB';
66

7-
import { isValidKYExistNumber } from './isValidKYExistNumber';
7+
import { isValidKYExistNumber } from '../crawling/isValidKYExistNumber';
88

99
// 기존에 등록된 KY 노래방 번호가 실제로 KY 노래방과 일치하는지 검증
1010
// 유효한 곡은 verify_ky_songs 테이블에 insert
@@ -44,9 +44,8 @@ for (const song of data) {
4444
}
4545

4646
index++;
47-
console.log('crawlYoutubeVerify : ', index);
4847

49-
if (index >= 2000) break;
48+
if (index >= 5000) break;
5049
}
5150

5251
browser.close();

0 commit comments

Comments
 (0)