CC6 Worker faf6beb536 M12.6 SF: sport-aware enrichment + federation map (HBS, HKS, HRS, HOS, HVS, HPS, HBS bocanje…)
- data/sport_federations.json: 24 Croatian sport federations + aliases +
  PGŽ local media (Novi list, Glas Istre, Rijeka.danas).
- enrich_router._sport_fed/_normalize_sport/_load_sport_feds: cached
  loader that picks up file changes via mtime.
- _research_links() now sport-aware: when row.sport maps to a known fed,
  the dynamic links list shows that fed (national + PGŽ regional) plus the
  three PGŽ local-media search URLs in place of the static HNS Semafor +
  transfermarkt fallback.
- scrape_sport_federation(sport, ime, prezime): generic profile-page
  scraper (slug pattern OR search-results crawl) → returns
  {profile_url, slika_url, datum_rodenja, mjesto_rodenja, klub_naziv}.
- _propose_for_sportas() now routes through the federation scraper before
  HNS Semafor; HNS path is gated to nogomet or rows already linked.
- _load_row(sportas) JOINs klubovi to fall back to klub.sport when
  c.sport is empty.
- Tested on 1024 Marijan Alkić (boćanje): proposed profile_url +
  datum_rodenja from hrvatski-bocarski-savez.hr; /apply persisted them.
- Tested on 3335 Toni Jelenković (košarka) and 3379 Niko Miknić
  (plivanje): research_links surface HKS/KS PGŽ and HPS respectively.

Worker:
- _pick_sportas now selects on coverage<70 across ALL sports (sport
  set OR known external linkage), not just hns_*.
- _SOURCE_WEIGHTS extended with 16 federation hosts at 0.88-0.92.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 01:30:16 +02:00
S
Description
PGZ Sport Intelligence Platform
3 GiB
Languages
HTML 89.3%
Python 9.5%
Stata 0.8%
Ruby 0.2%