Commit Graph

2 Commits

Author SHA1 Message Date
CC6 Worker ece556de11 M12.4: real HNS Semafor scraper for sportas + 24/7 enrichment worker
Critical bug fix: /v2/enrich/sportas/{id} returned proposed:{} for athletes
because the v3 pipeline was still relying on Wikipedia-only evidence and never
actually fetched semafor.hns.family/igraci/.

- enrich_router._propose_for_sportas now:
  • Resolves a HNS Semafor URL from profile_url, source_url, hns_igrac_id,
    vanjski_id JSONB ('hns_comet'+'hns_slug'), or source='hns_semafor'+source_id.
  • Fetches and parses the player page (BS4, regex fallback) and proposes
    profile_url, source_url, slika_url, hns_igrac_id, datum_rodenja,
    mjesto_rodenja, broj_dresa, biografija (DeepSeek synthesis from HNS+Wiki).
- _load_row(sportas) widened to read every relevant column + vanjski_id.
- _TABLE_MAP['sportas'] writeback whitelist expanded to 12 fields.
- workers/enrichment_worker.py: 24/7 daemon, picks under-enriched
  clanovi/klubovi/savezi every 5 min via SQL, calls /apply for each.
- systemd unit pgz-sport-enricher.service installed + enabled.
- Tested end-to-end: id=2222 (Abdija) and id=449 (Zec) now have
  profile_url, slika_url, hns_igrac_id, biografija persisted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 00:36:57 +02:00
Damir Radulić a7ec0a86be PGŽ Sport Platform — Round 1+2 baseline (sport2.html + API) 2026-05-04 23:39:08 +02:00