Files
pgz-sport/scripts/sport_harvesters/hrs_handball.py
T
damir 9fb512932a HNS+UI: 4 nova endpointa + multi-sport schema (M2M kategorije + player_stats)
Endpoints:
- GET /api/v2/enrich-sources — sport→source mapping
- GET /api/v2/klubovi/priority-sort — financirani/godišnjak prvi
- GET /api/v2/clan/{id}/kategorije — many-to-many kategorije
- GET /api/v2/clan/{id}/full — kompletna slika (profil+kategorije+sezone+utakmice+stats)
- POST /api/v2/export/klubovi — XLSX export selektiranih

Schema:
- pgz_sport.clan_kategorije (M2M: igrač u juniorskoj+seniorskoj)
- pgz_sport.player_stats (multi-sport: nogomet/košarka/rukomet/odbojka/vaterpolo)
- pgz_sport.klub_roster (multi-source)
- pgz_sport.enrichment_sources (sport→izvor)
- View: v_pgz_priority_klubovi (financiran || u_godisnjaku)
- View: v_klubovi_priority_sort (priority sort)

Sport harvesters scaffold:
- scripts/sport_harvesters/__base.py (SportHarvester class)
- hks_basketball.py, hrs_handball.py, hos_volleyball.py, hvs_waterpolo.py
2026-05-05 10:42:49 +02:00

28 lines
994 B
Python
Executable File

#!/usr/bin/env python3
"""HRS handball harvester."""
import sys
sys.path.insert(0, '/opt/pgz-sport/scripts/sport_harvesters')
from __base import SportHarvester
class HRSHarvester(SportHarvester):
SPORT = 'rukomet'
SOURCE = 'hrs'
def scrape_klub(self, page, klub):
url = f"https://hrs.hr/?s={klub['naziv'].replace(' ','+')}"
self.log(f" 🤾 Klub {klub['id']} {klub['naziv']}")
try:
page.goto(url, wait_until="domcontentloaded", timeout=20000)
# Find natjecanje or klub link
links = page.locator('a[href*="hrs.hr"]').all()
for a in links[:5]:
href = a.get_attribute('href') or ''
if 'natjecanje' in href or 'klub' in href:
self.log(f" Found: {href}")
break
except Exception as e:
self.log(f"{e}")
if __name__ == '__main__':
HRSHarvester().run(limit=int(sys.argv[1]) if len(sys.argv) > 1 else 50)