5.4 KiB
Subagent D — Skipped constraints, violator samples
Two candidate constraints were SKIPPED at apply-time because pre-existing rows would have been rejected. They are documented here so Damir can decide whether to clean the data and re-attempt the constraint, or accept the current state.
The trigger clanovi_normalize_trigger already enforces both rules for new
inserts and for name-changing updates, so future data ingest cannot
reintroduce these patterns. Only retroactive enforcement on existing rows is
deferred.
C3 — CHECK (length(ime)>=2 AND length(prezime)>=2) — SKIPPED
Violator count: 22 rows.
Two clusters:
-
Single-letter
prezime—id=1160andid=1165, both klub_id=848:('Boris Mičetić','B')— note the embedded space inime; the surname appears truncated to a single initial.('Boris Mičetić','J')— same pattern.- Decision suggestion: probably real-name parse errors. Resolve manually in
clanovi.
-
Placeholder
ime='-'(single dash) — 20 rows, klub_id mostly NULL plus one with klub_id=3896:
| id | klub_id | ime | prezime |
|---|---|---|---|
| 1852 | NULL | - | Grabovac |
| 1853 | NULL | - | Pilepić |
| 1854 | NULL | - | Maslak |
| 1855 | NULL | - | Jugo |
| 1856 | NULL | - | Miličević |
| 1857 | NULL | - | Marjanović |
| 1858 | NULL | - | Poljak |
| 1859 | NULL | - | Kurelić |
| 2021 | 3896 | - | Mohorić |
| 2125 | NULL | - | Mittrovich (braća) |
| 2130 | NULL | - | Loich |
| 2131 | NULL | - | Paulinich |
| 2132 | NULL | - | Zidarich |
| 2133 | NULL | - | Bertok |
| 2134 | NULL | - | Marincich |
| 2135 | NULL | - | Tiblias |
| 2138 | NULL | - | Veselica |
| 2139 | NULL | - | Naumović |
| 2140 | NULL | - | Osojnak |
| 2141 | NULL | - | Medle |
These look like historical / surname-only roster entries (note napomena
on id=1852 mentions "POVIJESNI: KK Kvarner najtrofejnija generacija …" so the
cluster is intentional historical data with unknown given name).
Decision suggestion: replace ime='-' with ime='?' is also rejected;
either backfill the given names from a source, mark them inactive/historical
in another column, or accept the data and never enable C3.
C6 — UNIQUE (klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja,'0001-01-01')) — SKIPPED
Conflict groups: 68 (each group has 2+ rows that would collide).
Most are concentrated on klub_id=2362 (HNK Rijeka roster) where the same
player appears twice — once with datum_rodenja IS NULL and once also with
NULL DOB but a different scrape source / older id. Sample:
| klub_id | l_ime | l_prez | dob | dups | ids |
|---|---|---|---|---|---|
| 2362 | amer | gojak | NULL | 2 | {3402, 4214} |
| 2362 | leon | šerifi | NULL | 2 | {3334, 4238} |
| 2362 | ante | oreč | NULL | 2 | {1581, 4230} |
| 2362 | branko | pavić | NULL | 2 | {2715, 4231} |
| 2362 | lovro | kitin | NULL | 2 | {3481, 4220} |
| 2362 | ante | majstorović | NULL | 2 | {3456, 4224} |
| 2362 | dejan | petrovič | NULL | 2 | {3399, 4232} |
| 2362 | duje | čop | NULL | 2 | {1579, 4211} |
| 2362 | fran | škalamera | NULL | 2 | {3480, 4239} |
| 2362 | gabriel | rukavina | NULL | 2 | {3404, 4234} |
| 2362 | bruno | bogojević | NULL | 2 | {3437, 4208} |
| 2362 | aleksa | todorović | NULL | 2 | {3455, 4202} |
| 2362 | cherno | saho | NULL | 2 | {3403, 4235} |
| 2362 | luka | menalo | NULL | 2 | {3454, 4226} |
| 2362 | martin | zlomislić | NULL | 2 | {3440, 4203} |
| 2362 | mladen | devetak | NULL | 2 | {3400, 4212} |
| 2362 | niko | janković | NULL | 2 | {3607, 4218} |
| 2362 | noel | bodetić | NULL | 2 | {3705, 4207} |
| 2362 | silvio | ilinković | NULL | 2 | {3412, 4217} |
| 2362 | šimun | butić | NULL | 2 | {3401, 4209} |
| 2362 | stjepan | radeljić | NULL | 2 | {3448, 4233} |
| 2362 | toni | fruk | 2001-03-09 | 2 | {3438, 4135} |
| 2362 | vito | kovač | NULL | 2 | {3298, 4201} |
| 2362 | jovan | manev | NULL | 2 | {3439, 4225} |
| 2585 | ivo | butrica | NULL | 2 | {2282, 4163} |
| 2585 | luko | ledinić | NULL | 2 | {2283, 4164} |
| 2586 | siniša | saftić | NULL | 2 | {2298, 4165} |
| 2587 | damir | poslek | NULL | 2 | {2310, 4167} |
| 2589 | matej | viduka | NULL | 2 | {2340, 4174} |
| 2589 | čedo | vukelić | NULL | 2 | {2339, 4175} |
(38 more groups not shown — query reproduction below.)
Cause: the dedup-fold key collapses on COALESCE(NULL, '0001-01-01'), so
two records of the same name+klub with missing DOB look identical even when
they are distinct profiles (different profile_url, source_id, hns_igrac_id).
Today's working composite key is the existing uq_clanovi_klub_profile (klub_id, profile_url) which is already enforced.
Decision suggestion: do NOT enable C6 as-is. Either (a) restrict the
uniqueness to WHERE datum_rodenja IS NOT NULL, or (b) merge true dupes via a
follow-up subagent that promotes one row and back-fills hns_igrac_id /
profile_url. Until then, ingestion is still protected by
uq_clanovi_klub_profile and (for HNS-keyed players) clanovi_hns_uniq.
Reproduce full list
SELECT klub_id, lower(ime) AS l_ime, lower(prezime) AS l_prez,
COALESCE(datum_rodenja, '0001-01-01'::date) AS dob,
count(*) AS dups,
array_agg(id ORDER BY id) AS ids
FROM pgz_sport.clanovi
GROUP BY klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja, '0001-01-01'::date)
HAVING count(*) > 1
ORDER BY dups DESC, klub_id;