# Subagent D — Skipped constraints, violator samples Two candidate constraints were SKIPPED at apply-time because pre-existing rows would have been rejected. They are documented here so Damir can decide whether to clean the data and re-attempt the constraint, or accept the current state. The trigger `clanovi_normalize_trigger` already enforces both rules **for new inserts and for name-changing updates**, so future data ingest cannot reintroduce these patterns. Only retroactive enforcement on existing rows is deferred. --- ## C3 — `CHECK (length(ime)>=2 AND length(prezime)>=2)` — SKIPPED Violator count: **22** rows. Two clusters: 1. **Single-letter `prezime`** — `id=1160` and `id=1165`, both klub_id=848: - `('Boris Mičetić','B')` — note the embedded space in `ime`; the surname appears truncated to a single initial. - `('Boris Mičetić','J')` — same pattern. - **Decision suggestion**: probably real-name parse errors. Resolve manually in `clanovi`. 2. **Placeholder `ime='-'` (single dash)** — 20 rows, klub_id mostly NULL plus one with klub_id=3896: | id | klub_id | ime | prezime | |----|---------|-----|---------| | 1852 | NULL | - | Grabovac | | 1853 | NULL | - | Pilepić | | 1854 | NULL | - | Maslak | | 1855 | NULL | - | Jugo | | 1856 | NULL | - | Miličević | | 1857 | NULL | - | Marjanović | | 1858 | NULL | - | Poljak | | 1859 | NULL | - | Kurelić | | 2021 | 3896 | - | Mohorić | | 2125 | NULL | - | Mittrovich (braća) | | 2130 | NULL | - | Loich | | 2131 | NULL | - | Paulinich | | 2132 | NULL | - | Zidarich | | 2133 | NULL | - | Bertok | | 2134 | NULL | - | Marincich | | 2135 | NULL | - | Tiblias | | 2138 | NULL | - | Veselica | | 2139 | NULL | - | Naumović | | 2140 | NULL | - | Osojnak | | 2141 | NULL | - | Medle | These look like **historical / surname-only roster entries** (note `napomena` on id=1852 mentions "POVIJESNI: KK Kvarner najtrofejnija generacija …" so the cluster is intentional historical data with unknown given name). **Decision suggestion**: replace `ime='-'` with `ime='?'` is also rejected; either backfill the given names from a source, mark them inactive/historical in another column, or accept the data and never enable C3. --- ## C6 — `UNIQUE (klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja,'0001-01-01'))` — SKIPPED Conflict groups: **68** (each group has 2+ rows that would collide). Most are concentrated on **klub_id=2362 (HNK Rijeka roster)** where the same player appears twice — once with `datum_rodenja IS NULL` and once also with NULL DOB but a different scrape source / older `id`. Sample: | klub_id | l_ime | l_prez | dob | dups | ids | |---------|-------|--------|-----|------|-----| | 2362 | amer | gojak | NULL | 2 | {3402, 4214} | | 2362 | leon | šerifi | NULL | 2 | {3334, 4238} | | 2362 | ante | oreč | NULL | 2 | {1581, 4230} | | 2362 | branko | pavić | NULL | 2 | {2715, 4231} | | 2362 | lovro | kitin | NULL | 2 | {3481, 4220} | | 2362 | ante | majstorović | NULL | 2 | {3456, 4224} | | 2362 | dejan | petrovič | NULL | 2 | {3399, 4232} | | 2362 | duje | čop | NULL | 2 | {1579, 4211} | | 2362 | fran | škalamera | NULL | 2 | {3480, 4239} | | 2362 | gabriel | rukavina | NULL | 2 | {3404, 4234} | | 2362 | bruno | bogojević | NULL | 2 | {3437, 4208} | | 2362 | aleksa | todorović | NULL | 2 | {3455, 4202} | | 2362 | cherno | saho | NULL | 2 | {3403, 4235} | | 2362 | luka | menalo | NULL | 2 | {3454, 4226} | | 2362 | martin | zlomislić | NULL | 2 | {3440, 4203} | | 2362 | mladen | devetak | NULL | 2 | {3400, 4212} | | 2362 | niko | janković | NULL | 2 | {3607, 4218} | | 2362 | noel | bodetić | NULL | 2 | {3705, 4207} | | 2362 | silvio | ilinković | NULL | 2 | {3412, 4217} | | 2362 | šimun | butić | NULL | 2 | {3401, 4209} | | 2362 | stjepan | radeljić | NULL | 2 | {3448, 4233} | | 2362 | toni | fruk | 2001-03-09 | 2 | {3438, 4135} | | 2362 | vito | kovač | NULL | 2 | {3298, 4201} | | 2362 | jovan | manev | NULL | 2 | {3439, 4225} | | 2585 | ivo | butrica | NULL | 2 | {2282, 4163} | | 2585 | luko | ledinić | NULL | 2 | {2283, 4164} | | 2586 | siniša | saftić | NULL | 2 | {2298, 4165} | | 2587 | damir | poslek | NULL | 2 | {2310, 4167} | | 2589 | matej | viduka | NULL | 2 | {2340, 4174} | | 2589 | čedo | vukelić | NULL | 2 | {2339, 4175} | (38 more groups not shown — query reproduction below.) **Cause**: the dedup-fold key collapses on `COALESCE(NULL, '0001-01-01')`, so two records of the same name+klub with missing DOB look identical even when they are distinct profiles (different `profile_url`, `source_id`, `hns_igrac_id`). Today's working composite key is the existing `uq_clanovi_klub_profile (klub_id, profile_url)` which is already enforced. **Decision suggestion**: do NOT enable C6 as-is. Either (a) restrict the uniqueness to `WHERE datum_rodenja IS NOT NULL`, or (b) merge true dupes via a follow-up subagent that promotes one row and back-fills `hns_igrac_id` / `profile_url`. Until then, ingestion is still protected by `uq_clanovi_klub_profile` and (for HNS-keyed players) `clanovi_hns_uniq`. ### Reproduce full list ```sql SELECT klub_id, lower(ime) AS l_ime, lower(prezime) AS l_prez, COALESCE(datum_rodenja, '0001-01-01'::date) AS dob, count(*) AS dups, array_agg(id ORDER BY id) AS ids FROM pgz_sport.clanovi GROUP BY klub_id, lower(ime), lower(prezime), COALESCE(datum_rodenja, '0001-01-01'::date) HAVING count(*) > 1 ORDER BY dups DESC, klub_id; ```