test_instruments.py) loads all 11 JSON instrument definitions and runs 16 test groups. Each test is a discrete assertion that either passes or fails with a specific diagnostic message. The tests are designed to catch both structural errors (malformed JSON, missing fields, broken references) and clinical logic errors (incorrect scoring, wrong phenotype classification, failed trigger chains).locea_25 ("My appetite feels more stable when I eat whole, balanced meals"). Short-form sentinel set uses the exact 8 items specified in the source (1, 3, 5, 9, 14, 19, 23, 30). MI rewordings are composed. Note the response scale uses "Almost always" at the top anchor (per source) rather than "Very often" — this is faithful to the Coaching Guidelines text.reverse_scored:true marker on locea_25 (scoring was already correct, marker was missing for structural consistency).escalation:true — these are eating disorder screening flags that route to clinician review. All item text transcribed verbatim.scoring_method from "mean" to "trigger_mean" to clearly distinguish trigger_load from the 6 core symptom domains in the scoring engine.instrument_type: "knowledge" with scoring_method: "knowledge" — fundamentally different from the Likert instruments. Every item has a correct_answer field. Scored as sum of correct answers (/60) with three-tier thresholds (Low 0–20, Moderate 21–40, High 41–60). web_form_enabled: true, conversational_enabled: false per the decision that CANLA is web-form only. No short form, no phenotype rules (this is a knowledge test, not a clinical profiler). All item text and answer options transcribed verbatim. Correct answers are derived — the source did not provide an explicit answer key, but every item has one unambiguously correct option.reverse_scored: true) and in the scoring block. The FSMCA scoring direction is inverted compared to the psychology instruments: higher scores = greater competence (strength), lower scores = impairment. The severity thresholds reflect this (3.0–4.0 = "Strong", 0.0–0.9 = "Severe impairment"). Short form uses the 12-item sentinel set specified in the source (items 1, 3, 7, 10, 13, 20, 25, 30, 33, 36, 42, 47) — not 8 items, because the source specified 12 for this instrument. MI rewordings are composed. All item text transcribed verbatim.likert_0_4 frequency scale, while Section F (Sleep Apnea Risk Screen) uses a differentiated apnea_severity_0_4 scale (No / Unsure-occasional / Mild / Frequent / Severe) as specified in the source. Global score excludes the apnea section (mean of items 1–30 only). Two clinical flags for apnea escalation: any single item ≥ 3 = high suspicion, domain mean ≥ 2.0 = screen further (STOP-BANG/sleep study). No reverse-scored items. Short form is composed (source specifies short form use at onboarding/follow-up but not specific items) — 6 sentinel items, one per domain. MI rewordings composed.mcaa_05 ("I avoid activity even when I could do it") within Section A. The scoring_direction field on each section makes this explicit for the scoring engine. Short form is composed — 6 sentinel items, one per domain. MI rewordings composed.coaching_rule field captures the key clinical constraint: "Never give action-heavy coaching to a low-readiness, high-ambivalence patient." Each phenotype includes a coaching_stance field mapping to MI/CBT/ACT modality selection. Short form is composed — 6 items, one per domain plus a second recovery item. MI rewordings composed. Timeframe is "Current state" (not "Past 4 weeks") per the source.reverse_scored:true marker on the item object. The scoring block already listed locea_25 in reverse_scored_items and the scoring engine was applying the reverse correctly. The marker was missing only as an inline annotation for structural consistency with how FSMCA and CRSEM mark their reverse-scored items.scoring_method from "mean" to "trigger_mean" for the Trigger Food Mapping section. The trigger_load domain is intentionally scored separately from the 6 core symptom domains (it uses a different scale and feeds into the Trigger Load Score, not the Core Total Mean). The original "mean" value caused the domain alignment test to flag trigger_load as a missing core domain. The fix makes the separation explicit in the schema.assessment_engine.py) and the web-form rendering system (assessment.php). Upon receipt of the 8 pending source Word documents, the remaining definitions will be built to the same standard and subjected to the same test suite.