How GrailWatch scores a listing
Every published score on GrailWatch is a deterministic function of public inputs. This page walks through what the engine does in the order it does it. If you can produce the same comp set, you'd get the same band — there's no hidden weight.
Step 1 — Build a fair-value band from comparable sales
For each reference, we keep a panel of recent verified sales (from eBay sold-completed, from auction archives, etc.). The engine reads the panel as of the listing's date, throws out anything older than 12 months, and normalizes each comp to "excellent condition / full set" by dividing through the condition + box-papers factors for the comp's brand-tier and era.
From the normalized panel, the engine reads the 20th, 50th, and 80th percentiles weighted by authenticity confidence (verified comps get full weight, probable get partial, unverified get a small fraction). The result is a floor / median / ceiling — the fair-value band.
If the panel has fewer than 5 verified comps in the recency window, confidence is reported as low and the deal tier is not published. Honest absence beats false precision.
Step 2 — Scale the band to the listing's condition + box-papers
The "excellent / full-set" band is then scaled by the listing's actual condition (unworn, excellent, very-good, good, fair, for-parts) and box-papers state (full-set, papers-only, box-only, watch-only). The factor tables are brand-tier + era aware: vintage top-luxury "watch-only" gets hit much harder than modern enthusiast watch-only, because that's how the market behaves.
Service state, if disclosed, applies one more small factor: +3% for recent-service, −8% for service-needed, −3% for unserviced-vintage, neutral for undisclosed.
Step 3 — Compute landed cost
The listing's ad price is converted to USD if needed, then we add:
- Import duty (per the duty table for the buyer's country)
- Brokerage / customs fee (when international + not DDP)
- VAT / sales tax (per the buyer's country)
- Buyer-paid third-party verification fee, if any
The resulting number is what you'd actually pay for the watch in hand. We score against landed cost — never the ad price — because that's the only honest comparison.
Step 4 — Score landed cost vs. the band
Pct-vs-median = (landed_cost − median) / median. Map to a deal tier:
- ≤ −10% from median → Great deal
- −10% to −3% → Good deal
- −3% to +7% → Fair price
- > +7% → Overpriced
The score is paired with a confidence label (low/medium/high) derived from comp count and band spread. A high-confidence "great deal" with 20 tight comps means more than a low-confidence "great deal" with 3 thin comps; the labels are honest about which is which.
Step 5 — Three-layer trust
Trust is computed in three independent layers:
- Platform — how protective the marketplace is (authentication, escrow, buyer protection, dispute resolution).
- Listing — integrity signals about the ad itself (stock-photo detection, reference consistency, serial format, stolen-registry status).
- Seller — account age, sales count, feedback percentage, identity cross-check status, off-platform-payment requests.
Each layer is on a 0–100 scale. The composite is a weighted median, but the composite NEVER overrides a fundamentally failing layer — if listing trust falls below 30, the engine flags the listing regardless of how strong the seller history is.
Step 6 — Graded risk
The risk read is built from named, factual signals: deep discounts vs band, new accounts, off-platform-payment requests, stolen- registry hits, stock-photo detection, reference inconsistencies. Each signal carries a weight; the sum maps to a risk level:
- Clear — no significant signals
- Caution — one or two signals worth noticing before you commit
- Elevated — multiple weak signals or one strong signal; the listing is shown but withheld from the merit-ranked feed
- High — multiple strong signals; the listing is shown only to a buyer who searches for it directly
Risk signals are factual observations ("the seller's account was created 12 days ago"), not accusations. GrailWatch never labels anyone a fraud or scammer — that wording is code- enforced out of every published string.
Step 7 — Calibration + likely-condition pass
Sellers describe condition inconsistently. The engine learns each seller's bias from their past sales and surfaces the likelycondition alongside the stated one. If a seller's history shows they routinely under-describe, a listing they call "good" is re-scored at "very-good" and the better number is shown as value if likely.
Step 8 — Audit trail
Every published score writes an immutable record: the canonical- JSON SHA-256 hash of the inputs, the engine version, the timestamp, and the published outputs. The trail is the legal-defense documentation; it survives erasure requests as a documented legitimate-interest override (see the Privacy Policy).
What we don't do
- Take payment from sellers to rank them higher. Ever.
- Republish copyrighted dealer photos or descriptions.
- Call anyone a fraud or scammer — risk is graded, signals are factual, language is code-enforced.
- Score a listing we can't justify. If the comp panel is too thin or the data sources don't agree, the score isn't published — the listing appears with a "not enough data to score" note.
Where to go next
- Try the Deal Calculator — paste a listing and get the full readout
- Browse the reference catalog
- Glossary of every label
- More on who's behind GrailWatch