Methodology — how Tripozi sources and verifies every page

The four-tier stack

Tier 1 — Hand-curated visa matrix. Zero AI, 100% manual review, every entry linked back to the issuing government.
Tier 2 — OSM-grounded accessibility and dietary. Venue lists pulled from OpenStreetMap contributor tags, enriched with Wikipedia and named-editor narrative.
Tier 3 — Guarded AI for destination content. Gemini Flash Lite, Redis-cached 30-60 days, Zod-validated, advance-booking warnings injected where stakes warrant.
Tier 4 — Deindexed thin combinatorials. ~3,132 pages that exist for navigation but are marked noindex, follow so Google doesn’t treat them as ranking signals.

Tier 1 — Hand-curated visa matrix

The visa vertical is the strictest layer on Tripozi because the cost of a wrong answer is a missed flight or detention at the border. The whole tier operates on three rules:

Government source per entry. Every one of the 72 passport × destination combinations links to the issuing country’s ministry of foreign affairs, immigration authority, or consular portal — never to third-party aggregators as a primary source. Passport Index and Sherpa are used only as secondary cross-checks during review.
Last-verified date visible. Every page shows the last human review date so readers know the data’s freshness. Anything older than a quarter is flagged for re-check.
Editorial uncertainty disclosure. When a rule is ambiguous or in transition (e.g., Korea’s K-ETA waiver periods, Mexico’s 2022-onwards tightening of the 180-day FMM grant), the page says so and points the reader to the primary source rather than fabricating confidence.

Maintenance cadence: quarterly full matrix review; out-of-cycle updates when a reader or the news cycle flags a regulatory change. The URL audit is automated (we check every officialSource.url and applicationUrl returns a live HTTP response).

Tier 2 — OSM-grounded accessibility + dietary

The accessibility and dietary verticals cover 10 European cities where OpenStreetMap has enough contributor-tagged venues to ground the content honestly. Our rigour discipline here:

Real venues, not AI-generated. Every restaurant / museum / hotel shown is an actual business with an OSM entry, tagged by a human contributor. We query the OSM Overpass API for wheelchair=yes|limited|designated (accessibility) or diet:halal|kosher|vegan|gluten_free (dietary), then filter by neighbourhood, venue type, and recency.
Honest density labelling. Sparse cities (Prague, Valencia, Tallinn, Porto have fewer wheelchair-tagged venues than Barcelona) render with honest venue counts and “this is what we found” framing — no padding with generic phrases or re-ordered lists to fake depth.
Local-dish compatibility ratings. Dietary pages include a four-tier classification for traditional local dishes (always / usually / ask / never dietary-compatible) so readers know when to expect friction at local restaurants rather than getting a list of “halal-friendly” places that serve pork alongside.
Wikipedia-grounded neighbourhood context. Vertical pages draw neighbourhood context from Wikipedia rather than AI-generated narrative. Wikipedia isn’t perfect but it has editorial oversight that raw AI output doesn’t — real encyclopedia-style sourcing beats hallucinated paragraphs.
Human editorial review. Every vertical page carries an “edited by the Tripozi editorial team” byline and a visible last-verified date. “Editorial team” today means a very small group — Tripozi is an indie project, not a newsroom. We’re explicit about this on the about page so readers can calibrate trust accordingly.
Weekly refresh cron. OSM + Wikipedia caches expire weekly (Sunday 04:00 UTC) so readers get venue data no more than 7 days stale.
Reader contributions loop. Each page has a “suggest a venue” form that writes to our moderation queue. Approved suggestions are tagged back to OpenStreetMap so the whole open-data ecosystem benefits — not just Tripozi.

Tier 3 — Guarded AI for destination content

The main destination grid (172 cities, per-destination tools, country hubs, itinerary modifiers) uses Google Gemini Flash Lite as the text generator. This tier earns the most scepticism from readers — fair, given the AI-content avalanche of the early 2020s — so we run it with explicit guardrails:

Prompts grounded in curated data. Gemini doesn’t just receive the destination name; it receives real GPS coordinates, the actual bestMonths array, live Open-Meteo weather, and the curated “popular for” interest tags. Hallucination surface is smaller when the prompt is concrete.
Cached per destination × modifier combo. The same “Tokyo 3-day itinerary” query returns the same itinerary for 30-60 days, reducing per-query variance. No “every user sees different facts”.
Zod schema validation + enum-drift fallbacks. Every Gemini response is parsed through a strict schema. If the model returns an unexpected enum value (e.g., a dietary compatibility rating outside the four allowed), we either normalise it or reject the response and retry with a different prompt.
Book-ahead warnings. Attractions known to sell out 1-3 months in advance (teamLab Planets, Louvre, Colosseum, Alhambra) carry explicit “reserve X weeks ahead” badges in the itinerary, regardless of what the LLM wrote.
Generous time ranges over false precision. The copy prefers “open most mornings 10-18” to “opens at 10:17”. Opening hours drift; the LLM doesn’t know that. Readers are reminded to verify before booking.
No per-keyword long-tail page generation. We don’t generate a unique AI page per query like “best bakeries in Tokyo”. Our modifier grid is finite (37 modifiers × 172 destinations, with 6 modifiers deindexed as duplicates — see tier 4), which means the AI surface is bounded and reviewable.

Tier 4 — Deindexed thin combinatorials

About 3,132 pages on Tripozi exist for site navigation but carry a noindex, follow meta tag so they don’t compete in Google’s ranking index. This is deliberate — a deindex budget we’re willing to spend to keep the indexable surface substantive:

2,088 month pages (/[destination]/in/[month]) — thin combinatorial aggregates that would otherwise flood the index. Kept accessible for users who land via deep search but blocked from ranking.
1,044 non-canonical modifier duplicates — we marked six modifiers as intent-duplicates of cleaner canonical variants (with-kids → family, romantic → couples, honeymoon → couples + luxury cascade, etc.). The canonical stays indexable; the variant is noindex but still usable if someone lands there.
Unknown-slug soft-404 mitigation — all seven dynamic route types (destinations, modifiers, tools, verticals, hubs) emit noindex, follow from generateMetadata when the requested entity is absent, rather than relying on Next.js’s default 200 response. Google eventually reclassifies these as soft 404s without us needing to emit hard 404 status codes.

Why bother? Google’s Helpful Content Update penalises sitewide when a domain has a high ratio of thin pages — not just the thin pages themselves. Burning the deindex budget protects our stronger pages from being tarred.

Where we’re less confident — and how we handle it

Honesty where it matters: we have known weak spots and we tell you rather than hide them.

Thin OSM cities. Prague, Valencia, Tallinn, and Porto render accessibility pages with fewer venues (and lower page weight) than Barcelona or Kraków because OSM contributor density is lower there. We don’t pad — we show what’s actually tagged and let readers cross-check.
Currency drift in visa fees. Costs are quoted in USD for comparability. Local-currency amounts can drift 10-20% over quarterly review windows. When in doubt, the official source URL is one click away.
Weather data latency. We use Open-Meteo for live weather but the cache is 1 hour. In rapidly-changing conditions (tropical storms, heat waves) the cache might lag reality. We’d rather have cached weather than rate-limit the site to a tiny user base.
Bot-detected government URLs. A few ministry pages (Japan MOFA, some Korean embassy portals) block datacenter-IP HEAD requests even from browser-like clients. Our URL audit scripts flag these as false-positive dead links; the URLs work for real browsers.

Corrections & feedback

Every layer has a correction channel:

Visa: email [email protected] with the official government URL showing the correct rule. Weekly review, quick fix.
Accessibility / dietary venues: in-page “suggest a venue” form. Moderated, then written back to OpenStreetMap.
Itinerary / tool content: same email — we can regenerate specific Gemini pages on demand once the underlying prompt is improved.

We publish this methodology page so the commitments above are a public record, not a marketing claim. If we drift from them, we want you to call it out.

How every page gets sourced