Skip to content

quidproquo Blog Improvement Roadmap: Content, Technical Debt, RAG Design, and Harness Infrastructure

Apr 18, 2026 1 min
TL;DR Using my own 30+ RAG/Agent posts to audit the blog itself, I identified a prioritized improvement list spanning content quality, site tech, RAG design fixes, harness infrastructure, and AI agent applications — no phases, just priorities.

🌏 中文版

I used my own articles on RAG, Agents, Context Engineering, and Harness Engineering to audit the blog itself — and found that I wrote “Intelligence without infrastructure is just a demo,” yet the platform was missing exactly that infrastructure. This post compiles a complete, priority-ordered action list across five dimensions: content quality, site technology, RAG design fixes, harness infrastructure, and AI agent applications.

Current Snapshot

  • Content: 227 posts (AI 121, Tech 98, Product 7, Education 1)
  • Broken links: 49 internal broken links (36% of all internal links)
  • Missing fields: 213 posts missing type (94%), 199 missing series (88%), 7 missing tldr
  • Tag inconsistency: ai-agent vs ai-agents across 35 posts
  • Drafts: 17 (mostly Claude Code deep-dive skeletons)
  • Infrastructure: Vectorize, D1, and Workers AI are all bound but unused — Embedding Pipeline, Chat API, and Agent nodes are entirely unimplemented
  • Harness: No root-level CLAUDE.md, no progress.txt, no pre-commit hook, no Post Evaluator

Priority Overview

The entire plan is organized by “fix cost vs. impact scope” into four levels:

  • P0 (Immediate): Low cost, high impact — users are currently seeing broken states
  • P1 (Short-term): Medium cost — addresses systemic risks or long-term technical debt
  • P2 (Mid-term): Higher cost but improves user experience or developer efficiency
  • P3 (Long-term): Experimental or depends on upstream prerequisites

P0 — Immediate (Low Cost, High Impact)

Content Fixes

Fix 49 broken internal links

  • Worst offender: ai/2026-04-01-agent-cli-guidelines.md has 10 broken links
  • Interim fix: replace with plain text labeled “coming soon” to avoid sending readers to 404s

Standardize tag naming

  • ai-agent (21 posts) + ai-agents (14 posts) → unify to ai-agent
  • A single sed batch can handle this

RAG design parameter fixes (just change numbers)

  • semantic_cache_threshold: 0.920.95 (my semantic-caching.md article explicitly notes that 0.90–0.94 means “related but different”)
  • chunk_size: switch to token-based calculation or drop to 1500 chars (in Chinese, 2000 chars ≈ 800–1000 tokens, exceeding the recommended range)
  • Add reranker_min_keep: 3 (explicitly recommended as a safety net in my cross-encoder-reranking.md)

Create a root-level CLAUDE.md

  • Core principle from my Harness articles: “Repository as Single Source of Truth”
  • Contents: tech stack, directory structure, dev workflow, naming conventions, decision rationale
  • Without this file, every new agent session has to rediscover the entire project from scratch

Site Technology

Create a custom 404 page

  • Currently no custom 404 — users hitting broken links see a raw error screen
  • Should include a search box and popular post recommendations

Add check-post-references.mjs to CI

  • The script already exists but never runs — which is how 49 broken links shipped to production
  • A direct violation of the “Linter is law, prompt is suggestion” principle I wrote about

Pre-commit hook (lint + reference check)

  • Use husky or simple-git-hooks
  • Block broken links and lint errors from entering the repo

P1 — Short-Term (1–2 Weeks)

Content Improvements

Fill in type fields for 213 posts

  • Use a script to batch-infer type from directory path (e.g., deep-dive/ in the filename)
  • Or run all markdown through an LLM for automatic classification
  • Type fields must be populated before category-page filtering becomes meaningful

Fill in tldr for 7 posts

  • Especially important for deep-dive articles
  • Can be batch-generated by Claude, then human-reviewed

Make type required in frontmatter

  • Update src/content.config.ts schema
  • Enforce via schema constraint, not via prompt — echoing the Harness article principle

Site Technology

English search page

  • /src/pages/search.astro currently hardcodes lang="zh-TW"
  • Create /src/pages/en/search.astro or add dynamic locale detection

Add an “about” section to the English homepage

  • The Chinese homepage has a full site-philosophy intro; the English version doesn’t
  • English readers have no idea what “Quid Pro Quo” means

Font loading optimization (LCP impact)

  • Add <link rel="preload" as="font">
  • Set font-display: swap
  • @fontsource/noto-sans-tc is installed but not loading correctly

Image sizing and lazy loading (CLS impact)

  • Enable astro:assets image optimization
  • Add loading="lazy" with explicit width/height to all markdown images

Basic accessibility

  • Add a “Skip to main content” link (Skip Navigation)
  • Add :focus-visible keyboard navigation styles
  • Verify whether --text-muted: #999 meets WCAG AA contrast requirements

Harness Infrastructure

Establish a progress.txt mechanism

  • The design I praised most in anthropic-harness-design.md: “the lowest-cost episodic memory implementation — no vector database needed, just a text file”
  • Not using it myself is the most ironic gap in this whole audit

Session-start hook

  • Auto-runs pnpm lint and reads progress.txt
  • Mirrors the “startup ritual” design from Anthropic’s harness guidance

Add an Evaluator node to the Post skill

  • Core point from my articles: “An agent that acts as both athlete and judge will tend to grade itself leniently”
  • The Post skill currently only has a Generator — no independent Evaluator
  • Evaluator should check: frontmatter completeness, internal link validity, tag consistency, heading structure

RAG Design Fixes

Deterministic Validation Node (Stripe Blueprint pattern)

  • Drawn from my internal-ai-coding-agents.md on Stripe Minions’ core architecture
  • Insert deterministic validation between Writer and Critic: Markdown syntax, source URL existence, Mermaid syntax
  • Don’t rely on AI to always get it right — use deterministic checkpoints to catch errors

Tool description quality standards

  • search_blog_posts vs search_abstract_index vs search_docs — usage boundaries are unclear
  • Each tool needs: when to use it, when not to use it, expected return format
  • search_abstract_index should become an internal strategy of the Research node, not exposed as a standalone tool

Critic fallback / degradation strategy

  • Stripe’s design: “If the LLM can’t fix it in two tries, flag for human review — a third attempt just burns tokens”
  • If 2 retries still fall below the threshold → annotate the response with “⚠️ This answer may be incomplete; reading the source articles directly is recommended”
  • Do not call the LLM again

Rewrite prompts in “describe the end state” style

  • Spotify found that overly prescriptive step-by-step instructions cause agents to get stuck on complex tasks
  • Agent prompts should describe “what a successful answer looks like” rather than “follow these steps”

P2 — Mid-Term (1–2 Months)

AI Agent Feature Implementation

Embedding Pipeline + semantic search

  • Embed 197 posts into Vectorize (already bound, never used)
  • Implement Hybrid Search: Vectorize semantic + D1 FTS5 BM25 + RRF fusion
  • Add BGE-Reranker for reranking
  • This is the foundation for all subsequent AI features

AI-powered related post recommendations

  • Replace the current pure tag-matching in relatedPosts.ts
  • Weighted scoring: 40% tag overlap + 30% category + 20% recency + 10% same series
  • Single-tag posts get a fallback: fill with category matches

Automatic TL;DR and description generation

  • Applies the “Compress” strategy from Context Engineering
  • Solves the 7-post tldr gap and adds three-level summaries for long articles

Conversational blog assistant (RAG Chat Phase 1)

  • LangGraph pipeline: Planner → Research → Normalize → Writer → Critic → Related Posts
  • SSE streaming responses
  • Visitor IP rate limit: 5/day; no limit for the site owner
  • But worth re-evaluating whether LangGraph is actually needed — my langgraph-agent-orchestration.md also warns: “If all you need is simple retries, LangGraph is overkill”

RAG Design Enhancements

MMR diversity reranking

  • Described in detail in my mmr-diversity-reranking.md, λ = 0.7
  • Insert after the reranker, before the Writer
  • Prevents multiple retrieved posts from covering the exact same ground

Adaptive RAG queryType routing

  • Based on the 6-type classification in my query-classification-adaptive-routing.md
  • Planner outputs complexity: 'simple' | 'medium' | 'complex'
  • Simple queries skip HyDE/Multi-query; general-knowledge queries skip retrieval entirely

CRAG progressive filter relaxation

  • Core strategy from my corrective-rag-crag.md: when zero results are returned, progressively relax secondary filters while keeping core filters
  • Order: relax filters and retry → still low-scoring → then fall back to web search

Add answer-relevance check to Critic

  • Check not just grounding (claims have sources) but also answer relevance (does it actually answer the question?)
  • Corresponds to the Answer Relevancy metric in RAGAS

Add drift detection to Critic

  • Core insight from my phil-schmid-agent-harness.md
  • Detect whether the Research phase has drifted from the original query intent
  • Not just verifying grounding

Harness Infrastructure

Architectural Decision Records (ADRs)

  • Why BGE-large? Why chunk at 2000? Why cache at 0.95?
  • Applies the Agent-Readable Code principle: make tacit knowledge explicit

Feature flags for every RAG pipeline technique

  • Applies the Bitter Lesson: allow any “smart” component to be turned off at any time
  • HyDE, Multi-query, Reranker, Critic should all be individually toggleable

Shadow mode A/B comparison mechanism

  • Originally planned for Phase 3 — worth moving earlier
  • Compare RAGAS scores with a technique on vs. off
  • Otherwise there’s no way to tell which “optimizations” actually do anything

Context Checkpoint system

  • Applies the Context Durability concept
  • Dynamic compression threshold: threshold = model_context_window * 0.7 (reserving 30% for generation)
  • Rather than hardcoding 8000 tokens

Site Technology

RSS Feed author information

  • Add <author> tags
  • Improves how feeds render in external readers

Series organization

  • 88% of posts have no series field
  • Formally organize the RAG series, Claude Code series, and AI Agent series

Multilingual translation pipeline (multi-agent)

  • Translator → Cultural Reviewer → Native Checker
  • For rapidly expanding English-language content

P3 — Long-Term (3+ Months)

Advanced AI Features

Site-owner episodic memory

  • The memory type emphasized in my ai-agents-context-cognition-action.md
  • Reference the user profile dialectic pattern from Hermes Agent
  • Remembers writing preferences and commonly used templates

Judge sampling at 30%

  • Recommendation from my rag-cost-optimization.md
  • Skip the Critic for simple queries; only run it for complex ones
  • Expected to save 20–30% in cost

BM25 short-circuit logic

  • When BM25 returns ≥ 5 results, skip the vector search
  • Especially effective for exact-noun queries (e.g., “What is LangGraph?”)

RAGAS evaluation pipeline + Golden Dataset

  • 50–100 ground-truth test cases
  • Faithfulness, Answer Relevance, Context Precision, Context Recall
  • Continuously track how each change affects quality

GraphRAG (entity relationship graph)

  • Extract entities and relationships from posts
  • Suited for cross-post queries (e.g., “all tools related to Claude Code”)

Custom document upload

  • Three input sources: PDF / Markdown / URL
  • Limited value for visitors; primarily useful for the site owner

Complete Fix Checklist

#ItemPrioritySource Article / Design Doc
1Fix 49 broken linksP0Content audit
2Unify tag to ai-agentP0Content audit
3Cache threshold 0.92 → 0.95P0semantic-caching.md
4Chunk size → token-basedP0chunking-strategies.md
5Add reranker_min_keep: 3P0cross-encoder-reranking.md
6Create root CLAUDE.mdP0harness-engineering-evolution.md
7Create 404 pageP0Site audit
8Add check-post-references to CIP0Harness principles
9Pre-commit hookP0Harness principles
10Fill type for 213 postsP1Content audit
11Fill tldr for 7 postsP1Content audit
12Make type requiredP1Harness principles
13English search pageP1Site audit
14English homepage about sectionP1Site audit
15Font loading optimizationP1Core Web Vitals
16Image sizing + lazy loadingP1Core Web Vitals
17Basic accessibilityP1WCAG
18Create progress.txtP1anthropic-harness-design.md
19Session-start hookP1anthropic-harness-design.md
20Post skill EvaluatorP1google-multi-agent-patterns.md
21Deterministic Validation NodeP1internal-ai-coding-agents.md
22Tool description quality standardsP1context-engineering-guide.md
23Critic degradation strategyP1internal-ai-coding-agents.md
24Rewrite prompts as end-state descriptionsP1internal-ai-coding-agents.md
25Embedding PipelineP2RAG design
26AI related post recommendationsP2context-engineering-guide.md
27Auto TL;DR generationP2context-engineering-guide.md
28RAG Chat Phase 1P2RAG design
29MMR diversity rerankingP2mmr-diversity-reranking.md
30Adaptive RAG routingP2query-classification-adaptive-routing.md
31CRAG filter relaxationP2corrective-rag-crag.md
32Critic answer-relevance checkP2rag-evaluation-frameworks.md
33Critic drift detectionP2phil-schmid-agent-harness.md
34Architectural Decision RecordsP2harness-engineering-evolution.md
35RAG feature flagsP2phil-schmid-agent-harness.md
36Shadow A/B comparisonP2phil-schmid-agent-harness.md
37Context Checkpoint systemP2phil-schmid-agent-harness.md
38RSS author infoP2Site audit
39Series organizationP2Content audit
40Translation pipelineP2google-multi-agent-patterns.md
41Site-owner episodic memoryP3ai-agents-context-cognition-action.md
42Judge sampling 30%P3rag-cost-optimization.md
43BM25 short-circuitP3rag-cost-optimization.md
44RAGAS evaluation pipelineP3rag-evaluation-frameworks.md
45GraphRAGP3RAG design
46Custom document uploadP3RAG design

The Bigger Picture

The core logic of this roadmap isn’t a waterfall “Phase 1 / 2 / 3” — it’s fix the handle before building the house:

  • P0 is the handle: Users are already seeing broken states (404s, broken links, incorrect cache results). Not fixing these means the brand keeps taking damage.
  • P1 is the foundation: Harness infrastructure, enforced content schema, basic accessibility — without these, every future feature will have something to trip over.
  • P2 is the house: AI features, RAG chat, advanced retrieval — these belong on a solid foundation.
  • P3 is the decoration: Experimental, long-term bets — can be adjusted as models improve.

The most ironic finding: I wrote 30+ posts teaching people how to build RAG agents, yet what the blog itself is missing isn’t AI features — it’s the infrastructure I keep emphasizing in those very posts: CLAUDE.md, progress.txt, pre-commit hooks, Evaluator, Linter as Law.

“Intelligence without infrastructure is just a demo.” — Phil Schmid’s words, turned back on myself.

References