quidproquo Blog Improvement Roadmap: Content, Technical Debt, RAG Design, and Harness Infrastructure

TL;DR Using my own 30+ RAG/Agent posts to audit the blog itself, I identified a prioritized improvement list spanning content quality, site tech, RAG design fixes, harness infrastructure, and AI agent applications — no phases, just priorities.

#quidproquo #rag #ai-agent #harness-engineering #context-engineering #blog #product-design

🌏 中文版

I used my own articles on RAG, Agents, Context Engineering, and Harness Engineering to audit the blog itself — and found that I wrote “Intelligence without infrastructure is just a demo,” yet the platform was missing exactly that infrastructure. This post compiles a complete, priority-ordered action list across five dimensions: content quality, site technology, RAG design fixes, harness infrastructure, and AI agent applications.

Current Snapshot

Content: 227 posts (AI 121, Tech 98, Product 7, Education 1)
Broken links: 49 internal broken links (36% of all internal links)
Missing fields: 213 posts missing type (94%), 199 missing series (88%), 7 missing tldr
Tag inconsistency: ai-agent vs ai-agents across 35 posts
Drafts: 17 (mostly Claude Code deep-dive skeletons)
Infrastructure: Vectorize, D1, and Workers AI are all bound but unused — Embedding Pipeline, Chat API, and Agent nodes are entirely unimplemented
Harness: No root-level CLAUDE.md, no progress.txt, no pre-commit hook, no Post Evaluator

Priority Overview

The entire plan is organized by “fix cost vs. impact scope” into four levels:

P0 (Immediate): Low cost, high impact — users are currently seeing broken states
P1 (Short-term): Medium cost — addresses systemic risks or long-term technical debt
P2 (Mid-term): Higher cost but improves user experience or developer efficiency
P3 (Long-term): Experimental or depends on upstream prerequisites

P0 — Immediate (Low Cost, High Impact)

Content Fixes

Fix 49 broken internal links

Worst offender: ai/2026-04-01-agent-cli-guidelines.md has 10 broken links
Interim fix: replace with plain text labeled “coming soon” to avoid sending readers to 404s

Standardize tag naming

ai-agent (21 posts) + ai-agents (14 posts) → unify to ai-agent
A single sed batch can handle this

RAG design parameter fixes (just change numbers)

semantic_cache_threshold: 0.92 → 0.95 (my semantic-caching.md article explicitly notes that 0.90–0.94 means “related but different”)
chunk_size: switch to token-based calculation or drop to 1500 chars (in Chinese, 2000 chars ≈ 800–1000 tokens, exceeding the recommended range)
Add reranker_min_keep: 3 (explicitly recommended as a safety net in my cross-encoder-reranking.md)

Create a root-level CLAUDE.md

Core principle from my Harness articles: “Repository as Single Source of Truth”
Contents: tech stack, directory structure, dev workflow, naming conventions, decision rationale
Without this file, every new agent session has to rediscover the entire project from scratch

Site Technology

Create a custom 404 page

Currently no custom 404 — users hitting broken links see a raw error screen
Should include a search box and popular post recommendations

Add check-post-references.mjs to CI

The script already exists but never runs — which is how 49 broken links shipped to production
A direct violation of the “Linter is law, prompt is suggestion” principle I wrote about

Pre-commit hook (lint + reference check)

Use husky or simple-git-hooks
Block broken links and lint errors from entering the repo

P1 — Short-Term (1–2 Weeks)

Content Improvements

Fill in type fields for 213 posts

Use a script to batch-infer type from directory path (e.g., deep-dive/ in the filename)
Or run all markdown through an LLM for automatic classification
Type fields must be populated before category-page filtering becomes meaningful

Fill in tldr for 7 posts

Especially important for deep-dive articles
Can be batch-generated by Claude, then human-reviewed

Make type required in frontmatter

Update src/content.config.ts schema
Enforce via schema constraint, not via prompt — echoing the Harness article principle

Site Technology

English search page

/src/pages/search.astro currently hardcodes lang="zh-TW"
Create /src/pages/en/search.astro or add dynamic locale detection

Add an “about” section to the English homepage

The Chinese homepage has a full site-philosophy intro; the English version doesn’t
English readers have no idea what “Quid Pro Quo” means

Font loading optimization (LCP impact)

Add <link rel="preload" as="font">
Set font-display: swap
@fontsource/noto-sans-tc is installed but not loading correctly

Image sizing and lazy loading (CLS impact)

Enable astro:assets image optimization
Add loading="lazy" with explicit width/height to all markdown images

Basic accessibility

Add a “Skip to main content” link (Skip Navigation)
Add :focus-visible keyboard navigation styles
Verify whether --text-muted: #999 meets WCAG AA contrast requirements

Harness Infrastructure

Establish a progress.txt mechanism

The design I praised most in anthropic-harness-design.md: “the lowest-cost episodic memory implementation — no vector database needed, just a text file”
Not using it myself is the most ironic gap in this whole audit

Session-start hook

Auto-runs pnpm lint and reads progress.txt
Mirrors the “startup ritual” design from Anthropic’s harness guidance

Add an Evaluator node to the Post skill

Core point from my articles: “An agent that acts as both athlete and judge will tend to grade itself leniently”
The Post skill currently only has a Generator — no independent Evaluator
Evaluator should check: frontmatter completeness, internal link validity, tag consistency, heading structure

RAG Design Fixes

Deterministic Validation Node (Stripe Blueprint pattern)

Drawn from my internal-ai-coding-agents.md on Stripe Minions’ core architecture
Insert deterministic validation between Writer and Critic: Markdown syntax, source URL existence, Mermaid syntax
Don’t rely on AI to always get it right — use deterministic checkpoints to catch errors

Tool description quality standards

search_blog_posts vs search_abstract_index vs search_docs — usage boundaries are unclear
Each tool needs: when to use it, when not to use it, expected return format
search_abstract_index should become an internal strategy of the Research node, not exposed as a standalone tool

Critic fallback / degradation strategy

Stripe’s design: “If the LLM can’t fix it in two tries, flag for human review — a third attempt just burns tokens”
If 2 retries still fall below the threshold → annotate the response with “⚠️ This answer may be incomplete; reading the source articles directly is recommended”
Do not call the LLM again

Rewrite prompts in “describe the end state” style

Spotify found that overly prescriptive step-by-step instructions cause agents to get stuck on complex tasks
Agent prompts should describe “what a successful answer looks like” rather than “follow these steps”

P2 — Mid-Term (1–2 Months)

AI Agent Feature Implementation

Embedding Pipeline + semantic search

Embed 197 posts into Vectorize (already bound, never used)
Implement Hybrid Search: Vectorize semantic + D1 FTS5 BM25 + RRF fusion
Add BGE-Reranker for reranking
This is the foundation for all subsequent AI features

AI-powered related post recommendations

Replace the current pure tag-matching in relatedPosts.ts
Weighted scoring: 40% tag overlap + 30% category + 20% recency + 10% same series
Single-tag posts get a fallback: fill with category matches

Automatic TL;DR and description generation

Applies the “Compress” strategy from Context Engineering
Solves the 7-post tldr gap and adds three-level summaries for long articles

Conversational blog assistant (RAG Chat Phase 1)

LangGraph pipeline: Planner → Research → Normalize → Writer → Critic → Related Posts
SSE streaming responses
Visitor IP rate limit: 5/day; no limit for the site owner
But worth re-evaluating whether LangGraph is actually needed — my langgraph-agent-orchestration.md also warns: “If all you need is simple retries, LangGraph is overkill”

RAG Design Enhancements

MMR diversity reranking

Described in detail in my mmr-diversity-reranking.md, λ = 0.7
Insert after the reranker, before the Writer
Prevents multiple retrieved posts from covering the exact same ground

Adaptive RAG queryType routing

Based on the 6-type classification in my query-classification-adaptive-routing.md
Planner outputs complexity: 'simple' | 'medium' | 'complex'
Simple queries skip HyDE/Multi-query; general-knowledge queries skip retrieval entirely

CRAG progressive filter relaxation

Core strategy from my corrective-rag-crag.md: when zero results are returned, progressively relax secondary filters while keeping core filters
Order: relax filters and retry → still low-scoring → then fall back to web search

Add answer-relevance check to Critic

Check not just grounding (claims have sources) but also answer relevance (does it actually answer the question?)
Corresponds to the Answer Relevancy metric in RAGAS

Add drift detection to Critic

Core insight from my phil-schmid-agent-harness.md
Detect whether the Research phase has drifted from the original query intent
Not just verifying grounding

Harness Infrastructure

Architectural Decision Records (ADRs)

Why BGE-large? Why chunk at 2000? Why cache at 0.95?
Applies the Agent-Readable Code principle: make tacit knowledge explicit

Feature flags for every RAG pipeline technique

Applies the Bitter Lesson: allow any “smart” component to be turned off at any time
HyDE, Multi-query, Reranker, Critic should all be individually toggleable

Shadow mode A/B comparison mechanism

Originally planned for Phase 3 — worth moving earlier
Compare RAGAS scores with a technique on vs. off
Otherwise there’s no way to tell which “optimizations” actually do anything

Context Checkpoint system

Applies the Context Durability concept
Dynamic compression threshold: threshold = model_context_window * 0.7 (reserving 30% for generation)
Rather than hardcoding 8000 tokens

Site Technology

RSS Feed author information

Add <author> tags
Improves how feeds render in external readers

Series organization

88% of posts have no series field
Formally organize the RAG series, Claude Code series, and AI Agent series

Multilingual translation pipeline (multi-agent)

Translator → Cultural Reviewer → Native Checker
For rapidly expanding English-language content

P3 — Long-Term (3+ Months)

Advanced AI Features

Site-owner episodic memory

The memory type emphasized in my ai-agents-context-cognition-action.md
Reference the user profile dialectic pattern from Hermes Agent
Remembers writing preferences and commonly used templates

Judge sampling at 30%

Recommendation from my rag-cost-optimization.md
Skip the Critic for simple queries; only run it for complex ones
Expected to save 20–30% in cost

BM25 short-circuit logic

When BM25 returns ≥ 5 results, skip the vector search
Especially effective for exact-noun queries (e.g., “What is LangGraph?”)

RAGAS evaluation pipeline + Golden Dataset

50–100 ground-truth test cases
Faithfulness, Answer Relevance, Context Precision, Context Recall
Continuously track how each change affects quality

GraphRAG (entity relationship graph)

Extract entities and relationships from posts
Suited for cross-post queries (e.g., “all tools related to Claude Code”)

Custom document upload

Three input sources: PDF / Markdown / URL
Limited value for visitors; primarily useful for the site owner

Complete Fix Checklist

#	Item	Priority	Source Article / Design Doc
1	Fix 49 broken links	P0	Content audit
2	Unify tag to ai-agent	P0	Content audit
3	Cache threshold 0.92 → 0.95	P0	semantic-caching.md
4	Chunk size → token-based	P0	chunking-strategies.md
5	Add reranker_min_keep: 3	P0	cross-encoder-reranking.md
6	Create root CLAUDE.md	P0	harness-engineering-evolution.md
7	Create 404 page	P0	Site audit
8	Add check-post-references to CI	P0	Harness principles
9	Pre-commit hook	P0	Harness principles
10	Fill type for 213 posts	P1	Content audit
11	Fill tldr for 7 posts	P1	Content audit
12	Make type required	P1	Harness principles
13	English search page	P1	Site audit
14	English homepage about section	P1	Site audit
15	Font loading optimization	P1	Core Web Vitals
16	Image sizing + lazy loading	P1	Core Web Vitals
17	Basic accessibility	P1	WCAG
18	Create progress.txt	P1	anthropic-harness-design.md
19	Session-start hook	P1	anthropic-harness-design.md
20	Post skill Evaluator	P1	google-multi-agent-patterns.md
21	Deterministic Validation Node	P1	internal-ai-coding-agents.md
22	Tool description quality standards	P1	context-engineering-guide.md
23	Critic degradation strategy	P1	internal-ai-coding-agents.md
24	Rewrite prompts as end-state descriptions	P1	internal-ai-coding-agents.md
25	Embedding Pipeline	P2	RAG design
26	AI related post recommendations	P2	context-engineering-guide.md
27	Auto TL;DR generation	P2	context-engineering-guide.md
28	RAG Chat Phase 1	P2	RAG design
29	MMR diversity reranking	P2	mmr-diversity-reranking.md
30	Adaptive RAG routing	P2	query-classification-adaptive-routing.md
31	CRAG filter relaxation	P2	corrective-rag-crag.md
32	Critic answer-relevance check	P2	rag-evaluation-frameworks.md
33	Critic drift detection	P2	phil-schmid-agent-harness.md
34	Architectural Decision Records	P2	harness-engineering-evolution.md
35	RAG feature flags	P2	phil-schmid-agent-harness.md
36	Shadow A/B comparison	P2	phil-schmid-agent-harness.md
37	Context Checkpoint system	P2	phil-schmid-agent-harness.md
38	RSS author info	P2	Site audit
39	Series organization	P2	Content audit
40	Translation pipeline	P2	google-multi-agent-patterns.md
41	Site-owner episodic memory	P3	ai-agents-context-cognition-action.md
42	Judge sampling 30%	P3	rag-cost-optimization.md
43	BM25 short-circuit	P3	rag-cost-optimization.md
44	RAGAS evaluation pipeline	P3	rag-evaluation-frameworks.md
45	GraphRAG	P3	RAG design
46	Custom document upload	P3	RAG design

The Bigger Picture

The core logic of this roadmap isn’t a waterfall “Phase 1 / 2 / 3” — it’s fix the handle before building the house:

P0 is the handle: Users are already seeing broken states (404s, broken links, incorrect cache results). Not fixing these means the brand keeps taking damage.
P1 is the foundation: Harness infrastructure, enforced content schema, basic accessibility — without these, every future feature will have something to trip over.
P2 is the house: AI features, RAG chat, advanced retrieval — these belong on a solid foundation.
P3 is the decoration: Experimental, long-term bets — can be adjusted as models improve.

The most ironic finding: I wrote 30+ posts teaching people how to build RAG agents, yet what the blog itself is missing isn’t AI features — it’s the infrastructure I keep emphasizing in those very posts: CLAUDE.md, progress.txt, pre-commit hooks, Evaluator, Linter as Law.

“Intelligence without infrastructure is just a demo.” — Phil Schmid’s words, turned back on myself.

quidproquo Blog Improvement Roadmap: Content, Technical Debt, RAG Design, and Harness Infrastructure

Current Snapshot

Priority Overview

P0 — Immediate (Low Cost, High Impact)

Content Fixes

Site Technology

P1 — Short-Term (1–2 Weeks)

Content Improvements

Site Technology

Harness Infrastructure

RAG Design Fixes

P2 — Mid-Term (1–2 Months)

AI Agent Feature Implementation

RAG Design Enhancements

Harness Infrastructure

Site Technology

P3 — Long-Term (3+ Months)

Advanced AI Features

Complete Fix Checklist

The Bigger Picture

References

Related · #quidproquo