🌏 中文版
I used my own articles on RAG, Agents, Context Engineering, and Harness Engineering to audit the blog itself — and found that I wrote “Intelligence without infrastructure is just a demo,” yet the platform was missing exactly that infrastructure. This post compiles a complete, priority-ordered action list across five dimensions: content quality, site technology, RAG design fixes, harness infrastructure, and AI agent applications.
Current Snapshot
- Content: 227 posts (AI 121, Tech 98, Product 7, Education 1)
- Broken links: 49 internal broken links (36% of all internal links)
- Missing fields: 213 posts missing
type(94%), 199 missingseries(88%), 7 missingtldr - Tag inconsistency:
ai-agentvsai-agentsacross 35 posts - Drafts: 17 (mostly Claude Code deep-dive skeletons)
- Infrastructure: Vectorize, D1, and Workers AI are all bound but unused — Embedding Pipeline, Chat API, and Agent nodes are entirely unimplemented
- Harness: No root-level CLAUDE.md, no progress.txt, no pre-commit hook, no Post Evaluator
Priority Overview
The entire plan is organized by “fix cost vs. impact scope” into four levels:
- P0 (Immediate): Low cost, high impact — users are currently seeing broken states
- P1 (Short-term): Medium cost — addresses systemic risks or long-term technical debt
- P2 (Mid-term): Higher cost but improves user experience or developer efficiency
- P3 (Long-term): Experimental or depends on upstream prerequisites
P0 — Immediate (Low Cost, High Impact)
Content Fixes
Fix 49 broken internal links
- Worst offender:
ai/2026-04-01-agent-cli-guidelines.mdhas 10 broken links - Interim fix: replace with plain text labeled “coming soon” to avoid sending readers to 404s
Standardize tag naming
ai-agent(21 posts) +ai-agents(14 posts) → unify toai-agent- A single
sedbatch can handle this
RAG design parameter fixes (just change numbers)
semantic_cache_threshold:0.92→0.95(mysemantic-caching.mdarticle explicitly notes that 0.90–0.94 means “related but different”)chunk_size: switch to token-based calculation or drop to 1500 chars (in Chinese, 2000 chars ≈ 800–1000 tokens, exceeding the recommended range)- Add
reranker_min_keep: 3(explicitly recommended as a safety net in mycross-encoder-reranking.md)
Create a root-level CLAUDE.md
- Core principle from my Harness articles: “Repository as Single Source of Truth”
- Contents: tech stack, directory structure, dev workflow, naming conventions, decision rationale
- Without this file, every new agent session has to rediscover the entire project from scratch
Site Technology
Create a custom 404 page
- Currently no custom 404 — users hitting broken links see a raw error screen
- Should include a search box and popular post recommendations
Add check-post-references.mjs to CI
- The script already exists but never runs — which is how 49 broken links shipped to production
- A direct violation of the “Linter is law, prompt is suggestion” principle I wrote about
Pre-commit hook (lint + reference check)
- Use husky or simple-git-hooks
- Block broken links and lint errors from entering the repo
P1 — Short-Term (1–2 Weeks)
Content Improvements
Fill in type fields for 213 posts
- Use a script to batch-infer type from directory path (e.g.,
deep-dive/in the filename) - Or run all markdown through an LLM for automatic classification
- Type fields must be populated before category-page filtering becomes meaningful
Fill in tldr for 7 posts
- Especially important for deep-dive articles
- Can be batch-generated by Claude, then human-reviewed
Make type required in frontmatter
- Update
src/content.config.tsschema - Enforce via schema constraint, not via prompt — echoing the Harness article principle
Site Technology
English search page
/src/pages/search.astrocurrently hardcodeslang="zh-TW"- Create
/src/pages/en/search.astroor add dynamic locale detection
Add an “about” section to the English homepage
- The Chinese homepage has a full site-philosophy intro; the English version doesn’t
- English readers have no idea what “Quid Pro Quo” means
Font loading optimization (LCP impact)
- Add
<link rel="preload" as="font"> - Set
font-display: swap @fontsource/noto-sans-tcis installed but not loading correctly
Image sizing and lazy loading (CLS impact)
- Enable
astro:assetsimage optimization - Add
loading="lazy"with explicit width/height to all markdown images
Basic accessibility
- Add a “Skip to main content” link (Skip Navigation)
- Add
:focus-visiblekeyboard navigation styles - Verify whether
--text-muted: #999meets WCAG AA contrast requirements
Harness Infrastructure
Establish a progress.txt mechanism
- The design I praised most in
anthropic-harness-design.md: “the lowest-cost episodic memory implementation — no vector database needed, just a text file” - Not using it myself is the most ironic gap in this whole audit
Session-start hook
- Auto-runs
pnpm lintand readsprogress.txt - Mirrors the “startup ritual” design from Anthropic’s harness guidance
Add an Evaluator node to the Post skill
- Core point from my articles: “An agent that acts as both athlete and judge will tend to grade itself leniently”
- The Post skill currently only has a Generator — no independent Evaluator
- Evaluator should check: frontmatter completeness, internal link validity, tag consistency, heading structure
RAG Design Fixes
Deterministic Validation Node (Stripe Blueprint pattern)
- Drawn from my
internal-ai-coding-agents.mdon Stripe Minions’ core architecture - Insert deterministic validation between Writer and Critic: Markdown syntax, source URL existence, Mermaid syntax
- Don’t rely on AI to always get it right — use deterministic checkpoints to catch errors
Tool description quality standards
search_blog_postsvssearch_abstract_indexvssearch_docs— usage boundaries are unclear- Each tool needs: when to use it, when not to use it, expected return format
search_abstract_indexshould become an internal strategy of the Research node, not exposed as a standalone tool
Critic fallback / degradation strategy
- Stripe’s design: “If the LLM can’t fix it in two tries, flag for human review — a third attempt just burns tokens”
- If 2 retries still fall below the threshold → annotate the response with “⚠️ This answer may be incomplete; reading the source articles directly is recommended”
- Do not call the LLM again
Rewrite prompts in “describe the end state” style
- Spotify found that overly prescriptive step-by-step instructions cause agents to get stuck on complex tasks
- Agent prompts should describe “what a successful answer looks like” rather than “follow these steps”
P2 — Mid-Term (1–2 Months)
AI Agent Feature Implementation
Embedding Pipeline + semantic search
- Embed 197 posts into Vectorize (already bound, never used)
- Implement Hybrid Search: Vectorize semantic + D1 FTS5 BM25 + RRF fusion
- Add BGE-Reranker for reranking
- This is the foundation for all subsequent AI features
AI-powered related post recommendations
- Replace the current pure tag-matching in
relatedPosts.ts - Weighted scoring: 40% tag overlap + 30% category + 20% recency + 10% same series
- Single-tag posts get a fallback: fill with category matches
Automatic TL;DR and description generation
- Applies the “Compress” strategy from Context Engineering
- Solves the 7-post tldr gap and adds three-level summaries for long articles
Conversational blog assistant (RAG Chat Phase 1)
- LangGraph pipeline: Planner → Research → Normalize → Writer → Critic → Related Posts
- SSE streaming responses
- Visitor IP rate limit: 5/day; no limit for the site owner
- But worth re-evaluating whether LangGraph is actually needed — my
langgraph-agent-orchestration.mdalso warns: “If all you need is simple retries, LangGraph is overkill”
RAG Design Enhancements
MMR diversity reranking
- Described in detail in my
mmr-diversity-reranking.md, λ = 0.7 - Insert after the reranker, before the Writer
- Prevents multiple retrieved posts from covering the exact same ground
Adaptive RAG queryType routing
- Based on the 6-type classification in my
query-classification-adaptive-routing.md - Planner outputs
complexity: 'simple' | 'medium' | 'complex' - Simple queries skip HyDE/Multi-query; general-knowledge queries skip retrieval entirely
CRAG progressive filter relaxation
- Core strategy from my
corrective-rag-crag.md: when zero results are returned, progressively relax secondary filters while keeping core filters - Order: relax filters and retry → still low-scoring → then fall back to web search
Add answer-relevance check to Critic
- Check not just grounding (claims have sources) but also answer relevance (does it actually answer the question?)
- Corresponds to the Answer Relevancy metric in RAGAS
Add drift detection to Critic
- Core insight from my
phil-schmid-agent-harness.md - Detect whether the Research phase has drifted from the original query intent
- Not just verifying grounding
Harness Infrastructure
Architectural Decision Records (ADRs)
- Why BGE-large? Why chunk at 2000? Why cache at 0.95?
- Applies the Agent-Readable Code principle: make tacit knowledge explicit
Feature flags for every RAG pipeline technique
- Applies the Bitter Lesson: allow any “smart” component to be turned off at any time
- HyDE, Multi-query, Reranker, Critic should all be individually toggleable
Shadow mode A/B comparison mechanism
- Originally planned for Phase 3 — worth moving earlier
- Compare RAGAS scores with a technique on vs. off
- Otherwise there’s no way to tell which “optimizations” actually do anything
Context Checkpoint system
- Applies the Context Durability concept
- Dynamic compression threshold:
threshold = model_context_window * 0.7(reserving 30% for generation) - Rather than hardcoding 8000 tokens
Site Technology
RSS Feed author information
- Add
<author>tags - Improves how feeds render in external readers
Series organization
- 88% of posts have no
seriesfield - Formally organize the RAG series, Claude Code series, and AI Agent series
Multilingual translation pipeline (multi-agent)
- Translator → Cultural Reviewer → Native Checker
- For rapidly expanding English-language content
P3 — Long-Term (3+ Months)
Advanced AI Features
Site-owner episodic memory
- The memory type emphasized in my
ai-agents-context-cognition-action.md - Reference the
user profile dialecticpattern from Hermes Agent - Remembers writing preferences and commonly used templates
Judge sampling at 30%
- Recommendation from my
rag-cost-optimization.md - Skip the Critic for simple queries; only run it for complex ones
- Expected to save 20–30% in cost
BM25 short-circuit logic
- When BM25 returns ≥ 5 results, skip the vector search
- Especially effective for exact-noun queries (e.g., “What is LangGraph?”)
RAGAS evaluation pipeline + Golden Dataset
- 50–100 ground-truth test cases
- Faithfulness, Answer Relevance, Context Precision, Context Recall
- Continuously track how each change affects quality
GraphRAG (entity relationship graph)
- Extract entities and relationships from posts
- Suited for cross-post queries (e.g., “all tools related to Claude Code”)
Custom document upload
- Three input sources: PDF / Markdown / URL
- Limited value for visitors; primarily useful for the site owner
Complete Fix Checklist
| # | Item | Priority | Source Article / Design Doc |
|---|---|---|---|
| 1 | Fix 49 broken links | P0 | Content audit |
| 2 | Unify tag to ai-agent | P0 | Content audit |
| 3 | Cache threshold 0.92 → 0.95 | P0 | semantic-caching.md |
| 4 | Chunk size → token-based | P0 | chunking-strategies.md |
| 5 | Add reranker_min_keep: 3 | P0 | cross-encoder-reranking.md |
| 6 | Create root CLAUDE.md | P0 | harness-engineering-evolution.md |
| 7 | Create 404 page | P0 | Site audit |
| 8 | Add check-post-references to CI | P0 | Harness principles |
| 9 | Pre-commit hook | P0 | Harness principles |
| 10 | Fill type for 213 posts | P1 | Content audit |
| 11 | Fill tldr for 7 posts | P1 | Content audit |
| 12 | Make type required | P1 | Harness principles |
| 13 | English search page | P1 | Site audit |
| 14 | English homepage about section | P1 | Site audit |
| 15 | Font loading optimization | P1 | Core Web Vitals |
| 16 | Image sizing + lazy loading | P1 | Core Web Vitals |
| 17 | Basic accessibility | P1 | WCAG |
| 18 | Create progress.txt | P1 | anthropic-harness-design.md |
| 19 | Session-start hook | P1 | anthropic-harness-design.md |
| 20 | Post skill Evaluator | P1 | google-multi-agent-patterns.md |
| 21 | Deterministic Validation Node | P1 | internal-ai-coding-agents.md |
| 22 | Tool description quality standards | P1 | context-engineering-guide.md |
| 23 | Critic degradation strategy | P1 | internal-ai-coding-agents.md |
| 24 | Rewrite prompts as end-state descriptions | P1 | internal-ai-coding-agents.md |
| 25 | Embedding Pipeline | P2 | RAG design |
| 26 | AI related post recommendations | P2 | context-engineering-guide.md |
| 27 | Auto TL;DR generation | P2 | context-engineering-guide.md |
| 28 | RAG Chat Phase 1 | P2 | RAG design |
| 29 | MMR diversity reranking | P2 | mmr-diversity-reranking.md |
| 30 | Adaptive RAG routing | P2 | query-classification-adaptive-routing.md |
| 31 | CRAG filter relaxation | P2 | corrective-rag-crag.md |
| 32 | Critic answer-relevance check | P2 | rag-evaluation-frameworks.md |
| 33 | Critic drift detection | P2 | phil-schmid-agent-harness.md |
| 34 | Architectural Decision Records | P2 | harness-engineering-evolution.md |
| 35 | RAG feature flags | P2 | phil-schmid-agent-harness.md |
| 36 | Shadow A/B comparison | P2 | phil-schmid-agent-harness.md |
| 37 | Context Checkpoint system | P2 | phil-schmid-agent-harness.md |
| 38 | RSS author info | P2 | Site audit |
| 39 | Series organization | P2 | Content audit |
| 40 | Translation pipeline | P2 | google-multi-agent-patterns.md |
| 41 | Site-owner episodic memory | P3 | ai-agents-context-cognition-action.md |
| 42 | Judge sampling 30% | P3 | rag-cost-optimization.md |
| 43 | BM25 short-circuit | P3 | rag-cost-optimization.md |
| 44 | RAGAS evaluation pipeline | P3 | rag-evaluation-frameworks.md |
| 45 | GraphRAG | P3 | RAG design |
| 46 | Custom document upload | P3 | RAG design |
The Bigger Picture
The core logic of this roadmap isn’t a waterfall “Phase 1 / 2 / 3” — it’s fix the handle before building the house:
- P0 is the handle: Users are already seeing broken states (404s, broken links, incorrect cache results). Not fixing these means the brand keeps taking damage.
- P1 is the foundation: Harness infrastructure, enforced content schema, basic accessibility — without these, every future feature will have something to trip over.
- P2 is the house: AI features, RAG chat, advanced retrieval — these belong on a solid foundation.
- P3 is the decoration: Experimental, long-term bets — can be adjusted as models improve.
The most ironic finding: I wrote 30+ posts teaching people how to build RAG agents, yet what the blog itself is missing isn’t AI features — it’s the infrastructure I keep emphasizing in those very posts: CLAUDE.md, progress.txt, pre-commit hooks, Evaluator, Linter as Law.
“Intelligence without infrastructure is just a demo.” — Phil Schmid’s words, turned back on myself.
References
- Multi-Agent RAG Patterns
- Context Engineering Guide
- LangGraph Agent Orchestration
- AI Agents: Context, Cognition, Action
- Anthropic Harness Design
- Phil Schmid: Agent Harness
- Harness Engineering Evolution
- Google Multi-Agent Patterns
- Internal AI Coding Agents (Stripe / Spotify / Coinbase)
- Chunking Strategies
- BGE-M3 Embedding Model Selection
- Hybrid Search BM25 Vector RRF
- Cross-Encoder Reranking
- MMR Diversity Reranking
- Semantic Caching
- Corrective RAG (CRAG)
- Query Classification & Adaptive Routing
- RAG Cost Optimization
- RAG Evaluation Frameworks
Loading...