Transcript Hygiene (Provider Fixups)
This document describes provider-specific fixes applied to transcripts before a run (building model context). These are in-memory adjustments used to satisfy strict provider requirements. These hygiene steps do not rewrite the stored JSONL transcript on disk; however, a separate session-file repair pass may rewrite malformed JSONL files by dropping invalid lines before the session is loaded. When a repair occurs, the original file is backed up alongside the session file. Scope includes:- Tool call id sanitization
- Tool call input validation
- Tool result pairing repair
- Turn validation / ordering
- Thought signature cleanup
- Image payload sanitization
Where this runs
All transcript hygiene is centralized in the embedded runner:- Policy selection:
src/agents/transcript-policy.ts - Sanitization/repair application:
sanitizeSessionHistoryinsrc/agents/pi-embedded-runner/google.ts
provider, modelApi, and modelId to decide what to apply.
Separate from transcript hygiene, session files are repaired (if needed) before load:
repairSessionFileIfNeededinsrc/agents/session-file-repair.ts- Called from
run/attempt.tsandcompact.ts(embedded runner)
Global rule: image sanitization
Image payloads are always sanitized to prevent provider-side rejection due to size limits (downscale/recompress oversized base64 images). Implementation:sanitizeSessionMessagesImagesinsrc/agents/pi-embedded-helpers/images.tssanitizeContentBlocksImagesinsrc/agents/tool-images.ts
Global rule: malformed tool calls
Assistant tool-call blocks that are missing bothinput and arguments are dropped
before model context is built. This prevents provider rejections from partially
persisted tool calls (for example, after a rate limit failure).
Implementation:
sanitizeToolCallInputsinsrc/agents/session-transcript-repair.ts- Applied in
sanitizeSessionHistoryinsrc/agents/pi-embedded-runner/google.ts
Provider matrix (current behavior)
OpenAI / OpenAI Codex- Image sanitization only.
- On model switch into OpenAI Responses/Codex, drop orphaned reasoning signatures (standalone reasoning items without a following content block).
- No tool call id sanitization.
- No tool result pairing repair.
- No turn validation or reordering.
- No synthetic tool results.
- No thought signature stripping.
- Tool call id sanitization: strict alphanumeric.
- Tool result pairing repair and synthetic tool results.
- Turn validation (Gemini-style turn alternation).
- Google turn ordering fixup (prepend a tiny user bootstrap if history starts with assistant).
- Antigravity Claude: normalize thinking signatures; drop unsigned thinking blocks.
- Tool result pairing repair and synthetic tool results.
- Turn validation (merge consecutive user turns to satisfy strict alternation).
- Tool call id sanitization: strict9 (alphanumeric length 9).
- Thought signature cleanup: strip non-base64
thought_signaturevalues (keep base64).
- Image sanitization only.
Historical behavior (pre-2026.1.22)
Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene:- A transcript-sanitize extension ran on every context build and could:
- Repair tool use/result pairing.
- Sanitize tool call ids (including a non-strict mode that preserved
_/-).
- The runner also performed provider-specific sanitization, which duplicated work.
- Additional mutations occurred outside the provider policy, including:
- Stripping
<final>tags from assistant text before persistence. - Dropping empty assistant error turns.
- Trimming assistant content after tool calls.
- Stripping
openai-responses
call_id|fc_id pairing). The 2026.1.22 cleanup removed the extension, centralized
logic in the runner, and made OpenAI no-touch beyond image sanitization.