Taming LLM Responses with Instaparse
It started with a simple goal: integrate an LLM model. Little did I know this would lead us down a rabbit hole for parsing challenges that would fundamentally change how we handle LLM outputs.
The Promise and the Pain
Like many developers, our journey began with a straightforward vision: use LLMs to generate UI operations for our no-code platform. The plan seemed simple - have the model return JSON structures describing UI components, their properties, and how they should be manipulated.
Our initial schema looked promising:
[{
"type": "append:node",
"context": {
"engine": "string",
"workspace": "string"
},
"data": {
"source": [{
"id": "string",
"componentName": "string",
"props": {
"data-content-editable": "content",
"class": "string",
"content": "string"
}
}],
"target": {
"id": "string",
"componentName": "string",
"props": {}
}
}
}]
We wrote comprehensive prompts, carefully explained our component hierarchy, and felt confident about our approach. Then reality struck.
The Pain Points
Our testing phase revealed several critical issues:
- JSON formatting significantly increased response latency
- Not all models supported JSON mode consistently
- Even with JSON mode enabled, sometimes LLMs would respond with incomplete JSON.
- The performance impact was unacceptable for real-time applications
The Regex Temptation
I'll admit it - my first instinct was to reach for regex. After all, how hard could it be to match some curly braces and square brackets?
;; I actually wrote this. I'm not proud of it.
(re-find #"\{[^}]+\}" llm-response)
I can feel you laughing right now. If you've ever tried to parse JSON with regex, you know exactly where this is going - a path of madness, unmaintainable code, and edge cases that haunted my dreams.
Instaparse - The Game Changer
Instead of fighting with regex, I decided to write a proper grammar to parse JSON-like structures embedded in text.
Here's the complete solution I developed:
1.The Grammar Definition
First, I defined a grammar that could handle JSON embedded within normal text:
(ns json-extractor.core
(:require [instaparse.core :as insta]
[clojure.edn :as edn]))
(def json-parser
(insta/parser
"text = (not-json | json)*
<not-json> = #'[^{\\[]+|[{\\[](?![\"\\s\\[{])'
json = object | array
<value> = object | array | string | number | boolean | null
object = <'{'> <ws> (pair (<','> <ws> pair)*)? <ws> <'}'>
array = <'['> <ws> (value (<','> <ws> value)*)? <ws> <']'>
pair = string <ws> <':'> <ws> value
string = <'\"'> #'[^\"\\\\]*(?:\\\\.[^\"\\\\]*)*' <'\"'>
number = #'-?(?:0|[1-9]\\d*)(?:\\.\\d+)?(?:[eE][+-]?\\d+)?'
boolean = 'true' | 'false'
null = 'null'
ws = #'\\s*'"))
2.Validation Layer
Once parsed, I needed to ensure the structures were valid:
(defn valid-json-structure? [x]
(or (map? x)
(and (sequential? x)
(every? (fn [item]
(or (number? item)
(string? item)
(boolean? item)
(nil? item)
(valid-json-structure? item)))
x))))
3.Transform Rules
(def transform-map
{:string identity
:number (fn [n]
(try
(edn/read-string n)
(catch Exception _
n)))
:boolean #(= % "true")
:null (constantly nil)
:pair vector
:object (fn [& pairs]
(try
(reduce (fn [acc [k v]]
(assoc acc (keyword k) v))
{}
pairs)
(catch Exception _
nil)))
:array (fn [& items]
(try
(vec (remove nil? items))
(catch Exception _
nil)))
:json identity
:text (fn [& items]
(->> items
(remove nil?)
(filter valid-json-structure?)))})
4.JSON String Detection
Before parsing, we need to find potential JSON strings in the text:
(defn find-all-json-like-strings
"Find potential JSON objects/arrays in text using balanced delimiter matching"
[text]
(let [results (atom [])
len (count text)]
(loop [i 0
stack []
start -1]
(if (< i len)
(let [c (nth text i)
stack' (cond
(and (empty? stack) (#{\{ \[} c))
(conj stack c)
(and (= (peek stack) \{) (= c \}))
(pop stack)
(and (= (peek stack) \[) (= c \]))
(pop stack)
(#{\{ \[} c)
(conj stack c)
:else
stack)]
(cond
(and (empty? stack) (= start -1) (#{\{ \[} c))
(recur (inc i) stack' i)
(and (empty? stack) (> start -1))
(do
(swap! results conj (subs text start (inc i)))
(recur (inc i) stack' -1))
:else
(recur (inc i) stack' start)))
(when (> start -1)
(swap! results conj (subs text start len)))))
@results))
5.Putting It All Together
Finally, I combined everything into two main functions:
(defn parse-single-json
"Parse a single JSON string"
[text]
(try
(let [result (json-parser text)]
(when-not (insta/failure? result)
(let [transformed (insta/transform transform-map result)
transformed (if (sequential? transformed)
(first transformed)
transformed)]
(when (valid-json-structure? transformed)
transformed))))
(catch Exception e
(tap> {:exception e :text text})
nil)))
(defn extract-json
"Extract all valid JSON structures from text"
[text]
(->> (find-all-json-like-strings text)
(map parse-single-json)
(filterv some?)))
Learn from my mistakes
- Write tests from the start.
- Don't modify the grammar without thorough testing
- Don't assume all LLM responses will contain valid JSON
- Don't skip the validation step, even if parsing succeeds
- Don't try to parse extremely large JSON structures in one go
When dealing with LLMs, robust parsing isn't just nice to have - it's essential for building reliable AI systems.
See It In Action
Our Auxtool Agent now streams UI operations in real-time, applying them as they arrive from the LLM. This creates a fluid, interactive experience where you can watch your UI being built dynamically as the model generates responses.
Demo Vade Auxtool Building Landing Page