Replies: 1 comment
-
|
Hey @IMRANEMU excellent questions, and I'm looking forward to seeing your results with fine tuning these SLMs. One of the "privileges" we have is Parlant's extensive scenario-based test suite which measures the behavioral accuracy and consistency of all parts of the engine, both independently and working together in integration. We have hundreds of tests under the tests/ directory. Have a look there! Speaking of this topic, you might be interested to know that we're also very minded toward optimizing the cost issue for everybody (it's a common request) and are currently in the process of training a fine-tuned SLM based service which is optimized for Parlant, and we're scheduled to offer it publicly quite soon. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
As Parlant relies heavily on reasoning tokens to produce accurate and structured responses, selecting the right model is crucial. The model must not only have strong domain knowledge but also excel in structured JSON output generation, which is essential for conversational agents.
Currently, I’m building a Conversational Agent in the Education Domain, and I’ve experimented with several open and closed-source models:
🧠 Models Tested
Open Models: Qwen 7B, Qwen 72B, Gemma 3B, Gemma 27B
Closed Models: GPT-4o, GPT-4o Mini, Gemini Flash Lite 2.0
⚙️ Results
Top Performers: GPT-4o and GPT-4o Mini delivered the best results in reasoning quality and structured JSON outputs.
Runner-up: Qwen 72B performed decently but not pixel-perfect.
Underperformers: Smaller models like Qwen 7B and Gemma 3B struggled with both domain comprehension and JSON consistency.
🎯 Next Steps
I plan to fine-tune Qwen 7B or Gemma 4B to improve their domain-specific reasoning and ensure precise JSON responses, making them more cost-efficient alternatives for production.
Question for you:
How do you approach model selection for Parlant?
What criteria or evaluation methods do you use to balance reasoning performance, cost, and structured output quality?
Beta Was this translation helpful? Give feedback.
All reactions