With AI adoption rising across industries, one question arises—and it is not WHAT models can do, but HOW we make them behave the way we need.
Teams are deploying large language models into real-world products, but quickly discovering that performance, cost, and output aren’t always predictable. Behind every AI response lies a set of tradeoffs: speed vs. scale, price vs. quality, volume vs. latency. And for most developers, those tradeoffs are hardcoded by someone else.
To better understand what’s missing from the current AI stack, Vancouver Tech Journal spoke with Medi Naseri, CEO and co-founder of LōD Technologies, a company building solutions for complex challenges, including energy strategies for data centres and improving the programmability and control of AI.
AI has reached a turning point
Every major technological revolution (steam, electricity, the internet) began with acceleration. AI is following the same curve, only faster. But now, the trust phase, where teams, regulators, and users expect reliability and accountability, is arriving earlier than anyone expected.
“People don’t just want to build with AI,” Naseri says. “They want to understand its behaviour and shape it without exceeding their budget or wasting resources.
He’s not alone. Developers, startups, and enterprise teams alike are realizing that “calling an API” isn’t the same as controlling the outcome. Whether you’re building a chatbot or powering a financial engine, the ability to customize how AI performs, per request, is becoming essential.
From access to architecture
For the past few years, AI progress has been framed around access: who has access to what models. But that story is evolving.
“Access isn’t the bottleneck anymore,” Naseri notes. “What matters is how well you can control the infrastructure under the hood.”
It’s a conversation more teams are starting to have. Some need responses under 200 milliseconds. Others need to control token usage or avoid unpredictable costs. Others need built-in checks for sensitive content. The common thread? They need choices.
Naseri calls this “programmable inference,” the idea that every model call should give you options:
“What model do I want? What matters more right now: cost, speed, or efficiency? Do I need guardrails on this request, or not? That’s the layer we’re missing.”
A peek under the hood
It was this need for control that led Naseri and his team to build CLōD, an inference platform designed to give developers more strategic options with every request.
He points to token-rate control as a good example. “Let’s say you’re running a high-volume app. You don’t want to just pick the cheapest model, you want to make sure your output tokens are efficient. That’s a real business lever.”
At the 2025 All In conference in Montréal, the team introduced another control layer: governance as an add-on. Developers can enable things like audit logs, filters, and policy checks when needed, and disable them when they’re not.
“Not every use case needs compliance built-in,” Naseri says. “But the moment it does, you shouldn’t have to re-architect your whole stack to add it.”
The real meaning of trust
By the end of our interview, one thing was clear: for all the talk of AI safety and regulation, trust won’t come from external rules alone. It will come from giving teams the tools to make smarter decisions, one model call at a time.
“Inference is where it all happens,” Naseri says. “If you can control that layer, what models run, how they run, what matters per request, you’re not just using AI. You’re owning it.”
As AI shapes the future of many industries, having that level of control is not just a desire; it is a necessity.
To learn more or join CLōD, visit clod.io
