What's next for HAPP: media intelligence & AI
HAPP AI Team
Product
· 9 min read
If January was about strengthening infrastructure—Messenger, CRMs, behavior settings, and system tokens—the next stage for HAPP is expanding cognitive capabilities. Two directions we're working on: media intelligence and dynamic AI model selection.
Media intelligence means the assistant is not limited to text: it can perceive and interpret images, screenshots, voice context, and multimodal inputs within a single conversation. Dynamic model selection is the ability to choose the right model for the task—where speed and cost matter in one place, reasoning quality or multilingualism in another. Together this enables smarter and more cost-effective automation.
From infrastructure to the cognitive layer
In January 2026 we focused on making the platform stable across environments: integrations with four CRMs, async tools, clear escalation rules. That is the base. The next step is to have the AI not only run scenarios but better understand the customer's context and choose the best way to handle each request.
In support and in e-commerce, customers increasingly send product photos, payment error screenshots, or voice messages. A classic text bot does not see these inputs. A multimodal model can recognise image content, link it to the catalogue or ticket, and suggest an action—without handing off to a human. That reduces delay and load on operators.
Media intelligence lets the assistant handle images, screenshots, and voice within the conversation—without human handoff. That cuts delay and operator load in support and e-commerce.
What we mean by media intelligence
By media intelligence we mean not only image recognition but the link "media — dialogue context — action". For example: the customer sends a photo of a damaged product. The system should identify the type of defect, match it to the order, suggest return or replacement, and if needed create a ticket with the attachment. All of this can be automated only when the model understands both text and visuals and business rules.
Another scenario is voice in messenger. The customer sends a voice message; the assistant transcribes it, detects intent, and replies with text or an action. Here what matters is not only transcription accuracy but consistency with earlier messages in the thread. Media intelligence in our view is exactly this kind of context-dependent processing.
We are rolling out these capabilities in stages: first for selected clients and scenarios, with clear measurement of quality and impact on operational metrics. The goal is not to "add pictures" but to increase the share of requests closed without escalation and to shorten time to first response.
Dynamic AI model selection
The second vector is dynamic model selection. Not every task needs the most powerful or longest-context model. Simple intent classification, order status checks, or standard answers from the knowledge base can be handled by faster, cheaper models. Complex complaints, multilingual dialogues, or media-heavy flows may require a different model.
Dynamic selection means routing the request to the right model by rules: channel type, request complexity, presence of media, language, latency or budget constraints. That way we keep quality where it is critical and reduce cost and latency where a simpler model is enough.
Dynamic model selection: simple tasks go to faster, cheaper models; complex complaints and multimedia go to a more capable model. Quality is preserved, cost and latency are optimized.
In 2025–2026 LLM providers offer more and more options by size, context, and price. Tying the whole product to a single model means losing flexibility. Our aim is an architecture where orchestration chooses the model by rules and the client gets predictable quality and cost.
Why this matters for support and e-commerce
In support and e-commerce, first response and resolution speed directly affect conversion and NPS. If the customer sends a photo and only gets "please describe the issue"—that is an extra step and a risk of losing the order. If the assistant immediately recognises the problem and suggests a solution—ticket lifetime and human load go down.
Dynamic model selection allows scaling such scenarios without proportional API cost growth: simple traffic is handled economically, complex cases get the right level of quality. For companies already using HAPP for voice calls or messengers, this is the next step toward a full cognitive layer—not just infrastructure but understanding content and adaptive choice of tools.
What's next
January was the stabilisation and integrations phase. The next steps are media intelligence and dynamic AI model selection. We are expanding the platform's cognitive capabilities so the assistant better understands multimodal context and uses available models more efficiently. If you want to discuss how this could apply to your support or e-commerce—get in touch.
Need a consultation?
We’ll show how HAPP fits your business.