A new, enterprise-specific AI speech model is here: Jargonic from aiOla claims to best rivals at your business’s lingo

0


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

Speech recognition models have become increasingly accurate in recent years. However, they may be built and benchmarked under ideal conditions—quiet rooms, clear audio and general-purpose vocabulary. For enterprises, however, real-world audio is far messier.

That’s the challenge aiOla aims to address with the launch of Jargonic, its new automatic speech recognition (ASR) built specifically for enterprise use. The Israeli startup is unveiling Jargonic today.

Jargonic is a new speech-to-text model designed to handle specialized jargon, background noise and diverse accents without extensive retraining or fine-tuning.

“Our model focuses on three key challenges in speech recognition: jargon, background noise and accents,” said Gill Hetz, aiOla vice president of AI. “We built a model that understands specific industry jargon in a zero-shot manner, handles noisy environments and supports a wide range of accents.”

Available now via API on aiOla’s enterprise platform, Jargonic is positioned as a production-ready ASR solution for businesses in industries such as manufacturing, logistics, financial services and healthcare.

aiOla team. Credit: aiOla

From product-first to AI-first

The launch of Jargonic represents a shift in focus for aiOla itself. According to company leadership, the team redefined its approach to prioritize AI research and deployment.

“When I arrived here, I saw an amazing product company that had invested heavily in advanced AI capabilities, but was mostly known for helping people fill out forms,” said Assaf Asbag, aiOla’s Chief Technology and Product Officer. “We shifted the perspective and became an AI company with a great product, instead of a product company with AI capabilities.”

“We decided to open our capabilities to the world,” Asbag added. “Instead of serving our model only to enterprises within our product, we developed an API and are now launching it to make our enterprise-grade, bulletproof model available to everyone.”

Jargon recognition, zero-shot adaptation

One of Jargonic’s distinguishing features is its approach to specialized vocabulary. Speech recognition systems typically struggle when confronted with domain-specific jargon that does not appear in standard training data. Jargonic addresses this challenge with a proprietary keyword spotting system that allows for zero-shot adaptation—enterprises can simply provide a list of terms without additional retraining.

In benchmark tests, Jargonic demonstrated a 5.91% average word error rate (WER) across four leading English academic datasets, outperforming competitors such as Eleven Labs, Assembly AI, OpenAI’s Whisper and Deepgram Nova-3.

However, the company has not yet disclosed performance comparisons specifically against newer multimodal transcription models like OpenAI’s GPT-4o-transcribe, which came nine days ago, boasting top performance on benchmarks such as WER, with only 2.46% in English. aiOla claims its model is still better at picking out specific business jargon.

Jargonic also achieved an 89.3% recall rate on specialized financial terms and consistently outperformed others in multilingual jargon recognition, reaching over 95% accuracy across five languages.

“Once you have heavy jargon, recognition accuracy typically drops by 20%,” Asbag explained. “But with our zero-shot approach, where you just list important keywords, accuracy jumps back up to 95%. That’s unique to us.”

This capability is designed to eliminate the time-consuming, resource-intensive retraining process typically required to adapt ASR systems for specific industries.

Optimized for the enterprise environment

Jargonic’s development was informed by years of experience building solutions for enterprise clients. The model was trained on over one million hours of transcribed speech, including significant data from industrial and business environments, ensuring robustness in noisy, real-life settings.

“What differentiates us is that we’ve spent years solving real-world enterprise problems,” Hetz said. “We optimized for speed, accuracy, and the ability to handle complex environments—not just podcasts or videos, but noisy, messy, real-life workplaces.”

The model’s architecture integrates keyword spotting directly into the transcription process, allowing Jargonic to maintain accuracy even in unpredictable audio conditions.

The voice-first future

For aiOla’s leadership, Jargonic is a step toward a broader shift in how people interact with technology. The company sees speech recognition not only as a business tool, but as an essential interface for the future of human-computer interaction.

“Our vision is that every machine interface will soon be voice-first,” Hetz said. “You’ll be able to talk to your refrigerator, your vacuum cleaner, any machine—and it will act and do whatever you want. That’s the future we’re building toward.”

Asbag echoed that sentiment, adding, “Conversational AI is going to become the new web browser. Machines are starting to understand us, and now we have a reason to interact with them naturally.”

For now, aiOla’s focus remains on the enterprise. Jargonic is available immediately to enterprise customers via API, allowing them to integrate the model’s speech recognition capabilities into their own workflows, applications, or customer-facing services.



Source link

You might also like
Leave A Reply

Your email address will not be published.