Ethically trained AI startup Pleias releases new small reasoning models optimized for RAG with built-in citations

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More

French AI startup Pleias made waves late last year with the launch of its ethically trained Pleias 1.0 family of small language models — among the first and only to date to be built entirely on scraping “open” data, that is, data explicitly labeled as public domain, open source, or unlicensed and not copyrighted.

Now the company has announced the release of two open source small-scale reasoning models designed specifically for retrieval-augmented generation (RAG), citation synthesis, and structured multilingual output.

The launch includes two core models — Pleias-RAG-350M and Pleias-RAG-1B — each also available in CPU-optimized GGUF format, making a total of four deployment-ready variants.

They are all based on Pleias 1.0, and can be used independently or in conjunction with other LLMs that the organization may already or plan to deploy. All appear to be available under a permissive Apache 2.0 open source license, meaning they are eligible for organizations to take, modify and deploy for commercial use cases.

RAG, as you’ll recall, is the widely-used technique that enterprises and organizations can deploy to hook an AI large language model (LLM) such as OpenAI’s GPT-4o, Google’s Gemini 2.5 Flash, Anthropic’s Claude Sonnet 3.7 or Cohere’s Command-A, or open source alternatives like Llama 4 and DeepSeek V3 to external knowledge bases, such as enterprise documents and cloud storages.

This is often necessary for enterprises that want to build chatbots and other AI applications that reference their internal policies or product catalogs (an alternative, prompting a long context LLM with all the information necessary, may not be suitable for enterprise use cases where security and per-token transmission costs are concerns).

The Pleias-RAG model family is the latest effort to bridge the gap between accuracy and efficiency in small language models.

These models are aimed at enterprises, developers, and researchers looking for cost-effective alternatives to large-scale language models without compromising traceability, multilingual capabilities, or structured reasoning workflows.

The target userbase is actually Pleias’s home continent of Europe, as co-founder Alexander Doria told VentureBeat via direct message on the social network X:

“A primary motivation has been the difficulty of scaling RAG applications in Europe. Most private organization have little GPUs (it may have changed but not long ago less than 2% of all [Nvidia] H100 [GPUs] were in Europe). And yet simultaneously there are strong incentive to self-host for regulated reasons, including GDPR.

“SLMs have progressed significantly over the past year, yet they are too often conceived as ‘mini-chatbots’ and we have observed a significant drop of performance in non-English languages, both in terms of source understanding and quality of text generation. So we have been satisfied to hit most of our objectives:

An actual alternative to 7-8b models for RAG even on CPU and other constrained infras.
Fully verifiable models coming with citation support.
Preservation of European language performance.”

However, of course the models being open source under the Apache 2.0 license means anyone could take and use them freely anywhere in the world.

Focused on grounding, citations, and facts

A key feature of the new Pleias-RAG models is their native support for source citation with literal quotes, fully integrated into the model’s inference process.

Unlike post-hoc citation methods or external chunking pipelines, the Pleias-RAG models generate citations directly, using a syntax inspired by Wikipedia’s reference format.

This approach allows for shorter, more readable citation snippets while maintaining verifiability.

Citation grounding plays a functional role in regulated settings.

For sectors like healthcare, legal, and finance — where decision-making must be documented and traceable — these built-in references offer a direct path to auditability. Pleias positions this design choice as an ethical imperative, aligning with increasing regulatory demands for explainable AI.

Proto agentic?

Pleias-RAG models are described as “proto-agentic” — they can autonomously assess whether a query is understandable, determine if it is trivial or complex, and decide whether to answer, reformulate, or refuse based on source adequacy.

Their structured output includes language detection, query and source analysis reports, and a reasoned answer.

Despite their relatively small size (Pleias-RAG-350M has just 350 million parameters) the models exhibit behavior traditionally associated with larger, agentic systems.

According to Pleias, these capabilities stem from a specialized mid-training pipeline that blends synthetic data generation with iterative reasoning prompts.

Pleias-RAG-350M is explicitly designed for constrained environments. It performs well on standard CPUs, including mobile-class infrastructure.

According to internal benchmarks, the unquantized GGUF version produces complete reasoning outputs in roughly 20 seconds on 8GB RAM setups. Its small footprint places it in a niche with very few competitors, such as Qwen-0.5 and SmolLM, but with a much stronger emphasis on structured source synthesis.

Competitive performance across tasks and languages

In benchmark evaluations, Pleias-RAG-350M and Pleias-RAG-1B outperform most open-weight models under 4 billion parameters, including Llama-3.1-8B and Qwen-2.5-7B, on tasks such as HotPotQA, 2WikiMultiHopQA, and MuSiQue.

These multi-hop RAG benchmarks test the model’s ability to reason across multiple documents and identify distractors — common requirements in enterprise-grade knowledge systems.

The models’ strength extends to multilingual scenarios. On translated benchmark sets across French, German, Spanish, and Italian, the Pleias models show negligible degradation in performance.

This sets them apart from other SLMs, which typically experience a 10–35% performance loss when handling non-English queries.

The multilingual support stems from careful tokenizer design and synthetic adversarial training that includes language-switching exercises. The models not only detect the language of a user query but aim to respond in the same language—an important feature for global deployments.

In addition, Doria highlighted how the models could be used to augment the performance of other existing models an enterprise may already be using:

“We envision the models to be used in orchestration setting, especially since their compute cost is low. A very interesting results on the evaluation side: even the 350m model turned out to be good on entirely different answers than the answers [Meta] Llama and [Alibaba] Qwen were performing at. So there’s a real complementarity we attribute to our reasoning pipeline, that goes beyond cost-effectiveness…”

Open access and licensing

According to Doria and a technical paper detailing the training of the Pleias-RAG family, the models were trained on: “Common Corpus to create the RAG training set (all the 3 million examples came from it). We used [Google] Gemma on top for generation of reasoning synthetic traces since the license allowed for reuse/retraining.”

Both models are released under the Apache 2.0 license, allowing for commercial reuse and integration into larger systems.

Pleias emphasizes the models’ suitability for integration into search-augmented assistants, educational tools, and user support systems. The company also provides an API library to simplify structured input-output formatting for developers.

The models’ release is part of a broader push by Pleias to reposition small LLMs as tools for structured reasoning, rather than as general-purpose conversational bots.

By leveraging an external memory architecture and systematic citation methods, the Pleias-RAG series offers a transparent, auditable alternative to more opaque frontier models.

Future outlook

Looking ahead, Pleias plans to expand the models’ capabilities through longer context handling, tighter search integration, and personality tuning for more consistent identity presentation.

Reinforcement learning is also being explored, particularly in domains like citation accuracy, where quote verification can be measured algorithmically.

The team is also actively collaborating with partners such as the Wikimedia Foundation to support targeted search integrations using trusted sources.

Ultimately, the current usage of RAG-specific implementations, models and workflows may fall away as more advanced AI models are trained and deployed, ones that incorporate RAG and agentic tool usage natively. As Doria told VentureBeat via DM:

“Long term, my conviction is that both classic RAG pipeline and long context models are going to be disrupted by search agents. We have started to move in this direction: that’s why the model already comes equipped with many features that are currently externalized in RAG applications (query reformulation, reranking, etc.). We obviously aim to go further and integrate search capacities and source processing capacities directly in the model itself. My conviction is that RAG will disappear in a way as it gets automated by agentic models able to direct their own workflows.“

With Pleias-RAG-350M and 1B, the company is betting that small models—when paired with strong reasoning scaffolding and verifiable outputs—can compete with much larger counterparts, especially in multilingual and infrastructure-limited deployments.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source link