Trusted AI

Large Language Model as a Service (LLMaaS) - Catalogue of available models

Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing

1,8 €

per million input tokens

8 €

per million tokens issued

8 €

per million reasoning tokens

4 €

per million reranking tokens

0,9 €

per million batch tokens received

4 €

per million batch tokens output

0,01 €

per minute of transcribed audio *

Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.

Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.

* any minute started is counted

Chat & Reasoning

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

Significant improvements in following instructions, reasoning, reading comprehension, mathematics, coding and tool use. Its context of 1M tokens enables the analysis of entire documents without truncation.

Parameters :

27B

Context Size :

1 000 000

Licence :

Apache 2.0

Energy efficiency :

2.78 kWh/Mtoken

CO₂ equivalent:

63.94 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

Multilingual

Vision

Reasoning

Mixture-of-Experts model with 120 billion parameters offering configurable reasoning and full access to the chain of thought. Ideal for scenarios requiring a permissive licence (Apache 2.0).

Parameters :

120B

Context Size :

120 000

Licence :

Apache 2.0

Energy efficiency :

2.37 kWh/Mtoken

CO₂ equivalent:

54.51 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Open-Source

Very Large

Supports English, French, German, Spanish, Italian, Portuguese, Hindi and Thai. Its 132k tokens window enables analysis of complex documents and long conversations.

Parameters :

70B

Context Size :

132 000

Licence :

LLAMA 3.3 Community Licence

Energy efficiency :

13.33 kWh/Mtoken

CO₂ equivalent:

306.59 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Dialogue

Multilingual

Ideal for agentic workflows, long-context reasoning, high-volume automation (support tickets, mass analyses), the use of tools and RAG.

Parameters :

120B

Context Size :

1 000 000

Licence :

NVIDIA Community License

Energy efficiency :

1.93 kWh/Mtoken

CO₂ equivalent:

44.39 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Ultra-sparse Mixture-of-Experts architecture combining the power of a very large model with the efficiency of a smaller model.

Parameters :

235B

Context Size :

200 000

Licence :

Apache 2.0

Energy efficiency :

3.97 kWh/Mtoken

CO₂ equivalent:

91.31 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Very Large

Large version of the Mistral Small family. Combines power, speed and reliability with an extended context. Native security filters.

Parameters :

119B

Context Size :

262 144

Licence :

Apache 2.0

Energy efficiency :

2 kWh/Mtoken

CO₂ equivalent:

46 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Security

Background

Fast

Thinking" version with enhanced reasoning capability. Combines compactness, speed and advanced reasoning.

Parameters :

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

2.42 kWh/Mtoken

CO₂ equivalent:

55.66 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Compact

Fast

Programming & Agents

Our programming and agent models are specially optimised for agentic software engineering, large-scale code generation and development workflow automation.

Includes entire code repositories thanks to its 1M token context. Supports multi-step reasoning and vision (screenshots, diagrams). Optimised for IDEs and CI/CD pipelines.

Parameters :

35B

Context Size :

1 000 000

Licence :

Apache 2.0

Energy efficiency :

2.07 kWh/Mtoken

CO₂ equivalent:

47.61 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Background

MoE

Vision

Reasoning

Excels at large-scale code generation and analysis. Designed for advanced software engineering tasks.

Parameters :

80B

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

2.29 kWh/Mtoken

CO₂ equivalent:

52.67 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

MoE

Background

Context of 250K tokens with support for function calling and guided decoding.

Parameters :

80B

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

2.09 kWh/Mtoken

CO₂ equivalent:

48.07 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

MoE

Optimised for codebase exploration, multi-file editing and the use of tools. Native vision support. Context of 200K tokens.

Parameters :

24B

Context Size :

200 000

Licence :

Apache 2.0

Energy efficiency :

4.23 kWh/Mtoken

CO₂ equivalent:

97.29 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Vision

Open-Source

Background

Fast

Ultra-compact, optimised for identifying and formatting function calls quickly.

Parameters :

270M

Context Size :

32 768

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.97 kWh/Mtoken

CO₂ equivalent:

22.31 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Compact

Efficient

Function Calling

Vision & Multimodal

Our Vision & Multimodal models can analyse images, videos and visual documents. They excel in OCR, object detection, structured extraction and spatio-temporal reasoning.

Excels in complex document analysis, multilingual OCR, 3D spatial reasoning and video understanding.

Parameters :

235B

Context Size :

200 000

Licence :

Apache 2.0

Energy efficiency :

5.56 kWh/Mtoken

CO₂ equivalent:

127.88 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Vision

Incorporates innovations in image and video analysis. Excels in complex OCR, graphics and structured extraction (JSON).

Parameters :

30B

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

3.39 kWh/Mtoken

CO₂ equivalent:

77.97 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Video

OCR

Excellent compromise between performance and footprint. Supports structured extraction and visual reasoning.

Parameters :

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

2.34 kWh/Mtoken

CO₂ equivalent:

53.82 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Multimodal

Efficient

Video

OCR

Google's most powerful open-source model. Native function calling, advanced visual understanding (OCR, graphics, documents, UI). Multilingual (35+ languages).

Parameters :

31B

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

3.77 kWh/Mtoken

CO₂ equivalent:

86.71 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

Vision

Reasoning

Multilingual

Open-Source

Variant 12B of the Gemma 4 family, offering a good balance between multimodal capabilities and footprint. Advanced reasoning, visual understanding (OCR, graphics, documents, UI) and multilingual support (35+ languages).

Parameters :

12B

Context Size :

250 000

Licence :

Apache 2.0

Energy efficiency :

3.31 kWh/Mtoken

CO₂ equivalent:

76.13 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Reasoning

Multilingual

Embedding

Our embedding models transform text into vector representations for semantic search, clustering and RAG (Retrieval-Augmented Generation) pipelines.

Context of 8192 tokens with three complementary search methods.

Parameters :

567M

Context Size :

8 192

Licence :

MIT

Energy efficiency :

0.36 kWh/Mtoken

CO₂ equivalent:

8.28 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Multilingual

Efficient

Ideal for processing large documents in RAG pipelines.

Parameters :

Context Size :

40 000

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Background

Efficient

The most powerful version of the Qwen3 embedding family. Ideal for tasks requiring contextual understanding.

Parameters :

Context Size :

40 000

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Background

High Performance

Excellent compromise between semantic performance and speed of execution.

Parameters :

0.6B

Context Size :

32 768

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

The fastest embedding model in the catalogue. Ideal for clustering and high-frequency searching.

Parameters :

278M

Context Size :

512

Licence :

Apache 2.0

Energy efficiency :

0.31 kWh/Mtoken

CO₂ equivalent:

7.13 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

Produces vector representations of text for classification, clustering and similarity search.

Parameters :

300M

Context Size :

2 048

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.35 kWh/Mtoken

CO₂ equivalent:

8.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Semantics

Efficient

Multilingual

Reranking

Our reranking models reorder search results by relevance to refine the quality of RAG pipelines. Compatible with the Cohere API.

Cohere v1/v2 SDK compatible. The relevance score is a raw logit (relative order is guaranteed). Ideal as a complement to the RAG stack (embedding + retrieval + rerank).

Parameters :

Context Size :

4 096

Licence :

NVIDIA Open Model License

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Rerank

RAG

Compact

Excellent rescheduling quality thanks to its 4B parameters. Ideal for demanding RAG pipelines.

Parameters :

Context Size :

4 096

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

Performance

Lightweight version for use cases requiring low reranking latency.

Parameters :

0.6B

Context Size :

4 096

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

Compact

Efficient

Complementary to the BGE-M3 embedding model for complete RAG pipelines.

Parameters :

335M

Context Size :

512

Licence :

MIT

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

High Performance

Security

Our security models specialise in detecting problematic content, preventing jailbreaks and ensuring regulatory compliance (RGPD, HDS). They can be used as pre-filters or post-filters in your workflows.

Version 4.1 (April 2026). Designed to filter sensitive content and ensure GDPR/HDS compliance. Can be used as a pre-filter or post-filter in your workflows. Hybrid thinking (reasoning) enabled.

Parameters :

Context Size :

8 192

Licence :

Apache 2.0

Energy efficiency :

3.09 kWh/Mtoken

CO₂ equivalent:

71.07 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Security

Guardrails

Compliance

Filtering

Same filtering capabilities as the 8B version, but with a smaller footprint. Ideal for high-frequency workflows. Hybrid thinking (reasoning) enabled.

Parameters :

Context Size :

8 192

Licence :

Apache 2.0

Energy efficiency :

0.65 kWh/Mtoken

CO₂ equivalent:

14.95 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Security

Guardrails

Compact

Efficient

Translation

Our translation models offer high fidelity in 55 languages, respecting the grammar, cultural nuances and technical specificities of the documents.

Captures literary and cultural nuances with exceptional fidelity.

Parameters :

27B

Context Size :

120 000

Licence :

Gemma Terms of Use

Energy efficiency :

7.84 kWh/Mtoken

CO₂ equivalent:

180.32 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Translation

Multilingual

Specialised

High Performance

Audio & Image

Our Audio & Image models enable real-time voice transcription (ASR streaming) and image generation from text descriptions, compatible with the OpenAI API.

Operates in Realtime mode via the /v1/realtime endpoint (WebSocket). Transcribes streaming audio.

Parameters :

Context Size :

32 768

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

ASR

Audio

Realtime

WebSocket

Supports image size and number of images. Compatible with the OpenAI ecosystem.

Parameters :

16B

Context Size :

N.C.

Licence :

Open Weights

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Image Generation

Creative

Multimodal

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Table comparing the characteristics and performance of the different AI models available, grouped by category.
Model	Publisher	Parameters	Context (tokens)	Energy efficiency *
Chat & Reasoning
qwen3.6:27b	Qwen Team	27B	1 000 000
gpt-oss:120b	OpenAI	120B	120 000
llama3.3:70b	Meta	70B	132 000
nemotron-3-super:120b	NVIDIA	120B	1 000 000
qwen3-2507:235b	Qwen Team	235B	200 000
mistral-small4:119b	Mistral AI	119B	262 144
qwen3-2507-think:4b	Qwen Team	4B	250 000
Programming & Agents
qwen3.6:35b	Qwen Team	35B	1 000 000
qwen-coder-next:80b	Qwen Team	80B	250 000
qwen3-next:80b	Qwen Team	80B	250 000
devstral-small-2:24b	Mistral AI & All Hands AI	24B	200 000
functiongemma:270m	Google	270M	32 768
Vision & Multimodal
qwen3-vl:235b	Qwen Team	235B	200 000
qwen3-vl:30b	Qwen Team	30B	250 000
qwen3-vl:4b	Qwen Team	4B	250 000
gemma4:31b	Google	31B	250 000
gemma4:12b-it-qat	Google	12B	250 000
Embedding
bge-m3:567m	BAAI	567M	8 192
qwen3-embedding:4b	Qwen Team	4B	40 000
qwen3-embedding:8b	Qwen Team	8B	40 000
qwen3-embedding:0.6b	Qwen Team	0.6B	32 768
granite-embedding:278m	IBM	278M	512
embeddinggemma:300m	Google	300M	2 048
Reranking
nvidia/llama-nemotron-rerank-vl-1b-v2	NVIDIA	1B	4 096	N.C.
qwen3-reranker:4b	Qwen Team	4B	4 096	N.C.
qwen3-reranker:0.6b	Qwen Team	0.6B	4 096	N.C.
bge-reranker-large	BAAI	335M	512	N.C.
Security
granite3-guardian:8b	IBM	8B	8 192
granite3-guardian:2b	IBM	2B	8 192
Translation
translategemma:27b	Google	27B	120 000
Audio & Image
voxtral	Mistral AI	4B	32 768	N.C.
z-image:16b	Community	16B	N.C.	N.C.

Legend and explanation

Functionality or capacity supported by the model

Functionality or capability not supported by the model

* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)

* Quick Model capable of generating more than 50 tokens per second

Note on performance measures

The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants able to communicate in several languages with automatic detection and context maintenance

Recommended models

nemotron-3-super:120b
qwen3.6:27b
gpt-oss:120b

Analysis of long documents

Processing of large documents (>100 pages) with extraction of key information, summaries and answers to questions

Recommended models

nemotron-3-super:120b
qwen3.6:27b
qwen3-2507:235b

Programming and development

Code generation, optimisation and debugging in multiple languages, refactoring and test creation

Recommended models

qwen3.6:35b
qwen-coder-next:80b
devstral-small-2:24b
nemotron-3-super:120b

Visual analysis

Image and visual document processing, OCR, interpretation of graphs and tables

Recommended models

qwen3-vl:235b
gemma4:31b
qwen3-vl:30b

Safety and compliance

Sensitive content filtering, jailbreak detection, RGPD/HDS compliance

Recommended models

granite4.1-guardian:8b
granite3-guardian:8b
granite3-guardian:2b
mistral-small4:119b

Light deployments

Applications requiring a minimal footprint, low latency and low power consumption

RAG (Retrieval-Augmented Generation)

Complete semantic search, reordering and retrieval-enhanced generation pipelines

Recommended models

bge-m3:567m
nvidia/llama-nemotron-rerank-vl-1b-v2
qwen3.6:27b

Follow the development of the LLMaaS offering

Discover all our IA research papers

Model status Our research papers

Trusted AI

Chat & Reasoning

qwen3.6:27b

gpt-oss:120b

llama3.3:70b

nemotron-3-super:120b

qwen3-2507:235b

mistral-small4:119b

qwen3-2507-think:4b

Programming & Agents

qwen3.6:35b

qwen-coder-next:80b

qwen3-next:80b

devstral-small-2:24b

functiongemma:270m

Vision & Multimodal

qwen3-vl:235b

qwen3-vl:30b

qwen3-vl:4b

gemma4:31b

gemma4:12b-it-qat

Embedding

bge-m3:567m

qwen3-embedding:4b

qwen3-embedding:8b

qwen3-embedding:0.6b

granite-embedding:278m

embeddinggemma:300m

Reranking

nvidia/llama-nemotron-rerank-vl-1b-v2

qwen3-reranker:4b

qwen3-reranker:0.6b

bge-reranker-large

Security

granite3-guardian:8b

granite3-guardian:2b

Translation

translategemma:27b

Audio & Image

voxtral

z-image:16b

Model comparison

Recommended use cases

Multilingual dialogue

Analysis of long documents

Programming and development

Visual analysis

Safety and compliance

Light deployments

RAG (Retrieval-Augmented Generation)