Trusted AI

Large Language Model as a Service (LLMaaS) - Catalogue of available models

Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing

1.8 €

per million input tokens

8 €

per million tokens issued

8 €

per million reasoning tokens

4 €

per million reranking tokens

0,01 €

per minute of transcribed audio *

Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.

Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.

* any minute started is counted

Chat & Reasoning

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

Significant improvements in following instructions, reasoning, reading comprehension, mathematics, coding and tool use. Its context of 1M tokens enables the analysis of entire documents without truncation.

Parameters :

27 billion

Context Size :

1000000

Licence :

Apache 2.0

Energy efficiency :

2.78 kWh/Mtoken

CO₂ equivalent:

63.94 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

Multilingual

Vision

Reasoning

Mixture-of-Experts model with 120 billion parameters offering configurable reasoning and full access to the chain of thought. Ideal for scenarios requiring a permissive licence (Apache 2.0).

Parameters :

120 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

2.37 kWh/Mtoken

CO₂ equivalent:

54.51 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Open-Source

Very Large

Mixture-of-Experts model with 21 billion parameters and 3.6 billion active parameters. Configurable reasoning and full agent capabilities.

Parameters :

20 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

3.25 kWh/Mtoken

CO₂ equivalent:

74.75 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Open-Source

Compact

Fast

Supports English, French, German, Spanish, Italian, Portuguese, Hindi and Thai. Its 132k tokens window enables analysis of complex documents and long conversations.

Parameters :

70 billion

Context Size :

132000

Licence :

LLAMA 3.3 Community Licence

Energy efficiency :

13.33 kWh/Mtoken

CO₂ equivalent:

306.59 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Dialogue

Multilingual

Includes native multimodal capabilities (text + image) and excels in over 140 languages. Ideal for analysing large documents and document research.

Parameters :

27 billion

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

5.8 kWh/Mtoken

CO₂ equivalent:

133.4 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Ideal for agentic workflows, long-context reasoning, high-volume automation (support tickets, mass analyses), the use of tools and RAG.

Parameters :

120 billion

Context Size :

1000000

Licence :

NVIDIA Community License

Energy efficiency :

1.93 kWh/Mtoken

CO₂ equivalent:

44.39 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Excels at function calling, structured reasoning and analysing long contexts. Rare combination of high speed and very long context.

Parameters :

30 billion

Context Size :

1000000

Licence :

NVIDIA Community License

Energy efficiency :

1.56 kWh/Mtoken

CO₂ equivalent:

35.88 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Fast

Excels at structured reasoning, solving complex mathematical problems and analysing long contexts.

Parameters :

30 billion

Context Size :

1000000

Licence :

NVIDIA Community License

Energy efficiency :

1.93 kWh/Mtoken

CO₂ equivalent:

44.39 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Mathematics

Offers fast inference (88 t/s) with a context of 120k tokens. Particularly suited to conversational assistants requiring low latency.

Parameters :

30 billion

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

1.58 kWh/Mtoken

CO₂ equivalent:

36.34 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Fast

Background

Multilingual

Excels in multi-factor analysis, formal demonstration and hallucination minimisation thanks to built-in logic verification mechanisms.

Parameters :

32 billion

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

6.32 kWh/Mtoken

CO₂ equivalent:

145.36 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Understanding

Analysis

Competes with the best proprietary models on complex benchmarks (MATH, HumanEval+). Able to expose its thought process. Preferred choice for transparency and auditability.

Parameters :

32 billion

Context Size :

65536

Licence :

Apache 2.0

Energy efficiency :

5.98 kWh/Mtoken

CO₂ equivalent:

137.54 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Open-Source

Background

Reasoning

Transparent

Code

High Performance

Optimised for efficiency (2.5x less resources than Llama 3.1 8B). Ideal for tasks requiring complete reproducibility and auditability.

Parameters :

7 billion

Context Size :

65536

Licence :

Apache 2.0

Energy efficiency :

1.13 kWh/Mtoken

CO₂ equivalent:

25.99 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Open-Source

Background

Transparent

Efficient

Mathematics

Code

Ultra-sparse Mixture-of-Experts architecture combining the power of a very large model with the efficiency of a smaller model.

Parameters :

235 billion

Context Size :

200000

Licence :

Apache 2.0

Energy efficiency :

3.97 kWh/Mtoken

CO₂ equivalent:

91.31 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

MoE

Agent

Reasoning

Very Large

Excellent instruction tracking, fewer repetitions, reliable function calling. Supports vision (image analysis) and native security filters.

Parameters :

24 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

5.05 kWh/Mtoken

CO₂ equivalent:

116.15 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Security

Instruction Following

Large version of the Mistral Small family. Combines power, speed and reliability with an extended context. Native security filters.

Parameters :

119 billion

Context Size :

262144

Licence :

Apache 2.0

Energy efficiency :

2 kWh/Mtoken

CO₂ equivalent:

46 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Security

Background

Fast

Excels at complex reasoning and coding while remaining efficient.

Parameters :

14 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

4.74 kWh/Mtoken

CO₂ equivalent:

109.02 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

High Performance

Reasoning

Code

Capable of complex reasoning while remaining fast. Ideal for assistants requiring responsiveness and quality.

Parameters :

8 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.33 kWh/Mtoken

CO₂ equivalent:

76.59 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Efficient

Reasoning

Surprising performance for conversational tasks and simple reasoning despite only 3B parameters.

Parameters :

3 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

1.75 kWh/Mtoken

CO₂ equivalent:

40.25 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Good balance between generation quality and inference speed.

Parameters :

9 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

4.23 kWh/Mtoken

CO₂ equivalent:

97.29 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Efficient

Background

Multilingual

Reasoning

Good candidate for assistants and light reasoning tasks.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.64 kWh/Mtoken

CO₂ equivalent:

83.72 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Background

Multilingual

Ideal for fast conversational tasks requiring a very long history or analysis of large documents with a small footprint.

Parameters :

0.8 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.39 kWh/Mtoken

CO₂ equivalent:

54.97 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Efficient

Background

Multilingual

Ideal as the first level of processing in complex workflows or for rapid classification tasks.

Parameters :

0.6 billion

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

1.33 kWh/Mtoken

CO₂ equivalent:

30.59 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Compact

Fast

Efficient

Multilingual

Thinking" version with enhanced reasoning capability. Combines compactness, speed and advanced reasoning.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.42 kWh/Mtoken

CO₂ equivalent:

55.66 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Compact

Fast

Supports multimodal input (text, image, audio, video) with advanced reasoning capabilities. Note - audio output via API is not yet enabled.

Parameters :

30 billion

Context Size :

32768

Licence :

Apache 2.0

Energy efficiency :

7.43 kWh/Mtoken

CO₂ equivalent:

170.89 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Omni

Audio

Vision

Agent

Multimodal

Programming & Agents

Our programming and agent models are specially optimised for agentic software engineering, large-scale code generation and development workflow automation.

Includes entire code repositories thanks to its 1M token context. Supports multi-step reasoning and vision (screenshots, diagrams). Optimised for IDEs and CI/CD pipelines.

Parameters :

35 billion

Context Size :

1000000

Licence :

Apache 2.0

Energy efficiency :

2.07 kWh/Mtoken

CO₂ equivalent:

47.61 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Background

MoE

Vision

Reasoning

Excels at large-scale code generation and analysis. Designed for advanced software engineering tasks.

Parameters :

80 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.29 kWh/Mtoken

CO₂ equivalent:

52.67 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

MoE

Background

Context of 250K tokens with support for function calling and guided decoding.

Parameters :

80 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.09 kWh/Mtoken

CO₂ equivalent:

48.07 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

MoE

Optimised for codebase exploration, multi-file editing and the use of tools. Native vision support. Context of 200K tokens.

Parameters :

24 billion

Context Size :

200000

Licence :

Apache 2.0

Energy efficiency :

4.23 kWh/Mtoken

CO₂ equivalent:

97.29 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Programming

Vision

Open-Source

Background

Fast

Dense model trained on 8.4T tokens. Often outperforms much larger models on code and mathematical reasoning tasks.

Parameters :

8.3 billion

Context Size :

32000

Licence :

Open Weights

Energy efficiency :

1.69 kWh/Mtoken

CO₂ equivalent:

38.87 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Code

Mathematics

STEM

Reasoning

Efficient

Ultra-compact, optimised for identifying and formatting function calls quickly.

Parameters :

0.27 billion

Context Size :

32768

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.97 kWh/Mtoken

CO₂ equivalent:

22.31 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Compact

Efficient

Function Calling

Vision & Multimodal

Our Vision & Multimodal models can analyse images, videos and visual documents. They excel in OCR, object detection, structured extraction and spatio-temporal reasoning.

Excels in complex document analysis, multilingual OCR, 3D spatial reasoning and video understanding.

Parameters :

235 billion

Context Size :

200000

Licence :

Apache 2.0

Energy efficiency :

5.56 kWh/Mtoken

CO₂ equivalent:

127.88 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Reasoning

Background

Vision

Fine analysis of high-resolution images, understanding of dynamic scenes and text-timestamp alignment for video.

Parameters :

32 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

7.75 kWh/Mtoken

CO₂ equivalent:

178.25 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Video

OCR

Incorporates innovations in image and video analysis. Excels in complex OCR, graphics and structured extraction (JSON).

Parameters :

30 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.39 kWh/Mtoken

CO₂ equivalent:

77.97 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Agent

Background

Multimodal

Video

OCR

Capable of analysing complex documents, graphics and videos with a high degree of accuracy.

Parameters :

8 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.38 kWh/Mtoken

CO₂ equivalent:

77.74 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Multimodal

Efficient

Video

OCR

Excellent compromise between performance and footprint. Supports structured extraction and visual reasoning.

Parameters :

4 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

2.34 kWh/Mtoken

CO₂ equivalent:

53.82 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Multimodal

Efficient

Video

OCR

Despite its small size, it offers amazing image and video analysis. Ideal for mobile or embedded applications.

Parameters :

2 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

0.95 kWh/Mtoken

CO₂ equivalent:

21.85 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Efficient

Multimodal

OCR

Google's most powerful open-source model. Native function calling, advanced visual understanding (OCR, graphics, documents, UI). Multilingual (35+ languages).

Parameters :

31 billion

Context Size :

250000

Licence :

Apache 2.0

Energy efficiency :

3.77 kWh/Mtoken

CO₂ equivalent:

86.71 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Agent

Background

Vision

Reasoning

Multilingual

Open-Source

Offers an exceptional performance to footprint ratio. 128K tokens with full vision capabilities.

Parameters :

31 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

1.11 kWh/Mtoken

CO₂ equivalent:

25.53 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Fast

Efficient

Better fidelity than the E2B version, but still high speed. Context 128K tokens.

Parameters :

31 billion

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

1.63 kWh/Mtoken

CO₂ equivalent:

37.49 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Fast

Lightweight yet powerful for low-latency OCR and image analysis.

Parameters :

2 billion

Context Size :

16384

Licence :

Apache 2.0

Energy efficiency :

0.8 kWh/Mtoken

CO₂ equivalent:

18.4 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

Compact

Efficient

OCR

Optimised for converting documents into structured Markdown. Excels with complex tables and formulas.

Parameters :

3 billion

Context Size :

8192

Licence :

MIT licence

Energy efficiency :

0.66 kWh/Mtoken

CO₂ equivalent:

15.18 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Vision

OCR

Efficient

Embedding

Our embedding models transform text into vector representations for semantic search, clustering and RAG (Retrieval-Augmented Generation) pipelines.

Context of 8192 tokens with three complementary search methods.

Parameters :

0.567 billion

Context Size :

8192

Licence :

MIT

Energy efficiency :

0.36 kWh/Mtoken

CO₂ equivalent:

8.28 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Multilingual

Efficient

Ideal for processing large documents in RAG pipelines.

Parameters :

4 billion

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Background

Efficient

The most powerful version of the Qwen3 embedding family. Ideal for tasks requiring contextual understanding.

Parameters :

8 billion

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Background

High Performance

Excellent compromise between semantic performance and speed of execution.

Parameters :

0.6 billion

Context Size :

32768

Licence :

Apache 2.0

Energy efficiency :

0.57 kWh/Mtoken

CO₂ equivalent:

13.11 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

The fastest embedding model in the catalogue. Ideal for clustering and high-frequency searching.

Parameters :

0.278 billion

Context Size :

512

Licence :

Apache 2.0

Energy efficiency :

0.31 kWh/Mtoken

CO₂ equivalent:

7.13 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Efficient

Produces vector representations of text for classification, clustering and similarity search.

Parameters :

0.3 billion

Context Size :

2048

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.35 kWh/Mtoken

CO₂ equivalent:

8.05 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Embedding

Compact

Semantics

Efficient

Multilingual

Reranking

Our reranking models reorder search results by relevance to refine the quality of RAG pipelines. Compatible with the Cohere API.

Cohere v1/v2 SDK compatible. The relevance score is a raw logit (relative order is guaranteed). Ideal as a complement to the RAG stack (embedding + retrieval + rerank).

Parameters :

1 billion

Context Size :

4096

Licence :

NVIDIA Open Model License

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Rerank

RAG

Compact

Excellent rescheduling quality thanks to its 4B parameters. Ideal for demanding RAG pipelines.

Parameters :

4 billion

Context Size :

4096

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

Performance

Lightweight version for use cases requiring low reranking latency.

Parameters :

0.6 billion

Context Size :

4096

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

Compact

Efficient

Complementary to the BGE-M3 embedding model for complete RAG pipelines.

Parameters :

0.3 billion

Context Size :

512

Licence :

MIT

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Reranker

High Performance

Security

Our security models specialise in detecting problematic content, preventing jailbreaks and ensuring regulatory compliance (RGPD, HDS). They can be used as pre-filters or post-filters in your workflows.

Designed to filter sensitive content and ensure RGPD/HDS compliance. Can be used as a pre-filter or post-filter in your workflows.

Parameters :

8 billion

Context Size :

8192

Licence :

Apache 2.0

Energy efficiency :

3.09 kWh/Mtoken

CO₂ equivalent:

71.07 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Security

Guardrails

Compliance

Filtering

Same filtering capabilities as version 8B, but with a smaller footprint. Ideal for high-frequency workflows.

Parameters :

2 billion

Context Size :

8192

Licence :

Apache 2.0

Energy efficiency :

0.65 kWh/Mtoken

CO₂ equivalent:

14.95 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Security

Guardrails

Compact

Efficient

Translation

Our translation models offer high fidelity in 55 languages, respecting the grammar, cultural nuances and technical specificities of the documents.

Captures literary and cultural nuances with exceptional fidelity.

Parameters :

27 billion

Context Size :

120000

Licence :

Gemma Terms of Use

Energy efficiency :

7.84 kWh/Mtoken

CO₂ equivalent:

180.32 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Translation

Multilingual

Specialised

High Performance

Respects grammar and cultural nuances. Ideal for long documents.

Parameters :

12 billion

Context Size :

128000

Licence :

Gemma Terms of Use

Energy efficiency :

4.87 kWh/Mtoken

CO₂ equivalent:

112.01 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Translation

Multilingual

Specialised

Compact version with an excellent speed/quality ratio. Context of 128K tokens.

Parameters :

4 billion

Context Size :

128000

Licence :

Gemma Terms of Use

Energy efficiency :

1.25 kWh/Mtoken

CO₂ equivalent:

28.75 CO₂e/Mtoken

Tools (Functions) :

Vision (Images) :

Translation

Multilingual

Specialised

Efficient

Audio & Image

Our Audio & Image models enable real-time voice transcription (ASR streaming) and image generation from text descriptions, compatible with the OpenAI API.

Operates in Realtime mode via the /v1/realtime endpoint (WebSocket). Transcribes streaming audio.

Parameters :

4 billion

Context Size :

32768

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

ASR

Audio

Realtime

WebSocket

Supports image size and number of images. Compatible with the OpenAI ecosystem.

Parameters :

16 billion

Context Size :

Licence :

Open Weights

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Functions) :

Vision (Images) :

Image Generation

Creative

Multimodal

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Table comparing the characteristics and performance of the different AI models available, grouped by category.
Model	Publisher	Parameters	Context (k tokens)	Energy efficiency *
Chat & Reasoning
qwen3.6:27b	Qwen Team	27B	1000000
gpt-oss:120b	OpenAI	120B	120000
gpt-oss:20b	OpenAI	20B	120000
llama3.3:70b	Meta	70B	132000
gemma3:27b	Google	27B	120000
nemotron-3-super:120b	NVIDIA	120B	1000000
nemotron3-nano:30b	NVIDIA	30B	1000000
nemotron-cascade:30b	NVIDIA	30B	1000000
glm-4.7-flash:30b	Zhipu AI	30B	120000
cogito:32b	Deep Cogito	32B	32000
elm tree 3:32b	AllenAI	32B	65536
Elm 3:7b	AllenAI	7B	65536
qwen3-2507:235b	Qwen Team	235B	200000
mistral-small3.2:24b	Mistral AI	24B	128000
mistral-small4:119b	Mistral AI	119B	262144
ministral-3:14b	Mistral AI	14B	250000
ministral-3:8b	Mistral AI	8B	250000
ministral-3:3b	Mistral AI	3B	250000
qwen3.5:9b	Qwen Team	9B	250000
qwen3.5:4b	Qwen Team	4B	250000
qwen3.5:0.8b	Qwen Team	0.8B	250000
qwen3:0.6b	Qwen Team	0.6B	40000
qwen3-2507-think:4b	Qwen Team	4B	250000
qwen3-omni:30b	Qwen Team	30B	32768
Programming & Agents
qwen3.6:35b	Qwen Team	35B	1000000
qwen-coder-next:80b	Qwen Team	80B	250000
qwen3-next:80b	Qwen Team	80B	250000
devstral-small-2:24b	Mistral AI & All Hands AI	24B	200000
rnj-1:8b	Essential AI	8B	32000
functiongemma:270m	Google	270M	32768
Vision & Multimodal
qwen3-vl:235b	Qwen Team	235B	200000
qwen3-vl:32b	Qwen Team	32B	250000
qwen3-vl:30b	Qwen Team	30B	250000
qwen3-vl:8b	Qwen Team	8B	250000
qwen3-vl:4b	Qwen Team	4B	250000
qwen3-vl:2b	Qwen Team	2B	250000
gemma4:31b	Google	31B	250000
gemma4:e2b	Google	31B (E2B)	128000
gemma4:e4b	Google	31B (E4B)	128000
granite3.2-vision:2b	IBM	2B	16384
deepseek-ocr	DeepSeek AI	3B	8192
Embedding
bge-m3:567m	BAAI	567M	8192
qwen3-embedding:4b	Qwen Team	4B	40000
qwen3-embedding:8b	Qwen Team	8B	40000
qwen3-embedding:0.6b	Qwen Team	0.6B	32768
granite-embedding:278m	IBM	278M	512
embeddinggemma:300m	Google	300M	2048
Reranking
nvidia/llama-nemotron-rerank-vl-1b-v2	NVIDIA	1B	4096	N.C.
qwen3-reranker:4b	Qwen Team	4B	4096	N.C.
qwen3-reranker:0.6b	Qwen Team	0.6B	4096	N.C.
bge-reranker-large	BAAI	335M	512	N.C.
Security
granite3-guardian:8b	IBM	8B	8192
granite3-guardian:2b	IBM	2B	8192
Translation
translategemma:27b	Google	27B	120000
translategemma:12b	Google	12B	128000
translategemma:4b	Google	4B	128000
Audio & Image
voxtral	Mistral AI	4B	32768	N.C.
z-image:16b	Community	16B		N.C.

Legend and explanation

Functionality or capacity supported by the model

Functionality or capability not supported by the model

* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)

* Quick Model capable of generating more than 50 tokens per second

Note on performance measures

The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants able to communicate in several languages with automatic detection and context maintenance

Recommended models

nemotron-3-super:120b
qwen3.6:27b
nemotron3-nano:30b
gpt-oss:120b

Analysis of long documents

Processing of large documents (>100 pages) with extraction of key information, summaries and answers to questions

Recommended models

nemotron-3-super:120b
qwen3.6:27b
qwen3-2507:235b

Programming and development

Code generation, optimisation and debugging in multiple languages, refactoring and test creation

Recommended models

qwen3.6:35b
qwen-coder-next:80b
devstral-small-2:24b
nemotron-3-super:120b

Visual analysis

Image and visual document processing, OCR, interpretation of graphs and tables

Recommended models

qwen3-vl:235b
gemma4:31b
deepseek-ocr
qwen3-vl:30b

Safety and compliance

Sensitive content filtering, jailbreak detection, RGPD/HDS compliance

Recommended models

granite3-guardian:8b
granite3-guardian:2b
mistral-small4:119b

Light deployments

Applications requiring a minimal footprint, low latency and low power consumption

Recommended models

qwen3.5:0.8b
qwen3-vl:2b
ministral-3:3b

RAG (Retrieval-Augmented Generation)

Complete semantic search, reordering and retrieval-enhanced generation pipelines

Recommended models

bge-m3:567m
nvidia/llama-nemotron-rerank-vl-1b-v2
qwen3.6:27b

Follow the development of the LLMaaS offering

Discover all our IA research papers

Model status Our research papers

Trusted AI

Chat & Reasoning

qwen3.6:27b

gpt-oss:120b

gpt-oss:20b

llama3.3:70b

gemma3:27b

nemotron-3-super:120b

nemotron3-nano:30b

nemotron-cascade:30b

glm-4.7-flash:30b

cogito:32b

elm tree 3:32b

Elm 3:7b

qwen3-2507:235b

mistral-small3.2:24b

mistral-small4:119b

ministral-3:14b

ministral-3:8b

ministral-3:3b

qwen3.5:9b

qwen3.5:4b

qwen3.5:0.8b

qwen3:0.6b

qwen3-2507-think:4b

qwen3-omni:30b

Programming & Agents

qwen3.6:35b

qwen-coder-next:80b

qwen3-next:80b

devstral-small-2:24b

rnj-1:8b

functiongemma:270m

Vision & Multimodal

qwen3-vl:235b

qwen3-vl:32b

qwen3-vl:30b

qwen3-vl:8b

qwen3-vl:4b

qwen3-vl:2b

gemma4:31b

gemma4:e2b

gemma4:e4b

granite3.2-vision:2b

deepseek-ocr

Embedding

bge-m3:567m

qwen3-embedding:4b

qwen3-embedding:8b

qwen3-embedding:0.6b

granite-embedding:278m

embeddinggemma:300m

Reranking

nvidia/llama-nemotron-rerank-vl-1b-v2

qwen3-reranker:4b

qwen3-reranker:0.6b

bge-reranker-large

Security

granite3-guardian:8b

granite3-guardian:2b

Translation

translategemma:27b

translategemma:12b

translategemma:4b

Audio & Image

voxtral

z-image:16b

Model comparison

Recommended use cases

Multilingual dialogue

Analysis of long documents

Programming and development

Visual analysis

Safety and compliance

Light deployments

RAG (Retrieval-Augmented Generation)