Trusted AI

Our offer Large Language Model as a Service (LLMaaS) gives you access to state-of-the-art language models, inferred using a qualified infrastructure SecNumCloudcertified HDS for hosting health data, and therefore sovereign, calculated in France. Benefit from high performance and optimum security for your AI applications. Your data remains strictly confidential and is not used or stored after processing.

Simple, transparent pricing

0.9 €

per million input tokens

4 €

per million tokens issued

21 €

per million reasoning tokens

0,01 €

per minute of transcribed audio *

Calculated on an infrastructure based in France, qualified SecNumcloud and HDS certified.

Note on the "Reasoning" award : This price applies specifically to models classified as reasoners" or "hybrids" (models with the "Reasoning" capability "ability activated) when reasoning is active and only on tokens linked to this activity. only on tokens linked to this activity.

* any minute started is counted

Large models

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

Combining remarkable efficiency with reduced computational resources, this model offers extensive multilingual capabilities covering 8 major languages (English, French, German, Spanish, Italian, Portuguese, Hindi and Thai). Its contextual window of 60,000 tokens enables in-depth analysis of complex documents and long conversations, while maintaining exceptional overall consistency. Optimised to minimise bias and problematic responses.

Parameters :

70 milliards

Context Size :

60000

Licence :

LLAMA 3.3 Community Licence

Energy efficiency :

11.75 kWh/Mtoken

CO₂ equivalent:

270.25 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Dialogue

Multilingual

Part of the Qwen3 series. This 235-billion-parameter model is designed to excel at deep reasoning, complex code generation, and nuanced understanding across broad contexts. Supports over 100 languages and hybrid modes of thinking.

Parameters :

235 milliards

Context Size :

60000

Licence :

Apache 2.0

Energy efficiency :

7.84 kWh/Mtoken

CO₂ equivalent:

180.32 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Multilingual

Very Large

DeepSeek-R1 671B is one of the largest open models, designed for the most demanding reasoning tasks and for generating text of exceptional quality.

Parameters :

671 milliards

Context Size :

16000

Licence :

MIT Licence

Energy efficiency :

11.11 kWh/Mtoken

CO₂ equivalent:

255.53 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Extremely Large

With unrivalled hardware efficiency, this model incorporates native multimodal capabilities and excels in multilingual performance in over 140 languages. Its impressive contextual window of 120,000 tokens makes it the ideal choice for analysing very large documents, document research and any application requiring understanding of extended contexts. Its optimised architecture allows flexible deployment without compromising the quality of results.

Parameters :

27 milliards

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

6.67 kWh/Mtoken

CO₂ equivalent:

153.41 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Broad context

FP8 version of the MoE Qwen3 30B-A3B model. Includes a "Thinking" mode for complex reasoning and a fast "Non-Thinking" mode. Enhanced capabilities in reasoning, code, maths and agent (tools/MCP). Supports over 100 languages. Ideal for an optimal performance/cost balance.

Parameters :

30 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

2.88 kWh/Mtoken

CO₂ equivalent:

66.24 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

MoE

Agent

Reasoning

Fast

Multilingual

DeepSeek-R1 70B is designed for complex reasoning and generation tasks.

Parameters :

70 milliards

Context Size :

32000

Licence :

MIT Licence

Energy efficiency :

12.56 kWh/Mtoken

CO₂ equivalent:

288.88 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Large

This 32-billion-parameter vision-language model is designed for the most demanding tasks, combining deep visual understanding with advanced reasoning capabilities to interact with graphical interfaces and analyse complex documents.

Parameters :

32 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

7.41 kWh/Mtoken

CO₂ equivalent:

170.43 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Reasoning

OCR

Visual location

Large

This 72-billion-parameter vision-language model is designed for the most demanding tasks, combining deep visual understanding with advanced reasoning capabilities to interact with graphical interfaces and analyse complex documents.

Parameters :

72 milliards

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

8.89 kWh/Mtoken

CO₂ equivalent:

204.47 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Reasoning

OCR

Visual location

Very Large

Specialised models

Our specialised models are optimised for specific tasks such as code generation, image analysis or structured data processing. They offer an excellent performance/cost ratio for targeted use cases.

Part of the Qwen3 series, trained on ~36T tokens. Enhanced reasoning, coding, maths and agent (tools/MCP) capabilities. Supports over 100 languages and hybrid ways of thinking.

Parameters :

14 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

3.88 kWh/Mtoken

CO₂ equivalent:

89.24 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Fast

Multilingual

This mid-sized model combines high-quality performance with operational efficiency, offering many of the capabilities of its larger 27B parameter brother in a lighter format. Ideal for deployments requiring quality and speed without the computational resources of larger models.

Parameters :

12 milliards

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

4.71 kWh/Mtoken

CO₂ equivalent:

108.33 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Fast

Background

This compact version of the Gemma 3 is optimised for resource-constrained deployments while maintaining outstanding performance for its size. Its efficient architecture enables rapid inference on standard hardware, ideal for applications requiring responsiveness and large-scale deployment. Despite its reduced size, it maintains multimodal capabilities for processing both text and images.

Parameters :

4 milliards

Context Size :

120000

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.58 kWh/Mtoken

CO₂ equivalent:

13.34 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Fast

Compact

Background

Efficient

This ultra-compact model represents the epitome of efficiency, enabling deployments in extremely resource-constrained environments. Despite its minimal size, it offers surprisingly basic capabilities for simple to moderate text tasks, with exceptional inference speed. It also supports integration with external tools via function calling.

Parameters :

1 milliards

Context Size :

32000

Licence :

Google Gemma Terms of Use

Energy efficiency :

0.15 kWh/Mtoken

CO₂ equivalent:

3.45 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Ultra-compact

Embedded

Efficient

Fast

Fine-tuned on synthetic instructions (ChatGPT, Gemma) and custom prompts. Not optimised for code/maths. Trained on a 4k context but retains the capacity of the base model for 32k. Model under development.

Parameters :

7 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

8.33 kWh/Mtoken

CO₂ equivalent:

191.59 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

French

Open-Source

Efficient

Despite its moderate size, this model delivers remarkable performance that rivals that of many much larger proprietary models. Its ingeniously optimised architecture makes it easy to deploy locally on a variety of infrastructures. With native multimodal capabilities, it can process both text and images without the need for external systems. Its Apache 2.0 licence offers maximum flexibility for commercial deployments and customisations, making it an ideal choice for businesses looking to balance performance and legal constraints.

Parameters :

24 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

3.72 kWh/Mtoken

CO₂ equivalent:

85.56 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Security

This version 3.2 retains the strengths of its predecessor while making targeted improvements. It is better able to follow precise instructions, produces fewer infinite generations or repetitive responses, and its function calling template is more robust. In other respects, its performance is equivalent to or slightly better than version 3.1.

Parameters :

24 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

3.72 kWh/Mtoken

CO₂ equivalent:

85.56 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Security

Instruction Following

Parameters :

24 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

5.28 kWh/Mtoken

CO₂ equivalent:

121.44 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Security

Instruction Following

Outstanding performance in code generation and algorithmic reasoning (60.6% LiveCodeBench Pass@1, 1936 Codeforces, 92.6% HumanEval+). Trained via RL (GRPO+) with progressive context extension (32k -> 64k). Transparent project (open code, dataset, logs). Allows integration of advanced code generation capabilities without relying on proprietary solutions.

Parameters :

14 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

4.12 kWh/Mtoken

CO₂ equivalent:

94.76 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Programming

Reasoning

Open-Source

Mathematics

Fast

This compact model achieves the remarkable feat of matching the performance of much larger models across a wide range of visual comprehension tasks. Its ability to directly interpret the visual content of documents - text, tables, graphs and diagrams - without going through a traditional OCR stage represents a significant advance in terms of efficiency and accuracy. This integrated approach significantly reduces recognition errors and provides a more contextual and nuanced understanding of visual content.

Parameters :

2 milliards

Context Size :

16384

Licence :

Apache 2.0

Energy efficiency :

0.69 kWh/Mtoken

CO₂ equivalent:

15.87 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Security

Compact

Efficient

This version 8B of the Granite 3.3 model offers significant gains on generic benchmarks (AlpacaEval-2.0, Arena-Hard) and improvements in mathematics, coding and instruction tracking. It supports 12 languages, Fill-in-the-Middle (FIM) for code, Thinking mode for structured reflection, and function calling. Apache 2.0 licence. Ideal for general tasks and integration into AI assistants.

Parameters :

8 milliards

Context Size :

60000

Licence :

Apache 2.0

Energy efficiency :

1.11 kWh/Mtoken

CO₂ equivalent:

25.53 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Security

Efficient

Compact version of Granite 3.3 (2B parameters) offering the same improvements in reasoning, instruction-following, mathematics and coding as version 8B. Supports 12 languages, Fill-in-the-Middle (FIM), Thinking mode, and function calling. Apache 2.0 licence. Excellent choice for lightweight deployments requiring extensive contextual and reasoning capabilities.

Parameters :

2 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

0.74 kWh/Mtoken

CO₂ equivalent:

17.02 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Security

Efficient

Ideal for general use requiring longer thought processing and greater accuracy. Useful for legal research, financial forecasting, software development and creative storytelling. Solves multi-step challenges where transparency and accuracy are essential.

Parameters :

24 milliards

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

5.33 kWh/Mtoken

CO₂ equivalent:

122.59 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Multilingual

The MoE (Mixture-of-Experts) architecture of this model represents a significant advance in the optimisation of language models, enabling performance comparable to that of much larger models to be achieved while maintaining a considerably smaller memory footprint. This innovative approach dynamically activates only the relevant parts of the network for each specific task, ensuring remarkable energy and computational efficiency without compromising on the quality of results.

Parameters :

3 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

0.45 kWh/Mtoken

CO₂ equivalent:

10.35 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Security

Fast

MoE

Efficiency

Efficient

With excellent logical reasoning capabilities and deep semantic understanding, this model stands out for its ability to grasp the subtleties and implications of complex texts. Its design emphasises coherent reasoning and analytical precision, making it particularly well-suited to applications requiring careful, contextual analysis of information. Its moderate size allows flexible deployment while maintaining high quality performance across a wide range of demanding analytical tasks.

Parameters :

14 milliards

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

4.4 kWh/Mtoken

CO₂ equivalent:

101.2 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Understanding

Analysis

Fast

This extended version of the Cogito model takes reasoning and comprehension capabilities even further, offering unrivalled depth of analysis for the most complex applications. Its sophisticated architectural design enables it to tackle multi-step reasoning with rigour and precision, while maintaining remarkable overall consistency. Ideal for mission-critical applications requiring artificial intelligence capable of nuanced reasoning and deep contextual understanding comparable to the analyses of human experts in specialist fields.

Parameters :

32 milliards

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

8.25 kWh/Mtoken

CO₂ equivalent:

189.75 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Understanding

Analysis

Part of the Qwen3 series, trained on a vast corpus of data. This 32-billion-parameter model is designed to excel at complex tasks, support over 100 languages and incorporate hybrid modes of thinking for improved performance.

Parameters :

32 milliards

Context Size :

40000

Licence :

Apache 2.0

Energy efficiency :

7.41 kWh/Mtoken

CO₂ equivalent:

170.43 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Multilingual

Background

This model uses an innovative RL approach with outcome-based rewards (accuracy checkers for maths, code execution for coding) and multi-step training to improve general abilities without degrading specialised performance. It includes agent capabilities for using tools and adapting reasoning. Apache 2.0 licence.

Parameters :

32 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

7.54 kWh/Mtoken

CO₂ equivalent:

173.42 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Coding

Mathematics

Representing an optimal balance between performance and efficiency, this compact version of the DeepSeek-R1 retains the key reasoning and analysis qualities of its larger counterpart, while enabling lighter and more flexible deployment. Its carefully optimised design ensures quality results across a wide range of tasks, while minimising computational resource requirements. This combination makes it the ideal choice for applications requiring agile deployment without major compromise on core capabilities.

Parameters :

14 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

4.26 kWh/Mtoken

CO₂ equivalent:

97.98 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Compact

Versatile

Fast

This mid-range version of the DeepSeek-R1 model intelligently combines power and efficiency, delivering significantly improved performance over the 14B version while maintaining a lighter footprint than the 70B version. This strategic position in the range makes it a particularly attractive option for deployments requiring advanced reasoning capabilities without the hardware requirements of larger models. Its versatility enables it to excel at a wide range of tasks, from text analysis to structured content generation.

Parameters :

32 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

7.99 kWh/Mtoken

CO₂ equivalent:

183.77 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Versatile

Offers the reasoning capabilities of the Cogito family in a very lightweight format (3 billion parameters), ideal for embedded deployments or CPU environments.

Parameters :

3 milliards

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

0.61 kWh/Mtoken

CO₂ equivalent:

14.03 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Compact

Embedded

Efficient

Fast

Designed to generate dense vector representations of text, this model is optimised for efficiency and performance in semantic similarity, clustering and classification tasks. Its small size makes it ideal for large-scale deployments.

Parameters :

0.278 milliards

Context Size :

512

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Fonctions) :

Vision (Images) :

Embedding

Compact

Semantics

Efficient

Lightweight version of the Guardian family, trained to identify and filter harmful content, bias and security risks in text interactions. Offers robust protection with a small computational footprint. Context limited to 8k tokens.

Parameters :

2 milliards

Context Size :

8192

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Fonctions) :

Vision (Images) :

Security

Compliance

Compact

Filtering

Efficient

Mid-sized model in the Guardian family, providing more in-depth security analysis than version 2B. Ideal for applications requiring rigorous content monitoring and strict compliance.

Parameters :

8 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

N.C.

CO₂ equivalent:

N.C.

Tools (Fonctions) :

Vision (Images) :

Security

Compliance

Filtering

The smallest model in the Qwen 2.5 series, offering basic language processing capabilities with a minimal footprint. Ideal for very simple tasks on IoT or mobile devices.

Parameters :

0.5 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

0.1 kWh/Mtoken

CO₂ equivalent:

2.3 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Ultra-compact

Fast

Embedded

Efficient

Slightly larger model than version 0.5B, offering enhanced capabilities while remaining highly efficient. Suitable for mobile or embedded applications requiring a little more power.

Parameters :

1.5 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

0.33 kWh/Mtoken

CO₂ equivalent:

7.59 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Compact

Fast

Embedded

Efficient

Offers strong multilingual capabilities and general understanding in a 14B format. Suitable for a wide range of applications requiring a reliable model without the requirements of very large models.

Parameters :

14 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

4.33 kWh/Mtoken

CO₂ equivalent:

99.59 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Versatile

Multilingual

Fast

Version 32B of Qwen 2.5, providing improved performance over version 14B, particularly in reasoning and following complex instructions, while remaining lighter than the 72B model.

Parameters :

32 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

8.51 kWh/Mtoken

CO₂ equivalent:

195.73 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Versatile

Multilingual

Reasoning

Offers a good compromise between the capabilities of the 1.5B and 14B models. Ideal for applications requiring a good general understanding in a light, fast format.

Parameters :

3 milliards

Context Size :

32000

Licence :

MIT licence

Energy efficiency :

0.52 kWh/Mtoken

CO₂ equivalent:

11.96 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Compact

Fast

Versatile

Efficient

Offers a good compromise between the capabilities of ultra-compact models and larger models. Ideal for applications requiring good general understanding in a light, fast format.

Parameters :

0.6 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

0.15 kWh/Mtoken

CO₂ equivalent:

3.45 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Compact

Fast

Versatile

Efficient

Slightly larger model than version 0.6B, offering enhanced capabilities while remaining highly efficient. Suitable for mobile or embedded applications requiring a little more power.

Parameters :

1.7 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

0.38 kWh/Mtoken

CO₂ equivalent:

8.74 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Compact

Fast

Embedded

Efficient

This compact version of the Qwen3 model is optimised for resource-constrained deployments while maintaining outstanding performance for its size. Its efficient architecture enables rapid inference on standard hardware.

Parameters :

4 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

0.68 kWh/Mtoken

CO₂ equivalent:

15.64 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Compact

Efficient

Version 8B of Qwen3, offering enhanced reasoning, coding, maths and agent capabilities. Supports over 100 languages and hybrid ways of thinking.

Parameters :

8 milliards

Context Size :

32000

Licence :

Apache 2.0

Energy efficiency :

1.01 kWh/Mtoken

CO₂ equivalent:

23.23 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Agent

Multilingual

Efficient

Qwen2.5-VL is Qwen's new flagship vision-language model, marking a significant advance on Qwen2-VL. Key features - Visual understanding (common objects, text, graphics, icons, layouts). Visual agent capabilities (reasoning, dynamic direction of tools for computer/telephone use). Precise visual localisation (bounding boxes, points, stable JSON output). Structured output generation (invoices, forms, tables). Qwen2.5-VL-3B outperforms even Qwen2-VL version 7B.

Parameters :

3.8 milliards

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

0.51 kWh/Mtoken

CO₂ equivalent:

11.73 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Reasoning

Fast

Efficient

OCR

Visual location

Edge AI

Parameters :

8.3 milliards

Context Size :

128000

Licence :

Apache 2.0

Energy efficiency :

0.95 kWh/Mtoken

CO₂ equivalent:

21.85 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Vision

Agent

Reasoning

Efficient

OCR

Visual location

Foundation-Sec-8B model (Llama-3.1-FoundationAI-SecurityLLM-base-8B) based on Llama-3.1-8B, pre-trained on a cybersecurity corpus. Designed for threat detection, vulnerability assessment, security automation, etc. Optimised for local deployment. Context of 16k tokens.

Parameters :

8 milliards

Context Size :

16384

Licence :

Apache 2.0

Energy efficiency :

1.59 kWh/Mtoken

CO₂ equivalent:

36.57 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Security

Compact

Devstral is an agentic LLM for software engineering tasks. It excels at using tools to explore code bases, modify multiple files and feed software engineering agents. It is refined from Mistral Small 3.1, featuring a long context window of up to 128k tokens.

Parameters :

24 milliards

Context Size :

120000

Licence :

Apache 2.0

Energy efficiency :

5.86 kWh/Mtoken

CO₂ equivalent:

134.78 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Programming

Open-Source

Background

This version 8B is positioned between compact and larger models, offering robust reasoning capabilities for a wide range of analytical applications without requiring the resources of larger models.

Parameters :

8 milliards

Context Size :

32000

Licence :

LLAMA 3.2 Community Licence

Energy efficiency :

1.11 kWh/Mtoken

CO₂ equivalent:

25.53 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Agent

Reasoning

Versatile

Efficient

Based on the Llama 3.1 architecture, this 8B model is an excellent starting point for general tasks, offering good quality generation and comprehension in an efficient format.

Parameters :

8 milliards

Context Size :

32000

Licence :

LLAMA 3.1 Community Licence

Energy efficiency :

1.08 kWh/Mtoken

CO₂ equivalent:

24.84 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Versatile

Efficient

This model is specifically trained to excel at tasks that require multi-step logical reasoning, making it particularly good at maths, logic and coding problems.

Parameters :

14 milliards

Context Size :

32000

Licence :

MIT Licence

Energy efficiency :

3.71 kWh/Mtoken

CO₂ equivalent:

85.33 CO₂e/Mtoken

Tools (Fonctions) :

Vision (Images) :

Reasoning

Mathematics

Programming

Fast

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Model	Publisher	Parameters	Context (k tokens)	Energy efficiency *
Large models
Llama 3.3 70B	Meta	70B	60000
Qwen3 235B	Qwen Team	235B	60000
DeepSeek-R1 671B	DeepSeek AI	671B	16000
Gemma 3 27B	Google	27B	120000
Qwen3 30B-A3B FP8	Qwen Team	30B-A3B	32000
DeepSeek-R1 70B	DeepSeek AI	70B	32000
Qwen2.5-VL 32B	Qwen Team	32B	120000
Qwen2.5-VL 72B	Qwen Team	72B	128000
Specialised models
Qwen3 14B	Qwen Team	14B	32000
Gemma 3 12B	Google	12B	120000
Gemma 3 4B	Google	4B	120000
Gemma 3 1B	Google	1B	32000
Lucie-7B-Instruct	OpenLLM-France	7B	32000
Mistral Small 3.1	Mistral AI	24B	120000
Mistral Small 3.2	Mistral AI	24B	120000
Mistral Small 3.2	Mistral AI	24B	120000
DeepCoder	Agentica x Together AI	14B	32000
Granite 3.2 Vision	IBM	2B	16384
Granite 3.3 8B	IBM	8B	60000
Granite 3.3 2B	IBM	2B	120000
Magistral 24B	Mistral AI	24B	40000
Granite 3.1 MoE	IBM	3B	32000
cogito:14b	Deep Cogito	14B	32000
Cogito 32B	Deep Cogito	32B	32000
Qwen3 32B	Qwen Team	32B	40000
QwQ-32B	Qwen Team	32B	32000
DeepSeek-R1 14B	DeepSeek AI	14B	32000
DeepSeek-R1 32B	DeepSeek AI	32B	32000
Cogito 3B	Deep Cogito	3B	32000
Granite Embedding	IBM	278M	512	N.C.
Granite 3 Guardian 2B	IBM	2B	8192	N.C.
Granite 3 Guardian 8B	IBM	8B	32000	N.C.
Qwen 2.5 0.5B	Qwen Team	0.5B	32000
Qwen 2.5 1.5B	Qwen Team	1.5B	32000
Qwen 2.5 14B	Qwen Team	14B	32000
Qwen 2.5 32B	Qwen Team	32B	32000
Qwen 2.5 3B	Qwen Team	3B	32000
Qwen3 0.6b	Qwen Team	0.6B	32000
Qwen3 1.7b	Qwen Team	1.7B	32000
Qwen3 4b	Qwen Team	4B	32000
Qwen3 8b	Qwen Team	8B	32000
Qwen2.5-VL 3B	Qwen Team	3.8B	128000
Qwen2.5-VL 7B	Qwen Team	7B (8.3B)	128000
Foundation-Sec-8B	Foundation AI - Cisco	8B	16384
devstral 24B	Mistral AI & All Hands AI	24B	120000
Cogito 8B	Deep Cogito	8B	32000
Llama 3.1 8B	Meta	8B	32000
Phi-4 Reasoning 14B	Microsoft	14B	32000

Legend and explanation

Functionality or capacity supported by the model

Functionality or capability not supported by the model

* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)

* Quick Model capable of generating more than 50 tokens per second

Note on performance measures

The speed values (tokens/s) represent performance targets under real-life conditions. Energy consumption (kWh/Mtoken) is calculated by calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), and then converted to kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute as an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants capable of communicating in several languages, with automatic detection, context maintenance throughout the conversation and understanding of linguistic specificities.

Recommended models

Llama 3.3
Mistral Small 3.1
Qwen 2.5
Granite 3.3

Analysis of long documents

Processing of large documents (>100 pages), maintaining context throughout the text, extracting key information, generating relevant summaries and answering specific content questions

Recommended models

Gemma 3
DeepSeek-R1
Granite 3.3

Programming and development

Generating and optimising code in multiple languages, debugging, refactoring, developing complete functionalities, understanding complex algorithmic implementations and creating unit tests

Recommended models

DeepCoder
QwQ
DeepSeek-R1
Granite 3.3
Devstral

Visual analysis

Direct processing of images and visual documents without OCR pre-processing, interpretation of technical diagrams, graphs, tables, drawings and photos with generation of detailed textual explanations of the visual content

Recommended models

Granite 3.2 Vision
Mistral Small 3.1
Gemma 3
Qwen2.5-VL

Safety and compliance

Applications requiring specific security capabilities; filtering of sensitive content, traceability of reasoning, RGPD/HDS verification, risk minimisation, vulnerability analysis and compliance with sectoral regulations

Recommended models

Granite Guardian
Granite 3.3
Devstral
Mistral Small 3.1
Magistral 24b
Foundation-Sec-8B

Light and on-board deployments

Applications requiring a minimal resource footprint, deployment on capacity-constrained devices, real-time inference on standard CPUs and integration into embedded or IoT systems

Recommended models

Gemma 3
Granite 3.1 MoE
Granite Guardian
Granite 3.3

Large models

Llama 3.3 70B

Qwen3 235B

DeepSeek-R1 671B

Gemma 3 27B

Qwen3 30B-A3B FP8

DeepSeek-R1 70B

Qwen2.5-VL 32B

Qwen2.5-VL 72B

Specialised models

Qwen3 14B

Gemma 3 12B

Gemma 3 4B

Gemma 3 1B

Lucie-7B-Instruct

Mistral Small 3.1

Mistral Small 3.2

Mistral Small 3.2

DeepCoder

Granite 3.2 Vision

Granite 3.3 8B

Granite 3.3 2B

Magistral 24B

Granite 3.1 MoE

cogito:14b

Cogito 32B

Qwen3 32B

QwQ-32B

DeepSeek-R1 14B

DeepSeek-R1 32B

Cogito 3B

Granite Embedding

Granite 3 Guardian 2B

Granite 3 Guardian 8B

Qwen 2.5 0.5B

Qwen 2.5 1.5B

Qwen 2.5 14B

Qwen 2.5 32B

Qwen 2.5 3B

Qwen3 0.6b

Qwen3 1.7b

Qwen3 4b

Qwen3 8b

Qwen2.5-VL 3B

Qwen2.5-VL 7B

Foundation-Sec-8B

devstral 24B

Cogito 8B

Llama 3.1 8B

Phi-4 Reasoning 14B

Model comparison

Recommended use cases

Multilingual dialogue

Analysis of long documents

Programming and development

Visual analysis

Safety and compliance

Light and on-board deployments