Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing
0.9 €
per million input tokens
4 €
per million tokens issued
21 €
per million reasoning tokens
0,01 €
per minute of transcribed audio *
Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.
Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.
* any minute started is counted

Large models

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

38 tokens/second

gpt-oss:120b

OpenAI's state-of-the-art open-weight language model, offering solid performance with a flexible Apache 2.0 licence.
A Mixture-of-Experts (MoE) model with 120 billion parameters and around 5.1 billion active parameters. It offers a configurable reasoning effort and full access to the chain of thought.
15 tokens/second

llama3.3:70b

State-of-the-art multilingual model developed by Meta, designed to excel at natural dialogue, complex reasoning and nuanced understanding of instructions.
Combining remarkable efficiency with reduced computational resources, this model offers extensive multilingual capabilities covering 8 major languages (English, French, German, Spanish, Italian, Portuguese, Hindi and Thai). Its contextual window of 120,000 tokens enables in-depth analysis of complex documents and long conversations, while maintaining exceptional overall consistency. Optimised to minimise bias and problematic responses.
17 tokens/second

qwen3:235b

High-volume model from the new Qwen3 generation, offering extended capabilities for the most complex tasks.
Part of the Qwen3 series. This 235-billion-parameter model is designed to excel at deep reasoning, complex code generation, and nuanced understanding across broad contexts. Supports over 100 languages and hybrid modes of thinking.
12 tokens/second

deepseek-r1:671b

Extremely large DeepSeek AI model, designed for the ultimate in reasoning and generation.
DeepSeek-R1 671B is one of the largest open models, designed for the most demanding reasoning tasks and for generating text of exceptional quality.
20 tokens/second

gemma3:27b

Google's revolutionary model offers an optimum balance between power and efficiency, with an exceptional performance/cost ratio for demanding professional applications.
With unrivalled hardware efficiency, this model incorporates native multimodal capabilities and excels in multilingual performance in over 140 languages. Its impressive contextual window of 120,000 tokens makes it the ideal choice for analysing very large documents, document research and any application requiring understanding of extended contexts. Its optimised architecture allows flexible deployment without compromising the quality of results.
80 tokens/second

qwen3-coder:30b

MoE model optimised for software engineering tasks with a very long context.
Advanced agentic capabilities for software engineering tasks, native support for a 250K token context, pre-trained on 7.5T tokens with a high code ratio, and optimised by reinforcement learning to improve code execution rates.
80 tokens/second

qwen3-2507-think:30b-a3b

Advanced model from the Qwen3 family, optimised for deep reasoning and extended contexts.
The Qwen3-30B-A3B-Thinking-2507 model offers significantly improved performance on reasoning tasks, following instructions and using tools. It natively supports a 256k token context window.
90 tokens/second

qwen3-2507:30b-a3b

Enhanced version of Qwen3-30B's non-thinking mode, with improved general capabilities, knowledge coverage and user alignment.
Significant improvements in following instructions, reasoning, reading comprehension, mathematics, coding and tool use. Native context of 250k tokens.
50 tokens/second

qwen3:30b-a3b

The latest generation of Qwen models, offering significant improvements in terms of training data, architecture and optimisation.
Pre-trained on 36T tokens in 119 languages. MoE (Mixture-of-Experts) model with 128 experts, including 8 activated per token.
21 tokens/second

deepseek-r1:70b

DeepSeek AI Model 70B
DeepSeek-R1 70B is designed for complex reasoning and generation tasks.
18 tokens/second

qwen2.5vl:32b

The most powerful version of the Qwen2.5-VL series, offering cutting-edge visual understanding and agentique capabilities.
This 32-billion-parameter vision-language model is designed for the most demanding tasks, combining deep visual understanding with advanced reasoning capabilities to interact with graphical interfaces and analyse complex documents.
15 tokens/second

qwen2.5vl:72b

The most powerful version of the Qwen2.5-VL series, offering state-of-the-art visual understanding and agentique capabilities for the most demanding tasks.
This 72-billion-parameter vision-language model is designed for the most demanding tasks, combining deep visual understanding with advanced reasoning capabilities to interact with graphical interfaces and analyse complex documents.

Specialised models

Our specialised models are optimised for specific tasks such as code generation, image analysis or structured data processing. They offer an excellent performance/cost ratio for targeted use cases.

embeddinggemma:300m

Google's state-of-the-art embedding model, optimised for its size, ideal for search and semantic retrieval tasks.
Built on Gemma 3, this model produces vector representations of text for classification, clustering and similarity search. Trained on over 100 languages, its small size makes it perfect for resource-constrained environments.
57 tokens/second

gpt-oss:20b

OpenAI's open-weight language model, optimised for efficiency and deployment on consumer hardware.
A Mixture-of-Experts (MoE) model with 21 billion parameters and 3.6 billion active parameters. It offers configurable reasoning effort and agent capabilities.
40 tokens/second

qwen3:14b

New-generation dense Qwen3 (14B) model, offering equivalent performance to Qwen2.5 32B with improved efficiency.
Part of the Qwen3 series, trained on ~36T tokens. Enhanced reasoning, coding, maths and agent (tools/MCP) capabilities. Supports over 100 languages and hybrid ways of thinking.
56 tokens/second

gemma3:12b

An intermediate version of the Gemma 3 model offering an excellent balance between performance and efficiency.
This mid-sized model combines high-quality performance with operational efficiency, offering many of the capabilities of its larger 27B parameter brother in a lighter format. Ideal for deployments requiring quality and speed without the computational resources of larger models.
57 tokens/second

gemma3:4b

Google's compact model offering excellent performance in a lightweight, cost-effective format.
This compact version of the Gemma 3 is optimised for resource-constrained deployments while maintaining outstanding performance for its size. Its efficient architecture enables rapid inference on standard hardware, ideal for applications requiring responsiveness and large-scale deployment. Despite its reduced size, it maintains multimodal capabilities for processing both text and images.
112 tokens/second

gemma3:1b

Ultra-lightweight micro-model designed for deployment on very low-resource devices.
This ultra-compact model represents the epitome of efficiency, enabling deployments in extremely resource-constrained environments. Despite its minimal size, it offers surprisingly basic capabilities for simple to moderate text tasks, with exceptional inference speed. It also supports integration with external tools via function calling.
4 tokens/second

lucie-instruct:7b

Open-source multilingual causal model (7B), fine-tuned from Lucie-7B. Optimised for French.
Fine-tuned on synthetic instructions (ChatGPT, Gemma) and custom prompts. Not optimised for code/maths. Trained on a 4k context but retains the capacity of the base model for 32k. Model under development.
35 tokens/second

mistral-small3.1:24b

Mistral AI's compact and responsive model, specially designed to provide fluid and relevant conversational assistance with optimum response speed.
Despite its moderate size, this model delivers remarkable performance that rivals that of many much larger proprietary models. Its ingeniously optimised architecture makes it easy to deploy locally on a variety of infrastructures. With native multimodal capabilities, it can process both text and images without the need for external systems. Its Apache 2.0 licence offers maximum flexibility for commercial deployments and customisations, making it an ideal choice for businesses looking to balance performance and legal constraints.
35 tokens/second

mistral-small3.2:24b

Minor update to Mistral Small 3.1, improving instruction tracking, function calling robustness and reducing repetition errors.
This version 3.2 retains the strengths of its predecessor while making targeted improvements. It is better able to follow precise instructions, produces fewer infinite generations or repetitive responses, and its function calling template is more robust. In other respects, its performance is equivalent to or slightly better than version 3.1.
64 tokens/second

deepcoder:14b

Open source AI model (14B) by Together AI & Agentica, a credible alternative to proprietary models for code generation.
Outstanding performance in code generation and algorithmic reasoning (60.6% LiveCodeBench Pass@1, 1936 Codeforces, 92.6% HumanEval+). Trained via RL (GRPO+) with progressive context extension (32k -> 64k). Transparent project (open code, dataset, logs). Allows integration of advanced code generation capabilities without relying on proprietary solutions.
48 tokens/second

granite3.2-vision:2b

IBM's revolutionary compact computer vision model, capable of directly analysing and understanding visual documents without the need for intermediate OCR technologies.
This compact model achieves the remarkable feat of matching the performance of much larger models across a wide range of visual comprehension tasks. Its ability to directly interpret the visual content of documents - text, tables, graphs and diagrams - without going through a traditional OCR stage represents a significant advance in terms of efficiency and accuracy. This integrated approach significantly reduces recognition errors and provides a more contextual and nuanced understanding of visual content.
30 tokens/second

granite3.3:8b

Granite 8B model fine-tuned by IBM for improved reasoning and instruction tracking, with a 128k token context.
This version 8B of the Granite 3.3 model offers significant gains on generic benchmarks (AlpacaEval-2.0, Arena-Hard) and improvements in mathematics, coding and instruction tracking. It supports 12 languages, Fill-in-the-Middle (FIM) for code, Thinking mode for structured reflection, and function calling. Apache 2.0 licence. Ideal for general tasks and integration into AI assistants.
45 tokens/second

granite3.3:2b

Granite 2B model fine-tuned by IBM, optimised for reasoning and instruction tracking, with a context of 128k tokens.
Compact version of Granite 3.3 (2B parameters) offering the same improvements in reasoning, instruction-following, mathematics and coding as version 8B. Supports 12 languages, Fill-in-the-Middle (FIM), Thinking mode, and function calling. Apache 2.0 licence. Excellent choice for lightweight deployments requiring extensive contextual and reasoning capabilities.
25 tokens/second

magistral:24b

Mistral AI's first reasoning model, excelling in domain-specific reasoning, transparent and multilingual.
Ideal for general use requiring longer thought processing and greater accuracy. Useful for legal research, financial forecasting, software development and creative storytelling. Solves multi-step challenges where transparency and accuracy are essential.
74 tokens/second

granite3.1-moe:3b

Innovative IBM model using the Mixture-of-Experts (MoE) architecture to deliver exceptional performance while drastically optimising the use of computational resources.
The MoE (Mixture-of-Experts) architecture of this model represents a significant advance in the optimisation of language models, enabling performance comparable to that of much larger models to be achieved while maintaining a considerably smaller memory footprint. This innovative approach dynamically activates only the relevant parts of the network for each specific task, ensuring remarkable energy and computational efficiency without compromising on the quality of results.
60 tokens/second

cogito:14b

Deep Cogito model specifically designed to excel at deep reasoning and nuanced contextual understanding tasks, ideal for sophisticated analytical applications.
With excellent logical reasoning capabilities and deep semantic understanding, this model stands out for its ability to grasp the subtleties and implications of complex texts. Its design emphasises coherent reasoning and analytical precision, making it particularly well-suited to applications requiring careful, contextual analysis of information. Its moderate size allows flexible deployment while maintaining high quality performance across a wide range of demanding analytical tasks.
32 tokens/second

cogito:32b

Advanced version of the Cogito model, offering considerably enhanced reasoning and analysis capabilities, designed for the most demanding applications in terms of analytical artificial intelligence.
This extended version of the Cogito model takes reasoning and comprehension capabilities even further, offering unrivalled depth of analysis for the most complex applications. Its sophisticated architectural design enables it to tackle multi-step reasoning with rigour and precision, while maintaining remarkable overall consistency. Ideal for mission-critical applications requiring artificial intelligence capable of nuanced reasoning and deep contextual understanding comparable to the analyses of human experts in specialist fields.
18 tokens/second

qwen3:32b

Powerful next-generation Qwen3 model, offering advanced reasoning, code and agentic capabilities, with extended context.
Part of the Qwen3 series, trained on a vast corpus of data. This 32-billion-parameter model is designed to excel at complex tasks, support over 100 languages and incorporate hybrid modes of thinking for improved performance.
35 tokens/second

qwq:32b

32-billion-parameter model enhanced by reinforcement learning (RL) to excel at reasoning, coding, maths and agent tasks.
This model uses an innovative RL approach with outcome-based rewards (accuracy checkers for maths, code execution for coding) and multi-step training to improve general abilities without degrading specialised performance. It includes agent capabilities for using tools and adapting reasoning. Apache 2.0 licence.
62 tokens/second

deepseek-r1:14b

A compact, efficient version of the DeepSeek-R1 model, offering an excellent compromise between performance and light weight for deployments requiring flexibility and responsiveness.
Representing an optimal balance between performance and efficiency, this compact version of the DeepSeek-R1 retains the key reasoning and analysis qualities of its larger counterpart, while enabling lighter and more flexible deployment. Its carefully optimised design ensures quality results across a wide range of tasks, while minimising computational resource requirements. This combination makes it the ideal choice for applications requiring agile deployment without major compromise on core capabilities.
33 tokens/second

deepseek-r1:32b

An intermediate version of the DeepSeek-R1 model, offering a strategic balance between the advanced capabilities of the 70B version and the efficiency of the 14B version, for optimum versatility and performance.
This mid-range version of the DeepSeek-R1 model intelligently combines power and efficiency, delivering significantly improved performance over the 14B version while maintaining a lighter footprint than the 70B version. This strategic position in the range makes it a particularly attractive option for deployments requiring advanced reasoning capabilities without the hardware requirements of larger models. Its versatility enables it to excel at a wide range of tasks, from text analysis to structured content generation.
55 tokens/second

cogito:3b

Compact version of the Cogito model, optimised for reasoning on devices with limited resources.
Offers the reasoning capabilities of the Cogito family in a very lightweight format (3 billion parameters), ideal for embedded deployments or CPU environments.

granite-embedding:278m

IBM's ultra-light embedding model for semantic search and classification.
Designed to generate dense vector representations of text, this model is optimised for efficiency and performance in semantic similarity, clustering and classification tasks. Its small size makes it ideal for large-scale deployments.

granite3-guardian:2b

IBM's compact model specialises in security and compliance, detecting risks and inappropriate content.
Lightweight version of the Guardian family, trained to identify and filter harmful content, bias and security risks in text interactions. Offers robust protection with a small computational footprint. Context limited to 8k tokens.

granite3-guardian:8b

IBM model specialising in security and compliance, offering advanced risk detection capabilities.
Mid-sized model in the Guardian family, providing more in-depth security analysis than version 2B. Ideal for applications requiring rigorous content monitoring and strict compliance.
162 tokens/second

qwen2.5:0.5b

Ultra-lightweight micro-model from the Qwen 2.5 family, designed for maximum efficiency on constrained equipment.
The smallest model in the Qwen 2.5 series, offering basic language processing capabilities with a minimal footprint. Ideal for very simple tasks on IoT or mobile devices.
102 tokens/second

qwen2.5:1.5b

Very compact model from the Qwen 2.5 family, offering a good performance/size balance for light deployments.
Slightly larger model than version 0.5B, offering enhanced capabilities while remaining highly efficient. Suitable for mobile or embedded applications requiring a little more power.
61 tokens/second

qwen2.5:14b

Versatile, medium-sized model from the Qwen 2.5 family, good balance between performance and resources.
Offers strong multilingual capabilities and general understanding in a 14B format. Suitable for a wide range of applications requiring a reliable model without the requirements of very large models.
31 tokens/second

qwen2.5:32b

Powerful model from the Qwen 2.5 family, offering advanced understanding and generation capabilities.
Version 32B of Qwen 2.5, providing improved performance over version 14B, particularly in reasoning and following complex instructions, while remaining lighter than the 72B model.
64 tokens/second

qwen2.5:3b

Compact, efficient model from the Qwen 2.5 family, suitable for general tasks with limited resources.
Offers a good compromise between the capabilities of the 1.5B and 14B models. Ideal for applications requiring a good general understanding in a light, fast format.
112 tokens/second

qwen3:0.6b

Compact, efficient model from the Qwen3 family, suitable for general-purpose tasks with limited resources.
Offers a good compromise between the capabilities of ultra-compact models and larger models. Ideal for applications requiring good general understanding in a light, fast format.
88 tokens/second

qwen3:1.7b

A very compact model in the Qwen3 family, offering a good balance between performance and size for light deployments.
Slightly larger model than version 0.6B, offering enhanced capabilities while remaining highly efficient. Suitable for mobile or embedded applications requiring a little more power.
49 tokens/second

qwen3:4b

Compact model in the Qwen3 family, offering excellent performance in a lightweight, cost-effective format.
This compact version of the Qwen3 model is optimised for resource-constrained deployments while maintaining outstanding performance for its size. Its efficient architecture enables rapid inference on standard hardware.
70 tokens/second

qwen3-2507-think:4b

Qwen3-4B model optimised for reasoning, with improved performance on logic, maths, science and code tasks, and extended context to 250K tokens.
This 'Thinking' version has an increased thought length, making it ideal for highly complex reasoning tasks. It also offers general improvements in following instructions, using tools and generating text.
70 tokens/second

qwen3-2507:4b

Updated version of Qwen3-4B's non-thinking mode, with significant improvements in overall capabilities, extended knowledge coverage and better alignment with user preferences.
Significant improvements in following instructions, logical reasoning, reading comprehension, mathematics, coding and tool use. Native context of 250k tokens.
33 tokens/second

qwen3:8b

Qwen3 8B model offering a good balance between performance and efficiency for general tasks.
Version 8B of Qwen3, offering enhanced reasoning, coding, maths and agent capabilities. Supports over 100 languages and hybrid ways of thinking.
65 tokens/second

qwen2.5vl:3b

Compact Vision-Language model, a high-performance solution for edge AI.
Qwen2.5-VL is Qwen's new flagship vision-language model, marking a significant advance on Qwen2-VL. Key features - Visual understanding (common objects, text, graphics, icons, layouts). Visual agent capabilities (reasoning, dynamic direction of tools for computer/telephone use). Precise visual localisation (bounding boxes, points, stable JSON output). Structured output generation (invoices, forms, tables). Qwen2.5-VL-3B outperforms even Qwen2-VL version 7B.
35 tokens/second

qwen2.5vl:7b

High-performance Vision-Language model, outperforming GPT-4o-mini on certain tasks.
Qwen2.5-VL is Qwen's new flagship vision-language model, marking a significant advance on Qwen2-VL. Key features - Visual understanding (common objects, text, graphics, icons, layouts). Visual agent capabilities (reasoning, dynamic direction of tools for computer/telephone use). Precise visual localisation (bounding boxes, points, stable JSON output). Structured output generation (invoices, forms, tables). Qwen2.5-VL-7B-Instruct outperforms GPT-4o-mini in many tasks, and is particularly good at understanding documents and diagrams.
21 tokens/second

hf.co/roadus/Foundation-Sec-8B-Q4_K_M-GGUF:Q4_K_M

Specialised language model for cybersecurity, optimised for efficiency.
Foundation-Sec-8B model (Llama-3.1-FoundationAI-SecurityLLM-base-8B) based on Llama-3.1-8B, pre-trained on a cybersecurity corpus. Designed for threat detection, vulnerability assessment, security automation, etc. Optimised for local deployment. Context of 16k tokens.
45 tokens/second

devstral:24b

Devstral is an agentic LLM for software engineering tasks.
Devstral is an agentic LLM for software engineering tasks. It excels at using tools to explore code bases, modify multiple files and feed software engineering agents. It is refined from Mistral Small 3.1, featuring a long context window of up to 128k tokens.
30 tokens/second

cogito:8b

An intermediate-sized model in the Cogito family, offering a good balance between reasoning capabilities and efficiency.
This version 8B is positioned between compact and larger models, offering robust reasoning capabilities for a wide range of analytical applications without requiring the resources of larger models.
31 tokens/second

llama3.1:8b

The basic model in the Llama 3.1 family, offering solid performance for its size.
Based on the Llama 3.1 architecture, this 8B model is an excellent starting point for general tasks, offering good quality generation and comprehension in an efficient format.
71 tokens/second

phi4-reasoning:14b

Part of Microsoft's Phi family, specialising in complex reasoning and mathematics.
This model is specifically trained to excel at tasks that require multi-step logical reasoning, making it particularly good at maths, logic and coding problems.

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Comparative table of the characteristics and performance of the various AI models available, grouped by category (large-scale models and specialist models).
Model Publisher Parameters Context (k tokens) Vision Agent Reasoning Security Quick * Energy efficiency *
Large models
gpt-oss:120b OpenAI 120B 120000
llama3.3:70b Meta 70B 120000
qwen3:235b Qwen Team 235B 60000
deepseek-r1:671b DeepSeek AI 671B 16000
gemma3:27b Google 27B 120000
qwen3-coder:30b Qwen Team 30B 250000
qwen3-2507-think:30b-a3b Qwen Team 30B 120000
qwen3-2507:30b-a3b Qwen Team 30B 250000
qwen3:30b-a3b Qwen Team 30B 32000
deepseek-r1:70b DeepSeek AI 70B 32000
qwen2.5vl:32b Qwen Team 32B 120000
qwen2.5vl:72b Qwen Team 72B 128000
Specialised models
embeddinggemma:300m Google 300M 2048 N.C.
gpt-oss:20b OpenAI 20B 120000
qwen3:14b Qwen Team 14B 32000
gemma3:12b Google 12B 120000
gemma3:4b Google 4B 120000
gemma3:1b Google 1B 32000
lucie-instruct:7b OpenLLM-France 7B 32000
mistral-small3.1:24b Mistral AI 24B 120000
mistral-small3.2:24b Mistral AI 24B 120000
deepcoder:14b Agentica x Together AI 14B 32000
granite3.2-vision:2b IBM 2B 16384
granite3.3:8b IBM 8B 60000
granite3.3:2b IBM 2B 120000
magistral:24b Mistral AI 24B 40000
granite3.1-moe:3b IBM 3B 32000
cogito:14b Deep Cogito 14B 32000
cogito:32b Deep Cogito 32B 32000
qwen3:32b Qwen Team 32B 40000
qwq:32b Qwen Team 32B 32000
deepseek-r1:14b DeepSeek AI 14B 32000
deepseek-r1:32b DeepSeek AI 32B 32000
cogito:3b Deep Cogito 3B 32000
granite-embedding:278m IBM 278M 512 N.C.
granite3-guardian:2b IBM 2B 8192 N.C.
granite3-guardian:8b IBM 8B 32000 N.C.
qwen2.5:0.5b Qwen Team 0.5B 32000
qwen2.5:1.5b Qwen Team 1.5B 32000
qwen2.5:14b Qwen Team 14B 32000
qwen2.5:32b Qwen Team 32B 32000
qwen2.5:3b Qwen Team 3B 32000
qwen3:0.6b Qwen Team 0.6B 32000
qwen3:1.7b Qwen Team 1.7B 32000
qwen3:4b Qwen Team 4B 32000
qwen3-2507-think:4b Qwen Team 4B 250000
qwen3-2507:4b Qwen Team 4B 250000
qwen3:8b Qwen Team 8B 32000
qwen2.5vl:3b Qwen Team 3.8B 128000
qwen2.5vl:7b Qwen Team 7B (8.3B) 128000
hf.co/roadus/Foundation-Sec-8B-Q4_K_M-GGUF:Q4_K_M Foundation AI - Cisco 8B 16384
devstral:24b Mistral AI & All Hands AI 24B 120000
cogito:8b Deep Cogito 8B 32000
llama3.1:8b Meta 8B 32000
phi4-reasoning:14b Microsoft 14B 32000
Legend and explanation
Functionality or capacity supported by the model
Functionality or capability not supported by the model
* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)
* Quick Model capable of generating more than 50 tokens per second
Note on performance measures
The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants capable of communicating in several languages, with automatic detection, context maintenance throughout the conversation and understanding of linguistic specificities.
Recommended models
  • Llama 3.3
  • Mistral Small 3.2
  • Qwen 3
  • Granite 3.3

Analysis of long documents

Processing of large documents (>100 pages), maintaining context throughout the text, extracting key information, generating relevant summaries and answering specific content questions
Recommended models
  • Gemma 3
  • Qwen3
  • Granite 3.3

Programming and development

Generating and optimising code in multiple languages, debugging, refactoring, developing complete functionalities, understanding complex algorithmic implementations and creating unit tests
Recommended models
  • DeepCoder
  • QwQ
  • Qwen3 coding
  • Granite 3.3
  • Devstral

Visual analysis

Direct processing of images and visual documents without OCR pre-processing, interpretation of technical diagrams, graphs, tables, drawings and photos with generation of detailed textual explanations of the visual content
Recommended models
  • Granite 3.2 Vision
  • Mistral Small 3.2
  • Gemma 3
  • Qwen2.5-VL

Safety and compliance

Applications requiring specific security capabilities; filtering of sensitive content, traceability of reasoning, RGPD/HDS verification, risk minimisation, vulnerability analysis and compliance with sectoral regulations
Recommended models
  • Granite Guardian
  • Granite 3.3
  • Devstral
  • Mistral Small 3.1
  • Magistral 24b
  • Foundation-Sec-8B

Light and on-board deployments

Applications requiring a minimal resource footprint, deployment on capacity-constrained devices, real-time inference on standard CPUs and integration into embedded or IoT systems
Recommended models
  • Gemma 3
  • Granite 3.1 MoE
  • Granite Guardian
  • Granite 3.3
Cookie policy

We use cookies to give you the best possible experience on our site, but we do not collect any personal data.

Audience measurement services, which are necessary for the operation and improvement of our site, do not allow you to be identified personally. However, you have the option of objecting to their use.

For more information, see our privacy policy.