Compute
High-performance, scalable computing resources for your critical workloads. Orchestrate your cloud-native applications with our modern container solutions.
Discover the Calcul offer
Virtual machines
VM Instances
An on-demand, flexible and secure virtual machine solution on a shared infrastructure.
Dedicated servers
OpenSource IaaS
Open source virtualised infrastructure in a trusted SecNumCloud-qualified cloud environment for complete technological sovereignty.
VMWare IaaS
Your VMware virtual machines in a trusted SecNumCloud-qualified and HDS-certified cloud environment.
Bare Metal
Dedicated, fully customisable servers for total autonomy over your sovereign infrastructure.
Containers
Openshift PaaS
The unified platform for creating, modernising and deploying your large-scale applications in a sovereign cloud.
Managed Kubernetes
Managed container orchestration solution offering security, resilience and advanced automation on sovereign infrastructure.
Storage
Adaptable, high-performance storage solutions for all your needs. Optimise your data with our highly available block and object solutions.
Discover our Storage offer
Storage
Block storage
The adaptable block storage solution for optimum storage performance in a sovereign cloud.
Object storage
The scalable, cost-effective storage solution for your unstructured data in a sovereign cloud.
Backup
Backup solutions
Differentiated backup solutions tailored to your challenges and environments
Network
Advanced network solutions to connect and secure your infrastructures. Deploy your private networks automatically and securely.
Discover the Network offer
Network
Virtual Private Cloud
Deploy and manage your private networks 100% automatically and securely.
Private Backbone
Take full control of your network with extended Layer 2 connectivity, designed for hybrid architectures and bespoke configurations.
Firewall
Managed Firewall
Advanced security solutions for complete insulation and enhanced protection
Accommodation Dry
Housing - Dedicated space
Secure hosting for your equipment in a dedicated or shared environment, depending on your needs.
Security
Advanced security solutions to protect your critical infrastructures. Control access and defend against online threats.
Discover the Security offer
Security
Anti DDoS
The shield against online attacks
Bastion host
Transparent, centralised access control for robust protection of your infrastructure
Managed KMS
Sovereign cryptographic key management, with HSM hardware root of trust, to protect your most sensitive data on SecNumCloud infrastructure.
Managed SIEM
A centralised platform for collecting and correlating security logs, combining AI-based automation and advanced detection rules (MITRE ATT&CK).
IA
Artificial intelligence solutions to transform your data into insights and accelerate your business processes.
Discover the IA offer
IA
LLMaaS
Access cutting-edge language models on a sovereign, SecNumCloud-qualified and HDS-certified infrastructure for high-performance, secure AI applications.
GPU
NVIDIA GPU instances to accelerate your artificial intelligence and high-performance computing in a sovereign cloud.
Data
Data solutions to manage, analyse and exploit your critical data.
Discover the Data offer
Databases
Managed MariaDB
A fully managed MariaDB relational database and PITR backup on SecNumCloud sovereign infrastructure.
Managed PostGreSQL
The fully managed relational database solution on SecNumCloud sovereign infrastructure
Big Data
Managed Kafka
The open-source distributed platform for streaming data in real time
Managed File System
A managed, sovereign, high-availability distributed file system, accessible via NFS and SMB on the SecNumCloud infrastructure.
Management & Governance
Coaching and support services to help you with your cloud transformation.
Find out about our support services
Support
Support levels
Discover the 3 levels of support available to help you meet your challenges.
Professional services
From design to optimisation, Cloud Temple is with you every step of the way.
Governance
Console - API - Terraform Provider
A single interface for viewing and managing your products and services
Observability
Infrastructure metrics available in market standards

Our Large Language Model as a Service (LLMaaS) offering gives you access to cutting-edge language models, inferred using SecNumCloud-qualified infrastructure, HDS-certified for healthcare data hosting, and therefore sovereign, calculated in France. Benefit from high performance and optimal security for your AI applications. Your data remains strictly confidential, and is neither exploited nor stored after processing.

Simple, transparent pricing
1.8 €
per million input tokens
8 €
per million tokens issued
8 €
per million reasoning tokens
0,01 €
per minute of transcribed audio *
Calculated on an infrastructure based in France, SecNumcloud qualified and HDS certified.
Note on the "Reasoning" price: This price applies specifically to models classified as "reasoners" or "hybrids" (models with the "Reasoning" capability activated) when reasoning is active and only on tokens linked to this activity.
* any minute started is counted

Large models

Our large models offer state-of-the-art performance for the most demanding tasks. They are particularly well-suited to applications requiring a deep understanding of language, complex reasoning or the processing of long documents.

Specialised models

Our specialised models are optimised for specific tasks such as code generation, image analysis or structured data processing. They offer an excellent performance/cost ratio for targeted use cases.

22 tokens/second

ministral-3:3b

Mistral AI's cutting-edge compact model, designed for efficiency in local and edge deployments.
Despite its small size, this model offers surprising performance for conversational tasks and simple reasoning. Ideal for mobile devices.
40 tokens/second

ministral-3:8b

Mid-sized model in the Ministral family, offering an optimal balance between performance and resources.
Version 8B is more robust, capable of handling longer contexts and more complex reasoning, while remaining very fast.
40 tokens/second

functiongemma:270m

Gemma micro-model specialising in function calling and detection of tool call intentions.
FunctionGemma 270M is an ultra-compact model optimised for identifying and formatting function calls. Ideal as a router or pre-filter in a multi-model agentic architecture.
49 tokens/second

granite3.2-vision:2b

IBM Granite compact multimodal model, specialising in the analysis of visual documents.
Granite 3.2 Vision 2B is a lightweight yet powerful model for OCR, data extraction from scanned documents and image analysis. Ideal for low-latency vision tasks.

qwen3-embedding:0.6b

Ultra-light Qwen3 embedding model, optimised for speed and efficiency on resource-limited infrastructures.
Offers an excellent compromise between semantic performance and speed of execution.
196.3 tokens/second

granite-embedding:278m

Ultra-compact IBM Granite embedding model, designed for maximum efficiency.
Ideal for semantic search tasks requiring minimal latency.

qwen3-embedding:4b

High-performance Qwen3-4B embedding model, offering deep semantic understanding and an extended context window.
Context of 40,000 tokens for processing large documents.
171 tokens/second

bge-m3:567m

State-of-the-art multilingual embedding model (BGE-M3), offering exceptional semantic search capabilities in over 100 languages.
Context of 8192 tokens. Supports dense, sparse and multi-vector search methods.
175 tokens/second

embeddinggemma:300m

Google's state-of-the-art embedding model, optimised for its size, ideal for search and semantic retrieval tasks.
Built on Gemma 3, this model produces vector representations of text for classification, clustering and similarity search. Trained on over 100 languages, its small size makes it perfect for resource-constrained environments.
57 tokens/second

gpt-oss:20b

OpenAI's open-weight language model, optimised for efficiency and deployment on consumer hardware.
A Mixture-of-Experts (MoE) model with 21 billion parameters and 3.6 billion active parameters. It offers configurable reasoning effort and agent capabilities.
55 tokens/second

qwen3-2507-think:4b

Qwen3-4B model optimised for reasoning, with improved performance on logic, maths, science and code tasks, and extended context to 250K tokens.
This 'Thinking' version has an increased thought length, making it ideal for highly complex reasoning tasks. It also offers general improvements in following instructions, using tools and generating text.
22 tokens/second

rnj-1:8b

Model 8B "Open Weight" specialising in coding, mathematics and science (STEM).
RNJ-1 is a dense model with 8.3B parameters trained on 8.4T tokens. It uses global attention and YaRN to provide a context of 32k tokens. It excels at code generation (83.5% HumanEval+) and mathematical reasoning, often outperforming much larger models.
64 tokens/second

qwen3-vl:2b

Ultra-compact multimodal Qwen3-VL model, bringing advanced vision capabilities to edge devices.
Despite its small size, this model incorporates Qwen3-VL technologies (MRoPE, DeepStack) to deliver impressive image and video analysis. Ideal for mobile or embedded applications requiring OCR, object detection or rapid visual understanding.
49 tokens/second

qwen3-vl:4b

Balanced Qwen3-VL multimodal model, offering robust vision performance with a small footprint.
Excellent compromise between performance and resources. Capable of analysing complex documents, graphics and videos with high accuracy. Supports structured extraction and visual reasoning.
16 tokens/second

qwen3.5:0.8b

Ultra-light Qwen3.5 model with 0.8 billion parameters, offering an exceptional native context of 250K tokens - a remarkable capacity for a model of this size.
Context configured to 250,000 tokens (native max context 262,144). Ideal for fast conversational tasks requiring a very long history or analysis of large documents with a small memory footprint.
37 tokens/second

qwen3.5:4b

Compact Qwen3.5 model with 4 billion parameters, offering a good compromise between performance and efficiency.
Context of 250k tokens. Good candidate for local assistants and light reasoning tasks.
32 tokens/second

qwen3.5:9b

Qwen3.5 model of intermediate size, offering solid reasoning capabilities with an extended context.
Context of 250k tokens. Offers a good balance between generation quality and inference speed.
46 tokens/second

qwen3:0.6b

Ultra-light Qwen3 model with 0.6 billion parameters, offering exceptional inference speed for fast, simple tasks.
Ideal for deployment on lightweight servers or as the first level of processing for complex workflows. Configured with a context of 40,000 tokens.
39 tokens/second

qwen3-vl:8b

Qwen3-VL multimodal model (8B), offering advanced vision performance with a reasonable footprint.
Version 8B of the Qwen3-VL model. Excellent compromise between performance and resources. Capable of analysing complex documents, graphics and video with high accuracy.
33 tokens/second

devstral-small-2:24b

Second iteration of Devstral (Small 2), a state-of-the-art agentic model for software engineering.
Optimised for codebase exploration, multi-file editing and tool use. Offers performance close to >100B models for code (SWE-bench Verified 68%). Native vision support. Context of 200k tokens.
84 tokens/second

deepseek-ocr

DeepSeek's specialist OCR model, designed for high-precision text extraction with formatting preservation.
Two-stage OCR system (visual encoder + MoE 3B decoder) optimised for converting documents into structured Markdown (tables, formulas). Requires specific pre-processing (Logits Processor) for optimum performance.
28 tokens/second

mistral-small3.2:24b

Minor update to Mistral Small 3.1, improving instruction tracking, function calling robustness and reducing repetition errors.
This version 3.2 retains the strengths of its predecessor while making targeted improvements. It is better able to follow precise instructions, produces fewer infinite generations or repetitive responses, and its function calling template is more robust.
27 tokens/second

translategemma:12b

State-of-the-art open translation model based on Gemma 3, covering 55 languages.
TranslateGemma 12B offers high-fidelity translation capabilities while respecting grammar and cultural nuances. Context of 128k tokens.
37 tokens/second

translategemma:4b

Compact version of the TranslateGemma translation model, optimised for speed.
TranslateGemma 4B offers fast and efficient translation capabilities for 55 languages. Context of 128k tokens.
16 tokens/second

translategemma:27b

High-performance translation model based on Gemma 3 27B.
TranslateGemma 27B offers superior translation quality for complex and technical content.

voxtral

Mistral AI's real-time ASR (Automatic Speech Recognition) model, capable of transcribing streaming audio via WebSocket.
Voxtral Mini 4B operates in Realtime mode via the /v1/realtime endpoint (WebSocket). It transcribes continuous audio with token extraction and ASR time tracking.

z-image:16b

Model for generating images from text prompts, compatible with the OpenAI /v1/images/generations API.
Z-Image Turbo is an image generation model compatible with the OpenAI Images API. It supports parameters for the size and number of images.

Model comparison

This comparison table will help you choose the model best suited to your needs, based on various criteria such as context size, performance and specific use cases.

Comparative table of the characteristics and performance of the various AI models available, grouped by category (large-scale models and specialist models).
Model Publisher Parameters Context (k tokens) Vision Agent Reasoning Security Quick * Energy efficiency *
Large models
Specialised models
ministral-3:3b Mistral AI 3B 250000
ministral-3:8b Mistral AI 8B 250000
functiongemma:270m Google 270M 32768
granite3.2-vision:2b IBM 2B 16384
qwen3-embedding:0.6b Qwen Team 0.6B 32768
granite-embedding:278m IBM 278M 512
qwen3-embedding:4b Qwen Team 4B 40000
bge-m3:567m BAAI 567M 8192
embeddinggemma:300m Google 300M 2048
gpt-oss:20b OpenAI 20B 120000
qwen3-2507-think:4b Qwen Team 4B 250000
rnj-1:8b Essential AI 8B 32000
qwen3-vl:2b Qwen Team 2B 250000
qwen3-vl:4b Qwen Team 4B 250000
qwen3.5:0.8b Qwen Team 0.8B 250000
qwen3.5:4b Qwen Team 4B 250000
qwen3.5:9b Qwen Team 9B 250000
qwen3:0.6b Qwen Team 0.6B 40000
qwen3-vl:8b Qwen Team 8B 250000
devstral-small-2:24b Mistral AI & All Hands AI 24B 200000
deepseek-ocr DeepSeek AI 3B 8192
mistral-small3.2:24b Mistral AI 24B 128000
translategemma:12b Google 12B 128000
translategemma:4b Google 4B 128000
translategemma:27b Google 27B 120000
voxtral Mistral AI 4B 32768 N.C.
z-image:16b Community 16B N.C.
Legend and explanation
Functionality or capacity supported by the model
Functionality or capability not supported by the model
* Energy efficiency Indicates particularly low energy consumption (< 2.0 kWh/Mtoken)
* Quick Model capable of generating more than 50 tokens per second
Note on performance measures
The speed values (tokens/s) represent performance targets in real-life conditions. Energy consumption (kWh/Mtoken) is calculated by dividing the estimated power of the inference server (in Watts) by the measured speed of the model (in tokens/second), then converted into kilowatt-hours per million tokens (division by 3.6). This method offers a practical comparison of the energy efficiency of different models, to be used as a relative indicator rather than an absolute measure of power consumption.

Recommended use cases

Here are some common use cases and the most suitable models for each. These recommendations are based on the specific performance and capabilities of each model.

Multilingual dialogue

Chatbots and assistants capable of communicating in several languages, with automatic detection, context maintenance throughout the conversation and understanding of linguistic specificities.
Recommended models
  • nemotron-3-super:120b
  • qwen3.5:27b
  • nemotron3-nano:30b
  • gpt-oss:120b

Analysis of long documents

Processing of large documents (>100 pages), maintaining context throughout the text, extracting key information, generating relevant summaries and answering specific content questions
Recommended models
  • nemotron-3-super:120b
  • qwen3.5:27b
  • qwen3-2507:235b

Programming and development

Generating and optimising code in multiple languages, debugging, refactoring, developing complete functionalities, understanding complex algorithmic implementations and creating unit tests
Recommended models
  • qwen3.5:27b
  • qwen3-2507:235b
  • qwen-coder-next:80b
  • nemotron-3-super:120b

Visual analysis

Direct processing of images and visual documents without OCR pre-processing, interpretation of technical diagrams, graphs, tables, drawings and photos with generation of detailed textual explanations of the visual content
Recommended models
  • qwen3.5:27b
  • deepseek-ocr
  • qwen3.5:35b

Safety and compliance

Applications requiring specific security capabilities; filtering of sensitive content, traceability of reasoning, RGPD/HDS verification, risk minimisation, vulnerability analysis and compliance with sectoral regulations
Recommended models
  • granite3-guardian:8b
  • qwen3.5:27b
  • granite3-guardian:2b

Light and on-board deployments

Testing applications at Cloud Temple that require a minimal resource footprint, deployment on capacity-constrained devices, real-time inference on standard CPUs and integration into embedded or IoT systems.
Recommended models
  • qwen3.5:0.8b
  • qwen3-vl:2b
  • ministral-3:3b
Contact us
Cookie policy

We use cookies to give you the best possible experience on our site, but we do not collect any personal data.

Audience measurement services, which are necessary for the operation and improvement of our site, do not allow you to be identified personally. However, you have the option of objecting to their use.

For more information, see our privacy policy.