Overview

AtlasCloud provides Serverless computing for AI inference, model training, general compute, and API services, allowing users to pay by the second for their compute usage. The platform supports automatic scaling based on request volume.

You can use the following methods:

Endpoint: Use custom images for AI inference, model training, and other tasks
Quick Deploy: Use pre-built images to quickly create vLLM / SD inference services

Why AtlasCloud Serverless?

You should choose AtlasCloud Serverless instances for the following reasons:

Cost Effective: Pay only for the actual compute time used, billed by the second
High Performance: Access to latest NVIDIA GPUs including A100, H100, and L4
Auto Scaling: Automatically scale from 1 to 100 workers based on demand
Container Support: Support both public and private Docker images
Fast Cold Start: Optimized cold start time of 2-3 seconds for most models
Monitoring & Logs: Real-time metrics for GPU, CPU, Memory usage and comprehensive logging
Storage Integration: Mount network storage to workers for data persistence across scaling events

Why AtlasCloud Serverless?​

Why AtlasCloud Serverless?