Overview
AtlasCloud provides Serverless computing for AI inference, model training, general compute, and API services, allowing users to pay by the second for their compute usage. The platform supports automatic scaling based on request volume.
You can use the following methods:
- Endpoint: Use custom images for AI inference, model training, and other tasks
- Quick Deploy: Use pre-built images to quickly create vLLM / SD inference services
Why AtlasCloud Serverless?
You should choose AtlasCloud Serverless instances for the following reasons:
- Cost Effective: Pay only for the actual compute time used, billed by the second
- High Performance: Access to latest NVIDIA GPUs including A100, H100, and L4
- Auto Scaling: Automatically scale from 1 to 100 workers based on demand
- Container Support: Support both public and private Docker images
- Fast Cold Start: Optimized cold start time of 2-3 seconds for most models
- Monitoring & Logs: Real-time metrics for GPU, CPU, Memory usage and comprehensive logging
- Storage Integration: Mount network storage to workers for data persistence across scaling events