Services are configured with a runtime driver and runtime configuration, which defines how the Service should be deployed.
Currently, the only supported driver is the Kubernetes driver.
When a Service is deployed with the Kubernetes driver, a Kubernetes deployment is created.
A Kubernetes deployment basically tells Kubernetes how many replicas of the Service should be up at any point in time, and how many resources should be allocated to it.
The amount of resources each replica has is determined by the instance type of the Service.
Whether or not the Service has a GPU and what the GPU type is also determined by the instance type.
The Agent is a pod that runs a Dataloop code that loads the Module, listens to a Rabbitmq queue and for each message, invokes the relevant function in the Module with inputs parsed from the message, manages the status updates, Monitors executions, updates metrics, all the service overview, CPU statuses, etc.
The agent will be terminated when the service is down.
Each time an execution is created manually or through a trigger, a message is sent to the relevant queue in Rabbitmq which in turn invokes the relevant function in one of the Agents.
In order to provide zero downtime on Service update, our updates are implemented with a rolling update mechanism.
When a Service update is invoked, our backend picks a couple of Agents, tells them to stop consuming new messages from RabbitMQ, wait for the existing executions to finish, and then exit.
When an Agent exits a new Agent is created instead of it, configured with the updated parameters.
If an Agent does not terminate since being ordered to stop, it is brutally stopped regardless if it finished handling all executions that it fetched or not.
Concurrency is the number of parallel executions that can run simultaneously. Each execution will start a new thread or a process depends on the service
run_execution_as_process flag. When deploying a service we can set its concurrency (the default is 10):
service = package.deploy( service_name='my-service', runtime=dl.KubernetesRuntime(concurrency=32) )
The pod type param defines the instance on which the service will run. Dataloop available instances are:
|DL Instance types||runner CPU||runner ram|
To set the pod type:
If no pod type is specified, the default pod type, regular-s, will be selected.
service = package.deploy( service_name='my-service', runtime=dl.KubernetesRuntime(pod_type=dl.InstanceCatalog.HIGHMEM_L) )
When you have large loads of work to do, you can create a few replicas of the same service:
service = package.deploy( service_name='my-service', runtime=dl.KubernetesRuntime(num_replicas=2) )
When we have changing loads of work, we want the number of replicas of the service to scale up when there are many executions coming in, and scale down otherwise. To do so, we need to create an autoscaler:
- 'cooldown_period' - Define how long to wait before scaling down (reducing the number of replicas) in case the queue is empty again. So that running executions will have time to complete.
- 'polling_interval' - Autoscaler polling interval of the service queue (in seconds). This parameter defines how often a new execution enters the queue.
service = package.deploy( service_name='my-service', runtime=dl.KubernetesRuntime(autoscaler = dl.KubernetesRabbitmqAutoscaler( # If you set the min_replicas to 0 the service won’t run until it has something in the queue, it will be suspended # the following parameters are also the default min_replicas=0, max_replicas=1, queue_length=10, cooldown_period=300, polling_interval=10 )) )
Updating the SDK Version for a Service
The SDK version for a service can be updated from the Dataloop platform user interface and from the SDK.
From the UI
Application (FaaS) → Application Hub → Installed → click on the service you wish to edit → Edit → select the SDK version you want
From the SDK
service.versions['runner'] = ‘1.23.23.latest’ service.versions['dtlpy'] = ‘1.23.23’ service = service.update()
Preemptible instances are designed for short-term usage. They behave the same as regular compute instances, but can be reclaimed at any time when needed elsewhere, and can run for a maximum of 24 hours before being preempted.
If your workloads are fault-tolerant and can withstand interruptions or failures, then preemptible instances can be scheduled for your application. For example, use preemptible VMs for tests that can be stopped and resumed later or for any short term or non-critical application.
Preemptible instances function like normal instances but have the following limitations:
- Compute Engine might stop preemptible instances at any time due to system needs. The probability that Compute Engine stops a preemptible instance for a system need is generally low, but might vary from day to day.
- Compute Engine always stops preemptible instances after they run for 24 hours.
- Preemptible instances are finite Compute Engine resources, so they might not always be available.
Set Preemptible instances
Once creating a new application, the computing settings will probably make your application installation (service) use preemptible instances by default.
- To update existing applications to start using Preemptible VM:
- Update the App package settings:
Applications (FaaS) > Application Hub > Applications Library tab > “Edit App package” > “App config” settings > Enable the “Preemptible” option > Update the existing App installations with the new package version.
- Update the App installation directly:
Applications (FaaS) > Application Hub > Installed Applications Tab> “Edit settings” > “Advanced” settings > Enable the “Preemptible” option
- Update the App package settings:
- To disable Preemptible VM when creating new application:
- Create application dialog > “App config” settings > disable the “Preemptible” option.