FaaS Concepts
  • 12 May 2023
  • Dark
    Light
  • PDF

FaaS Concepts

  • Dark
    Light
  • PDF

Article Summary

Package

A Package is static code with a schema that contains all modules, functions, and the code-base from which they can be taken. It also stores its revision history, allowing services to be deployed from previous package revisions.

A Package is split into Modules(not to be confused with Python modules) - for each Module, the Package specifies where the entry point of the Module within the Codebase is, as well as how to load it and the configuration by which the Module should be loaded, and how the Class's functions should be run. More on modules.

Only project members with the role "Developer" or "Project Owner" can push/delete a package.

Codebase

The package codebase is the code you import to the platform containing all of the modules and functions.
When you upload the code to the platform, either from your computer or from Github, it is saved on the platform as an item (in a zip file).
If you wish to download the codebase, use the following script:

# Get the service
service = dl.services.get(service_id='my_service _id')
# get the item object of the code base
code_base = dl.items.get(item_id=service.package.codebase_id)
# Download the code to a local path
code_base.download(local_path='where_to_download_the_codebase')

Versioning

Packages have a versioning mechanism.
This mechanism allows you to deploy a Service from an old revision of the Package.
Every time you modify the package attributes or codebase by pushing a package or by performing package.update(), a new version of this package is created. The package revisions field contains a list of all prevision states of the Package.
If the codebase has no changes, you can use package.update(); if not, you need to push a new one.
When you deploy a Service you can specify a specific revision from which to deploy the Service, the default is the last update.

If the codebase has no changes, you can use package.update(); if not, you need to push a new one.

# the package latest version
package.version
# List of all of the package versions
package.revisions

Modules

Not to be confused with Python modules

Modules are a reference to a Python file containing the python class (ServiceRunner by default) with functions inside it.

The idea behind separating a package into modules is to allow users to develop many different services that share some mutual code using one codebase. This means you can translate and define the code-base or a substance of it to a service, without needing a new code-base.

Using the module you can load a model on the service deployment and then run any function that references it (see the init method for reference), or proceed to the next step in the pipeline.
A Module contains 2 fields that determine which code within the Codebase to load, initialize and run.
The fields are "entryPoint" and "className".

Entry Point

The entry point is the name of the Python file containing the class and functions of the package code. It should end with .py. The default entry point is main.py.
The entry point specifies the relative path from the CodeBase root to the main file that should be loaded when loading the Module.

Class Name

The "className" specifies the name of the python class within a file that should be loaded in the Service. The class referenced by "className" must extend dl.BaseServiceRunner in order to work.
The python class (ServiceRunner by default by you can name it however you like. just change the class_name attribute of the Package's Module) will be loaded when the service is deployed, see the init method for reference.

Module JSON

A module does not have its own JSON file. The basic scheme looks like this:

{
    "name": "default_module",  # module name
    "className": "ServiceRunner",  # optional
    "entryPoint": "main.py",  # the module entry point which includes its main class and methods
    "initInputs": [],  # expected init params at deployment time
    # list of module functions that can be executed from remote
    "functions": [
        {
            "name": "run",  # function name - must be the same as the actual method name in signature
            "description": "",  # optional -  function description
            # expected function params
            "input": [
                {
                    # input name - identical to input param name in signature
                    "name": "item",
                    "type": "Item",  # Item / Dataset / Annotation / Json
                }
            ],
            "output": [],  # not implemented - keep blank
        }
    ],
}

Class Functions

This is the basic running unit of the FaaS. You can define the functions on the class and when the service is deployed, you can run each of them.

function.PNG

Class Functions Input types

Each of the function's inputs has a name and a type.
Input types can be grouped into 2 categories:

  • JSON - this allows the function to receive all possible inputs as well as having no prepossessing done by the runner.
  • Dataloop types("item", "dataset", etc...) - When invoking the function, the ID of the resource should be passed as an input. See the types referenced in the repositories.

Package JSON

{
  "name": "default_package",
    "modules": [
      {
        "name": "default_module",
        "className":"ServiceRunner",  
        "entryPoint": "main.py",
        "initInputs": [],
        "functions": [
          {
            "name": "run",
            "description": "this description for your service",
            "input": [
              {
                "name": "item",
                "type": "Item"
              }
            ],
            "output": []
          }
        ]
      }
    ]
}

Service

A Service is a running deployment of Modules within a Package. Running a service requires having a bot user.

  • The Service entity specifies the Module it deploys, the arguments passed to the Module's exported class' init method, and the infrastructural configuration for it, such as the number of replicas (machine instances) and concurrent executions.

  • Once the service is ready, its functions can be invoked with the required input parameters and entities.

  • Multiple services can be deployed from the same module.

Init Method

The "init" method is invoked immediately after loading the Module and is used for initializing the replica of the Service.

  • The modules' "Init" method inputs schema is defined in the Modules' "initInputs" field.
  • The actual value passed to the "init" method at loading time is taken from the "initParams" field of the service.

For example:

  • For "Image bounding box classifier" service, the "init" method will download the model and load it to its memory.
  • All subsequent function invocations in a Service will use the loaded model to calculate the bounding box for the image.

Service Status

A service has an operational status assigned to it. Status can be one of the following values:

  • Active: an active service has running replicas or an autoscaler configured to create replicas automatically when executions are created for the service.
  • Inactive: when a service is paused, all replicas go down. Any trigger or UI slot set for the service will become inactive as well.
  • Initializing: initializing one or more of the service replicas. This status is usually displayed after activating a service, or when autoscaler is in action.
  • Error: when the service fails to deploy. The service replicas have active errors (can be found under "Error info").

Versioning

A deployed service gets a version that represents its state. Every time a service is updated its version number increments and its previous value is saved in the service revision. Services can be deployed using earlier versions.

To view the service revisions:

# List service revisions history from past updates using  'service.update()' command
service.revisions
# Current service version for currently used package codebase (services can use old package versions)
service.package_revision

Service JSON

{
  "name": "service-name",           # service name
  "packageName": "default_package", # package name
  "packageRevision": "latest",      # What package version to run?
  "runtime": {
    "gpu": false,                   # Does the service require a GPU?
    "replicas": 1,                  # How many replicas should the service create
    "concurrency": 6,               # How many executions can run simultaneously?
    "runnerImage": ""               # You can provide your own docker image for                                                                         the service to run on.
  },
    "triggers": [],                 # List of triggers to trigger service
    "initParams": {},               # Does your init method expects input if it                                                                              does provide it here.
    "moduleName": "default_module"  # Which module to deploy?
}

Execution

An execution is a single function invocation with input data. As a Dataloop entity, Executions has unique IDs, status and log, which allows for monitoring it from invocation to completion.

Execution Input

The execution input is similar to the input the function requires, provided to the method the execution invokes.

For example:

dl.FunctionIO(name='model_filename', type=dl.PackageInputType.JSON),
dl.FunctionIO(name='dataset', type=dl.PackageInputType.DATASET),
dl.FunctionIO(name='item', type=dl.PackageInputType.ITEM)
#When executing:
execution = service.execute(execution_input=dl.FunctionIO(name='item', value='item-id',type=dl.PackageInputType.ITEM), project_id='project-id', function_name='function-name')

The input of Dataloop type ( item, dataset, annotation, etc) should be passed with an ID of the corresponding entity.
The input of type Json can have any JSON serializable value and will provide it as is to the method.

Execution Status

Every execution has a status

  • execution.statusLog holds an array of all status updates of the execution
  • execution.latest_status is the latest status update of the execution
  • The execution status can be updated by using the Progress object
  • Progress updates are API call and can only be updated in increments of 5

For example: When performing 'progress.update(progress=4)' and then 'progress.update(progress=7)' the update to 7 won't be in effect. However, performing 'progress.update(progress=9)' will have an effect.

class ServiceRunner(dl.BaseServiceRunner):
    def detect(self, item: dl.Item, progress: dl.Progress): 
        progress.update(status='inProgress', progress=0, 
                        message='execution started')
        ###############
        ### DO WORK ###
        ###############
        progress.update(status='inProgress', progress=30)
        ####################
        ### DO MORE WORK ###
        ####################
        progress.update(status='inProgress', progress=80)
        ####################
        ### DO MORE WORK ###
        ####################
        progress.update(status='inProgress', progress=100, 
                        message='execution completed')

Retry Mechanism

Executions failure to complete may occur for various reasons, for example:

By setting the value of "max_attempts attribute", services can be configured to automatically retry failed executions.
Consequently, the number of attempts on a single execution can be read from the "attempts" attribute.

Waiting for execution

Processes can be paused until an execution is completed with either status (success/failed/aborted/etc), using the "execution.wait()" method.

Execution cURL

To export the execution in a cURL format:

curl = service.executions.create(return_curl_only=True)

Rerun Execution

To rerun failed executions:

execution = dl.executions.get('execution_id')
execution.rerun()

Execution JSON

{
  "id": "string",
  "url": "string",
  "createdAt": "2020-11-18T17:41:25.740Z",
  "updatedAt": "2020-11-18T17:41:25.740Z",
  "creator": "string",
  "attempts": 0,
  "maxAttempts": 0,
  "toTerminate": true,
  "input": {},
  "output": {},
  "feedbackQueue": {
    "exchange": "string",
    "routing": "string"
  },
  "status": [
    {
      "timestamp": "2020-11-18T17:41:25.740Z",
      "status": "created",
      "message": "string",
      "percentComplete": 0,
      "error": {},
      "output": {}
    }
  ],
  "latestStatus": {
    "timestamp": "2020-11-18T17:41:25.740Z",
    "status": "created",
    "message": "string",
    "percentComplete": 0,
    "error": {},
    "output": {}
  },
  "duration": 0,
  "projectId": "string",
  "functionName": "string",
  "serviceId": "string",
  "triggerId": "string",
  "serviceName": "string",
  "packageId": "string",
  "packageName": "string",
  "packageRevision": 0,
  "serviceVersion": 0,
  "syncReplyTo": {
    "exchange": "string",
    "route": "string"
  }
}

BOT

Bot is a dummy project-user with 'Developer' role permissions used for running services.
A deployed service uses the associated bot credentials to log-in and perform operations on the platform - if a service is not created by a bot, then the platform will create one automatically. If a service function creates an annotation as part of its operation, the bot will be recorded as the annotation-creator.

  • A project can have unlimited bots.
  • A service is limited to exactly 1 bot, however, a service's bot can be changed.
  • A bot’s role is "developer" and it functions as a regular user with all its permissions.
  • Bots can be created by users with a "Developer" role or higher.
  • Deleting a bot, removing it from the project-contributors list or demoting it to project-role lesser than "Developer' is likely to result in failure to run associated services and consequencing failed executions.

Use Service in Other Projects

In order for a service to run in other projects within the same organization with full access to the projects' entities (datasets, items, tasks, etc.), the bot user associated with the service needs to be added as a contributor (developer permissions) to the required projects. By doing so, you are providing the bot with developer access to your projects and ensuring that it won't encounter any blocks. If you need to copy the service's bot, please refer to the following guide.

Copy Service's Bot

To copy the bot user email of an existing service:
Go to "Application Hub" > "Installed" tab > Click on "3 dots" button of the required service > "Copy service bot"

Create/Set a Bot

Create Bot

project.bots.create(name='botman-name')

Set bot for a service

The bot being assigned to a service must be a contributor in the Project and a member of the Organization owning the project.

service = package.deploy(
    service_name='my-service',
    bot='botmanemai.dataloop.ai'
)

Trigger

The Dataloop FaaS enables you to register Functions to events in the system.
Use the Triggers entity for the registration.
A Trigger contains a project on which it monitors events, a resource type, the action that happened to the resource, a DQL (The Data Query Engine) filter that checks whether or not to invoke the operation based on the resource JSON, and an operation.

Currently, the only supported operation is the creation of an execution.

Trigger Events - Resource

Events of which entities to listen to?
Resources are entities in the Dataloop system: Item, Annotation, Dataset, Task, etc. Each resource is associated with an action to define a specific event for the trigger.
Once an event trigger occurs, the resource object is delivered as an input for the execution created.

Note that you can only have one resource per trigger.
If you wish to have more than one, use a manual execution from a new function.

dl EnumAPI ValueDataloop EntityAvailable Actions (*)
dl.TriggerResource.ITEMItemPlatform itemCreated,Updated,Deleted,Clone
dl.TriggerResource.DATASETDatasetPlatform datasetCreated,Updated,Deleted
dl.TriggerResource.ANNOTATIONAnnotationPlatform item's annotationCreated,Updated,Deleted
dl.TriggerResource.ITEM_STATUSItemStatusPlatform Item's statusUpdated**, taskAssigned
dl.TriggerResource.TASKTaskPlatform taskCreated,Updated,Deleted,statusChanged
dl.TriggerResource.ASSIGNMENTAssignmentPlatform assignmentCreated,Updated,Deleted,statusChanged
dl.TriggerResource.MODELModelPlatform ModelCreated,Updated,Deleted

(*) Your function can be triggered by the following actions when taken over a resource (a specific resource must be set!):

  1. Created: run your function over any newly created resource.
  2. Update: run your function over any updated resource. See (**) for more details regarding "itemStatus.Updated".
  3. Deleted: run your function over any deleted resource.
  4. Clone: run your function over every cloned item.
  5. taskAssigned: once an item is assigned to a task, run the function over this item. To use this event, set the following trigger: "ItemStatus.TaskAssigned".
  6. statusChanged: once a task/assignment status is changed, run the function over this task/assignment (task 'completed', assignment 'done').

(**) Use the trigger "itemStatus.Updated" to run your function over items once their status is changed within a task context ('approved', 'discard', etc.).

Trigger Events - Action

What event action types are available?

Actions are the events that will trigger the function .
For further explanation regarding the actions, read the explanations above (*).

headerdl EnumAPI ValueAvailable Modes
celldl.TriggerAction.CREATEDCreatedOnce
celldl.TriggerAction.UPDATEDUpdatedOnce, Always (excluding Task & Assignment)
celldl.TriggerAction.DELETEDDeletedOnce
celldl.TriggerAction.CLONECloneOnce
celldl.TriggerAction.STATUS_CHANGEDstatusChangedOnce, Always (excluding Task & Assignment)
celldl.TriggerAction.TASK_ASSIGNEDtaskAssignedOnce, Always

Trigger Execution Mode

Some events, such as item updates, can happen more than once on the same entity. Trigger execution mode defines those repeating events that will trigger the service every time they happen, or only on the first time they happen.

  • Once - the function will only run once when triggered. For instance, for an "item" resource and an "Updated" action, the function will only work on the first updated item.
  • Always - the function will run each time when triggered. For instance, for an "item" resource and an "Updated" action, the function will run for every updated item.
dl Enumdl.TriggerExecutionMode.ONCE
dl.TriggerExecutionMode.ONCEOnce
dl.TriggerExecutionMode.ALWAYSAlways

Be careful not to loop a trigger event. Known examples:

Creating a function that adds items to a folder, while an event trigger "item.created" is set for this function.
Creating a function that updates items, while an event trigger "item.updated" with execution_mode='always' is set for this function.
The function will be triggered again and again when implementing the above functions.

Trigger JSON

{
  "name": "service-name",  
  "packageName": "default_package",  
  "packageRevision": "latest",  
  "runtime": {
    "gpu": false,           
    "replicas": 1,            
    "concurrency": 6,      
    "runnerImage": "" 
  },
    "triggers": [
      {
        "name": "trigger-name",
        "filter": {
          '$and': [{'dir': '/train'}, {'hidden': False}, {'type': 'file'}]},
        "resource": "Item",
        "actions": [
          "Created"
        ],
        "active": true,
        "function": "run",
        "executionMode": "Once"
      }
    ],               
    "initParams": {},                     
    "moduleName": "default_module"    
}

Trigger at Specific Time Patterns

Dataloop FaaS enables you to run functions at specified time patterns with constant input using the Cron syntax.

In the Cron trigger specification, you specify when you want the trigger to start, when you want it to end, the Cron spec specifying when it should run, and the input that should be sent to the action.

# start_at: iso format date string to start activating the cron trigger 
# end_at: iso format date string to end the cron activation 
# inputs: dictionary "name":"val" of inputs to the function 
import datetime
up_cron_trigger = service.triggers.create(function_name='my_function',
                                          trigger_type=dl.TriggerType.CRON,
                                          name='cron-trigger-name',
                                          start_at=datetime.datetime(2020, 8, 23).isoformat(),
                                          end_at=datetime.datetime(2024, 8, 23).isoformat(),
                                          cron="0 5 * * *")

Triggers Examples

Trigger for Items Uploaded to Directory /Input

filters = dl.Filters(field='dir', values='/input')
trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.ITEM,
                                  actions=dl.TriggerAction.CREATED,
                                  name='items-created-trigger',
                                  filters=filters)

Trigger for Items Updated in Project

trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.ITEM,
                                  actions=dl.TriggerAction.UPDATED,
                                  name='items-updated-trigger')

Trigger for Annotated Video Items that are Being Updated

filters = dl.Filters(field='annotated', values=True)
filters.add(field='metadata.system.mimetype', values='video/*')
trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.ITEM,
                                  actions=dl.TriggerAction.UPDATED,
                                  name='items-updated-trigger',
                                  filters=filters)

Trigger for Datasets that are Being Updated

trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.DATASET,
                                  actions=dl.TriggerAction.UPDATED,
                                  name='dataset-updated-trigger')

Trigger for Annotations Created from Box Type

filters = dl.Filters(field='type', values='box', resource=dl.FiltersResource.ANNOTATION)
trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.ANNOTATION,
                                  actions=dl.TriggerAction.CREATED,
                                  name='annotation-created-trigger',
                                  filters=filters)

Trigger for Annotations Updates of Label "DOG"

filters = dl.Filters(field='label', values='DOG',resource=dl.FiltersResource.ANNOTATION)
trigger = service.triggers.create(function_name='run',
                                  resource=dl.TriggerResource.ANNOTATION,
                                  actions=dl.TriggerAction.CREATED,
                                  name='annotation-created-trigger',
                                  filters=filters)

Trigger for Item Status Change in a Task

trigger = service.triggers.create(function_name='run',
                                  execution_mode=dl.TriggerExecutionMode.ALWAYS,
                                  resource=dl.TriggerResource.ITEM_STATUS,
                                  actions=[dl.TriggerAction.UPDATED],
                                  name='item-updated-status')