otto/rfc/0003-resource-auctioning.adoc

9.7 KiB
Raw Permalink Blame History

<html lang="en"> <head> </head>

RFC-0003: Resource allocation by auction

Table 1. Metadata

RFC

0003

Title

Resource allocation by auction

Sponsor

R Tyler Croy

Status

Draft 💬

Type

Standards

Created

2019-12-13

Depends on

RFC-0005

Abstract

Fundamental to any task orchestration engine, such as Otto, is the allocation of resources for the execution of tasks. Matching tasks to resources is simple in small environments where one might have a single orchestrator and only a few resources, e.g. virtual machines, capable of executing tasks. More complex task workloads require increasingly complex approaches for efficient allocation and utilization of resources available to the task orchestration engine. This document describes the approach taken by Otto, wherein tasks are "auctioned" to resources or orchestrators which can maximize utilization (saturation of resource) while minimizing cost (time to execute task, operational expenditure of resource).

Specification

Auction-based Resource Allocation requires at minimum three system components in order to operate properly: the Eventbus, the Auctioneer, and an Orchestrator. The Auctioneer manages the bidding on tasks and ultimately is responsible for evaluating which bid has the lowest cost to execute the task. Resource cost is a coupling of the underlying operational cost of the resource, e.g. Compute/Hour, and the Orhcestrators estimated time to execute the task.

This specification will not describe the actual execution of the task, which may be carried out on an Agent, at the behest of an Orchestrator, or executed by the Orchestrator itself.

Tasks in the system are also not generated by the Auctioneer, but are instead expected to be published to the Eventbus by another service.

Auctioneer

Task auctions are meant to be fast, lightweight, and not 100% perfect. The Auctioneer plays the role of operating a quick auction for each task that it receives. The Auctioneers uses a configured auction duration to determine how much time to allow each auction to exist.

Note

The Auctioneer does not generate the tasks in the system, that responsibility is outside of the purview of this document.

For example a task A arrives in tasks.for_auction. The Auctioneer processes the task and creates its internal representation for the auction before announcing the task auction on tasks.auction. The various Orchestrators consuming from tasks.auction may then consider the contents of the auction format in order to determine whether they can/should create a bid, which they submit onto tasks.bids.

The Auctioneer listens on tasks.bids for bids on all open auctions, for the example task A, it would see zero or more bids for A. Once the configured auction duration elapses, the Auctioneer chooses the most cost-effective bid and then writes a "auction won" message into the inbox (inbox.<clientId>) for the Orchestrator whose bid won.

The Auctioneer maintains a list of currently "open" auctions for tasks, and reports via a web interface on the status of these auctions.

Caution

Reliability concern: What happens if an Orchestrator wins a bid, but then is unable to actually start working on the task? Should it be cancelled? How would the Auctioneer handle this?

Task Auction Format

The format of the message announcing the task auction is described as follows:

{
  "task" : {
    "raw" : [full task definition] (1)
    "capabilities" : { (2)
    }
  },
  "auction" : {
    "starts" : "1970-01-01" (3)
    "ends"   : "1970-01-01" (4)
  },
}
  1. The full format of a task definition is not subject of this specification.

  2. Key-value listing of task capabilities requested for execution of the task.

  3. The ISO-8601 formatted timestamp of when the auction was opened

  4. The ISO-8601 formatted timestamp of when the auction will close.

Eventbus

The implementation and specifics of the Eventbus are not described in this document. For our purposes however it is important to describe the channels which are required for the resource auction to operate:

Table 2. Channels
Channel name Stateful Purpose

tasks.for_auction

Tasks which have not yet been auctioned, primarily used by the Auctioneer

tasks.auction

Tasks which are available to be bid upon by Orchestrators.

tasks.bids

Task bids by the various Orchestrators.

inbox.<clientId

Channel representing the private inbox of a given client. This channel is where rewarded bids will be dispatched.

tasks.started

x

Informational channel for tasks which are being executed.

tasks.finished

x

Informational channel for tasks which are finished executing.

Orchestrator

The role of "Orchestrator" in the auction process can be served by a service whose sole responsibility is to bid and provision agents, or it could be served by an Agent itself. Standalone Orchestrators might take the form of an "EC2 Orchestrator" which can dynamically provision resources in AWS EC2. An Agent-Orchestrator, an Agent which acts as an Orchestrator, in contrast would be a long-lived resource, like the proverbial build machine under somebodys desk.

Both forms of Orchestrators are responsible for determining their capabilities. These capabilities will help the Orchestrator determine whether or not it should bid for a certain task which is up for auction. For example, resources which are capable of running Docker containers would be able ot bid on tasks which require containers. A resource which cannot provide sudo access or admin privileges would in contrast avoid bidding on tasks which require escalated privileges for execution.

Both forms of Orchestrators should listen to the tasks.auction channel in additional to their "personal" inbox channel.

Table 3. Suggested Capabilities
Capability Values Notes

cores

integer > 0

Number of cores necessary to run the task

memory

50M or 1G formated strings

Memory necessary to run the task

docker_run

true / false

The resource can run a Docker container.

docker_build

true / false

The resource has a DOCKER_SOCK which can be used for running docker build.

Motivation

Tip

Explain why the existing code base or process is inadequate to address the problem that the RFC solves. This section may also contain any historal context such as how things were done before this proposal.

  • Do not discuss design choices or alternative designs that were rejected, those belong in the Reasoning section.

Reasoning

Tip

Explain why particular design decisions were made. Describe alternate designs that were considered and related work, e.g. how the feature is supported in other systems. Provide evidence of consensus within the community and discuss important objections or concerns raised during discussion.

  • Use sub-headings to organize this section for ease of readability.

  • Do not talk about history or why this needs to be done, that is part of Motivation section.

Backwards Compatibility

Tip

Describe any incompatibilities and their severity. Describe how the RFC proposes to deal with these incompatibilities.

If there are no backwards compatibility concerns, this section may simply say: There are no backwards compatibility concerns related to this proposal.

Security

Tip

Describe the security impact of this proposal. Outline what was done to identify and evaluate security issues, discuss of potential security issues and how they are mitigated or prevented, and how the RFC interacts with existing permissions, authentication, authorization, etc.

If this proposal will have no impact on security, this section may simply say: There are no security risks related to this proposal.

Testing

Tip

If the RFC involves any kind of behavioral change to code give a summary of how its correctness (and, if applicable, compatibility, security, etc.) can be tested.

In the preferred case that automated tests can be developed to cover all significant changes, simply give a short summary of the nature of these tests.

If some or all of changes will require human interaction to verify, explain why automated tests are considered impractical. Then summarize what kinds of test cases might be required: user scenarios with action steps and expected outcomes. Might behavior vary by platform (operating system, servlet container, web browser, etc.)? Are there foreseeable interactions between different permissible versions of components? Are any special tools, proprietary software, or online service accounts required to exercise a related code path (Active Directory server, GitHub login, etc.)? When will testing take place relative to merging code changes, and might retesting be required if other changes are made to this area in the future?

If this proposal requires no testing, this section may simply say: There are no testing issues related to this proposal.

Prototype Implementation

Tip

Link to any open source reference implementation of code changes for this proposal. The implementation need not be completed before the RFC is accepted but must be completed before the RFC is given "final" status.

RFCs which will not include code changes may omit this section.

References

Tip

Provide links to any related documents. This will include links to discussions on the mailing list, pull requests, and meeting notes.

</html>