9.7 KiB

Raw Permalink Blame History

RFC-0003: Resource allocation by auction

Table 1. Metadata
RFC	0003
Title	Resource allocation by auction
Sponsor	R Tyler Croy
Status	Draft 💬
Type	Standards
Created	2019-12-13
Depends on	RFC-0005

Table of Contents

Abstract
Specification
Motivation
Reasoning
Backwards Compatibility
Security
Testing
Prototype Implementation
References

Abstract

Fundamental to any task orchestration engine, such as Otto, is the allocation of resources for the execution of tasks. Matching tasks to resources is simple in small environments where one might have a single orchestrator and only a few resources, e.g. virtual machines, capable of executing tasks. More complex task workloads require increasingly complex approaches for efficient allocation and utilization of resources available to the task orchestration engine. This document describes the approach taken by Otto, wherein tasks are "auctioned" to resources or orchestrators which can maximize utilization (saturation of resource) while minimizing cost (time to execute task, operational expenditure of resource).

Specification

Auction-based Resource Allocation requires at minimum three system components in order to operate properly: the Eventbus, the Auctioneer, and an Orchestrator. The Auctioneer manages the bidding on tasks and ultimately is responsible for evaluating which bid has the lowest cost to execute the task. Resource cost is a coupling of the underlying operational cost of the resource, e.g. Compute/Hour, and the Orhcestrator’s estimated time to execute the task.

This specification will not describe the actual execution of the task, which may be carried out on an Agent, at the behest of an Orchestrator, or executed by the Orchestrator itself.

Tasks in the system are also not generated by the Auctioneer, but are instead expected to be published to the Eventbus by another service.

Auctioneer

Task auctions are meant to be fast, lightweight, and not 100% perfect. The Auctioneer plays the role of operating a quick auction for each task that it receives. The Auctioneers uses a configured auction duration to determine how much time to allow each auction to exist.

Note	The Auctioneer does not generate the tasks in the system, that responsibility is outside of the purview of this document.

For example a task A arrives in tasks.for_auction. The Auctioneer processes the task and creates its internal representation for the auction before announcing the task auction on tasks.auction. The various Orchestrators consuming from tasks.auction may then consider the contents of the auction format in order to determine whether they can/should create a bid, which they submit onto tasks.bids.

The Auctioneer listens on tasks.bids for bids on all open auctions, for the example task A, it would see zero or more bids for A. Once the configured auction duration elapses, the Auctioneer chooses the most cost-effective bid and then writes a "auction won" message into the inbox (inbox.<clientId>) for the Orchestrator whose bid won.

The Auctioneer maintains a list of currently "open" auctions for tasks, and reports via a web interface on the status of these auctions.

Caution

Reliability concern: What happens if an Orchestrator wins a bid, but then is unable to actually start working on the task? Should it be cancelled? How would the Auctioneer handle this?

Task Auction Format

The format of the message announcing the task auction is described as follows:

{
  "task" : {
    "raw" : [full task definition] (1)
    "capabilities" : { (2)
    }
  },
  "auction" : {
    "starts" : "1970-01-01" (3)
    "ends"   : "1970-01-01" (4)
  },
}

The full format of a task definition is not subject of this specification.
Key-value listing of task capabilities requested for execution of the task.
The ISO-8601 formatted timestamp of when the auction was opened
The ISO-8601 formatted timestamp of when the auction will close.

Eventbus

The implementation and specifics of the Eventbus are not described in this document. For our purposes however it is important to describe the channels which are required for the resource auction to operate:

Table 2. Channels
Channel name	Stateful	Purpose
`tasks.for_auction`	✓	Tasks which have not yet been auctioned, primarily used by the Auctioneer
`tasks.auction`	✓	Tasks which are available to be bid upon by Orchestrators.
`tasks.bids`	✓	Task bids by the various Orchestrators.
`inbox.<clientId`	✓	Channel representing the private inbox of a given client. This channel is where rewarded bids will be dispatched.
`tasks.started`	x	Informational channel for tasks which are being executed.
`tasks.finished`	x	Informational channel for tasks which are finished executing.

Orchestrator

The role of "Orchestrator" in the auction process can be served by a service whose sole responsibility is to bid and provision agents, or it could be served by an Agent itself. Standalone Orchestrators might take the form of an "EC2 Orchestrator" which can dynamically provision resources in AWS EC2. An Agent-Orchestrator, an Agent which acts as an Orchestrator, in contrast would be a long-lived resource, like the proverbial build machine under somebody’s desk.

Both forms of Orchestrators are responsible for determining their capabilities. These capabilities will help the Orchestrator determine whether or not it should bid for a certain task which is up for auction. For example, resources which are capable of running Docker containers would be able ot bid on tasks which require containers. A resource which cannot provide sudo access or admin privileges would in contrast avoid bidding on tasks which require escalated privileges for execution.

Both forms of Orchestrators should listen to the tasks.auction channel in additional to their "personal" inbox channel.

Table 3. Suggested Capabilities
Capability	Values	Notes
`cores`	`integer > 0`	Number of cores necessary to run the task
`memory`	`50M` or `1G` formated strings	Memory necessary to run the task
`docker_run`	`true` / `false`	The resource can run a Docker container.
`docker_build`	`true` / `false`	The resource has a `DOCKER_SOCK` which can be used for running `docker build`.

Motivation

Tip

Explain why the existing code base or process is inadequate to address the problem that the RFC solves. This section may also contain any historal context such as how things were done before this proposal.

Do not discuss design choices or alternative designs that were rejected, those belong in the Reasoning section.

Reasoning

Tip

Explain why particular design decisions were made. Describe alternate designs that were considered and related work, e.g. how the feature is supported in other systems. Provide evidence of consensus within the community and discuss important objections or concerns raised during discussion.

Use sub-headings to organize this section for ease of readability.
Do not talk about history or why this needs to be done, that is part of Motivation section.

Backwards Compatibility

Tip	Describe any incompatibilities and their severity. Describe how the RFC proposes to deal with these incompatibilities. If there are no backwards compatibility concerns, this section may simply say: There are no backwards compatibility concerns related to this proposal.

Security

Tip

Describe the security impact of this proposal. Outline what was done to identify and evaluate security issues, discuss of potential security issues and how they are mitigated or prevented, and how the RFC interacts with existing permissions, authentication, authorization, etc.

If this proposal will have no impact on security, this section may simply say: There are no security risks related to this proposal.

Testing

Tip

If the RFC involves any kind of behavioral change to code give a summary of how its correctness (and, if applicable, compatibility, security, etc.) can be tested.

In the preferred case that automated tests can be developed to cover all significant changes, simply give a short summary of the nature of these tests.

If some or all of changes will require human interaction to verify, explain why automated tests are considered impractical. Then summarize what kinds of test cases might be required: user scenarios with action steps and expected outcomes. Might behavior vary by platform (operating system, servlet container, web browser, etc.)? Are there foreseeable interactions between different permissible versions of components? Are any special tools, proprietary software, or online service accounts required to exercise a related code path (Active Directory server, GitHub login, etc.)? When will testing take place relative to merging code changes, and might retesting be required if other changes are made to this area in the future?

If this proposal requires no testing, this section may simply say: There are no testing issues related to this proposal.

Prototype Implementation

Tip	Link to any open source reference implementation of code changes for this proposal. The implementation need not be completed before the RFC is accepted but must be completed before the RFC is given "final" status. RFCs which will not include code changes may omit this section.

References

Tip	Provide links to any related documents. This will include links to discussions on the mailing list, pull requests, and meeting notes.

Last updated 2024-05-15 08:15:24 UTC

</html>

9.7 KiB Raw Permalink Blame History Unescape Escape