Go to file
R Tyler Croy f4f8493da3
Specify a bit more what federation should look like
2021-02-28 16:20:39 -08:00
.gitignore initial commit 2021-02-28 14:12:26 -08:00
README.adoc Specify a bit more what federation should look like 2021-02-28 16:20:39 -08:00

README.adoc

<html lang="en"> <head> </head>

Open Distribution Initiative

The Open Distribution Initiative is a concept for developing more scalable and federated means of distribution free and open source software artifacts.

Concept

At its core the current concept for ODI (Open Distribution Initiative) is that of central coordinating directory servers and relays which ultimately service user-traffic. The directory servers hold the shared inventory of what artifacts are available, checksums, and distribution statistics. Relays on the other hand are intended to cache and forward some subset of the catalog to end clients. The relays are intended to be owned and operated by a large heterogeneous mix of users, whereas directory servers are more likely to operated on high throughput academic or corporate networks. In either case, the goal is to limit centralization to the extent possible.

ODI operates over traditional HTTP with the custom software for directories and relays, but clients should never require customized software to consume artifacts.

Note

ODI is inspired somewhat by Tor and its network of relays and bridges.

An example ODI network topology
everest.example.com
                                     odi-r1.osuosl.org
 ODI Directory
+---------------+                     ODI Relay
|               |                    +-------------+
|  +---------+  |                    | +---------+ |
|  | Catalog +-------------------------> Catalog | |
|  +---------+  |                    | +---------+ |
|  +---------+  |                    +-------------+
|  | Catalog +--------------+
|  +---------+  |           |        odi.sonic.com
|  +---------+  |           |
|  | Catalog +-----+-----+  |         ODI Relay
|  +---------+  |  |     |  |        +-------------+
|               |  |     |  |        | +---------+ |
+---------------+  |     |  +----------> Catalog | |
                   |     |           | +---------+ |
shasta.example.com |     |           +-------------+
                   |     |
 ODI Directory     |     |           odi.ocf.berkeley.edu
+---------------+  |     |
|               |  |     |            ODI Relay
|  +---------+  |  |     |           +-------------+
|  | Catalog <-----+     |           | +---------+ |
|  +---------+  |        +-------------> Catalog | |
|  +---------+  |                    | +---------+ |
|  | Catalog +-----------+           | +---------+ |
|  +---------+  |        +-------------> Catalog | |
|  +---------+  |                    | +---------+ |
|  | Catalog +-----------+           | +---------+ |
|  +---------+  |        +-------------> Catalog | |
|               |                    | +---------+ |
+---------------+                    +-------------+

In the topology above there are two directory servers that have been deployed. An example request for an artifact from everest.example.com might follow the following path:

Client request flow
       Client                                    Directory              Relay
         +                                           +                    +
         |  Requests /r/artifact-1.jar               |                    |
         +-----------------------------------------> |                    |
         |                                           |                    |
         |                                    Lookup relay table          |
         |                                    for distribution            |
         |                                           |                    |
         |                302 Found on Relay         |                    |
         | <-----------------------------------------+                    |
         |                                           |                    |
         |  Requests /r/artifact-1.jar               |                    |
         +--------------------------------------------------------------> |
         |                                           |                    |
         |                                           |    200 Ok w/ bytes |
         | <--------------------------------------------------------------+
         |                                           |                    |

Relays

Relays are just simple HTTP servers with a valid domain name and TLS certificate. Additionally, they must run the odi-relayd in order to ensure they are keeping the proper synchronization with the configured ODI Directories.

Seeding

In order for an ODI Relay to work properly, it must have a catalog downloaded and ready to serve. While not yet defined, this is expected to be handled through a combination of relay-specified configuration such as:

  • How much disk space can be used.

  • How much bandwidth may be utilized.

  • Which ODI Directories the operator wishes to interact with.

  • What catalogs or catalog tags is the operator interested in mirroring.

A relay would then register with configured directories, providing some of the configuration information and a pre-shared key the relay generated for authenticating future inbound requests from the directories.

Once the relay has passed self-test by the directory, the directory would assign some portion of the requested catalog(s) to the relay and notify the relay to begin downloading the artifacts.

Upon completion, the relay would inform the directory that it is ready to begin operation. Once the directory confirms the relay is properly running, it would begin directing traffic for the artifacts assigned to that relay.

Clean up

Relays must specify how much space they are willing to host and are responsible for performing some level of "garbage collection" on older lesser-used artifacts to ensure that space for "newer" artifacts in the catalog are made.

Note

The ODI Directory may request some artifacts be expunged which have known severe security defects or have otherwise been corrupted.

Directories

The ODI Directory is the most complex part of the equation and is responsible for both maintaining relationships with active relays but also other directories. The directory-to-directory federation helps ensure that no single directory may end up as a single point of failure.

Fundamentally a Directory must have a large amount of disk space as its intended to provide the "source of truth" for all artifacts it expects its relays to serve. In the future directories may provide a mode of operation where only relays and origin servers have copies of the artifact.

Statistics need to be kept to identify "hot" artifacts which require more capacity. The directory may request more relays to host artifacts which require more capacity. The directory is also responsible for notifying relays of new artifacts in their respective catalogs.

Federation

ODI Directories are intended to be federated whenever possible to provide resiliency. The two forms of federation supported by directories are references and links. Taking inspiration from Peertube, these two modes of federation allow for some directories to reference others, providing pointers to artifacts or even allowing a "directory of directories" to exist. Alternatively, when two directories are linked they will both effectively be serving the same catalog(s) with its artifacts and metadata.

A directory which only contains references to other directories can be thought of as a meta-directory and can act as a single reference point for clients. The meta-directory would receive requests for artifacts, determine which directory owns the catalog containing the artifact, and redirect the request off to that directory server where the typical directory/relay flow would occur. The meta-directory may organize directories it references which contain identical catalogs into geographical or other groupings for faster distribution to clients.

An example of a global distribution network via the Open Distribution Initiative
                                      eu.freebsd.org          mirrors.xmission.com
dist.freebsd.org
                                       ODI Directory           ODI Relay
 ODI Meta-directory                   +---------------+       +---------------+
+--------------------+                |               |       | +-----------+ |
|                    |                | +-----------+ |    +----> Catalog A | |
| Georeference (EU)  +------------------> Catalog A <---+  |  | +-----------+ |
|                    |                | +-----------+ | |  |  | +-----------+ |
|                    |                |               | |  |  | | Catalog B | |
| Georeference* (NA) +-------+        +---------------+ |  |  | +-----------+ |
|                    |       |                          |  |  +---------------+
|                    |       |        na.freebsd.org    |  |
| Georeference (CN)  +-----+ |                          |  |
|                    |     | |         ODI Directory    |  |
|                    |     | |        +---------------+ |  |
+--------------------+     | |        |               | |  |
                           | |        | +-----------+ | |  |
                           | +----------> Catalog A +------+
                           |          | +-----------+ | |
                           |          |               | |
                           |          +---------------+ |
                           |                            |
                           |          cn.freebsd.org    |
                           |                            link
                           |           ODI Directory    |
                           |          +---------------+ |
                           |          |               | |
                           |          | +-----------+ | |
                           +------------> Catalog A <---+
                                      | +-----------+ |
                                      |               |
                                      +---------------+

In the above example na.freebsd.org is considered the canonical directory for Catalog A to which both eu. and cn. are linked. The "meta-directory" running at dist.freebsd.org is configured with geo-references for the different global regions.

Important

At this time federation of artifact statistics is not planned. Aside from novel graphs its not clear what value aggregation of statistics might provide.

Catalogs

Note

The exact size and shape of ODI catalogs has yet to be defined

Open Questions

  • How would a catalog on a directory be updated? When a project pushes a release, ODI could act similar to an origin-pull CDN model wherein a projects catalog is configured to pull from a lower bandwidth origin server and then effectively disseminate that through the ODI network. Another option would be to simply rely on "triggering" but that may require some sort of active user management/API tier, whereas origin-pull could operate via static configuration managed by pull requests.

  • Should catalogs be organized based around tags? Ecosystem (e.g. Python)? What level of granularity is useful here? The "Group" in an rpmspec might be a useful pattern to emulate here.

Distribution Challenges

Distribution of artifacts for free and open source projects faces a number of challenges, not the least of which is financial. Many major projects rely on corporate funding for CDN or other hosting services to distribute key artifacts to their downstream developers and end-users..For smaller projects, corporate or academic support for their software distribution is not an option leaving many to rely heavily on proprietary services like GitHub (Releases/Packages) or platform-specific artifact repositories (such as Rubygems.org, Python Package Index, etc).

Some "first generation projects" (those that predate or are concurrent with the SourceForge era) may rely on mirror networks for artifact distribution. The patchwork of mirrors powering the Apache Software Foundation, Debian, or openSUSE helps them distribute many terabytes of data per month, but typically relies on a handful of volunteers in order to remain viable. Additionally, mirroring relationships are typically formed between individuals with significant systems administration experience, leading to a very clear skew towards operating systems and infrastructure tools being distributed through these mirroring networks.

Corporate Funding

Theres nothing wrong with corporate funding for infrastructure. Solely relying on corporate generosity can and does present challenges for a number of projects seeking to maintain funding continuity in their budgets.

Some projects which rely heavily on corporate generosity for their distribution are:

  • Maven Central which is owned and operated by Sonatype, Inc.

  • NPM which is owned and operated by Microsoft.

  • GitHub releases, which is owned and operated by Microsoft.

Mixed Funding

  • Python Package Index which is supported by the Python Software Foundation, with infrastructure sponsorship from AWS, Google, Fastly.

  • Rubygems.org which is supported by Ruby Together and Ruby Central, with infrastructure sponsorship from Fastly.

  • Jenkins which is supported by the Continuous Delivery Foundation, a corporate trade organization, with a non-trivial part of distribution served via a volunteer-managed mirror network.

  • openSUSE which is sponsored by SUSE GmbH in addition to other companies, with a non-trivial part of distribution served via a volunteer-managed mirror network.

</html>