mirror of https://github.com/jenkins-infra/iep
Merge pull request #6 from jenkins-infra/github-events-005
IEP-5 Systematic collection of project events
This commit is contained in:
commit
e546402653
|
@ -0,0 +1,256 @@
|
|||
ifdef::env-github[]
|
||||
:tip-caption: :bulb:
|
||||
:note-caption: :information_source:
|
||||
:important-caption: :heavy_exclamation_mark:
|
||||
:caution-caption: :fire:
|
||||
:warning-caption: :warning:
|
||||
endif::[]
|
||||
|
||||
= IEP-5: Systematic collection of project events
|
||||
|
||||
:toc:
|
||||
|
||||
.Metadata
|
||||
[cols="2"]
|
||||
|===
|
||||
|
||||
| IEP
|
||||
| 5
|
||||
|
||||
| Title
|
||||
| Systematic collection of project events
|
||||
|
||||
| Author
|
||||
| link:https://github.com/rtyler[R. Tyler Croy]
|
||||
|
||||
| Status
|
||||
| :speech_balloon: In-process
|
||||
|
||||
| Type
|
||||
| Service
|
||||
|
||||
| Created
|
||||
| 2016-12-28
|
||||
|===
|
||||
|
||||
|
||||
|
||||
== Abstract
|
||||
|
||||
Considering the size and velocity of the Jenkins project, it can be difficult
|
||||
to systematically determine the overall health through manual checks, scraping
|
||||
of various APIs, or through other non-automated means. Additionally, without
|
||||
any project-owned corpus of data around it is burdensome to answer questions
|
||||
such as:
|
||||
|
||||
* What's the level of development activities in core, class A/B/C plugins? How are they changing over the time?
|
||||
* Who are the seasoned contributors?
|
||||
* Who are the new contributors that we can reach out to and help?
|
||||
* What's the typical journey of a plugin developer?
|
||||
|
||||
== Specification
|
||||
|
||||
The primary source of project events is currently GitHub, which provides
|
||||
organization-wide webhooks. This specification focuses primarily on these
|
||||
events but can be readily extended when new sources of information arise.
|
||||
|
||||
=== Technologies Introduced
|
||||
|
||||
This specification introduces a few new technologies which are currently not
|
||||
part of the Jenkins project infrastructure:
|
||||
|
||||
* link:https://azure.microsoft.com/en-us/services/functions/[Azure Functions]
|
||||
* link:https://azure.microsoft.com/en-us/services/event-hubs/[Azure EventHub]
|
||||
* link:https://azure.microsoft.com/en-us/services/documentdb/[Azure DocumentDB]
|
||||
|
||||
|
||||
The motivations for selecting each of these pieces of technology is discussed
|
||||
more in the <<Rationale>> section.
|
||||
|
||||
|
||||
|
||||
.Component Diagram
|
||||
[source]
|
||||
----
|
||||
|
||||
+--------------------+ +----------------------------+ +--------------+
|
||||
| GitHub (jenkinsci) +----webhook------>| github-event-queue (App) +--enqueue-->| EventHub |
|
||||
+--------------------+ +----->+----------------+-----------+ | (all events) |
|
||||
+------------------------+ | | +--------------+
|
||||
| GitHub (jenkins-infra) +-------+ |
|
||||
+------------------------+ |
|
||||
+----append----+
|
||||
|
|
||||
+--------------v--------+
|
||||
| DocumentDB (github) |
|
||||
+-----------------------+
|
||||
----
|
||||
|
||||
|
||||
=== Data Format
|
||||
|
||||
The following describes the JSON format to be expected by applications querying
|
||||
either DocumentDB or consuming from an EventHub.
|
||||
|
||||
.JSON Event/Document Format
|
||||
[source,json]
|
||||
----
|
||||
{
|
||||
"source" : "github", // <1>
|
||||
"type" : "pull_request", // <2>
|
||||
"event" : {}, // <3>
|
||||
"received" : "2017-01-05T21:56:04.522Z" // <4>
|
||||
}
|
||||
----
|
||||
<1> Original source of the event, `"github"` indicates the event originated as a GitHub webhook.
|
||||
<2> Type of event, in the case of the `github` source, this will be one of the link:https://developer.github.com/webhooks/#events[webhook event types].
|
||||
<3> The actual JSON payload of the event.
|
||||
<4> The ISO-8601 timestamp indicating when the event was received by the Azure Functions app
|
||||
|
||||
This table should be updated when new sources and types are added:
|
||||
|
||||
.Event Sources
|
||||
|===
|
||||
| Identifier | Description | Types
|
||||
|
||||
| `github`
|
||||
| A GitHub webhook payload
|
||||
| One of the link:https://developer.github.com/webhooks/#events[webhook event types].
|
||||
|
||||
|===
|
||||
|
||||
|
||||
=== Storage Capacity
|
||||
|
||||
The <<Costs>> for Azure DocumentDB storage is only based on what is _actually_
|
||||
consumed rather than what is provisioned. Therefore each DocumentDB provisioned
|
||||
for events should be at minimum *25GB*.
|
||||
|
||||
Time-to-live:: Azure DocumentDB supports a document or collection-level time-to-live (TTL), which
|
||||
can automatically purge documents after the TTL expires. Until we better
|
||||
understand the amount of events data the project wishes to store, this will
|
||||
remain *off*.
|
||||
Consistency:: Until a compelling reason to change the consistency level for the
|
||||
DocumentDB resources, the default of *session* consistency should be used.
|
||||
|
||||
|
||||
=== Monitoring
|
||||
|
||||
The monitoring facilities
|
||||
link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-monitoring[built into Azure Functions]
|
||||
don't integrate with any of the existing monitoring tools in use by the Jenkins
|
||||
project. Azure Functions can however, output diagnostic logs and web server
|
||||
logs into an Azure storage account. This is not scoped in this document because
|
||||
it is not yet clear whether these logs are necessary and worth the added cost
|
||||
of storing them.
|
||||
|
||||
zure DocumentDB does not have a supported integration with DataDog.
|
||||
|
||||
|
||||
Until there is more support in DataDog or Azure for better monitoring, these
|
||||
services will *not* be automatically monitored.
|
||||
|
||||
|
||||
|
||||
== Motivation
|
||||
|
||||
By provisioning relatively simple webhook receivers which not only archive
|
||||
events data into a live-queryable datastore (DocumentDB), but also publish those
|
||||
events on an Azure EventHub, the Jenkins project will have an easy-to-access
|
||||
data store of project events. Additionally, the events enqueued into EventHub
|
||||
can act as an input for future services (for example, a project health
|
||||
dashboard) without requiring additional infrastructure to be provisioned.
|
||||
|
||||
== Rationale
|
||||
|
||||
=== Technologies Introduced
|
||||
|
||||
An underlying rationale for using all of the Azure-specific technologies
|
||||
referenced below is that it is cheaper, easier, and faster to use
|
||||
platform-level services provided by Azure rather than implementing and hosting
|
||||
the features of each technology itself.
|
||||
|
||||
==== Azure Functions
|
||||
|
||||
Implement a GitHub webhook receiver in Azure Functions is sufficiently trivial
|
||||
that it is one of
|
||||
link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-a-web-hook-or-api-function[their "Get Started" examples],
|
||||
and as such Azure Functions has explicit support and ehancements for receiving
|
||||
webhook payloads from GitHub.
|
||||
|
||||
Additionally, Azure Functions represent a "slice" of computation which is
|
||||
suitable for the purpose of receiving a JSON payload, processing it, and
|
||||
storing it for later. This as opposed to implementing a new web application
|
||||
specifically for this purpose which would need either its own virtual machine
|
||||
or container infrastructure in order to execute.
|
||||
|
||||
|
||||
==== Azure EventHub
|
||||
|
||||
The use of Azure EventHub in the architecture described above is more
|
||||
future-proofing than a strong requirement to solve the problem at hand. The
|
||||
assumption being that additional services in the future will wish to consume
|
||||
some or all of the events received by the deployed Azure Functions app.
|
||||
|
||||
The most practical means of providing this service internally is through a
|
||||
pub/sub mechanism which EventHubs provide in Azure. EventHubs also provide the
|
||||
added benefit of automatically expiring old messages along with many other
|
||||
valuable queueing features such as consumer groups and partitions.
|
||||
|
||||
|
||||
==== Azure DocumentDB
|
||||
|
||||
GitHub webhook event payloads are constructed as JSON, and it is expected that
|
||||
any subsequent events will be consumed as JSON. As such, a document-oriented
|
||||
database ("NoSQL") is preferred in order to avoid time-consuming schema
|
||||
updates.
|
||||
|
||||
|
||||
== Costs
|
||||
|
||||
The pricing
|
||||
link:https://azure.microsoft.com/en-us/pricing/details/functions/[for Azure Functions]
|
||||
by itself is already confusing and without an existing Functions app
|
||||
consuming the `jenkinsci` events it's difficult to evaluate what the runtime
|
||||
cost would be. That said, 1 million monthly executions are provided for free by
|
||||
Azure, meaning the Function app itself will cost nothing or very little.
|
||||
|
||||
|
||||
The noticeable cost of this proposal will come from the
|
||||
link:https://azure.microsoft.com/en-us/pricing/details/event-hubs/[EventHub]
|
||||
and
|
||||
link:https://azure.microsoft.com/en-us/pricing/details/documentdb/[DocumentDB]
|
||||
storage and transit rates.
|
||||
|
||||
|
||||
=== DocumentDB
|
||||
|
||||
The storage rate, in East US, is *$0.25 per GB / month*. The throughput rate, in East US, is *$0.008/hr* per hundred
|
||||
link:https://docs.microsoft.com/en-us/azure/documentdb/documentdb-manage#request-units-and-database-operations[Request Units per second].
|
||||
|
||||
Assuming each DocumentDB instance is provisioned with 5GB of storage, the
|
||||
annual storage cost will be roughly *$15*. Though this is likely to go up as
|
||||
more data is stored over time.
|
||||
|
||||
The throughput rate's annual cost is difficult to ascertain without real-world
|
||||
usage, but is expected to remain under *$100* barring dramatic shifts in
|
||||
expected usage.
|
||||
|
||||
=== EventHub
|
||||
|
||||
The cost per million ingress events in East US is a paltry $0.028, so not worth
|
||||
discussing.
|
||||
|
||||
The throughput unit cost (1MB ingress, 2MB egress) comes to an annual cost of
|
||||
*$133.92*.
|
||||
|
||||
== Reference implementation
|
||||
|
||||
The reference implementation of the Azure Functions app can be found in the
|
||||
link:https://github.com/jenkins-infra/analytics-functions[jenkins-infra/analytics-functions]
|
||||
repository.
|
||||
|
||||
The Terraform for actually provisioning the Azure Functions app in Azure can be
|
||||
found in
|
||||
link:https://github.com/jenkins-infra/azure/pull/12[this pull request].
|
||||
|
Loading…
Reference in New Issue