Merge pull request #6 from jenkins-infra/github-events-005

IEP-5 Systematic collection of project events
This commit is contained in:
R. Tyler Croy 2017-01-24 09:29:52 -08:00 committed by GitHub
commit e546402653
1 changed files with 256 additions and 0 deletions

256
iep-005/README.adoc Normal file
View File

@ -0,0 +1,256 @@
ifdef::env-github[]
:tip-caption: :bulb:
:note-caption: :information_source:
:important-caption: :heavy_exclamation_mark:
:caution-caption: :fire:
:warning-caption: :warning:
endif::[]
= IEP-5: Systematic collection of project events
:toc:
.Metadata
[cols="2"]
|===
| IEP
| 5
| Title
| Systematic collection of project events
| Author
| link:https://github.com/rtyler[R. Tyler Croy]
| Status
| :speech_balloon: In-process
| Type
| Service
| Created
| 2016-12-28
|===
== Abstract
Considering the size and velocity of the Jenkins project, it can be difficult
to systematically determine the overall health through manual checks, scraping
of various APIs, or through other non-automated means. Additionally, without
any project-owned corpus of data around it is burdensome to answer questions
such as:
* What's the level of development activities in core, class A/B/C plugins? How are they changing over the time?
* Who are the seasoned contributors?
* Who are the new contributors that we can reach out to and help?
* What's the typical journey of a plugin developer?
== Specification
The primary source of project events is currently GitHub, which provides
organization-wide webhooks. This specification focuses primarily on these
events but can be readily extended when new sources of information arise.
=== Technologies Introduced
This specification introduces a few new technologies which are currently not
part of the Jenkins project infrastructure:
* link:https://azure.microsoft.com/en-us/services/functions/[Azure Functions]
* link:https://azure.microsoft.com/en-us/services/event-hubs/[Azure EventHub]
* link:https://azure.microsoft.com/en-us/services/documentdb/[Azure DocumentDB]
The motivations for selecting each of these pieces of technology is discussed
more in the <<Rationale>> section.
.Component Diagram
[source]
----
+--------------------+ +----------------------------+ +--------------+
| GitHub (jenkinsci) +----webhook------>| github-event-queue (App) +--enqueue-->| EventHub |
+--------------------+ +----->+----------------+-----------+ | (all events) |
+------------------------+ | | +--------------+
| GitHub (jenkins-infra) +-------+ |
+------------------------+ |
+----append----+
|
+--------------v--------+
| DocumentDB (github) |
+-----------------------+
----
=== Data Format
The following describes the JSON format to be expected by applications querying
either DocumentDB or consuming from an EventHub.
.JSON Event/Document Format
[source,json]
----
{
"source" : "github", // <1>
"type" : "pull_request", // <2>
"event" : {}, // <3>
"received" : "2017-01-05T21:56:04.522Z" // <4>
}
----
<1> Original source of the event, `"github"` indicates the event originated as a GitHub webhook.
<2> Type of event, in the case of the `github` source, this will be one of the link:https://developer.github.com/webhooks/#events[webhook event types].
<3> The actual JSON payload of the event.
<4> The ISO-8601 timestamp indicating when the event was received by the Azure Functions app
This table should be updated when new sources and types are added:
.Event Sources
|===
| Identifier | Description | Types
| `github`
| A GitHub webhook payload
| One of the link:https://developer.github.com/webhooks/#events[webhook event types].
|===
=== Storage Capacity
The <<Costs>> for Azure DocumentDB storage is only based on what is _actually_
consumed rather than what is provisioned. Therefore each DocumentDB provisioned
for events should be at minimum *25GB*.
Time-to-live:: Azure DocumentDB supports a document or collection-level time-to-live (TTL), which
can automatically purge documents after the TTL expires. Until we better
understand the amount of events data the project wishes to store, this will
remain *off*.
Consistency:: Until a compelling reason to change the consistency level for the
DocumentDB resources, the default of *session* consistency should be used.
=== Monitoring
The monitoring facilities
link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-monitoring[built into Azure Functions]
don't integrate with any of the existing monitoring tools in use by the Jenkins
project. Azure Functions can however, output diagnostic logs and web server
logs into an Azure storage account. This is not scoped in this document because
it is not yet clear whether these logs are necessary and worth the added cost
of storing them.
zure DocumentDB does not have a supported integration with DataDog.
Until there is more support in DataDog or Azure for better monitoring, these
services will *not* be automatically monitored.
== Motivation
By provisioning relatively simple webhook receivers which not only archive
events data into a live-queryable datastore (DocumentDB), but also publish those
events on an Azure EventHub, the Jenkins project will have an easy-to-access
data store of project events. Additionally, the events enqueued into EventHub
can act as an input for future services (for example, a project health
dashboard) without requiring additional infrastructure to be provisioned.
== Rationale
=== Technologies Introduced
An underlying rationale for using all of the Azure-specific technologies
referenced below is that it is cheaper, easier, and faster to use
platform-level services provided by Azure rather than implementing and hosting
the features of each technology itself.
==== Azure Functions
Implement a GitHub webhook receiver in Azure Functions is sufficiently trivial
that it is one of
link:https://docs.microsoft.com/en-us/azure/azure-functions/functions-create-a-web-hook-or-api-function[their "Get Started" examples],
and as such Azure Functions has explicit support and ehancements for receiving
webhook payloads from GitHub.
Additionally, Azure Functions represent a "slice" of computation which is
suitable for the purpose of receiving a JSON payload, processing it, and
storing it for later. This as opposed to implementing a new web application
specifically for this purpose which would need either its own virtual machine
or container infrastructure in order to execute.
==== Azure EventHub
The use of Azure EventHub in the architecture described above is more
future-proofing than a strong requirement to solve the problem at hand. The
assumption being that additional services in the future will wish to consume
some or all of the events received by the deployed Azure Functions app.
The most practical means of providing this service internally is through a
pub/sub mechanism which EventHubs provide in Azure. EventHubs also provide the
added benefit of automatically expiring old messages along with many other
valuable queueing features such as consumer groups and partitions.
==== Azure DocumentDB
GitHub webhook event payloads are constructed as JSON, and it is expected that
any subsequent events will be consumed as JSON. As such, a document-oriented
database ("NoSQL") is preferred in order to avoid time-consuming schema
updates.
== Costs
The pricing
link:https://azure.microsoft.com/en-us/pricing/details/functions/[for Azure Functions]
by itself is already confusing and without an existing Functions app
consuming the `jenkinsci` events it's difficult to evaluate what the runtime
cost would be. That said, 1 million monthly executions are provided for free by
Azure, meaning the Function app itself will cost nothing or very little.
The noticeable cost of this proposal will come from the
link:https://azure.microsoft.com/en-us/pricing/details/event-hubs/[EventHub]
and
link:https://azure.microsoft.com/en-us/pricing/details/documentdb/[DocumentDB]
storage and transit rates.
=== DocumentDB
The storage rate, in East US, is *$0.25 per GB / month*. The throughput rate, in East US, is *$0.008/hr* per hundred
link:https://docs.microsoft.com/en-us/azure/documentdb/documentdb-manage#request-units-and-database-operations[Request Units per second].
Assuming each DocumentDB instance is provisioned with 5GB of storage, the
annual storage cost will be roughly *$15*. Though this is likely to go up as
more data is stored over time.
The throughput rate's annual cost is difficult to ascertain without real-world
usage, but is expected to remain under *$100* barring dramatic shifts in
expected usage.
=== EventHub
The cost per million ingress events in East US is a paltry $0.028, so not worth
discussing.
The throughput unit cost (1MB ingress, 2MB egress) comes to an annual cost of
*$133.92*.
== Reference implementation
The reference implementation of the Azure Functions app can be found in the
link:https://github.com/jenkins-infra/analytics-functions[jenkins-infra/analytics-functions]
repository.
The Terraform for actually provisioning the Azure Functions app in Azure can be
found in
link:https://github.com/jenkins-infra/azure/pull/12[this pull request].