Add a blog post about hotdog
This commit is contained in:
parent
54740111dd
commit
c9613c1332
|
@ -0,0 +1,138 @@
|
|||
---
|
||||
layout: post
|
||||
title: Ingesting production logs with Rust
|
||||
tags:
|
||||
- rust
|
||||
- syslog
|
||||
- hotdog
|
||||
- featured
|
||||
team: Core Platform
|
||||
author: rtyler
|
||||
---
|
||||
|
||||
When we set our goals at the beginning of the year "deploy Rust to production"
|
||||
was not among them, yet here we are. Our pattern of deploying small services in containers
|
||||
allowed us to easily bring Rust into production, replacing a difficult to
|
||||
manage service in the process. In January, the Core Platform team started working on a
|
||||
project called "View Analytics". The effort was primarily to replace an aging
|
||||
system which was literally referred to as "old analytics." As part of the View
|
||||
Analytics design we needed to provide an entry point for [Fastly](https://fastly.com) to relay access logs as syslog
|
||||
formatted messages which could then be written into [Apache Kafka](https://kafka.apache.org), driving the entire
|
||||
View Analytics data pipeline. Our initial rollout shipped an [rsyslog](https://rsyslog.com)-based solution
|
||||
for the “rsyslog-kafka” service.. Using rsyslogd worked fairly well, but had a
|
||||
couple of significant downsides. Last month, we deployed its replacement: a
|
||||
custom open source daemon written in Rust: [hotdog](https://github.com/reiseburo/hotdog) 🌭.
|
||||
|
||||
|
||||
(**Note:** _This specific use-case was well suited to Rust. That does not mean
|
||||
that anything else we do at Scribd should or will necessarily be written in
|
||||
Rust._)
|
||||
|
||||
|
||||
## Problems with rsyslog
|
||||
|
||||
rsyslog is one of those tools that seems to have existed since the dawn of
|
||||
time. It is incredibly common to find in logging infrastructure since it routes
|
||||
just about any log from any thing, to any where. Our first iteration of the
|
||||
aforementioned `rsyslog-kafka` service relied on it because of its ubiquity. We had a proble that looked like routing logs from one thing (Fastly) to another thing (Kafka), and that's basically what `rsyslogd` does!
|
||||
|
||||
This flexibility comes at a price. When explaining to colleagues what rsyslog
|
||||
_really_ is, I would describe it as "an old C-based scripting engine that just
|
||||
happens to forward logs." If they didn't believe me, I would send them the
|
||||
documentation to
|
||||
[Rainerscript](https://rsyslog.readthedocs.io/en/latest/rainerscript/), named
|
||||
after [Rainer Gerhards](https://en.wikipedia.org/wiki/Rainer_Gerhards) the
|
||||
author of `rsyslog`. Accepting this "zen of rsyslog" makes it slightly
|
||||
easier to understand and work with the rsyslog configuration syntax.
|
||||
That said, I still find it incredibly difficult to work with, and even harder
|
||||
to test.
|
||||
|
||||
In our pipeline, we needed to bring JSON formatted messages from Fastly and
|
||||
route them to the appropriate topics, using the approximate format of:
|
||||
|
||||
```json
|
||||
{
|
||||
"$schema" : "some/jsonschema/def.yml",
|
||||
"$topic" : "logs-fastly",
|
||||
"meta" : {
|
||||
},
|
||||
"url" : "etcetc",
|
||||
"timestamp" : "iso8601"
|
||||
}
|
||||
```
|
||||
|
||||
JSON parsing in rsyslog is feasible, but not easy. To make matters worse, there
|
||||
was no way to handle JSON keys which use the dollar-sign `$`, because the
|
||||
scripting interpreter treats `$` characters as variable references. The
|
||||
original version of our rsyslog-kafka gateway that went into production ended
|
||||
up using regular expressions to fish out the topic we needed for routing!
|
||||
|
||||
Unfortunately, the daemon also does not emit metrics or statistics natively in
|
||||
a format we could easily get into Datadog. The only way to get the statistics
|
||||
we needed out of the daemon would be to ingest statistics written out to a file through a sidecar
|
||||
container and report those into Datadog. This would have required building a
|
||||
custom daemon to parse the rsyslogd stats output which seemed like a lot of
|
||||
work without a lot of benefit.
|
||||
|
||||
This all left us with very little understanding of how a service which was
|
||||
difficult to configure and test would actually run in production.
|
||||
|
||||
|
||||
## Makin' hotdogs
|
||||
|
||||
Bored one weekend with nothing to do, I asked myself “how hard could getting syslog into Kafka be?” As it turned out: _not that hard_.
|
||||
|
||||
I continued to improve the [daemon](https://github.com/reiseburo/hotdog) over a number of
|
||||
weeks until I had feature parity with our rsyslogd use-case, and then some!
|
||||
|
||||
* RFC 5424/3164 syslog-formatted message parsing
|
||||
* Custom routing based on regular expression or [JMESPath](https://jmespath.org/) rules
|
||||
* syslog over TCP, or TCP with TLS encryption
|
||||
* Native statsd support for a myriad of operational metrics we care about
|
||||
* Inline message modification based on simple Handlebars templates
|
||||
|
||||
Since the rsyslog-kafka service is deployed in a Docker container, we deployed
|
||||
a new build of the container with 🌭 inside to our development environment and
|
||||
started testing. After testing looked to be going well, we deployed to
|
||||
production at the end of May.
|
||||
|
||||
Overall the process went well!
|
||||
|
||||
|
||||
## What was learned
|
||||
|
||||
The biggest take-away from this effort has been the power of small services
|
||||
packaged into Docker containers. The entire inside of the container changed,
|
||||
but because the external contracts were not changed the service could be
|
||||
significantly modified without issue.
|
||||
|
||||
The original implementation was ~2x slower than rsyslog and required a doubling
|
||||
of the number of containers running in ECS. The poor performance almost
|
||||
entirely came to laziness in the original Rust implementation. Repeated parsing
|
||||
of JSON strings, reallocations, and unnecessary polling.
|
||||
|
||||
The performance issues were easily identified and fixed with the help of the
|
||||
`perf` on Linux (`perf record --call-graph dwarf` is wonderful!) That said, I
|
||||
am still quite impressed that a completely unoptimized Rust daemon, built on
|
||||
[async-std](https://async.rs), was performing reasonably close to a
|
||||
finely-tuned system like `rsyslogd`. While I haven't done a conclusive
|
||||
comparison, now that hotdog has been optimized I would guesstimate that it is
|
||||
with +/-10% performance parity `rsyslogd`.
|
||||
|
||||
![Hotdog and Datadog](/post-images/2020-06-hotdog/hotdog-metrics.png)
|
||||
|
||||
|
||||
Having full control over the syslog entrypoint proved valuable almost
|
||||
immediately. During a pairing session with my colleague Hamilton, he expressed the
|
||||
desire for an additional metric: per-topic message submission counters. In
|
||||
`rsyslogd` the metric doesn't exist in any form, but because hotdog was built to
|
||||
support statsd out of the box, we made a one-line change adding the new metric
|
||||
and our visibility went up almost immediately!
|
||||
|
||||
|
||||
---
|
||||
|
||||
Scribd has a number of services deployed in production using Ruby, Golang,
|
||||
Python, Java, and now a little bit of Rust too. As far as weekend hacks go,
|
||||
[hotdog](https://github.com/reiseburo/hotdog) worked out quite well, if you have thousands of log entries per second that you need to get into Kafka, give it a try!
|
||||
|
Binary file not shown.
After Width: | Height: | Size: 38 KiB |
Loading…
Reference in New Issue