A prototype of a monitoring system
Go to file
R. Tyler Croy a9cd1c47ad
Upgrade to the latest version of jruby/gradle plugin and rework the jar
2015-10-09 13:52:08 -07:00
bin Upgrade to the latest version of jruby/gradle plugin and rework the jar 2015-10-09 13:52:08 -07:00
config Warbler work in progress 2014-09-02 22:10:44 -07:00
gradle/wrapper Upgrade to a released version of the jar plugin and clean up the build.gradle 2015-09-06 21:19:43 -07:00
lib Upgrade to a released version of the jar plugin and clean up the build.gradle 2015-09-06 21:19:43 -07:00
proto Proto experimentation 2014-07-12 15:45:33 -07:00
scripts Create some scripts to build the executable jars out of bin/* 2014-07-06 16:29:01 -07:00
spec Refactor enough to run RSpec through jruby-gradle tooling 2014-10-15 15:09:46 -07:00
tasks Add the yard rake task for generating internal API documentation 2014-07-06 16:29:00 -07:00
.gitignore Upgrade to a released version of the jar plugin and clean up the build.gradle 2015-09-06 21:19:43 -07:00
.rspec Upgrade to a released version of the jar plugin and clean up the build.gradle 2015-09-06 21:19:43 -07:00
Gemfile Warbler work in progress 2014-09-02 22:10:44 -07:00
Gemfile.lock Warbler work in progress 2014-09-02 22:10:44 -07:00
HACKING.md Switch over to using Gradle to build and test the Blick agent 2014-10-04 20:55:18 -07:00
README.md
Rakefile Add support for compiling protobufs and running tests 2014-07-05 14:55:17 -07:00
build.gradle Upgrade to the latest version of jruby/gradle plugin and rework the jar 2015-10-09 13:52:08 -07:00
gradle.properties Upgrade to a released version of the jar plugin and clean up the build.gradle 2015-09-06 21:19:43 -07:00
gradlew Add the gradle wrapper 2014-09-02 22:11:33 -07:00
gradlew.bat Add the gradle wrapper 2014-09-02 22:11:33 -07:00

README.md

Blick

Blick is currently a concept, purely an experimental approach to a monitoring system. While conceptually similar to Sensu, the idea behind Blick is to use ZeroMQ to create event-driven agents which stream events directly to a server.

Contributing

The project is still in its infancy, but you can chat with us in the #blick channel on the Freenode IRC network.

Design

                              +--------> +--------------+       +--------------+
+---------------+             |          | Blick Server |       | Carbon Cache |
|  Blick Agent  |             |          +--------------+       +--------------+
+---------------+---> +-------+-----+        ^     ^                     ^
                      | Blick Relay |        |     |                     |
+---------------+---> +-------------+        |     +---+--------------+--+
|  Blick Agent  |                            |         | Blick Statsd |
+---------------+                            |         +--------------+
                                             |                   ^
                                   +---------+-----+             |
                                   |  Blick Agent  |       +-----+-------+
                                   +---------------+       | Application |
                                                           +-------------+

Server

The Server is the main destionation for all Blick events. The Server is responsible for aggregating event data, presenting data, and issuing alerts based on that data.

The server should also receive some node information from Agents which can be used to pull a node classification from a Puppet master or Chef server. Ideally, the Server would be able to present automatic checks and alerts based on what is presented in the node's classification. For example, if a service { 'httpd': ensure => running, } is defined in the node's [Puppet] resource graph then the Server should automatically alert if the httpd process is not running.

Agent

The Agent's sole responsibility is to publish events via a ZeroMQ socket to the Server, or a Relay.

The Agent should be primarily event driven, allowing multiple sources of events, e.g.:

  • System-level: Provided by dbus or Kernel uevents (TBD)
  • Process-level: Provided by systemd to dbus

For non-evented data (/proc related events, non-standard process events) a polling loop should exist in the Agent, but this should not be the default mechanism for event acquisition.

It's currently unclear where application/process-level data, such as JMX, should be gathered from. This might make sense to live in the Agent, or the Relay.

Agent Design

                      +--------+                                                              
                      | ZeroMQ |                                                              
                      +--------+                                                              
                       ^                                                                      
                       |                                                                      
                       |                                                                      
+---------+     +------+----+           +-----------------+                                   
|Main loop|     | Publisher |<----------+ Process Monitor |                                   
+---------+     +-----------+           |   (listener)    |<--------- systemd/dbus            
                     ^ ^ ^ ^            +-----------------+                                   
                     | | | |                                                                  
                     | | | |           +--------------------+                                 
                     | | | +-----------+ Filesystem Monitor |<------- inotify/kqueue          
                     | | |             |    (listener)      |                                 
    +------------+   | | |             +--------------------+                                 
    | Heartbeat  +---+ | |                                                                    
    | (observer) |     | |                +---------------+                                   
    +------------+     | |                | MySQL Slow    |                                   
                       | +----------------+ Query Monitor |<--------- inotify/kqueue file-read
  +--------------+     |                  |  (listener)   |                                   
  | Disk Monitor +-----+                  +---------------+                                   
  |  (observer)  |                                                                            
  +--------------+                                                                            
Listeners

Listeners are evented entities within the agent, in order for a monitor to act as a listener it must receive events from some external source on the system being monitored.

Unless otherwise required, all monitors should be listeners by default.

Observers

Observers are all polling/interval based monitors that the agent will run in a separate thread.

Statsd

The intended purpose of the Statsd daemon is to provide application-based monitoring and alerting. Blick should not replace Graphite but by using the Blick Statsd server as the destination of Graphite events, Blick can get a side-channel of these events and provide alerts based on their values.

Relay

The Relay is more of a planned addition to help Blick scale. The Relay sitting at the top of a rack, siphoning events from Agents as well as SNMP providers into the Server would provide a more scalable means of data aggregation.