..

Send Nagios metrics to Graphite via Logstash

Posted October 3, 2014

Currently, we can see on blogs two methods to send Nagios metrics to Graphite :

The first method is using Graphios, a dedicated tool for this problem.
The second method, that I recently discovered here, is using Logstash and grok to parse checks messages and extract data.

Personally, I find Graphios great, it’s a python daemon that monitors Nagios’ files each 15 seconds and sends new metrics to Graphite with a name set in each Nagios service configuration. The second method requires too much work to write patterns for grok, in my humble opinion.

In this blog post, I would like to introduce a new way to send Nagios performance data to graphite using Logstash.

Let’s imagine we already have a complete monitoring infrastructure based on Nagios and now we want to use Graphite to graph our metrics. Let’s say also that we can’t afford the cost of a migration from Nagios to Shinken and we want to be as unobtrusive as possible (possibly because we want to avoid conflicts with Centreon).

Requirements

For this tutorial, we will need :

A working Nagios poller with a ZMQ broker ;
A working instance of Graphite with optionally statsd ;
An new ~~or existing~~ Logstash instance.

Warning: Currently, we need a functioning Logstash version >=1.5.0 dev to use this method because in the current and previous stable versions of Logstash, a bug prevented splitted events to be refiltered. This bug is present on 1.2.x, 1.3.x and 1.4.x branches (see pull requests #793 and #1545).

Nagios input

We configure logstash to subscribe to the zmq Nagios broker which publish each event in JSON format as explained in one of my previous posts.

input {
  zeromq {
    topology => "pubsub"
    address  => "tcp://127.0.0.1:6666"
    codec    => json {}
    mode     => "client"
  }
}

Parsing of Nagios metrics

The main problem is parsing the perfdata (something like time=0.001513s;;;0.000000 size=454B;;;0) into a easily manipulable format. Fortunately, I wrote a blog post on it.

We will use a modified perl script to convert the perfdata string into line feed delimited json. We will later use line feeds to split an event with several metrics into multiple events with a single metric.

#!/usr/bin/env perl
use Nagios::Plugin::Performance use_die => 1;
use JSON;

my $perfstring = $ARGV[0];

if (not defined $perfstring) {
  die "perfstring is required";
}

@perf = Nagios::Plugin::Performance->parse_perfstring(
    $perfstring
) 
or die "Failed to parse perfstring";

@metrics = ();

for $p (@perf) {
  my %metric_hash = (
    'value'    => $p->value,
    'uom'      => $p->uom, # not used
    'clean_label' => $p->clean_label
  );
  push(@metrics,\%metric_hash);
}

my @encoded = map { encode_json $_  } @metrics;
print join("\n", @encoded);

In order to run this perl script on each event, we have to use the ruby filter and backticks.

We will use the code field of the ruby filter to put the output of our perl script in a new field called metrics_json.

filter {
  ruby {
    code         => "event['metrics_json'] = `perl perfdatatojson.pl \"#{event['payload']['performance']}\"`"
  }

Before spliting the event, we remove spaces in the service description so we can use the name as part of the metric name. We will use ̀the pattern [hostname].[cleaned_service_description].[cleaned_metric_name] to generate the name of our metrics.

  mutate {
    gsub => [
      "payload[service]", " ", ""
    ]
  }

We use the split filter to split the event. If we have only one metric, nothing happens. If we have two, split creates 2 events containing a single metric and so on.

  split {
    field => "metrics_json" # multiple metrics to singles
  }

Now that our field metrics_json contains only one metric, we use the json filter to decode our metric data. At the end, we have a field containing our metrics values.

  json {
    source       => "metrics_json" # single metric json
    target       => "metric" # decoded metric object
    remove_field => "metrics_json"
  }
}

Output our metrics to Graphite

In my case, I send my metrics to my existing statsd instance. It is possible to send the metric directly with the graphite output.

The name of the metric is up to you. I chose a simple format.

output {
  statsd {
    host => '10.0.2.2'
    gauge => [ "%{payload[hostname]}.%{payload[service]}.%{metric[clean_label]}", "%{metric[value]}" ]
  }
}

Run a Logstash agent with this config and the perl script in the same folder. New metrics will come quickly in Graphite’s console.

Example of event with parsed metric

{
            "id" => "0ade1398-aea1-45f0-b5db-74a7fbc08994",
       "context" => "SERVICECHECK",
        "source" => "NAGIOS",
     "timestamp" => "1412258190",
       "payload" => {
        "current_attempt" => "1",
           "max_attempts" => "4",
             "state_type" => "1",
                  "state" => "0",
              "timestamp" => "1412258190",
         "execution_time" => "0.011145",
               "hostname" => "localhost",
                "service" => "CurrentUsers",
                 "output" => "USERS OK - 1 users currently logged in",
            "performance" => "users=1;20;50;0"
    },
      "@version" => "1",
    "@timestamp" => "2014-10-02T13:56:30.172Z",
          "host" => "packer-virtualbox-iso",
        "metric" => {
                "uom" => "",
              "value" => 1,
        "clean_label" => "users",
    }
}

Things to improve

Improve Logstash’s configuration, it doesn’t support host checks perfdata ;
Cancel events if they don’t have perfdata ;
Use tags and clone the event so it can be send to another system like Riemann.

author Philippe Lewin Written by Philippe Lewin, French Software Engineer. twitter