2012-04-15 01:38:44 +04:00
StatsD [![Build Status ](https://secure.travis-ci.org/etsy/statsd.png )](http://travis-ci.org/etsy/statsd)
2010-12-30 03:30:36 +03:00
======
2012-04-03 20:07:37 +04:00
A network daemon that runs on the [Node.js][node] platform and
listens for statistics, like counters and timers, sent over [UDP][udp]
and sends aggregates to one or more pluggable backend services (e.g.,
[Graphite][graphite]).
2011-02-17 07:09:03 +03:00
We ([Etsy][etsy]) [blogged][blog post] about how it works and why we created it.
2010-12-30 03:30:36 +03:00
Concepts
--------
* *buckets*
2012-01-06 00:54:15 +04:00
Each stat is in its own "bucket". They are not predefined anywhere. Buckets can be named anything that will translate to Graphite (periods make folders, etc)
2010-12-30 03:30:36 +03:00
* *values*
2012-06-08 09:04:17 +04:00
Each stat will have a value. How it is interpreted depends on modifiers. In
general values should be integer.
2011-04-09 01:05:19 +04:00
2010-12-30 03:30:36 +03:00
* *flush*
2012-04-03 20:07:37 +04:00
After the flush interval timeout (default 10 seconds), stats are
aggregated and sent to an upstream backend service.
2010-12-30 03:30:36 +03:00
Counting
--------
gorets:1|c
2011-05-27 08:07:51 +04:00
This is a simple counter. Add 1 to the "gorets" bucket. It stays in memory until the flush interval `config.flushInterval` .
2010-12-30 03:30:36 +03:00
Timing
------
glork:320|ms
2012-06-08 09:04:17 +04:00
The glork took 320ms to complete this time. StatsD figures out 90th percentile,
average (mean), lower and upper bounds for the flush interval. The percentile
threshold can be tweaked with `config.percentThreshold` .
2010-12-30 03:30:36 +03:00
2012-06-08 09:04:17 +04:00
The percentile threshold can be a single value, or a list of values, and will
generate the following list of stats for each threshold:
2012-03-08 02:13:04 +04:00
2012-06-08 09:04:17 +04:00
stats.timers.$KEY.mean_$PCT stats.timers.$KEY.upper_$PCT
2012-03-08 02:13:04 +04:00
2012-06-08 09:04:17 +04:00
Where `$KEY` is the key you stats key you specify when sending to statsd, and
`$PCT` is the percentile threshold.
2012-03-08 02:13:04 +04:00
2010-12-30 03:30:36 +03:00
Sampling
--------
gorets:1|c|@0.1
2011-04-01 13:11:52 +04:00
Tells StatsD that this counter is being sent sampled every 1/10th of the time.
2010-12-30 03:30:36 +03:00
2012-04-15 01:45:09 +04:00
Gauges
------
StatsD now also supports gauges, arbitrary values, which can be recorded.
gaugor:333|g
2011-05-27 08:07:51 +04:00
Debugging
---------
There are additional config variables available for debugging:
* `debug` - log exceptions and periodically print out information on counters and timers
* `debugInterval` - interval for printing out information on counters and timers
* `dumpMessages` - print debug info on incoming messages
For more information, check the `exampleConfig.js` .
2010-12-30 03:30:36 +03:00
2012-04-03 20:07:37 +04:00
Supported Backends
------------------
2010-12-30 03:30:36 +03:00
2012-04-03 20:07:37 +04:00
StatsD supports multiple, pluggable, backend modules that can publish
statistics from the local StatsD daemon to a backend service or data
store. Backend services can retain statistics for
longer durations in a time series data store, visualize statistics in
graphs or tables, or generate alerts based on defined thresholds. A
backend can also correlate statistics sent from StatsD daemons running
across multiple hosts in an infrastructure.
2010-12-30 03:30:36 +03:00
2012-05-15 07:07:33 +04:00
StatsD includes the following backends:
2012-04-03 20:07:37 +04:00
* [Graphite][graphite] (`graphite`): Graphite is an open-source
time-series data store that provides visualization through a
web-browser interface.
2012-05-08 22:07:43 +04:00
* Console (`console`): The console backend outputs the received
metrics to stdout (e.g. for seeing what's going on during development).
2012-04-03 20:07:37 +04:00
By default, the `graphite` backend will be loaded automatically. To
select which backends are loaded, set the `backends` configuration
2012-05-15 07:07:33 +04:00
variable to the list of backend modules to load.
2012-04-03 20:07:37 +04:00
2012-05-15 07:07:33 +04:00
Backends are just npm modules which implement the interface described in
section *Backend Interface* . In order to be able to load the backend, add the
module name into the `backends` variable in your config. As the name is also
used in the `require` directive, you can load one of the provided backends by
giving the relative path (e.g. `./backends/graphite` ).
2012-04-03 20:07:37 +04:00
Graphite Schema
---------------
2011-02-13 04:08:33 +03:00
2011-02-19 07:12:05 +03:00
Graphite uses "schemas" to define the different round robin datasets it houses (analogous to RRAs in rrdtool). Here's what Etsy is using for the stats databases:
[stats]
2011-04-09 01:05:19 +04:00
priority = 110
2011-02-19 07:12:05 +03:00
pattern = ^stats\..*
retentions = 10:2160,60:10080,600:262974
2011-02-19 07:16:15 +03:00
That translates to:
2011-02-19 07:16:57 +03:00
2011-02-19 07:16:15 +03:00
* 6 hours of 10 second data (what we consider "near-realtime")
* 1 week of 1 minute data
* 5 years of 10 minute data
This has been a good tradeoff so far between size-of-file (round robin databases are fixed size) and data we care about. Each "stats" database is about 3.2 megs with these retentions.
2011-02-19 07:12:05 +03:00
2011-10-11 20:09:42 +04:00
TCP Stats Interface
-------------------
A really simple TCP management interface is available by default on port 8126 or overriden in the configuration file. Inspired by the memcache stats approach this can be used to monitor a live statsd server. You can interact with the management server by telnetting to port 8126, the following commands are available:
* stats - some stats about the running server
* counters - a dump of all the current counters
* timers - a dump of the current timers
The stats output currently will give you:
* uptime: the number of seconds elapsed since statsd started
* messages.last_msg_seen: the number of elapsed seconds since statsd received a message
* messages.bad_lines_seen: the number of bad lines seen since startup
2012-04-03 20:07:37 +04:00
Each backend will also publish a set of statistics, prefixed by its
module name.
Graphite:
* graphite.last_flush: the number of seconds elapsed since the last successful flush to graphite
2012-04-03 23:04:27 +04:00
* graphite.last_exception: the number of seconds elapsed since the last exception thrown whilst flushing to graphite
2012-04-03 20:07:37 +04:00
2012-02-11 21:12:46 +04:00
A simple nagios check can be found in the utils/ directory that can be used to check metric thresholds, for example the number of seconds since the last successful flush to graphite.
2011-04-02 17:40:10 +04:00
Installation and Configuration
------------------------------
* Install node.js
* Clone the project
* Create a config file from exampleConfig.js and put it somewhere
* Start the Daemon:
node stats.js /path/to/config
2012-01-10 19:03:49 +04:00
Tests
-----
A test framework has been added using node-unit and some custom code to start and manipulate statsd. Please add tests under test/ for any new features or bug fixes encountered. Testing a live server can be tricky, attempts were made to eliminate race conditions but it may be possible to encounter a stuck state. If doing dev work, a `killall node` will kill any stray test servers in the background (don't do this on a production machine!).
Tests can be executd with `./run_tests.sh` .
2011-04-02 17:40:10 +04:00
2012-04-03 20:07:37 +04:00
Backend Interface
-----------------
2012-04-04 19:19:24 +04:00
Backend modules are Node.js [modules][nodemods] that listen for a
number of events emitted from StatsD. Each backend module should
export the following initialization function:
2012-04-03 20:07:37 +04:00
2012-04-04 19:19:24 +04:00
* `init(startup_time, config, events)` : This method is invoked from StatsD to
initialize the backend module. It accepts three parameters:
`startup_time` is the startup time of StatsD in epoch seconds,
`config` is the parsed config file hash, and `events` is the event
emitter that backends can use to listen for events.
2012-04-03 20:07:37 +04:00
The backend module should return `true` from init() to indicate
success. A return of `false` indicates a failure to load the module
(missing configuration?) and will cause StatsD to exit.
2012-04-04 19:19:24 +04:00
Backends can listen for the following events emitted by StatsD from
the `events` object:
* Event: ** 'flush'**
Parameters: `(time_stamp, metrics)`
Emitted on each flush interval so that backends can push aggregate
metrics to their respective backend services. The event is passed
two parameters: `time_stamp` is the current time in epoch seconds
and `metrics` is a hash representing the StatsD statistics:
2012-04-03 20:07:37 +04:00
```
metrics: {
counters: counters,
gauges: gauges,
timers: timers,
pctThreshold: pctThreshold
}
```
Each backend module is passed the same set of statistics, so a
2012-04-04 19:19:24 +04:00
backend module should treat the metrics as immutable
structures. StatsD will reset timers and counters after each
listener has handled the event.
* Event: ** 'status'**
Parameters: `(writeCb)`
2012-04-03 20:07:37 +04:00
2012-04-04 19:19:24 +04:00
Emitted when a user invokes a *stats* command on the management
server port. It allows each backend module to dump backend-specific
status statistics to the management port.
2012-04-03 20:07:37 +04:00
The `writeCb` callback function has a signature of `f(error,
2012-04-04 19:19:24 +04:00
backend_name, stat_name, stat_value)`. The backend module should
invoke this method with each stat_name and stat_value that should be
sent to the management port. StatsD will prefix each stat name with
the `backend_name` . The backend should set `error` to *null* , or, in
the case of a failure, an appropriate error.
2012-04-03 20:07:37 +04:00
2011-02-13 04:08:33 +03:00
Inspiration
-----------
StatsD was inspired (heavily) by the project (of the same name) at Flickr. Here's a post where Cal Henderson described it in depth:
2011-02-13 04:17:09 +03:00
[Counting and timing ](http://code.flickr.com/blog/2008/10/27/counting-timing/ ). Cal re-released the code recently: [Perl StatsD ](https://github.com/iamcal/Flickr-StatsD )
2011-02-13 04:08:33 +03:00
2011-02-17 07:09:03 +03:00
Contribute
---------------------
You're interested in contributing to StatsD? *AWESOME* . Here are the basic steps:
fork StatsD from here: http://github.com/etsy/statsd
1. Clone your fork
2. Hack away
3. If you are adding new functionality, document it in the README
4. If necessary, rebase your commits into logical chunks, without errors
5. Push the branch up to GitHub
6. Send a pull request to the etsy/statsd project.
We'll do our best to get your changes in!
[graphite]: http://graphite.wikidot.com
[etsy]: http://www.etsy.com
[blog post]: http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/
[node]: http://nodejs.org
2012-04-03 20:07:37 +04:00
[nodemods]: http://nodejs.org/api/modules.html
[udp]: http://en.wikipedia.org/wiki/User_Datagram_Protocol
2011-02-17 07:09:03 +03:00
Contributors
-----------------
2011-04-09 01:05:19 +04:00
In lieu of a list of contributors, check out the commit history for the project:
2011-04-01 13:11:52 +04:00
http://github.com/etsy/statsd/commits/master