Prometheus reporting

When compiled with optional support for mirage/prometheus, liquidsoap can export prometheus metrics.

The basic settings to enable exports are:

# Prometheus settings
settings.prometheus.server.set(true)
settings.prometheus.server.port.set(9090)

Common metrics, namely gauge, counter and summary are provided via the script language, as well as a specialized operator to track source’s latencies. A fully-featured implementation can be found at mbugeia/srt2hls

Basic operators

The 3 basic operators are:

prometheus.counter
prometheus.gauge
prometheus.summary

They share a similar type and API, which is as follows:

(help : string,
 ?namespace : string,
 ?subsystem : string,
 labels : [string],
 string) ->
   (label_values : [string]) -> 
     (float) -> unit

This type can be a little confusing. Here’s how it works: 1. First, one has to create a metric factory of a given type. For instance:

is_playing_metric = prometheus.gauge(labels=["source"],"liquidsoap_is_playing")

Then, the metric factory can be used to instantiate speific metrics by passing the label’s values:

playlist = playlist(id="playlist", ...)
set_playlist_is_playing = is_playing_metric(label_values=["radio"])

The returned function is a setter for this metric, i.e.

For gauge metrics, it sets the gauge value
For counter metrics, it increases the counter value
For summary metrics, it registers an observation

Finally, the programmer can now use that callback to set the metric as desired. For instance here:

def check_if_ready(set_is_ready, source) =
  def callback() =
    if source.is_ready(source) then
      set_is_ready(1.)
    else
      set_is_ready(0.)
    end
    0.1
  end
  callback
end
thread.run.recurrent(delay=0.,check_if_ready(set_playlist_is_playing, playlist))

`prometheus.latency`

The prometheus.latency operator provides prometheus metrics describing the internal latency of a given source. It is fairly easy to use:

s = (...)
prometheus.latency(s)

The metrics are computed over a sliding window that can be defined as a parameter of the operator. Exported metrics are:

# Input metrics:
liquidsoap_input_latency{...} <value>
liquidsoap_input_max_latency{...} <value>
liquidsoap_input_peak_latency{...} <value>

# Output metrics:
liquidsoap_outputput_latency{...} <value>
liquidsoap_output_max_latency{...} <value>
liquidsoap_output_peak_latency{...} <value>

# Overall metrics:
liquidsoap_overall_latency{...} <value>
liquidsoap_overall_max_latency{...} <value>
liquidsoap_overall_peak_latency{...} <value>

The 3 different groups of values are:

input: metrics related to the time it takes to generate audio data
output: metrics related to the time it takes to output (encode and send) audio data
overall: the sum of all previous two groups

Each group of metrics is divided into 3 subsets:

Mean latency value over the sliding window
Max latency value over the sliding window
Peak latency since start

Latencies are reported over a frame’s duration, which is typically around 0.04 seconds. Thus, in a situation where liquidsoap does not observe latency catch-ups, the overall mean latency liquidsoap_overall_latency should always be near that value.

These metrics can be used to report and track the source of latencies and catch-ups while streaming. Typically, if a source starts taking too much time to generate its audio data, this should be reflects in the input latencies. Likewise for encoding and network output.

Keep in mind, however, that enabling these metrics can have a CPU cost. It is rather small with a couple of sources but can increase with the number of sources being tracked. The user of these metrics is advised to keep track of CPU usage while ramping up on using them.

OCaml specific metrics

The prometheus binding used by liquidsoap also exports default OCaml-related metrics. They are as follows:

ocaml_gc_allocated_bytes <value>
ocaml_gc_compactions <value>
ocaml_gc_heap_words <value>
ocaml_gc_major_collections <value>
ocaml_gc_major_words <value>
ocaml_gc_minor_collections <value>
ocaml_gc_top_heap_words <value>
process_cpu_seconds_total <value>

These metrics can be useful when debugging issues with liquidsoap, in particular to track is an observed increase in memory usage is related to OCaml memory allocation or not. More than often, if the increase is not related to OCaml, it can be safely assumed that the issue might come from an external library used by liquisoap.