Clojure InfluxDB client - part 2

I’ve release a small Clojure InfluxDB client library since my last post on the same subject, and as I mentioned I wanted to explore ways to leverage the /write endpoint.

Though that didn’t require much:

(defn write
  [conn db data query-params]
     (str (:url conn) "/write")
     {:content-type :x-www-form-urlencoded
      :body data
      :query-params (prep-query-params conn query-params {"db" db})}))

Just like SELECT-statements for the /query endpoint, the above assumes that data is already in the correct format. This however doesn’t really add much value for me as a programmer… but then - what does?

Going down that rabbit hole, gave me a prime example of how Clojure makes my life an absolute pleasure. Exactly why that is the case, and a few thoughts about my design decisions, is what I want to share in the remainder of the blog post.

Into the rabbit hole

The Line Protocol describes the format of the data that the write function takes, and here is the syntax:

<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]

I needed a way to represent a “measurement point” in code, and without jumping hoops transform that to the Line Protocol. I guess they came to the same conclusion for the Java client. They too have a point representation (org.influxdb.dto.Point).

Among Clojure’s many strong points are representing stuff in data and manipulate that data. The following are all valid representations of a point using hash-map:

;; minimal data required by the Line Protocol
{:measurement "cpu"
 :fields {:value 0.64}}

;; now also including a few tags
{:measurement "cpu"
 :tags {:host "serverA" :region "us_west"}
 :fields {:value 0.64}}

;; now with multiple fields and different data types along with a timestamp
{:measurement "cpu"
 :fields {:value 0.64 :verified true :count 4}
 :time 1434067467000000000}

To convert the above point representation to the Line Protocol, I set out to address the following challenges:

The optional data was fairly easy to solve and you’ll see the implementation in point->line (point to line):

(defn point->line
  "Takes a point (hash-map) and optionally a precision and returns a string in
  the Line Protocol syntax."
  [{:keys [measurement fields tags time] :as point}]
  (str (str/join "," (conj (key-val->str tags) measurement))
       " "
       (str/join "," (key-val->str fields))
       (when time
         (str " " time))))

The (when time for an optional timestamp is straight forward. Though the reverse order of measurement and optional tags, might look a bit odd if you aren’t familiar with conj. Use this as an excuse to play with how conj behavior differs depending on whether it works on a list or a vector (key-val->str always returns a list).

To solve the different data types was also straight forward because Clojure has corresponding data types:

(defn val->str
    (float? v) v
    (boolean? v) (if v "t" "f")
    (int? v) (str v "i")
    :else (str "\"" v "\"")))

I did however spend some time exploring the different integer types in Clojure (Integer, Long and BigInt) before I felt comfortable with the above solution.

Two down, one to go. The final version of point->line ended up using a slightly different implementation to handle time precision:

(defn point->line
  "Takes a point (hash-map) and optionally a precision and returns a string in
  the Line Protocol syntax."
  [{:keys [measurement fields tags time] :as point} precision]
  (str (str/join "," (conj (key-val->str tags) measurement))
       " "
       (str/join "," (key-val->str fields))
       (when time
         (str " " (adjust-precision time precision)))))

The reason why adjust-precision is so important, is due to the following recommendation in the official InfluxDB documentation:

We recommend using the least precise precision possible as this can result in significant improvements in compression.

With this in mind, I wanted to make it possible for developers to be able use alternatives to integers for representing an “instant” (like Java 8 Date Time API or Joda-time). I chose to solve it using multimethod.

(def ratios
  {::ns 1
   ::u  1000
   ::ms 1000000
   ::s  1000000000})

(defmulti ->nano
  "Takes an instant and returns it as nano seconds since epoch."
 (fn [inst] (type inst)))

(defmethod ->nano :default
  (identity inst))

(defn adjust-precision
  "Takes an instant representation and returns the adjusted instant according to
  the precision. Use nil as precision to leave the instant as-is i.e. when
  already represented in the correct precision."
  [inst precision]
  (if-let [ratio (precision ratios)]
    (long (/ (->nano inst) ratio))

Using a multimethod implementation eliminates the necessity of adding date library dependencies for the InfluxDB client library. While still making it easy to extend.

Here is how it would look for the Java 8 Date Time API (java.time.Instant):

(defmethod ->nano java.time.Instant
  [^java.time.Instant inst]
  (+ (* (.getEpochSecond inst) 1000000000) (.getNano inst)))

Something similar could be implemented for Joda-Time.

Now what?

My current implementation for creating Line Protocol data from a point doesn’t handle special characters. There are also some InfluxDB API end points that the Clojure client still doesn’t support. Since I’m not very familiar with them, I would like to get some more experience with them. Better designs always emerge when I have better insight, better context and better understanding of how these endpoints help make my life easier.

I would also like to explore performance-related topics i.e. is the multimethod for different “instant” representation a performance killer. Last but not least the Java clients async writes have me intrigued… does something similar belong in the Clojure InfluxDB client library?

Creating a Clojure client library for InfluxDB

I’m working on a project where a time series database makes sense and the choice fell on InfluxDB. I found mnuessler/influxdb-clojure an aged Clojure wrapping an older version of the Java InfluxDB client. Being all excited about diving into this new area I thought it would be best to leverage the existing efforts put into making InfluxDB accessible in Clojure. It took me a while to realize that I wasn’t comfortable with all the layers put between me and InfluxDB server.

First I dropped the existing Clojure library which hasn’t been updated for almost 3 years. It seemed unlikely that the project would miraculously be revived. Therefore I opted for just depending on the newest Java client (2.14) directly which included the following dependencies:

Retrieving org/influxdb/influxdb-java/2.14/influxdb-java-2.14.jar from central
Retrieving com/squareup/retrofit2/retrofit/2.4.0/retrofit-2.4.0.jar from central
Retrieving com/squareup/moshi/moshi/1.5.0/moshi-1.5.0.jar from central
Retrieving org/msgpack/msgpack-core/0.8.16/msgpack-core-0.8.16.jar from central
Retrieving com/squareup/retrofit2/converter-moshi/2.4.0/converter-moshi-2.4.0.jar from central
Retrieving com/squareup/okhttp3/okhttp/3.11.0/okhttp-3.11.0.jar from central
Retrieving com/squareup/okio/okio/1.14.0/okio-1.14.0.jar from central
Retrieving com/squareup/okhttp3/logging-interceptor/3.11.0/logging-interceptor-3.11.0.jar from central

But digging deeper I realized that the only thing the Java client was doing was sending request directly to the InfluxDB HTTP API. It just rubs me the wrong way having 8 extra dependencies (around 1 Mb worth of jar files), dealing with transforming back and forth between Java objects and Clojure data structures and having to browse 8k lines of Java code across 85 files, when I wanted know a bit more of what was under the hood. After all, since we are talking about doing 2-3 different HTTP requests to InfluxDB, it seemed overly complicated.

I got this nagging feeling that, for my use case, I would be better of just doing it myself. How hard can it be?…

Well to be honest it was a bit harder than I had expected. I started down a path where I would make a data structure that should represent a SELECT query. It quickly became complex due to the desire to deliver the same use cases as when using a string i.e. basic arithmetic. Also I couldn’t decide on the best way to deal with the fact that fields and tags can have the same name,in which case, you are obligated to specify exactly which type you are referring to. Not being convinced that representing the select query as a Clojure data structure was actually a good idea, I took a step back and started my implementation assuming that I already had the “select statement” string.

I also wanted the code to interact nicely with application state handled with Mount. For that I decided to have the InfluxDB connection be represented by a map:

{:url "http://localhost:8086"
 :username "root"
 :password "root"}

Where the :username and :password would be optional.

This is what I ended up with for supporting the /query HTTP endpoint:

(ns dk.emcken.influxdb-client
  (:require [cheshire.core :as json]
            [clj-http.client :as http-client]
            [clojure.string :as str]))

(defn prep-query-params
  "Convenience middleware to populate username and password from the connection if wanted."
  [{:keys [username password] :as conn} influx & additionals]
  (let [auth-params (when (and username password) {"u" username "p" password})]
    (apply merge (conj additionals influx auth-params))))

(def available-methods
  {::read http-client/get
   ::manage http-client/post})

(defn query
  "The query argument q can be either a string or a list/vector of strings. For
  valid influx-params see"
  ([conn method q]
   (query conn method q {}))
  ([conn method q influx-params]
   (let [request-fn (or (method available-methods)
                        (throw (ex-info "Unknown query method." {:method method})))]
      (str (:url conn) "/query")
      {:query-params (prep-query-params conn influx-params
                                        {"q" (if (string? q) q (str/join ";" q))})}))))

(defn unwrap
  "Takes a http response from the API endpoint and converts it to a Clojure data
  structure for convenience."
  (-> response
      (get "results")))

The above would allow for things like:

user >  (require '[dk.emcken.influxdb-client :as client :refer [unwrap query]])

user > (def conn {:url "http://localhost:8086"})

user > (unwrap (query conn ::client/read "SHOW DATABASES"))
[{"series" [{"values" [["_internal"]], "columns" ["name"], "name" "databases"}], "statement_id" 0}]

user > (unwrap (query conn ::client/manage ["CREATE DATABASE mydb1" "CREATE DATABASE mydb2"]))
[{"statement_id" 0} {"statement_id" 1}]

I’m pretty happy with my 40-ish lines of Clojure code to do queries, and I might even end up releasing it if my “write” implementation feels solid. The Java client does have one really cool thing going for it, and that is asynchronous writes. I assume this feature would be VERY handy with heavy data loads. But with a difference in lines of code on almost 8k and 84 files, I think it would possible to do something similarly awesome in Clojure.

What can I say… I love working with Clojure.

Recent findings on Clojure testing

These days I’m spending time looking into testing in Clojure. I’ve been writing my share of test cases in PHP at work. There we have several tests “freezing” time or mocking services. Both practices are common and something worth learning for Clojure.

Freeze time

Personally I often find it easier to reason about/test the code when providing a reference date to the function. The “new” Date/Time introduced in Java 8, as well as the Clojure wrappings, are both very well crafted. Even though I’m not a big fan of “freezing” time, it has its uses, and I was very delighted to find how easy it is to achieve in Clojure:

(ns my-ns.core
  (:require [java-time :as time]))

(defn >10-secs-ago
  (time/after? my-instant (time/plus (time/instant) (time/seconds 10))))

(ns my-ns.core-test
  (:require [clojure.test :as t]
            [java-time :as time]
            [my-ns.core :as sut]))

(t/deftest >10-secs-ago
  (t/testing "More than 10 seconds ago check on instant that is"
    ;; The clock is mocked using milli seconds
    ;; 1549411200000 = 2019-02-06T00:00:00.000Z
    (time/with-clock (time/mock-clock 1549411200000)
      (t/testing "less than 10 seconds ago"
        (t/is (false? (sut/>10-secs-ago (time/instant 1549411200999)))))
      (t/testing "exactly 10 seconds ago"
        (t/is (false? (sut/>10-secs-ago (time/instant 1549411210000)))))
      (t/testing "more than 10 seconds ago"
        (t/is (true? (sut/>10-secs-ago (time/instant 1549411210001))))))))

Mount mock

Mount is framework for managing life-cycle and dependencies for an application with runtime state. So far, I’m really impressed with it and I enjoy it over other frameworks trying to solve the same thing.

Testing code that relies on application state had me puzzled for a little while. Some examples I found suggested using start-with, but even though I was able to swap part of the application state, the entire application was mounted. Since I’m only interested in testing parts of the application I wanted to avoid starting or mocking things untouched by the test.

This is how I did it:

(ns my-ns.core
  (:require [mount.core :as mount]))

(mount/defstate sessions
  :start (atom {}))

(defn open-session []
  (let [id (gensym)]
    (swap! sessions assoc id ::open)

(defn close-session
  (swap! sessions dissoc id))

(ns my-ns.core-test
  (:require [clojure.test :as t]
            [mount.core :as mount]
            [my-ns.core :as sut]))

(t/deftest open-session
  (t/testing "that session is being registered upon open"
    (let [sessions (atom {})]
        (mount/only #{#'my-ns.core/sessions})
        (mount/swap {#'my-ns.core/sessions sessions})
      (let [session-id (sut/open-session)]
        (t/is (contains? @sessions session-id)))

(t/deftest close-session
  (t/testing "that session is being deregistered upon close"
    (let [session-id (gensym)
          sessions (atom {session-id :sut/open})]
        (mount/only #{#'my-ns.core/sessions})
        (mount/swap {#'my-ns.core/sessions sessions})
      (sut/close-session session-id)
      (t/is (not (contains? @sessions session-id)))

I’m sure the only, swap, start and stop can be shortened with a helper function or macro but that will be for another time. I’m wondering if Mount has a smarter way of doing the above, that I just didn’t find yet?

And if you (like a recent version of me) don’t know what SUT means, Wikipedia is our friend. Happy testing.

Sources of inspiration

I find work related inspiration many places, among others, colleagues meetups and conferences. But most inspiration comes while purposeless roaming the internet. I’ve always been interested in more that just writing the code for the software. Whether it be participating in meetings and discussions that would help me better understand the business. Gathering data to predict how a UI should be formed for optimal user experience. Learning how to avoid making poor software design decisions that would bring the nice people in operations in trouble. But also the softer things like how culture and circumstances affects us as developers and individuals.

Prior getting the responsibility for a team, all the above would just be archived mentally as “work/software development”. These days, stuff related to work culture and people have earned its own archive label. Suddenly these things have a slightly different meaning to me.

Before I was laughing when seeing/reading things like these (and I still do):

Now failing to do my job properly… the joke will be on me.

To get to the core of the matter. I would like to share the following links because they either gave me food for thought or just solidified my view on matters:

About cutting corners

It takes time to understand a problem, because before you can fully understand it, you need to also understand its context. As a code base grows, depending on how well the code is structured, the context in which you need to understand the problem can (and often will) grow.

When a problem or its context is misunderstood there is a tendency to oversimplify the problem. I’m all for finding a simple and elegant solution but not at the cost of correctness. Oversimplified solutions will not be able to handle all special cases, and special cases lead to crashes or data corruption… or both. This in turn leads to maintenance and clean up by the programmer which I would much rather have coding on this next big thing.

The available time to implement a solution is not only challenged by a push for quick deliveries but also a lack of respect that creating software is actually hard. I’ve experience this lack of respect from all involved parties (including the programmer).

Even though I strive to “do things right”, I also think it is okay and often necessary no to. But it should be a conscious decision.

I.e. making sure to handle two use cases (corner cases) will increase development time but at the same time those a very unlikely to happen or if they happen they are very easy to correct manually. If someone makes a conscious decision that these cases should be handled outside the software, then it is okay.

It is NOT okay (and very short sighted) when the pressure doesn’t allow collecting the necessary information to make that decision.

So by all means cut corners - but only do so together with proper risk assessment.