I have been involved in three large Clojure projects, and configuration is always a problem eventually to be solved. It may or may not surprise you to know that each of these applications has handled configuration differently. I have mixed emotions about that, because I wish we could have built on someone else’s work, but I also know that one size does not necessarily fit all. Two of those projects ended up open sourcing their configuration libraries, so perhaps we’re a little better off.

Why does one size not necessarily fit all? Configuration has several dimensions and each config library lies at a different intersection of the space. Configuration values can be compiled into your code, or they can be pulled at run-time from some external source. That external source can be the environment, a file on the file system, a resource on the classpath, JVM system properties, or even some external configuration service. Most applications will read config once at start up assuming it does not change, but some may want config that can change dynamically over the life of the process.

Keep these dimensions in mind as we examine five methods for defining config, three that work only for Clojure, one that works only for ClojureScript, and one that works for both. The Clojure only methods are weavejester/environ, sonian/carica, and outpace/config. The ClojureScript method is the built in goog-define macro. The Clojure/ClojureScript option is adzerk-oss/env.

weavejester/environ

environ uses environment variables for configuration as has been made popular by The Twelve-Factor App, and it has advantages:

  1. Environment variables can be changed independently from code.
  2. Since they are independent from code sensitive config is unlikely to be accidentally checked in.
  3. They are programming language and operating system agnostic.

However, there are other considerations that may or may not be disadvantages depending on your use case:

  1. All environment values are strings.
  2. Environment values cannot be tweaked to change configuration for a running application.
  3. Environment variables form a flat, un-namespaced map.
  4. If there were a required config value, application code would have to ensure it is available as environ does not.
  5. environ does not have a feature for providing a default value.
  6. Assuming a value is available, it also falls to application code to validate it, because environ will not.

The last three considerations result from using a pull model, instead of a push model. environ is not so much “configuring” your application as it is providing a map of values from which your application can configure itself. I think this approach is problematic for a few different reasons that I will talk about when I get to outpace/config, but note that this is a problem with the pull model, not with using environment for configuration.

If you like the trade-offs, using environment as a source for configuration can be very useful, and environ is a great way to do that. You should consider adzerk-oss/env as it also takes configuration from the environment, but it uses a push model.

sonian/carica

Full disclosure: While at Sonian I was involved in writing the code that eventually became carica.

The environment is not the only source for configuration values, files on the filesystem and resources on the classpath can also be sources of configuration. carica will pull configuration values from JSON, EDN, or Clojure files and resources. It will also merge multiple files on the classpath, which can be useful, but is also finicky as you will need to closely control the order of JARs and directories on the classpath. carica has some other features like middleware, but I think those are more nice-to-haves, and I want to focus more on the fundamentals.

Let’s revisit the advantages of environment variables, but with config files on the mind. Config files can also be changed independently from code. In some cases carica expects you to check in your config files and build them into your JAR to make them available as resources. This can be great for some config, and dangerous for sensitive config. You can always have carica use a filesystem file or a resource on the classpath that is only available in production, but you might have to be more careful about what you put in config files. Finally, if you use JSON your config files can be somewhat programming language and operating system agnostic.

I don’t think we lose much with carica versus environment variables. We gain some advantages. Config files can have a richer set of values, and that is better than having only strings as values. carica also would allow us to change config values for a running application. By default carica uses a caching middleware which will not read changed values, but you can override that so that it will re-read the config files every time you ask for a config. There is a performance penalty to this approach, but you could probably use carica for two different sets of configs: those that do not change after loading the application and those that do. I’ll talk more about changing config for a running application when I wrap up this whole discussion.

However, we still have these considerations:

  1. Config files form a flat, un-namespaced map.
  2. If there were a required config value, application code would have to ensure it is available as carica does not.
  3. carica does not have a feature for providing a default value. Presumably you would provide a default by baking a config file into your JAR, which you would override in production with a deployed config file on the filesystem or earlier on the classpath.
  4. Assuming a value is available, it also falls to application code to validate it, because carica will not.

Again, these considerations stem from the pull model of configuration.

outpace/config

Full disclosure: While at Outpace I was involved in maintaining the config project.

This is my favorite configuration library for Clojure, so I will try not to gush too much, and you will have to take what I say with a grain of salt.

While I was at Outpace, Alex Taggart wrote the config library, and the moment I saw it I instantly thought it brilliant. While most other configuration libraries are a pull model, config is a push model. It pushes root values to vars that are marked as configuration values using the defconfig macro:

(defconfig database-url)

You can define a required config var:

(defconfig! database-url)

You can provide a default value:

(defconfig database-url "jdbc:postgresql://localhost/my_db")

You can also provide validators:

(defconfig
  ^{:validate [string? "Must be a string."
               super-cool? "Must be super cool!"]}
  database-url
  "jdbc:postgresql://localhost/my_db")
  

Since these are just vars, they can be marked :dynamic and/or :private, given type hints, bound, redefined, and anything else a var can do, and the Clojure compiler will helpfully point out if you are trying to reference a configuration var that does not exist. The only thing that is special about them is they receive a root value at load-time. config pulls in all the configuration values that it needs at load-time, so it is not possible to define config values that can change at run-time. Since they receive their value at load-time, not compile-time, AOT compiling will not trigger any I/O.

But wait there’s more! config will use a file on the filesystem to define the root values of your configuration vars. Using reader tags in the EDN file, config can pull values from the environment, a system property, a file on the filesystem, a resource on the classpath, or any other place for which you configure a reader tag. The configuration file is not really a configuration file, it is a layer of abstraction that binds a source for a configuration value with a destination for it.

One interesting consequence of this is it is safe to use config in a library, because using defconfig you only mark a var as a configuration value you do not make any assumptions about where it is coming from. And since each config var is namespaced there is no risk of collision over configuration names used by different libraries.

Finally, I think the most useful thing about config is its tool to generate a config.edn file. Using this tool you can generate a config.edn file that lists all of the configuration vars in your application (and the libraries it uses) and their default values if any. If a configuration var is required, it will also appear in the file waiting for you to give it a value. You can also run this tool with an existing config.edn file in place, and it will tell you if you are trying to configure vars that do not exist. The biggest drawback of the pull model for configuration is that you have no traceability between the config values that exist and the places in your code that use configuration, and my experience is config becomes a mess of vestigial values that no one dares remove. config’s generate tool will give you accurate information about your application and its complicated relationship with its config.

config addresses all of the considerations of the previous libraries1 except that it does not allow values to be tweaked for running applications. The tradeoff is that we no longer have programming language or operating system agnosticism.

goog-define

As of ClojureScript 1.7.107 there is a goog-define macro that can be used to define a value at compile-time. I think Martin Klepsch does a great job of describing how to use goog-define. To that I will add these considerations:

  1. The config values are pulled from compiler options, not the environment. This is very much not programming language agnostic.
  2. Each value must be a string, a number, or a boolean.
  3. Config values are baked into the application. In advanced optimization mode they are propagated as constants.
  4. You must specify a default value.
  5. Assuming a value is available, it falls to application code to validate it, because—aside from basic type validation–goog-define will not.

adzerk-oss/env

This is another 12factor style library. However, there are some differences from environ:

  1. env will let you declare that a config is required, and an exception will be thrown if it is not provided.
  2. You can specify a default value to be used if an environment variable is not available.
  3. env is also a push model, so there is more traceability about which config values are used where.
  4. env will pull from system properties in addition to the environment.
  5. If you set the root value of a config var, that new value will get pushed into a JVM system property and subsequent attempts to read the environment will see the new value.

We still have the following considerations:

  1. All environment values are strings.
  2. Assuming a value is available, it falls to application code to validate it, because env will not.

Additionally, remember that env works for both Clojure and ClojureScript, so that is a huge advantage.

Conclusion

We have taken a look at weavejester/environ, sonian/carica, and outpace/config, goog-define, and adzerk-oss/env. We have considered each along several dimensions: compile-time vs. run-time, configuration source (the environment, a file on the file system, a resource on the classpath, JVM system properties), push model vs. pull model, config loaded once vs. config that can change.

I believe the push model to be far superior to the pull model, because you have better traceability over which configs are used and where they are used. With a push model you can also enforce required configs, default values, and validation. Additionally, a push model can allow safe use of configuration in libraries (as we saw with outpace/config).

We did see a couple of libraries that allow you to change config values at run-time. If you are building a 99.9999% available service, then you will need change the configuration of a running application. However, be reasonable. If you’re not paying 100 full-time engineers to work on your service, then don’t try to be Google. I would strongly caution you to be explicit about which configs are set once and which can change. In your code you must carefully handle configs whose value may change out from under you, otherwise you will open yourself up to weird concurrency related bugs. Specifically, if you are going to read a config value more than once during the course of a single task, you should read it once at the beginning of the task and use that value instead of re-reading the config. Or you could just take the 5 minutes of downtime. :)

In the end which configuration library you choose will depend on your use case and philosophy. You may choose a library that I didn’t even talk about, or you may chose to use more than one library for different classes of configs, and that’s all right, because there is no one-size-fits-all config library.

Footnotes:

  1. ahahahaha! outpace/config defines the standards to which the other libraries must adhere! Classic Marketing Maneuver™