The author seems to be using "markup language" as a concept basically synonymous with a configuration language, or something that is not a programming language. A markup language is a language used to "mark up" text with formatting and structure. This may sound like a terminology nitpick, but I would argue the reason why for example XML is not a great configuration language is that it was designed to be something else – a markup language.
XML is a fantastic configuration language. Its primary purpose is text markup, right, but it can really shine as a configuration language too. It is meant to be used as a language, that is to describe structures of unbounded complexity while keeping them syntactically constrained.
The best configuration language is the one that is custom made for a specific application. A domain-specific language. But DSLs requires a parser and adding one is usually too cumbersome to merely parse a configuration. This is where XML comes in.
I think the author is probably aligning with "is" rather than "ought". So a language made this list if it IS being used as a configuration format, regardless of whether it OUGHT to be used as one.
Which is a way of deciding that makes sense given that I think the purpose of this article is "use my language instead". Getting lost in the weeds about each language's original design intent would bloat the article without meaningfully contributing to their thesis.
Given that it's an essential/definitional issue, I would have preferred that the author at least showed awareness of the differences between languages intended for marking up text and languages intended to represent data in a structured way.
Back when XML was first being developed, I was really anticipating having a standardized, easier-to-implement successor to SGML (which was hampered by its complexity and by the cost of the ISO standard) in the text markup space. It was that disappointing it ended up filling the vacuum in the space of serialized representations for for structured data, then getting rejected when it wasn't quite as suitable for that as alternatives such as JSON.
Me too, I still think it's cool. But while indeed easier-to-implement compared to SGML, it could have been yet a bit simpler. :) I once attempted to write a conforming parser, I remember it being a _lot_ of work to even determine well-formedness.
Ah, I was perhaps a bit unclear – I didn't mean to comment on whether certain languages made the list or not, I enjoyed the thorough walkthrough of languages being used for configuration.
I was more commenting on comments such as this one under Pkl:
> This is not a markup language. This is a full-blown programming language.
And under Nickel:
> Nice programming language. Not a markup language.
It's like he's saying "we should use markup languages for configuration, not programming languages", which I don't think he means.
I’ve often wondered the same. The op doesn’t go far beyond:
> The era of XML is in the past.
Which is about as deep as it seems to get.
My suspicion is that the reason people don’t like it is because it’s a bit of a pain to work with in JavaScript
XML has a DOM API. And most people use JavaScript in a browser to manipulate the DOM. So I would say among all the language it should be the JavaScript devs that are the most proficient at the DOM API.
More generally XML is so thoroughly out of fashion that it is perceived as driving resume-driven architecture backwards in time, into an unfamiliar "dark age" of deliberate engineering decisions.
In the world of human-readable data formats (ie not programming languages), the best one I ever used was Jane Street's sexplib[1] s-expression format.
It was concise and expressive. There was a direct way to describe variants (types with multiple constructors), which is always awkward in JSON, but the format was still surprisingly low-noise for reading and editing by hand. I remember you could even use it as a lightweight markup format:
(here is some text with (em formatting) information)
(The format leaves the interpretation of things like (em ...) up to you; you could use it as a slightly more verbose Markdown, but you could also use it to structure readable text with other sorts of metadata instead.)
And, unlike certain other formats I won't name, it has comments!
It also helps that Emacs with Paredit makes editing s-expressions flow. The tool doesn't need to know anything about the sexplib format specifically; just relying on basic s-expression structure gives us fluid but simple structural editing.
I am constantly sad that nobody else uses this sort of format, and I have to deal with a mixture of JSON, YAML, TOML and other ad-hoc formats instead.
Agree, sexp is quite nice. That was my favorite before json came around. Not that I like json particularly, it just ate the world so it's easier to go along with it.
No mention of The Configuration Complexity Clock [1] which I always think deserves a link, but credit to the author for actually keeping it slim and readable over (IMHO) most of the newer additions to the landscape which either declare themselves as 'obvious' and aren't or just add features such as Pkl's extends keyword, moving us further round the complexity clock.
Of course just adding multiline strings is the start of the rabbit hole, now you need to think about leading line breaks, trailing line breaks, intermediate line breaks, whitespace chomping, and- oh heavens I've reinvented YAML, I think I need to lie down.
Nice article. I've definitely moved the clock to 6.
And I remember toying with the idea of 9 but I think for a different purpose, I don't remember what. Never did get that far though, I'm acutely aware that tooling is also important, but it's still oh-so-tempting.
I wonder where the company is at now. I don't think they would have moved the clock ahead but they may have moved it backwards to 12.
I don't regret my little rules engine though. It held up very well. It was intended for non-devs to be able to use, but the article is quite right, a complex set up was not easy to understand! I often got questions when things looked wrong but after much thinking the rules were usually calculating things correctly.
The alternative would have been 100 hardcodings because each client needed something different.
One kinda-exception I'd like to raise: Cases where you'd like to use the regular proper programming language from the very beginning, but there are trust issues, and there's no good/reliable sandboxing option.
For example, B2B stuff where every customer has their own idiosyncratic sets of rules for if-this-then-that, which change at a different cadence than your releases.
In those cases, it's less that configuration slowly becomes too complex and evolves into code, and more that code is wanted from the get-go but configuration is the compromise.
- in the spec, key order clearly exists at the syntactic level
- it's just that the spec says nothing about the semantics of JSON (besides "A conforming processor of JSON texts should not accept any inputs that are not conforming JSON texts.")
quote: "The JSON syntax [...] does not assign any significance to the ordering of name/value pairs"
As a result, the operation of "parsing then serializing again" is not even guaranteed to be idempotent across different environments.
FWIW, JSON with `#` comments and redundant comma before `}` and `]` is already supported by many programming languages, e.g., Perl's JSON module (the 'relaxed' option). The advantage is that correct JSON also works, i.e., that JSON module just make it less error prone for humans, but otherwise sticks to the existing format.
> This is why I based my MAML on top of JSON. I fixed things that were a little bit annoying for me inside JSON.
And now it's incompatible with JSON, which is the worst part of the idea.
Instead, maybe propose a patch to your favorite programming language's JSON module to accept the same extensions for humans that the Perl module already supports, and maybe someone uses it for their JSON config files. This moves us together instead of further apart.
I don't like this article all that much but I do like the MAML format the author wrote this to promote. It seems like it’s everything that’s good about JSON, plus everything fixed for human writing, and nothing more.
Hats off, wish my package.json and build.yml were package.maml and build.maml.
MAML is an improvement from JSON, but does not improve the data types much (although it does make one improvement, which is that it distinguishes integers from floating point), although the syntax is improved.
Dismissing full-blown programming languages is a mistake IMHO.
Sometimes it's useful to generate configuration (e.g. starting from a template and to build configuration for prod vs QA or for various deployment targets).
If you have a full-blown programming language you can sprinkle a little bit of code in the template and get express everything you need in a clean way.
If all you have is a templating language you have to shoehorn your logic into it (likely making a mess) or use the templating language + a "real" programming language.
The ideal setup is something like the safe interpreters in Tcl, which let you do very fun things like having config files in pure Tcl, but with limited access to the machine and master interpreter.
Depending on the kind of configuration, different formats might be helpful.
I made up TER, which supports all ASN.1 types (although in some cases there is not a built-in syntax for them and you must write them in terms of other types; however, extensions can be made for your specific application) and is not limited to Unicode, nor does it limit integers to 64-bits (and you can write integers in any base up to thirty-six). You can convert TER to DER (I wrote a program to do this; it is not really intended for an application to use the full specification of TER itself except for using an external program to convert to DER and then use that instead), and you might also make a subset of TER for specific applications.
But, like any other one, it also has advantages and disadvantages (including some that are mentioned in that article; if they look at TER they would probably have criticisms about that too).
There are two kind of criticisms with JSON, having to do with the data types, and having to do with the syntax, and I had mentioned some of these in the past (some of them are commonly mentioned by others too, but some are less commonly mentioned by others).
Another configuration format is the X resource manager format.
I wonder if "the right solution" is a programming language that is fast, concise, trivially easy to run, and outputs some efficient binary format like protobuf.
Programming languages have comments and control flow, multiple popular implementations, and can have nice literals. Lack of Turing completeness is actually not a terribly useful feature if you trust the input (and you should probably just use protobufs or similar for untrusted inputs in that case.)
I do not want control flow of any kind in my configuration file. Nor do I want expressions or any kind of evaluation.
Greppability is a must-have feature for me. As is simplicity - I don’t want to have to deal with internalizing interpreter mechanics, rules for precedence, variable scope, etc just to figure out what config values my program is going to be provided with.
Any time I’ve been forced to work with a system which used a general or restricted programming language to express configuration, it’s been a nightmare.
If your config language doesn't have its own control flow then it's going to get a meta layer added on with control flow. Like, what do you do if this service has a different hostname in staging versus prod? Or connects to a different DB, or whatever. Either have one template file and give it a values file when you go to deploy it, or have two fully-written out files... which becomes annoying when you have to make sure you keep them in sync across two or more environments, often leading to a hacked on and poorly implemented templating system anyway (eg, Helm)
There is also .ini, it's simpler than toml, though has more inconsistent implementations and formats. But the '[[table]]' is just '[table]' and I like the '[table.subtable]' syntax better.
But since we're criticizing formats ;-) maml https://maml.dev is fairly good good.
* I like multiline blocks and comment style
* I do not like optional commas, either require them or not. I'd lean toward requiring them.
* Not sure about ordered object keys. How should languages represent that, a map/dict kind of a data structure won't work it would have to be an ordered dict/list of kvs etc. Two objects {"a":"x", "b":"y"} and {"b":"y", "a":"x"} will be different then. I get the idea, but I am leaning toward not liking it by the time I finished typing all that.
* I like booleans and null. Good for not having ons and offs those are just annoying.
About the ordered keys, most good languages have an ordered map type, either by accident or by design. Those few languages that don’t would need a list of key-value pairs. Given that this is for config, the performance cost of linear-scanning such a list would be negligible in realistic use cases.
Personally I think it’s a very good decision. Simply by declaring that objects are ordered, it becomes the case and it unlocks a whole lot of useful mechanisms.
I agree on this in general. The awkward things here are that:
a) textprotos aren't really touted by the protobuf folks as a thing to use outside Google. I'm not 100% sure why this is.
b) inside Google, there's a perception that you shouldn't use textprotos for much other than hardcoding proto values inside the monorepo (where there aren't really schema-versioning concerns). I think this perception is misplaced, you just have to be aware that a given schema is used in textprotos. Which is usually an easy thing to be aware of. This is just because the schema-versioning concerns are different than with binary photos (e.g. field renames are now breaking).
c) IIRC most parsers unconditionally reject unknown fields. I think the reason for this is highlighted in the docs: you can't safely go from a textproto with an unknown field to a binary serialisation of that proto. IIRC there are some parsers that let you parse unknown fields anyway but then I think you're a bit more tied to a specific implementation than you'd like...
One strange but quite handy alternative is actually the JSON representation. You can use .proto files as a schema but then serialise the values to JSON as there's a canonical mapping. Then you get something that's human readable but with the type safety of protos. Although of course it's not really writable since... It's JSON.
I kind of wish it had two and only two features: defined ordering, and comments
As to languages for configuration (specifically), I wish there was a good specification for a program config, where you would have one well-defined configuration file for defaults, and a user-specified configuration file for the values you have set/overriden.
say default.config:
{"player_name":"player", "mouse_speed":5}
and my.config:
{"player_name":"me"}
in some way that upgrading the program could do stuff with default.config without destroying your existing config.
That's a pretty common config pattern. SublimeText does that, and when you edit the settings it opens a 2 paned editor with the default config on the left and user config on the right.
> Well, this is some sort of programming language with dynamic types. But there are so many good programming languages, so I don't know why this one needs to be used.
RE: Jsonnet and others: because it has nice guarantees, like lack of arbitrary I/O and pure execution.
Need to configure 5 services with hundreds of replicas in 7 data centers? Some values depend on the service, some on the data center and some on the combination thereof? Maybe also overrides for a bunch of problemstic machines?
And you also want a manageable config language which doesn't turn into a full blown Turing tar pit?
Then jsonnet is for you.
So it's not entirely fair to compare it in the "pleasant syntax" contest. It's like putting a Unimog into a ranking of city cars.
Very interesting (and opinionated) overview of the existing configuration languages.
At the very bottom, author is presenting its own language (maml.dev). It is yet another JSON with comments, multiline strings, and optional commas. It's the most boring part of the page.
agreed, but then even some third-party project is too exciting.
My choice is "whatever is available in the standard library" - which used to be INI, JSON or YAML. But with python 3.11 there is also TOML, which is nice...
Interestingly, at least in this list, after JSON/YAML were published in 2001 there's a 10y+ gap until competitors started arriving. Of those, considering only markup languages (so no Dhall/Nickel/etc*), EDN/TOML/JSON5/HJSON were published in early 2010s and all the rest (~10) are published late 10s-now 20s. Author's MAML being close to first in this new wave.
*Honorable mention to Nix, which is also functionally a configuration language, released a decade before those. Feel like recent few years there has been an increase in remaking existent things slightly different and marketing them like new.
There's a direct correlation between how exciting and fun a configuration syntax looks and how much you'll despise it after a year of having to deal with the fallout of those 'exciting features'.
Speaking of which, Pkl looks absolutely fascinating, I might have to check that out...
The Kubernetes ecosystem has made me hate YAML so much. Maybe I am just not using the tooling correctly but anchors all over the place, templating, etc.
I have yet to see a solution to that type of thing though as every config language exists on a spectrum of syntax to full blown programing language.
YAML isn't really the problem with Kubernetes though. The Kubernetes problem is that the config is complex and heavily typed, but also essentially ad hoc - outside some basic constructs there's conventions but they're all different or evolved and not uniform.
It does get a lot easier to deal with when you have VS Code tied into CRD definitions for example, butcimmnot convinced config is the problem so much as just Kubernetes is all things to all people and as a result what any given configuration actually does is almost completely random and likely to hit a bunch of edge cases.
My problem with YAML is that even when you try to keep it simple it tries its best to trip you up with some obscure feature you never wanted to deal with
I agree that YAML has its issues. But saying it's bad because it has too many features seems misguided. Do those advanced features get in the way of using a simple YAML file? At least not in my experience.
Yes, the features in YAML absolutely get in the way. The "Norway problem" for example ("no" translates to "False" in versions prior to YAML 1.2), but that is just the one more people ran into. Here is a nice overview of some of the problems:
Could go the other way and only have strings a la CONL[0]. Which feels reasonable once I got over my initial shock. Your program is already going to enforce data types, why force that into the config language?
Personally I have embraced using sqlite as my configuration infrastructure, as the more I've used configuration formats, the more I've felt they are glorified databases.
The reason they're called configuration files is because they're often meant to be edited directly by the user. Which is why there are so many configuration formats, since everyone has a different opinion on the ideal UI/UX.
You can store configuration in an actual database, but asking your users to use SQL to change it is sadistic. In that case you really need a UI to allow users to change configuration, which defeats the purpose of the simplicity of configuration files.
Just some opinions with no serious arguments. Okay XML is in the past but so what? There are still use cases for it. And Pkl/CUE/Dhall/Starlark looking like full-blown programming languages is a feature, but being deliberately weaker than say TypeScript is also a feature (no arbitrary I/O, in some cases no Turing completeness). I don’t think the author understands or appreciates why.
OpenBSD's daemons' configs are pleasant. But then you have to maintain a DSL for even small projects. Most people aren't going to want to implement and maintain a whole one-off config language for their small project. Which is why having a common, minimal option is nice.
I always found Django's approach smart. The configuration files are Python files.
That said, TOML is not unsexy.
I hate JSON as configuration format, it is an exchange format, for configuration TOML is clearly more pleasant. VS Code, Sublime, you are doing it wrong.
The author dismisses many entries, saying "it's a programming language, I want a config language, or I'd use Javascript". I completely disagree with the author, given my experience.
- You want bits of programmability in your config language. Else you end up with copy-pasting pieces and adding comments like "must be the same value as foo.bar below!". This sucks enough that you may end up with a config generator for some custom meta-config language, which you need to produce and maintain. You very certainly need symbolic constants, it's great to have simple arithmetics and string concatenation, and ideally you need a way to map a config fragment over a list.
- You don't want a Turing-complete language as your config language. You also don't want to allow any I/O or other effects in that language. You don't want to allow your config parser to do anything but to form a tree of keys and values in memory after reading the config. Javascript is right out. XML is also right out, because entities allow for external resource links, that is, uncontrolled I/O; this was a source of many exploits.
- Nevertheless, you want your configs to be modular when needed, allowing to graft into your main config imported values from some common templates, specific overrides, etc. Direct inclusion is a poor but serviceable approach.
- You want the values in your config be typed. They can be strings by default to save some typing. But you want your config parser to be able to complain when you write a string instead of a number, of a float instead of an integer, or an integer out of representable range (the bane of large numbers in JSON), or a date that cannot be parsed, or a null for a value that cannot be null. This saves you from a ton of nasty surprises when configs are updated.
- You want your config language to have explicit delimiters, so that incompletely copy-pasted constructs would be detected. Hence YAML is a bad choice, JSON5 is a much better choice.
- Of course, you absolutely want comments. JSON is a serialization format, not a human-operated config format.
This is why, I think, CUE, Dhall, and Jsonnet are doing more or less the right thing, but they are a tad large. Maybe something simpler and more compact exists, with some limited programmability and easy syntax.
For the reasons you mention (mainly the first one), it can sometimes be useful to use a separate programming language with functions that are used to create the required format, without needing the configuration format itself to do these things.
Can you do an infinite loop in that separate language? Throw an exception? Write to disk? Start processes? Access the Internet?
If not, you can use this language safely inside the config.
If yes, you could as well use the main language of your application, and not introduce an extra dependency.
There is, of course, always a temptation to take awk, or PowerShell, or Ruby, "because it's already installed on the system", which may not be the case on all platforms.
I did not mean that you have to introduce an extra dependency; the programming language used to make such a configuration can be the main language of the application. Although whoever needs to configure it can be someone other than whoever wrote it, and they can use whatever they want to use. The output can then be used as the configuration even on a different computer if it is not necessary to change the configuration but only use it on multiple computers. This will be appropriate for some uses but for other uses it will not be appropriate. Note also that different programs will have different requirements for what kind of configurations you will require.
The features you mention might be useful, but they're certainly not required.
If your program has grown so complex that users would benefit from modular configs, references, and so on, then a configuration file is the wrong approach for configuring it. You should rather provide a custom UI for users to modify the program's behavior, and persist the configuration transparently in the background for them. This can be done via a CLI, GUI, API, whatever.
The fact that tools like GitHub Actions, Ansible, Kubernetes, etc., abuse a format like YAML to make it work as a DSL, and we needed to invent tools like CUE, Dhall, and Jsonnet to manage this mess is absolute insanity. Most programs don't need such level of complexity. And for those, plain JSON, or minimal configuration formats like the one discussed here, are good enough.
> This can be done via a CLI, GUI, API, whatever.
So then I need to write a Terraform provider or whatever and a bunch of my own configuration to be able to declaratively configure the thing?
Shallow. Especially “ The era of XML is in the past… now it's dead.”
That’s not an argument against the merits of XML, it’s just a fashion declaration.
It’s also wrong. Podcasting boils down to XML files. It even heavily uses XML’s extension mechanism. XML is the basis for RSS and Atom. XML is the basis for LibreOffice and modern Microsoft Office files. XML is at the core of the epub digital book standard. And on and on.
XML may not be ideal for config (reasonable people can disagree on the topic) but it’s not dead.
It’s also interesting that he declares JSON “won” then adds a bunch of XML features to it like ordered entries and not having to quote element/attribute names.
JSON “won” for web apis - e.g. browser-server data interchange. It is not, and was never claimed to be, good for serial documents like config, or for when you need an extensible format (the “X” in XML). It’s fine for what it is, and so is XML, which, when it “won”, was similarly overused, like for web apis.
I've often wondered if PostScript or PDF contained the roots of a very good config language. Perhaps it simply is (PDF docs at least) but nobody regards it as such.
My guess is the RPN nature would be a no-go for many people. Nevertheless: comments, dicts, arrays, good string syntax, numerics, binary data, etc. Maybe that makes it too complicated.
PostScript is not so bad as a programming language, and it has a binary format as well as a text format, although the limits of numbers and some other things can be too small for many applications. The syntax of TER is based on PostScript (although it is not actually compatible, for many reasons; for example, the syntax "1.0" does not denote a number in TER, but "+1.0" does).
The perfect configuration system would be able to support any number of parameters, enforce arbitrary validation rules, be searchable, provide metadata, examples, descriptions for each parameter, provide an audit log of who/why something was changed, support rollbacks, and so on.
The problem is that folks want a flat human readable file format to solve all these problems. That's a pipe dream.
JSON and CSV are pretty close to "syntactically optimal" in terms of losslessly storing nested and tabular data, respectively, in a human readable format. We ought to just stick with those and think about how to design effective configuration systems on top of them.
It totally is a pipe dream to find some magic flat human readable file that does all this, you need to think of it as a config system not a config file.
So then why aren't we questioning our core assumption here that the solution has to be the shape of a human readable file format at all and accepting "syntactically optimal" as the best we can come up with?
Other than that text files are a compatible format to work with for the tools that coders use to interact with everything (text editor/git) why does it need to be a text file at all? or even human readable? Should we even call csv human readable... ?
The perfect config system you describe could equally be implemented on top of any kind of storage (file, db, whatever) with a configuration editor ui to drive it all. And thanks to llm's even for totally one-off custom configuration files making a small web editor for them is basically free if you can give it the schema and some examples.
And ui doesn't mean it can't be keyboard driven (just to preempt people griping about using a mouse). But it does mean that you can present your config in the way that makes the most sense to actually do the configuration instead of having to work around any limitations of a syntax. An example is the configs out there with the trailing comment "foo bar #foo bar sets the the foo bars the bar in the blaze." Maybe you've configured foo 100 times, you don't need the comment to know what foo sets but the syntax demands it so you have to see it every time. In a dedicated visual config editor the comment can be pulled up only when needed.
Will your config "file" end up with some extra cruft to store things about how to show it properly in the ui? totally. but we don't care because we don't actually edit that "file" directly anymore remember?
I do think human-readability is a useful feature, to some extent. Human operators should be able to sanity-check individual components of the system for correctness, and this is so much easier if intermediate data formats are "human readable" (which I would extend as far as .sqlite, assuming a straightforward schema design).
Yeah I’m not saying you shouldn’t have a file that is possible to sanity check if that’s important to your system, but I would say (contrary to op and many comments) that we shouldn’t design around that as the core feature assumption. We don’t have trouble sanity checking our normal db tables so I’d imagine that any format that has a reliable way to see the loaded-in-memory representation would meet the same level of sanity checking.
CSV has too many features. I recently found out the hard way that line breaks are allowed inside values, so you can't actually read CSV line-by-line.
Or maybe not enough features, because there's no other way to define a linebreak. \n isn't a thing.
And also, some editors are picky about the number of commas on each line. Excel and Sheets don't care if you have uneven number of commas, but IntelliJ's CSV viewer I think just looks at the first line to determine how many columns there ought to be.
The author seems to be using "markup language" as a concept basically synonymous with a configuration language, or something that is not a programming language. A markup language is a language used to "mark up" text with formatting and structure. This may sound like a terminology nitpick, but I would argue the reason why for example XML is not a great configuration language is that it was designed to be something else – a markup language.
XML is a fantastic configuration language. Its primary purpose is text markup, right, but it can really shine as a configuration language too. It is meant to be used as a language, that is to describe structures of unbounded complexity while keeping them syntactically constrained.
The best configuration language is the one that is custom made for a specific application. A domain-specific language. But DSLs requires a parser and adding one is usually too cumbersome to merely parse a configuration. This is where XML comes in.
I think the author is probably aligning with "is" rather than "ought". So a language made this list if it IS being used as a configuration format, regardless of whether it OUGHT to be used as one.
Which is a way of deciding that makes sense given that I think the purpose of this article is "use my language instead". Getting lost in the weeds about each language's original design intent would bloat the article without meaningfully contributing to their thesis.
Given that it's an essential/definitional issue, I would have preferred that the author at least showed awareness of the differences between languages intended for marking up text and languages intended to represent data in a structured way.
Back when XML was first being developed, I was really anticipating having a standardized, easier-to-implement successor to SGML (which was hampered by its complexity and by the cost of the ISO standard) in the text markup space. It was that disappointing it ended up filling the vacuum in the space of serialized representations for for structured data, then getting rejected when it wasn't quite as suitable for that as alternatives such as JSON.
Me too, I still think it's cool. But while indeed easier-to-implement compared to SGML, it could have been yet a bit simpler. :) I once attempted to write a conforming parser, I remember it being a _lot_ of work to even determine well-formedness.
Ah, I was perhaps a bit unclear – I didn't mean to comment on whether certain languages made the list or not, I enjoyed the thorough walkthrough of languages being used for configuration.
I was more commenting on comments such as this one under Pkl:
> This is not a markup language. This is a full-blown programming language.
And under Nickel:
> Nice programming language. Not a markup language.
It's like he's saying "we should use markup languages for configuration, not programming languages", which I don't think he means.
I don't get why so many people hate XML. I'd rather us it than YAML.
I’ve often wondered the same. The op doesn’t go far beyond:
> The era of XML is in the past.
Which is about as deep as it seems to get. My suspicion is that the reason people don’t like it is because it’s a bit of a pain to work with in JavaScript
XML has a DOM API. And most people use JavaScript in a browser to manipulate the DOM. So I would say among all the language it should be the JavaScript devs that are the most proficient at the DOM API.
Once you have learned the DOM API, it’s pretty much universal. Things like Document.getElementsByTagName not only work in JavaScript but also in Python: https://docs.python.org/3/library/xml.dom.html#xml.dom.Docum...
More generally XML is so thoroughly out of fashion that it is perceived as driving resume-driven architecture backwards in time, into an unfamiliar "dark age" of deliberate engineering decisions.
In the world of human-readable data formats (ie not programming languages), the best one I ever used was Jane Street's sexplib[1] s-expression format.
It was concise and expressive. There was a direct way to describe variants (types with multiple constructors), which is always awkward in JSON, but the format was still surprisingly low-noise for reading and editing by hand. I remember you could even use it as a lightweight markup format:
(The format leaves the interpretation of things like (em ...) up to you; you could use it as a slightly more verbose Markdown, but you could also use it to structure readable text with other sorts of metadata instead.)And, unlike certain other formats I won't name, it has comments!
It also helps that Emacs with Paredit makes editing s-expressions flow. The tool doesn't need to know anything about the sexplib format specifically; just relying on basic s-expression structure gives us fluid but simple structural editing.
I am constantly sad that nobody else uses this sort of format, and I have to deal with a mixture of JSON, YAML, TOML and other ad-hoc formats instead.
[1]: https://github.com/janestreet/sexplib
Agree, sexp is quite nice. That was my favorite before json came around. Not that I like json particularly, it just ate the world so it's easier to go along with it.
No mention of The Configuration Complexity Clock [1] which I always think deserves a link, but credit to the author for actually keeping it slim and readable over (IMHO) most of the newer additions to the landscape which either declare themselves as 'obvious' and aren't or just add features such as Pkl's extends keyword, moving us further round the complexity clock.
Of course just adding multiline strings is the start of the rabbit hole, now you need to think about leading line breaks, trailing line breaks, intermediate line breaks, whitespace chomping, and- oh heavens I've reinvented YAML, I think I need to lie down.
[1] https://mikehadlow.blogspot.com/2012/05/configuration-comple...
Nice article. I've definitely moved the clock to 6.
And I remember toying with the idea of 9 but I think for a different purpose, I don't remember what. Never did get that far though, I'm acutely aware that tooling is also important, but it's still oh-so-tempting.
I wonder where the company is at now. I don't think they would have moved the clock ahead but they may have moved it backwards to 12.
I don't regret my little rules engine though. It held up very well. It was intended for non-devs to be able to use, but the article is quite right, a complex set up was not easy to understand! I often got questions when things looked wrong but after much thinking the rules were usually calculating things correctly.
The alternative would have been 100 hardcodings because each client needed something different.
One kinda-exception I'd like to raise: Cases where you'd like to use the regular proper programming language from the very beginning, but there are trust issues, and there's no good/reliable sandboxing option.
For example, B2B stuff where every customer has their own idiosyncratic sets of rules for if-this-then-that, which change at a different cadence than your releases.
In those cases, it's less that configuration slowly becomes too complex and evolves into code, and more that code is wanted from the get-go but configuration is the compromise.
> every customer has their own idiosyncratic sets of rules for if-this-then-that
So Varnish Configuration Language? It's definitely an awkward case that doesn't seem to fit neatly into the landscape.
Thanks for this - it expresses something I've felt for a while. Solve too hard and you end up back at midnight.
I'm sure there's no relation to The Doomsday Clock.
Regarding key order in JSON:
- in the spec, key order clearly exists at the syntactic level
- it's just that the spec says nothing about the semantics of JSON (besides "A conforming processor of JSON texts should not accept any inputs that are not conforming JSON texts.")
quote: "The JSON syntax [...] does not assign any significance to the ordering of name/value pairs"
As a result, the operation of "parsing then serializing again" is not even guaranteed to be idempotent across different environments.
FWIW, JSON with `#` comments and redundant comma before `}` and `]` is already supported by many programming languages, e.g., Perl's JSON module (the 'relaxed' option). The advantage is that correct JSON also works, i.e., that JSON module just make it less error prone for humans, but otherwise sticks to the existing format.
> This is why I based my MAML on top of JSON. I fixed things that were a little bit annoying for me inside JSON.
And now it's incompatible with JSON, which is the worst part of the idea.
Instead, maybe propose a patch to your favorite programming language's JSON module to accept the same extensions for humans that the Perl module already supports, and maybe someone uses it for their JSON config files. This moves us together instead of further apart.
I don't like this article all that much but I do like the MAML format the author wrote this to promote. It seems like it’s everything that’s good about JSON, plus everything fixed for human writing, and nothing more.
Hats off, wish my package.json and build.yml were package.maml and build.maml.
MAML is an improvement from JSON, but does not improve the data types much (although it does make one improvement, which is that it distinguishes integers from floating point), although the syntax is improved.
Dismissing full-blown programming languages is a mistake IMHO. Sometimes it's useful to generate configuration (e.g. starting from a template and to build configuration for prod vs QA or for various deployment targets).
If you have a full-blown programming language you can sprinkle a little bit of code in the template and get express everything you need in a clean way. If all you have is a templating language you have to shoehorn your logic into it (likely making a mess) or use the templating language + a "real" programming language.
The ideal setup is something like the safe interpreters in Tcl, which let you do very fun things like having config files in pure Tcl, but with limited access to the machine and master interpreter.
Depending on the kind of configuration, different formats might be helpful.
I made up TER, which supports all ASN.1 types (although in some cases there is not a built-in syntax for them and you must write them in terms of other types; however, extensions can be made for your specific application) and is not limited to Unicode, nor does it limit integers to 64-bits (and you can write integers in any base up to thirty-six). You can convert TER to DER (I wrote a program to do this; it is not really intended for an application to use the full specification of TER itself except for using an external program to convert to DER and then use that instead), and you might also make a subset of TER for specific applications.
But, like any other one, it also has advantages and disadvantages (including some that are mentioned in that article; if they look at TER they would probably have criticisms about that too).
There are two kind of criticisms with JSON, having to do with the data types, and having to do with the syntax, and I had mentioned some of these in the past (some of them are commonly mentioned by others too, but some are less commonly mentioned by others).
Another configuration format is the X resource manager format.
I wonder if "the right solution" is a programming language that is fast, concise, trivially easy to run, and outputs some efficient binary format like protobuf.
Programming languages have comments and control flow, multiple popular implementations, and can have nice literals. Lack of Turing completeness is actually not a terribly useful feature if you trust the input (and you should probably just use protobufs or similar for untrusted inputs in that case.)
> Programming languages have . . . control flow
I do not want control flow of any kind in my configuration file. Nor do I want expressions or any kind of evaluation.
Greppability is a must-have feature for me. As is simplicity - I don’t want to have to deal with internalizing interpreter mechanics, rules for precedence, variable scope, etc just to figure out what config values my program is going to be provided with.
Any time I’ve been forced to work with a system which used a general or restricted programming language to express configuration, it’s been a nightmare.
If your config language doesn't have its own control flow then it's going to get a meta layer added on with control flow. Like, what do you do if this service has a different hostname in staging versus prod? Or connects to a different DB, or whatever. Either have one template file and give it a values file when you go to deploy it, or have two fully-written out files... which becomes annoying when you have to make sure you keep them in sync across two or more environments, often leading to a hacked on and poorly implemented templating system anyway (eg, Helm)
There is also .ini, it's simpler than toml, though has more inconsistent implementations and formats. But the '[[table]]' is just '[table]' and I like the '[table.subtable]' syntax better.
But since we're criticizing formats ;-) maml https://maml.dev is fairly good good.
* I like multiline blocks and comment style
* I do not like optional commas, either require them or not. I'd lean toward requiring them.
* Not sure about ordered object keys. How should languages represent that, a map/dict kind of a data structure won't work it would have to be an ordered dict/list of kvs etc. Two objects {"a":"x", "b":"y"} and {"b":"y", "a":"x"} will be different then. I get the idea, but I am leaning toward not liking it by the time I finished typing all that.
* I like booleans and null. Good for not having ons and offs those are just annoying.
About the ordered keys, most good languages have an ordered map type, either by accident or by design. Those few languages that don’t would need a list of key-value pairs. Given that this is for config, the performance cost of linear-scanning such a list would be negligible in realistic use cases.
Personally I think it’s a very good decision. Simply by declaring that objects are ordered, it becomes the case and it unlocks a whole lot of useful mechanisms.
I've actually been getting a lot of mileage out of textproto: https://protobuf.dev/reference/protobuf/textformat-spec/
Well typed, simple syntax. Maps are annoying though.
I agree on this in general. The awkward things here are that:
a) textprotos aren't really touted by the protobuf folks as a thing to use outside Google. I'm not 100% sure why this is.
b) inside Google, there's a perception that you shouldn't use textprotos for much other than hardcoding proto values inside the monorepo (where there aren't really schema-versioning concerns). I think this perception is misplaced, you just have to be aware that a given schema is used in textprotos. Which is usually an easy thing to be aware of. This is just because the schema-versioning concerns are different than with binary photos (e.g. field renames are now breaking).
c) IIRC most parsers unconditionally reject unknown fields. I think the reason for this is highlighted in the docs: you can't safely go from a textproto with an unknown field to a binary serialisation of that proto. IIRC there are some parsers that let you parse unknown fields anyway but then I think you're a bit more tied to a specific implementation than you'd like...
One strange but quite handy alternative is actually the JSON representation. You can use .proto files as a schema but then serialise the values to JSON as there's a canonical mapping. Then you get something that's human readable but with the type safety of protos. Although of course it's not really writable since... It's JSON.
What is the name of the style found in (e.g.) ISC BIND9:
* https://www.zytrax.com/books/dns/ch6/#master
* https://bind9.readthedocs.io/en/latest/manpages.html#named-c...
ISC-style? (E/A)BNF-ish?
Personally I like (a) the open/close braces for better stanza navigation, (b) all statements/lines ending with semicolon.
(Something similar was used in (now-EOL) ISC DCHPd, and they've moved to "extended JSON" for their new DHCP server, KEA.)
I've always had a weird fondness for the apache configuration format.
It's a nice mix of syntax that looks like what it does and general support.
It's probably stockholm syndrome.
I like json
I kind of wish it had two and only two features: defined ordering, and comments
As to languages for configuration (specifically), I wish there was a good specification for a program config, where you would have one well-defined configuration file for defaults, and a user-specified configuration file for the values you have set/overriden.
say default.config:
and my.config: in some way that upgrading the program could do stuff with default.config without destroying your existing config.Why do you and the author want 'defined ordering'? I don't think I've wanted that in a config language.
I rely on it a tiny bit in PHP. In JS, if order matters, I just use an array, and it's really never been a big deal.
Maps/dicts should be unordered IMO.
That's a pretty common config pattern. SublimeText does that, and when you edit the settings it opens a 2 paned editor with the default config on the left and user config on the right.
> Well, this is some sort of programming language with dynamic types. But there are so many good programming languages, so I don't know why this one needs to be used.
RE: Jsonnet and others: because it has nice guarantees, like lack of arbitrary I/O and pure execution.
See: https://sre.google/workbook/configuration-specifics/#pitfall...
Jsonnet solves a different problem though.
Need to configure 5 services with hundreds of replicas in 7 data centers? Some values depend on the service, some on the data center and some on the combination thereof? Maybe also overrides for a bunch of problemstic machines?
And you also want a manageable config language which doesn't turn into a full blown Turing tar pit?
Then jsonnet is for you.
So it's not entirely fair to compare it in the "pleasant syntax" contest. It's like putting a Unimog into a ranking of city cars.
What about bash? https://wiki.archlinux.org/title/PKGBUILD
> Spoiler: I created my own configuration language maml.dev
Of course you did! :-)
If you want to use JSON as a configuration language, why keep the top level braces and indentation?
I didn't see any indication in the article, or a glance at the spec, that indentation is significant.
Very interesting (and opinionated) overview of the existing configuration languages.
At the very bottom, author is presenting its own language (maml.dev). It is yet another JSON with comments, multiline strings, and optional commas. It's the most boring part of the page.
A good configuration syntax isn’t supposed to be exciting.
agreed, but then even some third-party project is too exciting.
My choice is "whatever is available in the standard library" - which used to be INI, JSON or YAML. But with python 3.11 there is also TOML, which is nice...
Interestingly, at least in this list, after JSON/YAML were published in 2001 there's a 10y+ gap until competitors started arriving. Of those, considering only markup languages (so no Dhall/Nickel/etc*), EDN/TOML/JSON5/HJSON were published in early 2010s and all the rest (~10) are published late 10s-now 20s. Author's MAML being close to first in this new wave.
*Honorable mention to Nix, which is also functionally a configuration language, released a decade before those. Feel like recent few years there has been an increase in remaking existent things slightly different and marketing them like new.
There's a direct correlation between how exciting and fun a configuration syntax looks and how much you'll despise it after a year of having to deal with the fallout of those 'exciting features'.
Speaking of which, Pkl looks absolutely fascinating, I might have to check that out...
No but you'd have to do a lot of work to convince me it's really any better then basic YAML (yaml without going nuts with anchors and indirection).
The Kubernetes ecosystem has made me hate YAML so much. Maybe I am just not using the tooling correctly but anchors all over the place, templating, etc.
I have yet to see a solution to that type of thing though as every config language exists on a spectrum of syntax to full blown programing language.
YAML isn't really the problem with Kubernetes though. The Kubernetes problem is that the config is complex and heavily typed, but also essentially ad hoc - outside some basic constructs there's conventions but they're all different or evolved and not uniform.
It does get a lot easier to deal with when you have VS Code tied into CRD definitions for example, butcimmnot convinced config is the problem so much as just Kubernetes is all things to all people and as a result what any given configuration actually does is almost completely random and likely to hit a bunch of edge cases.
Personally, I like configuration languages that don't let you "go nuts" in the first place.
My problem with YAML is that even when you try to keep it simple it tries its best to trip you up with some obscure feature you never wanted to deal with
I agree that YAML has its issues. But saying it's bad because it has too many features seems misguided. Do those advanced features get in the way of using a simple YAML file? At least not in my experience.
That being said TOML would be my choice.
Yes, the features in YAML absolutely get in the way. The "Norway problem" for example ("no" translates to "False" in versions prior to YAML 1.2), but that is just the one more people ran into. Here is a nice overview of some of the problems:
https://ruudvanasseldonk.com/2023/01/11/the-yaml-document-fr...
I would argue that most, if not all, of those problems stem from too many features.
Could go the other way and only have strings a la CONL[0]. Which feels reasonable once I got over my initial shock. Your program is already going to enforce data types, why force that into the config language?
[0] https://cirw.in/blog/conl
Personally I have embraced using sqlite as my configuration infrastructure, as the more I've used configuration formats, the more I've felt they are glorified databases.
Glorified? They're just text files.
The reason they're called configuration files is because they're often meant to be edited directly by the user. Which is why there are so many configuration formats, since everyone has a different opinion on the ideal UI/UX.
You can store configuration in an actual database, but asking your users to use SQL to change it is sadistic. In that case you really need a UI to allow users to change configuration, which defeats the purpose of the simplicity of configuration files.
Yes, but in various ways in imo they require similarish purposes to databases.
Hence why I prefer something like sqlite.
Text config are good for wildcard configurations such as manifest or cargo.Toml.
But for a fixed and immutable configuration with multiple data types I think databases are better.
Middle Aged Man Language - too many braces.
So, yaml with squiggles. Roger roger.
JSON sans commas, but with extra strings.
A configuration language for what? What kind of configurations?
Just some opinions with no serious arguments. Okay XML is in the past but so what? There are still use cases for it. And Pkl/CUE/Dhall/Starlark looking like full-blown programming languages is a feature, but being deliberately weaker than say TypeScript is also a feature (no arbitrary I/O, in some cases no Turing completeness). I don’t think the author understands or appreciates why.
To be honest, all I see is https://xkcd.com/927/
All config formats are bad. You either don't need all the features at all, except key:value. Or you quickly run into weird limitations and quirks.
What I rather like instead, are custom build english-like DSL's, like:
- https://man.openbsd.org/pf.conf - https://man.openbsd.org/smtpd.conf - https://man.openbsd.org/httpd.conf - … and many more
OpenBSD's daemons' configs are pleasant. But then you have to maintain a DSL for even small projects. Most people aren't going to want to implement and maintain a whole one-off config language for their small project. Which is why having a common, minimal option is nice.
I always found Django's approach smart. The configuration files are Python files.
That said, TOML is not unsexy.
I hate JSON as configuration format, it is an exchange format, for configuration TOML is clearly more pleasant. VS Code, Sublime, you are doing it wrong.
The author dismisses many entries, saying "it's a programming language, I want a config language, or I'd use Javascript". I completely disagree with the author, given my experience.
- You want bits of programmability in your config language. Else you end up with copy-pasting pieces and adding comments like "must be the same value as foo.bar below!". This sucks enough that you may end up with a config generator for some custom meta-config language, which you need to produce and maintain. You very certainly need symbolic constants, it's great to have simple arithmetics and string concatenation, and ideally you need a way to map a config fragment over a list.
- You don't want a Turing-complete language as your config language. You also don't want to allow any I/O or other effects in that language. You don't want to allow your config parser to do anything but to form a tree of keys and values in memory after reading the config. Javascript is right out. XML is also right out, because entities allow for external resource links, that is, uncontrolled I/O; this was a source of many exploits.
- Nevertheless, you want your configs to be modular when needed, allowing to graft into your main config imported values from some common templates, specific overrides, etc. Direct inclusion is a poor but serviceable approach.
- You want the values in your config be typed. They can be strings by default to save some typing. But you want your config parser to be able to complain when you write a string instead of a number, of a float instead of an integer, or an integer out of representable range (the bane of large numbers in JSON), or a date that cannot be parsed, or a null for a value that cannot be null. This saves you from a ton of nasty surprises when configs are updated.
- You want your config language to have explicit delimiters, so that incompletely copy-pasted constructs would be detected. Hence YAML is a bad choice, JSON5 is a much better choice.
- Of course, you absolutely want comments. JSON is a serialization format, not a human-operated config format.
This is why, I think, CUE, Dhall, and Jsonnet are doing more or less the right thing, but they are a tad large. Maybe something simpler and more compact exists, with some limited programmability and easy syntax.
For the reasons you mention (mainly the first one), it can sometimes be useful to use a separate programming language with functions that are used to create the required format, without needing the configuration format itself to do these things.
Can you do an infinite loop in that separate language? Throw an exception? Write to disk? Start processes? Access the Internet?
If not, you can use this language safely inside the config.
If yes, you could as well use the main language of your application, and not introduce an extra dependency.
There is, of course, always a temptation to take awk, or PowerShell, or Ruby, "because it's already installed on the system", which may not be the case on all platforms.
I did not mean that you have to introduce an extra dependency; the programming language used to make such a configuration can be the main language of the application. Although whoever needs to configure it can be someone other than whoever wrote it, and they can use whatever they want to use. The output can then be used as the configuration even on a different computer if it is not necessary to change the configuration but only use it on multiple computers. This will be appropriate for some uses but for other uses it will not be appropriate. Note also that different programs will have different requirements for what kind of configurations you will require.
The features you mention might be useful, but they're certainly not required.
If your program has grown so complex that users would benefit from modular configs, references, and so on, then a configuration file is the wrong approach for configuring it. You should rather provide a custom UI for users to modify the program's behavior, and persist the configuration transparently in the background for them. This can be done via a CLI, GUI, API, whatever.
The fact that tools like GitHub Actions, Ansible, Kubernetes, etc., abuse a format like YAML to make it work as a DSL, and we needed to invent tools like CUE, Dhall, and Jsonnet to manage this mess is absolute insanity. Most programs don't need such level of complexity. And for those, plain JSON, or minimal configuration formats like the one discussed here, are good enough.
> This can be done via a CLI, GUI, API, whatever. So then I need to write a Terraform provider or whatever and a bunch of my own configuration to be able to declaratively configure the thing?
Shallow. Especially “ The era of XML is in the past… now it's dead.”
That’s not an argument against the merits of XML, it’s just a fashion declaration.
It’s also wrong. Podcasting boils down to XML files. It even heavily uses XML’s extension mechanism. XML is the basis for RSS and Atom. XML is the basis for LibreOffice and modern Microsoft Office files. XML is at the core of the epub digital book standard. And on and on.
XML may not be ideal for config (reasonable people can disagree on the topic) but it’s not dead.
It’s also interesting that he declares JSON “won” then adds a bunch of XML features to it like ordered entries and not having to quote element/attribute names.
JSON “won” for web apis - e.g. browser-server data interchange. It is not, and was never claimed to be, good for serial documents like config, or for when you need an extensible format (the “X” in XML). It’s fine for what it is, and so is XML, which, when it “won”, was similarly overused, like for web apis.
Are those examples using XML because it was the style at the time though?
I've often wondered if PostScript or PDF contained the roots of a very good config language. Perhaps it simply is (PDF docs at least) but nobody regards it as such.
My guess is the RPN nature would be a no-go for many people. Nevertheless: comments, dicts, arrays, good string syntax, numerics, binary data, etc. Maybe that makes it too complicated.
PostScript is not so bad as a programming language, and it has a binary format as well as a text format, although the limits of numbers and some other things can be too small for many applications. The syntax of TER is based on PostScript (although it is not actually compatible, for many reasons; for example, the syntax "1.0" does not denote a number in TER, but "+1.0" does).
The perfect configuration system would be able to support any number of parameters, enforce arbitrary validation rules, be searchable, provide metadata, examples, descriptions for each parameter, provide an audit log of who/why something was changed, support rollbacks, and so on.
The problem is that folks want a flat human readable file format to solve all these problems. That's a pipe dream.
JSON and CSV are pretty close to "syntactically optimal" in terms of losslessly storing nested and tabular data, respectively, in a human readable format. We ought to just stick with those and think about how to design effective configuration systems on top of them.
Yes this ^
It totally is a pipe dream to find some magic flat human readable file that does all this, you need to think of it as a config system not a config file.
So then why aren't we questioning our core assumption here that the solution has to be the shape of a human readable file format at all and accepting "syntactically optimal" as the best we can come up with?
Other than that text files are a compatible format to work with for the tools that coders use to interact with everything (text editor/git) why does it need to be a text file at all? or even human readable? Should we even call csv human readable... ?
The perfect config system you describe could equally be implemented on top of any kind of storage (file, db, whatever) with a configuration editor ui to drive it all. And thanks to llm's even for totally one-off custom configuration files making a small web editor for them is basically free if you can give it the schema and some examples.
And ui doesn't mean it can't be keyboard driven (just to preempt people griping about using a mouse). But it does mean that you can present your config in the way that makes the most sense to actually do the configuration instead of having to work around any limitations of a syntax. An example is the configs out there with the trailing comment "foo bar #foo bar sets the the foo bars the bar in the blaze." Maybe you've configured foo 100 times, you don't need the comment to know what foo sets but the syntax demands it so you have to see it every time. In a dedicated visual config editor the comment can be pulled up only when needed.
Will your config "file" end up with some extra cruft to store things about how to show it properly in the ui? totally. but we don't care because we don't actually edit that "file" directly anymore remember?
I do think human-readability is a useful feature, to some extent. Human operators should be able to sanity-check individual components of the system for correctness, and this is so much easier if intermediate data formats are "human readable" (which I would extend as far as .sqlite, assuming a straightforward schema design).
Yeah I’m not saying you shouldn’t have a file that is possible to sanity check if that’s important to your system, but I would say (contrary to op and many comments) that we shouldn’t design around that as the core feature assumption. We don’t have trouble sanity checking our normal db tables so I’d imagine that any format that has a reliable way to see the loaded-in-memory representation would meet the same level of sanity checking.
CSV has too many features. I recently found out the hard way that line breaks are allowed inside values, so you can't actually read CSV line-by-line.
Or maybe not enough features, because there's no other way to define a linebreak. \n isn't a thing.
And also, some editors are picky about the number of commas on each line. Excel and Sheets don't care if you have uneven number of commas, but IntelliJ's CSV viewer I think just looks at the first line to determine how many columns there ought to be.
I think this proposed format is terrible. Almost like JSON, but not quite, so both humans and LLMs will be confused by it, and forget the rules.
name-of-config-language
meh.