Posted on Thu 21 March 2019
Let’s name a thing: dominant data formats.
A data format is a specification for how to turn information into a digital encoding that a computer can work with, and of course how to reverse that back into a human-readable method.
A dominant data format is one that is used to such an extent that tools for converting it are essential for any successor format, and such tools will continue to be maintained and universally available for many years after it is no longer the most popular format, because so much data was stored in it that you are likely to run into it frequently.
A super-dominant data format is an extremely long-lived dominant format that becomes the foundation of multiple dominant data formats.
ASCII is super-dominant. UNICODE UTF-8, at least, is super-dominant.
RS-232c isn’t a storage format, it’s a data protocol. I think it’s a dominant data protocol, as is ethernet, as is MIDI.
HTML is dominant.
CSV is dominant, even though it’s terrible.
For a long time, WordPerfect’s file format was dominant. Nobody uses it, but tools to read and convert it are close to hand.
XML and JSON are probably dominant, but it’s possible that they will be replaced by some other cross-program interchange format in the blink of an eye (relatively speaking).
WAV is dominant. MP3 is probably dominant.
ZIP is super-dominant. Tar and gzip are dominant – are they super-dominant?
EPUB is currently dominant; it’s built on top of UTF8, HTML, and ZIP.
What’s a non-dominant data format, then? It’s anything which is:
- only used internally, or
- only used by one producer of software, or
- only used as an intermediate format, or,
- not standardized enough to be re-implemented.