Xz format inadequate for long-term archiving:
Abstract
One of the challenges of digital preservation is the evaluation of data formats. It is important to choose well-designed data formats for long-term archiving. This article describes the reasons why the xz compressed data format is inadequate for long-term archiving and inadvisable for data sharing and for free software distribution. The relevant weaknesses and design errors in the xz format are analyzed and, where applicable, compared with the corresponding behavior of the bzip2, gzip and lzip formats. Key findings include: (1) safe interoperability among xz implementations is not guaranteed; (2) xz's extensibility is unreasonable and problematic; (3) xz is vulnerable to unprotected flags and length fields; (4) LZMA2 is unsafe and less efficient than the original LZMA; (5) xz includes useless features that increase the number of false positives for corruption; (6) xz shows inconsistent behavior with respect to trailing data; (7) error detection in xz is several times less accurate than in bzip2, gzip and lzip.
Disclosure statement: The author is also author of the lzip format.