the final archive format (in rust)
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Robey 2ae3fd690f slight cleanup 2 days ago
docs slight cleanup 2 days ago
src wire up encryption to bitbottle's writer 2 days ago
tests make scanner compute the file's overall sha256 also 1 week ago
.dockerignore i don't know why, but here's a docker file 2 weeks ago
.gitignore i don't know why, but here's a docker file 2 weeks ago
Cargo.lock add date/time formatting 2 days ago
Cargo.toml wire up encryption to bitbottle's writer 2 days ago
Dockerfile i don't know why, but here's a docker file 2 weeks ago
LICENSE.txt don't forget the license 2 months ago
Makefile add clean 3 weeks ago
README.md i don't know why, but here's a docker file 2 weeks ago
debug.sh buzscan CLI tool 2 months ago

README.md

bitbottle

Bitbottle: the final archive format.

Bitbottle is a data & file format for archiving collections of files & folders, like "tar", "zip", and "winrar". Its primary differentiating features are:

  • All important posix attributes are preserved (owner & group by name, permissions, create/modify timestamps).
  • File contents are stored as a database of de-duplicated chunks using buzhash, similar to common backup utilities.
  • The format is streaming-friendly for readers: Metadata and content lists appear before file contents, to allow a subset of the files to be extracted with minimal buffering and no seeking.
  • Compression may occur per-file or over the whole archive, using snappy (very fast) or LZMA2 (very compact).
  • Encryption is built-in: AES-128-GCM or XCHACHA20-POLY1305(*), with Ed25519 or an argon2id password for authentication.
  • The container format (bottle) is easily extensible for future algorithms.

(*) I apologize for the ridiculous names. I did not name any of these algorithms.

Current status

After writing a few drafts in typescript going back to 2015, this is a rust version intended for a wider audience. As of Sep 2021, only some modules are written and tested. Very much WIP.

There are a couple of command-line tools for testing so far.

rabbet

"rabbet" will Run A BitBottle Encryption Test. (Yes I realize I dropped a silly name, only a few lines after complaining about silly names. I need this.) It doesn't read or write an actual archive, but it can create and unpack an encrypted bitbottle of raw data, like a simple encryption tool. To encrypt a movie with an ssh public key:

> ./target/release/rabbet -ev ./startrek5-rifftrax.mp4 -o test.bb -r ./tests/data/test-key.pub
Encrypting for robey@togusa     (34fd22aae3c59072fd6f48147309eb302ea30f6ae5fc6376f683df3e74485a7c)
[00:00:00] Encrypting:  346M -- done.

[12:12] robey@togusa:~/projects/rust/bitbottle/ (main)
> ./target/release/rabbet -iv test.bb
Encryption:             AES_128_GCM
Public key algorithm:   ED25519_NACL_SEALED
Encrypted for:          robey@togusa     (34fd22aae3c59072fd6f48147309eb302ea30f6ae5fc6376f683df3e74485a7c)
Block size:             1.00M
Total size:             346M

buzscan

"buzscan" is a rust implementation of the buzhash chunking algorithm. Buzhash is a type of rolling hash which incorporates data from only a sliding window, looking for a place with a particularly round number. It's good (but not perfect) at finding these places even as they move around inside a file.

It computes a rolling hash of a stream of data and emits block boundaries and the SHA-256 of each block. This can be used by an archiver to identify duplicate blocks.

Some implementations like borg (C source) use a random table or PRNG to map bytes. Buzscan uses a deterministic table built from recursive applications of CRC-32 that honed in on a good bit distribution.

The "buzscan" CLI tool will traverse a list of files and folders (recursively) and build up a set of blocks, looking for duplicates, and report on the de-duplicated size of the data it found. It's very slow, because it's hashing everything it finds.

> ./target/release/buzscan .
[00:00:01]      935 files,      885 blocks, total disk space:  236M,  154M unique

Build

Some of the modules are apparently not pure-Rust, including argonautica and rust-lzma. They require some local package installs:

  • pkg-config
  • liblzma-dev
  • libclang (for argonautica)
cargo build --release
./target/release/buzscan --help
./target/release/rabbet --help

Authors

License

Apache 2.0 license, included in LICENSE.txt.