||2 days ago|
|docs||2 days ago|
|src||2 days ago|
|tests||1 week ago|
|.dockerignore||2 weeks ago|
|.gitignore||2 weeks ago|
|Cargo.lock||2 days ago|
|Cargo.toml||2 days ago|
|Dockerfile||2 weeks ago|
|LICENSE.txt||2 months ago|
|Makefile||3 weeks ago|
|README.md||2 weeks ago|
|debug.sh||2 months ago|
Bitbottle: the final archive format.
Bitbottle is a data & file format for archiving collections of files & folders, like "tar", "zip", and "winrar". Its primary differentiating features are:
- All important posix attributes are preserved (owner & group by name, permissions, create/modify timestamps).
- File contents are stored as a database of de-duplicated chunks using buzhash, similar to common backup utilities.
- The format is streaming-friendly for readers: Metadata and content lists appear before file contents, to allow a subset of the files to be extracted with minimal buffering and no seeking.
- Compression may occur per-file or over the whole archive, using snappy (very fast) or LZMA2 (very compact).
- Encryption is built-in: AES-128-GCM or XCHACHA20-POLY1305(*), with Ed25519 or an argon2id password for authentication.
- The container format (bottle) is easily extensible for future algorithms.
(*) I apologize for the ridiculous names. I did not name any of these algorithms.
After writing a few drafts in typescript going back to 2015, this is a rust version intended for a wider audience. As of Sep 2021, only some modules are written and tested. Very much WIP.
There are a couple of command-line tools for testing so far.
"rabbet" will Run A BitBottle Encryption Test. (Yes I realize I dropped a silly name, only a few lines after complaining about silly names. I need this.) It doesn't read or write an actual archive, but it can create and unpack an encrypted bitbottle of raw data, like a simple encryption tool. To encrypt a movie with an ssh public key:
> ./target/release/rabbet -ev ./startrek5-rifftrax.mp4 -o test.bb -r ./tests/data/test-key.pub Encrypting for robey@togusa (34fd22aae3c59072fd6f48147309eb302ea30f6ae5fc6376f683df3e74485a7c) [00:00:00] Encrypting: 346M -- done. [12:12] robey@togusa:~/projects/rust/bitbottle/ (main) > ./target/release/rabbet -iv test.bb Encryption: AES_128_GCM Public key algorithm: ED25519_NACL_SEALED Encrypted for: robey@togusa (34fd22aae3c59072fd6f48147309eb302ea30f6ae5fc6376f683df3e74485a7c) Block size: 1.00M Total size: 346M
"buzscan" is a rust implementation of the buzhash chunking algorithm. Buzhash is a type of rolling hash which incorporates data from only a sliding window, looking for a place with a particularly round number. It's good (but not perfect) at finding these places even as they move around inside a file.
It computes a rolling hash of a stream of data and emits block boundaries and the SHA-256 of each block. This can be used by an archiver to identify duplicate blocks.
Some implementations like borg (C source) use a random table or PRNG to map bytes. Buzscan uses a deterministic table built from recursive applications of CRC-32 that honed in on a good bit distribution.
The "buzscan" CLI tool will traverse a list of files and folders (recursively) and build up a set of blocks, looking for duplicates, and report on the de-duplicated size of the data it found. It's very slow, because it's hashing everything it finds.
> ./target/release/buzscan . [00:00:01] 935 files, 885 blocks, total disk space: 236M, 154M unique
Some of the modules are apparently not pure-Rust, including argonautica and rust-lzma. They require some local package installs:
- libclang (for argonautica)
cargo build --release ./target/release/buzscan --help ./target/release/rabbet --help
- Robey Pointer <firstname.lastname@example.org> https://mastodon.technology/@robey
Apache 2.0 license, included in