ZIP 231: Memo Bundles

The key words “MUST”, “MUST NOT”, “SHOULD”, and “MAY” in this document are to be interpreted as described in BCP 14 ¹ when, and only when, they appear in all capitals.

The term “network upgrade” in this document is to be interpreted as described in ZIP 200. ²

The character § is used when referring to sections of the Zcash Protocol Specification. ³

The terms “Mainnet” and “Testnet” are to be interpreted as described in § 3.12 ‘Mainnet and Testnet’. ⁴

Abstract

Currently, the memo sent in a shielded output is limited to at most 512 bytes. This ZIP proposes to allow larger memos, and to enable memo data to be shared between multiple recipients of a transaction.

Motivation

In Zcash transaction versions v2 to v5 inclusive, each Sapling or Orchard shielded output contains a ciphertext comprised of a 52-byte note plaintext, and a corresponding 512-byte memo field. ⁵ Recipients can only decrypt the outputs sent to them, and thus can also only observe the memo fields included with the outputs they can decrypt.

The shielded transaction protocol hides the sender(s) (that is, the addresses corresponding to the keys used to spend the input notes) from all of the recipients. For certain kinds of transactions, it is desirable to make one or more sender addresses available to one or more recipients (for example, a reply address). In such circumstances it is important to authenticate the sender addresses, to give the recipient a guarantee that the address is controlled by a sender of the transaction; failure to authenticate this address can enable phishing attacks. These Authenticated Reply Addresses require zero-knowledge proofs, and for the Orchard protocol these proofs are too large to fit into a 512-byte memo field.

It is also desirable, for clients with more stringent bandwidth constraints, to be able to transmit encrypted notes to the client without including the encrypted memo data. In the current light client protocol ⁶, this is done by truncating the note ciphertext to just the part that encrypts the memo. However, that has the effect of truncating the authentication tag, and so the resulting decryption algorithm does not meet standard security notions for an authenticated encryption scheme. It is a goal of this proposal to rectify this, simplifying the security argument.

Instead of the memo data, this ZIP proposes that it is possible to indicate whether a memo is present for the recipient. When using the light client protocol, a recipient need not download full transaction information if this indication tells them that they have not received any memo in the transaction.

At present, it is not possible to transmit the same memo data to multiple transaction recipients without redundantly encoding that data, and sending memo data greater than 512 bytes requires sending multiple outputs; the problem is compounded when attempting to send more than 512 bytes to each recipient. By separating memo data from the decryption capability for those memos, it admits a greater variety of applications that utilize memo data, while decreasing the amount of data that needs to be stored on-chain overall.

Privacy Implications

Prior to the activation of this ZIP, every shielded output has an associated memo field. A chain observer can therefore infer a likely 1:1 correlation between transaction recipients and memo payloads. The maximum number of distinct memos is precisely known, and the bounds on possible memo sizes are very small (between 0 and 512 bytes). It is possible to send additional data to a single recipient by adding (potentially zero-valued) outputs; on-chain this is indistinguishable from sending more memos to more recipients or (in the case of Orchard) spending more input notes.

After the activation of this ZIP, recipient count and memo count are decoupled. It becomes possible to construct transactions for which there are many more recipients than distinct memos, or vice versa. This may result in more observably distinct patterns relating the number of recipients to the number of possible memos. One important example is that if a wallet includes an authenticated reply-to address, this may be distinguishable from other ordinary wallet behaviour.

On the flip side, a chain observer now only knows upper bounds on the amount of memo data being conveyed, and the number of possible distinct memos. They have no information about how memo data is split among the recipients. This can provide a privacy improvement in some situations. For example, it is not possible to distinguish between many recipients receiving many small memos, and the same set of recipients receiving one large shared memo.

In summary, when sending large amounts of memo data, the change introduced by this ZIP eliminates a potential distinguisher along one axis in exchange for a potential distinguisher along another.

Requirements

Non-requirements

Specification

Since this proposal is defined only for v6 and later transactions, it is not necessary to consider Sprout JoinSplit outputs. The following sections apply to both Sapling and Orchard outputs.

Memo bundle

A memo bundle consists of a sequence of 256-byte memo chunks, each individually encrypted. These memo chunks represent zero or more encrypted memos.

Each transaction may contain a single memo bundle, and a memo bundle may contain at most \(\mathsf{memo\_chunk\_limit}\) = 64 memo chunks. This limits the total amount of memo data that can be conveyed within a single transaction to \(\mathsf{memo\_chunk\_limit}\) × 256 = 16384 bytes, or 16 KiB.

Memo bundles are encoded in transactions in a prunable manner: each memo chunk can be replaced by its representative digest.

Memo encryption

During transaction construction, each output with memo data is assigned a 32-byte memo key \(\mathsf{K^{memo}}\). These keys SHOULD be generated randomly, and MUST NOT be used to encrypt more than one memo within a single transaction. If an output has no memo data, it is assigned the memo key consisting of 32 \(\mathtt{0xFF}\) bytes.

In note plaintexts of v6-onward transactions, the 512-byte memo field is replaced by \(\mathsf{K^{memo}}\).

The transaction builder generates a 32-byte salt value \(\mathsf{salt}\) from a CSPRNG. A new salt MUST be generated for each memo bundle.

The symmetric encryption key for a memo is derived from its \(\mathsf{K^{memo}}\) as follows:

\(\hspace{2em}\mathsf{encryption\_key} = \mathsf{PRF^{expand}_{K^{memo}}}([\mathtt{0xE0}] \,||\, \mathsf{salt})\)

The first byte \(\mathtt{0xE0}\) should be added to the documentation of inputs to \(\mathsf{PRF^{expand}}\) in § 4.1.2 ‘Pseudo Random Functions’ ⁷.

If the generated key is 32 \(\mathtt{0xFF}\) bytes, the transaction constructor MAY repeat this procedure with a different salt, in order to avoid the recipient misinterpreting the output as having no memo data. Since that has negligible probability, it alternatively MAY omit this check.

Each memo is padded to a multiple of 256 bytes with zeroes, and split into 256-byte chunks. Each memo chunk is encrypted with ChaCha20Poly1305 ⁸ as follows:

\(\hspace{2em}\mathsf{IETF\_AEAD\_CHACHA20\_POLY1305}(\mathsf{encryption\_key}, \mathsf{nonce}, \mathsf{memo\_chunk})\)

where \(\mathsf{nonce} = \mathsf{I2BEOSP}_{88}(\mathsf{counter}) \,||\, [\mathsf{is\_final\_chunk}]\).

Finally, the encrypted memo chunks for all memos are combined into a single sequence using an order-preserving shuffle. Memo chunks from different memos MAY be interleaved in any order, but memo chunks from the same memo MUST have the same relative order. The following diagram shows an example shuffle of three memos:

Memo decryption

When a recipient decrypts a shielded output, they obtain a memo key \(\mathsf{K^{memo}}\). From this they derive encryption_key as above, and then proceed as follows:

\(\hspace{1.5em}\) let mutable \(\mathsf{memo} \;{\small ⦂}\; \mathbb{N} \leftarrow 0 \\\) \(\hspace{1.5em}\) let mutable \(\mathsf{counter} \;{\small ⦂}\; \mathbb{N} \leftarrow 0 \\\) \(\hspace{1.5em}\) let mutable \(\mathsf{potential\_last\_chunk\_index} \;{\small ⦂}\; \mathbb{N} \leftarrow 0 \\\) \(\hspace{1.5em}\) for \(i\) from \(0\) up to \(\mathtt{nMemoChunks}-1\): \(\\\) \(\hspace{3.0em}\) if \(\mathtt{vMemoChunks}[i]\) is not pruned: \(\\\) \(\hspace{4.5em}\) let \(\mathsf{nonce} = \mathsf{I2BEOSP}_{88}(\mathsf{counter}) \,||\, [\mathtt{0x00}] \\\) \(\hspace{4.5em}\) let \(P = \mathsf{IETF\_AEAD\_CHACHA20\_POLY1305.Decrypt}(\mathsf{encryption\_key}, \mathsf{nonce}, \mathtt{vMemoChunks}[i]) \\\) \(\hspace{4.5em}\) if \(P \neq \bot\): \(\\\) \(\hspace{6.0em}\) set \(\mathsf{memo} \leftarrow \mathsf{memo} \,||\, P \\\) \(\hspace{6.0em}\) set \(\mathsf{counter} \leftarrow \mathsf{counter} + 1 \\\) \(\hspace{6.0em}\) set \(\mathsf{potential\_last\_chunk\_index} \leftarrow i + 1 \\\) \(\,\\\) \(\hspace{1.5em}\) let \(\mathsf{final\_nonce} = \mathsf{I2BEOSP}_{88}(\mathsf{counter}) \,||\, [\mathtt{0x01}] \\\) \(\hspace{1.5em}\) let mutable \(\mathsf{success} \leftarrow \kern0.05em\) false \(\\\) \(\hspace{1.5em}\) for \(i\) from \(0\) up to \(\mathtt{nMemoChunks}-1\): \(\\\) \(\hspace{3.0em}\) if \(\mathtt{vMemoChunks}[i]\) is not pruned: \(\\\) \(\hspace{4.5em}\) let \(P = \mathsf{IETF\_AEAD\_CHACHA20\_POLY1305.Decrypt}(\mathsf{encryption\_key}, \mathsf{final\_nonce}, \mathtt{vMemoChunks}[i]) \\\) \(\hspace{4.5em}\) if \(i \geq \mathsf{potential\_last\_chunk\_index}\) and \(P \neq \bot\) and not \(\mathsf{success}\): \(\\\) \(\hspace{6.0em}\) set \(\mathsf{memo} \leftarrow \mathsf{memo} \,||\, P \\\) \(\hspace{6.0em}\) set \(\mathsf{success} \leftarrow \kern0.05em\) true \(\\\) \(\,\\\) \(\hspace{1.5em}\) if \(\mathsf{success}\) then return \(\mathsf{memo}\) else return \(\bot \\\)

If any chunk of the memo encrypted to \(\mathsf{encryption\_key}\) has been pruned, the decryption process above returns nothing (as \(\mathsf{final\_nonce}\) will be set to the wrong counter value followed by \(\mathtt{0x01}\)), ensuring that a malformed memo is not returned.

Encoding in transactions

Bytes	Name	Data Type	Description
1	\(\mathtt{fAllPruned}\)	\(\mathtt{uint8}\)	1 if all chunks have been pruned, otherwise 0.
32	\(\mathtt{nonceOrHash}\)	\(\mathtt{byte}[32]\)	The nonce for deriving encryption keys, or the overall hash.
† varies	\(\mathtt{nMemoChunks}\)	\(\mathtt{compactSize}\)	The number of memo chunks.
† varies	\(\mathtt{pruned}\)	\(\mathtt{byte}[\mathsf{ceiling}(\mathtt{nMemoChunks}/8)]\)	Bitflags indicating the type of each entry in \(\mathtt{vMemoChunks}\).
† varies	\(\mathtt{vMemoChunks}\)	\(\mathtt{MemoChunk}[\mathtt{nMemoChunks}]\)	A sequence of encrypted memo chunks.

Transaction sighash

\(\mathsf{memo\_chunk\_digest}[i] = H(\mathtt{vMemoChunks}[i]) \\\) \(\mathsf{memo\_bundle\_digest} = H(\mathsf{concat}(\mathsf{memo\_chunk\_digests}))\)

The memo bundle digest structure is a performance optimization for the case where all memo chunks in a transaction have been pruned.

TODO: finish this to be a modification to the equivalent of ZIP 244 for transaction v6.

Changes to ZIP 317 ¹⁰

The conventional fee in ZEC is altered such that a memo bundle may contain two free chunks if there are any shielded outputs in the transaction. Any memo chunk beyond this requires marginal_fee. See the Fee calculation section of ZIP 317 ¹¹ for details.

Network protocol

Nodes must reject GetData responses having an \(\mathtt{fAllPruned}\) value that is nonzero, or any byte of \(\mathtt{pruned}\) that is nonzero.

Changes to the Zcash Protocol Specification

The following changes affecting the definitions of note plaintexts and note ciphertexts, and the algorithms for encryption and decryption.

In § 4.7.2 ‘Sending Notes (Sapling)’ ¹³ and § 4.7.3 ‘Sending Notes (Orchard)’ ¹⁴:

In § 4.20.2 ‘Decryption using an Incoming Viewing Key (Sapling and Orchard)’ ¹⁶ and § 4.20.3 ‘Decryption using a Full Viewing Key (Sapling and Orchard)’ ¹⁷:

Applicability

Open issues

Interaction with ZIP 302 ¹⁸

Rationale

Memo bundle size restriction

Restricting the total amount of memo data in a bundle, for example to 16 KiB, limits the rate at which the chain size can grow cheaply (from a computational perspective; memo bundles are much easier to produce than proofs or signatures).

The current behaviour for previous transaction versions (no limit on the number of memos) is not altered by this ZIP, because memos in those transactions are tied to individual shielded outputs (incurring their computational cost), and are not natively aggregatable.

Memo chunk size

To understand the effect of memo chunk size, we construct a table showing the total amount of data stored on-chain when encoding 16 KiB of memo data to as many recipients as possible.

Each table entry has the format “\(N\) @ \(M\) (\(O\))” where \(N\) is the maximum number of distinct recipients you can have within the memo data limits, \(M\) is the cost in bytes of that memo data plus memo keys and authentication tags when using a 32-byte memo key, and \(O\) is the relative overhead compared to pre-ZIP-231 memos.

In the “256 32-out” case you have a distinguisher compared to old transactions, in that you can tell the transaction is sending at most 256 bytes per recipient rather than 512 if it is sending the maximum number of memos. But that’s inherently baked into the decision to use a smaller memo chunk size (and it is still possible for the chunks to all be a single memo sent to all outputs, or anything in between).

Chunk size	Memo size ≤ 256 bytes	Memo size = 512 bytes
Pre-231	32 @ 16384 ( 0.00%)	32 @ 16384 ( 0.00%)
512	32 @ 17920 (+ 9.38%)	32 @ 17920 (+ 9.38%)
256	64 @ 19456 (+18.75%)	32 @ 18432 (+12.50%)
256 32-out	32 @ 9728 (-40.63%)

Memo key size

16-byte (128-bit) keys don’t meet Zcash’s target security level of 125 bits, as argued in ¹⁹.

However, for the sake of argument, if we used a 16-byte memo key instead of 32 bytes, the transaction size overhead would become:

The decrease in overhead is relatively modest in most cases, but more noticeable for small memos with a 256-byte memo chunk.

Chunk size	Memo size ≤ 256 bytes	Memo size = 512 bytes
Pre-231	32 @ 16384 ( 0.00%)	32 @ 16384 ( 0.00%)
512	32 @ 17408 (+ 6.25%)	32 @ 17408 (+ 6.25%)
256	64 @ 18432 (+12.50%)	32 @ 17920 (+ 9.38%)
256 32-out	32 @ 9216 (-43.75%)

Encryption format

Including a per-transaction \(\mathsf{salt}\) in the derivation of \(\mathsf{encryption_key}\) gives protection against accidental (or intentional) reuse of \(\mathsf{K^{memo}}\) reuse across multiple transactions. We do not protect against \(\mathsf{K^{memo}}\) reuse within a transaction; it is up to the transaction builder to ensure that the same \(\mathsf{K^{memo}}\) is not used to encrypt two different memos (and if they did so, normal clients would either never observe the second memo, or would decrypt parts of each memo and get a nonsensical and potentially insecure “spliced” memo).

We do not include commitments to the shielded outputs in the derivation of \(\mathsf{encryption\_key}\) for two reasons:

Pruned encoding

The separation of memo data from note data, and the new ability to easily store variable-length memo data, opens up an attack vector against node operators for storing arbitrary data. The transaction digest commitments to the memo bundle are structured such that if a node operator is presented with a memo key (i.e. they are given the capability to decrypt a particular memo), they can identify and prune the corresponding memo chunks, while still enabling the transaction to be validated as part of its corresponding block and broadcast over the network.

The transaction encoding permits pruning at the individual chunk level in order to facilitate pruning an individual memo from a transaction without affecting the other memos. This enables node operators to be responsive to, for example, GDPR deletion requests.

Note that broadcasting a partially-pruned transaction means that the pruned chunks no longer contribute to the upper bound on memo data.

The prunable structure does not introduce a censorship axis; memo bundles do not reveal which memo chunks correspond to which memos, and therefore a network adversary cannot selectively censor individual memos. They can censor any/all chunks within specific transactions, however shielded transactions do not reveal their senders, recipients, or amounts, and thus also cannot be individually targeted for censorship.

Deployment

Reference implementation

References

Terminology