Manifest discussions

Day 1

  • ‘Schema’ is what is exchanged on the wire protocol. It’s not been broken in the past, so that old clients can still work with it.
  • Tree manifests: It’s possible to calculate both hashes (tree hash and flat manifest hashes).
  • Mozilla central needs 25MB of mapping from old to new hash scheme for both changeset and manifest
  • Having the mapping might cause people to reimplement git alias (spectral note: I don’t know what this is ;))
  • Two switches: new manifest format, and new manifest hash
    • Mozilla can use the new format (trees, manifest v2) without a new hash
  • While we’re changing the ‘schema’ (hashes), what do we want from changeset v2?
    • “extra” key/value pairs on individual lines
    • Support n in filenames
    • Rename information?
    • add/edit/delete status in changesest (no need to touch manifest)?
    • More strict author field validation (require email/rfc format?)
  • hg log on directories becomes faster with tree manifests
  • rename cache to store that a file has not been renamed, so that a lot of checks become much quicker
  • Historical note: the reason for the file list in the changeset is that it’s for push/pull, so deletes didn’t originally show up there, because it didn’t change the filelog.
  • sid0: do we want to store ‘this got deleted’ information in the filelogs, so that hg log <file> shows that it happened?
  • Default is ‘flat manifests’ since gut-feel is ~98% of projects this is the best one for them
  • Certain projects want tree manifests on disk (client? or server? or both (separately? concurrently?))
  • Could make a read-only copy using old hash to do a more gradual migration to exchanging tree manifests?

Three use cases:

  1. Mozilla central today: flag to turn on that uses a new disk format, but no hash changes, so exchange is unaffected.
  2. Google soon: start from scratch with new hashes
  3. Transition from 1->2

Two flags:

  1. Storing tree manifests locally (old hashing)
  2. Break the schema

For flag #1 without #2: The manifest revlog (root-level 00manifest.{i,d}) would have the old hash as its nodeid, and it wouldn’t strictly match the contents at that version.

An extension (client and server side) that can maintain a map for old-hashes in bug trackers?

For getting to Flag #2:

  • Default on the server is that it does not accept manifest v2
  • no v1 children with v2 parents
  • Server then enables v2 pushes to it, the next change with v2 will upgrade all future changes
  • Upgrade during exchange v1->v2? Maybe not needed?
  • Command to downgrade from v1->v2 if you get ‘infected’ with the virus should be pretty easy.
  • flat-hashing a tree manifest would be more difficult than it might seem at first, because parent revisions
  • A new challenger appears! (4th use case?)
  • Matrix: flat-right-now vs. flat-with-subdir-hashes vs. tree manifests, manifestv1 vs. manifestv2, hashv1 vs. hashv2
    • Are deltas going to be broken in any of these?
    • Manifest Feature Maxtrix
    • So we’re thinking implement 6, 8/9, 14 on the way to 17, benchmark them, see if the benefits make it so that implementing the conversion-during-exchange makes sense.
      • benchmarks need to consider clone time, server cpu usage, on-disk size
      • 6=14 and 8=17 if we don’t care about breaking hashes, 8=9 if we don’t care about exchange
  • Client version announcement (User-Agent string?)
    • As a ‘backport extension’?
    • Include hg version, extensions? python version? platform?

Day 2

Google wants new tree-structure manifests.

It’d be nice to not break old clients. Can compute old format hash for tree manifest on disk.

Three use cases:

  1. mozilla-central today
  2. Google soon
    • Never accept v1 manifests, ever.
  3. Transition from 1 to 2 case
    • (~2 years out, needs time for clients to upgrade naturally)
  4. prevention use case
    • Implementation-wise, this really means you don’t set the schema change flag on the server.
    • Idea: server could rewrite as v1 when receiving push using v2, tell client (using bundle2)

Two flags:

  1. Store tree manifests locally but use old hashing
    • Transcode to old manifest format over the wire
    • store old hash in the changelog entry
  2. Break the schema
    • allow new hashing scheme to be recorded in changelog
    • exchange the new revlogs

MAY enforce a changeset schema change when we do flag 2? Not sure if it really matters.

Layout v2: orthogonal from all of these concerns?

  • Puts file hashes on separate lines for compression benefits