Recent changes to this wiki:
much better
diff --git a/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn index 00048cb64e..5e06039132 100644 --- a/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn +++ b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn @@ -1,72 +1,93 @@ -# Acquaintances: Sharing Files through Connected Projects +[[!meta author="Spencer"]] + +# Friends: Sharing Files through Connected Projects I often connect repos together during my scientific work, in which I like to use the [YODA (Datalad)](https://handbook.datalad.org/en/latest/_images/dataset_modules.svg) standard of connecting related projects via submodules. However, I've recently found that sometimes I have to connect an entire repo to, say, a paper just to use one resource. For the sake of provenance, this connection is essential, but it feels extremely inefficient and unscalable to have one repo filled with submodules just for individual files. -For these specific instances, I'm devising an alternative solution: acquaintance repos. +For these specific instances, I'm devising an alternative solution: friend repos. -## Acquaintances are Unrelated Repos +## Friends are Unrelated Repos -In general, an acquaintance is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) +In general, a friend is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) This definition requires upholding some technical details: -1. Acquaintances should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` -1. Acquaintances don't need to know about *all* files in the acquaintance repo (neither in a git sense or annex sense), just the files used. Therefore `git annex filter-branch` is a bit overkill, but could be done manually via selecting exactly the keys needed. +1. Friends should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` +1. Friends don't need to know about *all* files in the friend repo (neither their history (git) or key logs (annex)), they just the files they use. Therefore while `git annex filter-branch` could be used to filter for just the files needed, it is a bit overkill. ## Solution - A Special Remote with Custom Groups (`gx` is short for `git annex`) -Define a special repo that points to the primary storage location for the acquaintance repo. -I like to define it with a name like `acq.X` so it's obvious by inspection that it's an acquaintance. -Other metadata also tells you this (`gx group acq.X` will list `acquaintance`, or something could be added to the description), +Define a special repo that points to the primary storage location for the friend repo. +I like to define it with a name like `fri.X` so it's obvious by inspection that it's an friend. +Other metadata also tells you this (`gx group fri.X` will list `friend`, or something could be added to the description), but being in the name makes it clear especially for e.g. `gx list`. ### Depot: Primary Storage The depot is where a repo stores its *own* stuff. This prevents others' stuff from being duplicated into the referencing repo. -For those familiar with the `client` group, `depot`s are just clients with acquaintances replacing archives. +For those familiar with the `client` group, `depot`s are just clients with friends replacing archives. + +```bash +gx groupwanted depot "(include=* and (not (copies=friend:1))) or approxlackingcopies=1" +``` + +#### Client Replacement Version + +If you want to be able to use the assistant or archives, here's a version that can stand in for `client`: -`gx groupwanted depot "(include=* and (not (copies=acquaintance:1))) or approxlackingcopies=1"` +```bash +gx groupwanted depot "(include=* and ((exclude=*/archive/* and exclude=archive/*) or (not (copies=archive:1 or copies=smallarchive:1 or copies=friend:1)))) or approxlackingcopies=1" +``` -### Acquaintance +### Friend: Related Repos -The acquaintance is the source for stuff the current repo references. +The friend is the source for stuff the current repo references. Therefore, it doesn't need to be stored by the repo (i.e. in its depot) -`gx groupwanted acquaintance present` +```bash +gx groupwanted friend present +``` ### Finishing Up -To actually register where acquaintance files are, the ideal way is `gx fsck`. +To actually register where friend files are, the ideal way is `gx fsck`. This is better than e.g. `gx filter-branch` mentioned above because it's automatic. The default behavior of `fsck`, like other annex commands, is to check against files *in the current worktree*, so it will only populate the metadata for a special remote about the files the current repo is trained to care about. -`gx fsck -f acq.X -J 10` +```bash +gx fsck -f fri.X --fast -J 10 +``` -This may be a bit slow initially because it has to check each file in the worktree by seeking the remote, downloading known files, and verifying their hashes before they're registered as present in the new acquaintance. +Without `--fast`, the process will be slower as it verifies hashes by downloading files. In short the process involves: -1. For every external file desired by a repo: - 1. Copy the file (or a symlink) to the current repo and track it with annex - 1. Define a new special remote `acq.X` pointing to the depot/storage location for the file from the acquaintance repo. - 1. Assign the special remote with group `acquaintance` - 1. Assign any storage locations for the current remote with group `depot` - 1. Run `gx fsck -f acq.X` to populate the new special remote's contents relative to the current repo's worktree/branch - 1. Run `gx sync` if desired. The result should be files present in the current repo (if desired), and only in the acquaintance but not the depot(s). - 1. Now, the acquaintance acts as a link back to the origin for referenced files without duplication or having to add the entire acquaintance as a submodule! +1. For every repo that wants a friend: + 1. Define the group `friend` with its `groupwanted` rule (above for easy copying) + 1. Define the group `depot` with its `groupwanted` rule (above for easy copying) + 1. Set existing depots to use the `depot` group and have `groupwanted` as their `wanted` rule +1. For every friend: + 1. Define a new special remote `fri.X` pointing to the depot/storage location for friend repo. + 1. Assign the special remote with group `friend` and ensure it has `groupwanted` as its `wanted` rule +1. For every batch of files added from a friend: + 1. Copy the files (or symlinks) and track them with annex + 1. Run the `gx fsck` above to update the friend with the new files + 1. Run `gx sync` if desired. + 1. The result should be files present in the friend (and maybe the current), but not the depot(s). + 1. Now, the friend tells us where a file came from without having to add the entire friend as a submodule! ## FAQ/Open Questions -1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses acquaitances/depots? - 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. +1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses friend/depots? + 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. 1. Is there a way to get CLI autocomplete to suggest custom groups? - 1. Not sure yet. -1. Will this play well with standard groups and the assistant, especially if `client`s and `archive`s are used? - 1. Probably not, I don't use the assistant, but I suspect if one wanted to they'd have to define depots as clients with the acquantaince logic added instead of substituted for archives. + 1. I don't think there's support for this yet: only the standard groups are suggested in my zsh/omz setup. +1. Is this a replacement for Datalad datasets? + 1. I think of this as a tool to use alongside datasets. Datalad datasets are great when one project depends on the entirety of another (like a technical paper on an analysis) while this technique is better for collecting files from many projects under one umbrella (like a Thesis, which coincidentally, is what I'm developing this for). + 1. This also helps separate the ideas of storage (where files live) and referencing (how files are used). When I originally started using datasets, I had one special repo for each repo since I figured each repo has to have its own unique remote for git in whatever Github/Organization/Team the project belongs to anyway. Now, this is motivating me to consider how to rationally store contents for projects that share some commonality (a collaboration, an experimental phase, a taskforce, a super-repo as a parent). In this way, I can maintain a provenance record while minimizing the number of clones and remotes I need to maintain. -<!-- Work in progress! Feel free to leave comments like this if you have questions about the final idea once I finish it. --> <!-- Learning in Public: I've only just begun to use this for myself and am eliciting feedback and fleshing it out by describing it here (Feynmann Technique Style) -->
rename tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn to tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn similarity index 100% rename from doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn rename to doc/tips/Friends_-_Connecting_Projects_to_Share_Files.mdwn diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment b/doc/tips/Friends_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment similarity index 100% rename from doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment rename to doc/tips/Friends_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment
comment
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment new file mode 100644 index 0000000000..f58b964892 --- /dev/null +++ b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files/comment_1_8abe6074c55f81ee3643b508e742c6cd._comment @@ -0,0 +1,7 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-10-06T13:31:33Z" + content=""" +Passing --fast to fsck will prevent it needing to download the files. +"""]]
new idea, work in progress
diff --git a/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn new file mode 100644 index 0000000000..00048cb64e --- /dev/null +++ b/doc/tips/Acquaintances_-_Connecting_Projects_to_Share_Files.mdwn @@ -0,0 +1,72 @@ +# Acquaintances: Sharing Files through Connected Projects + +I often connect repos together during my scientific work, in which I like to use the [YODA (Datalad)](https://handbook.datalad.org/en/latest/_images/dataset_modules.svg) standard of connecting related projects via submodules. However, I've recently found that sometimes I have to connect an entire repo to, say, a paper just to use one resource. For the sake of provenance, this connection is essential, but it feels extremely inefficient and unscalable to have one repo filled with submodules just for individual files. + +For these specific instances, I'm devising an alternative solution: acquaintance repos. + +## Acquaintances are Unrelated Repos + +In general, an acquaintance is a repo whose *history* (branches, worktree, commits) is not relevant to the current repo, but is the origin for some files that the current repo uses. This is unlike *clones* (where everything is related), *parents/children* (where the entire child is derived or related to the parent, e.g. like superproject team repos and their children), or other [groups](https://git-annex.branchable.com/preferred_content/standard_groups/) defined by git-annex (archives, sources, etc.) + +This definition requires upholding some technical details: + +1. Acquaintances should **never sync**. This precludes defining them as normal git remotes unless you are very dilligent about undefining `remote.<name>.fetch` and setting `remote.<name>.sync=false` +1. Acquaintances don't need to know about *all* files in the acquaintance repo (neither in a git sense or annex sense), just the files used. Therefore `git annex filter-branch` is a bit overkill, but could be done manually via selecting exactly the keys needed. + +## Solution - A Special Remote with Custom Groups + +(`gx` is short for `git annex`) + +Define a special repo that points to the primary storage location for the acquaintance repo. +I like to define it with a name like `acq.X` so it's obvious by inspection that it's an acquaintance. +Other metadata also tells you this (`gx group acq.X` will list `acquaintance`, or something could be added to the description), +but being in the name makes it clear especially for e.g. `gx list`. + +### Depot: Primary Storage + +The depot is where a repo stores its *own* stuff. +This prevents others' stuff from being duplicated into the referencing repo. +For those familiar with the `client` group, `depot`s are just clients with acquaintances replacing archives. + +`gx groupwanted depot "(include=* and (not (copies=acquaintance:1))) or approxlackingcopies=1"` + +### Acquaintance + +The acquaintance is the source for stuff the current repo references. +Therefore, it doesn't need to be stored by the repo (i.e. in its depot) + +`gx groupwanted acquaintance present` + +### Finishing Up + +To actually register where acquaintance files are, the ideal way is `gx fsck`. +This is better than e.g. `gx filter-branch` mentioned above because it's automatic. +The default behavior of `fsck`, like other annex commands, is to check against files *in the current worktree*, +so it will only populate the metadata for a special remote about the files the current repo is trained to care about. + +`gx fsck -f acq.X -J 10` + +This may be a bit slow initially because it has to check each file in the worktree by seeking the remote, downloading known files, and verifying their hashes before they're registered as present in the new acquaintance. + +In short the process involves: + +1. For every external file desired by a repo: + 1. Copy the file (or a symlink) to the current repo and track it with annex + 1. Define a new special remote `acq.X` pointing to the depot/storage location for the file from the acquaintance repo. + 1. Assign the special remote with group `acquaintance` + 1. Assign any storage locations for the current remote with group `depot` + 1. Run `gx fsck -f acq.X` to populate the new special remote's contents relative to the current repo's worktree/branch + 1. Run `gx sync` if desired. The result should be files present in the current repo (if desired), and only in the acquaintance but not the depot(s). + 1. Now, the acquaintance acts as a link back to the origin for referenced files without duplication or having to add the entire acquaintance as a submodule! + +## FAQ/Open Questions + +1. Is there a way to define the custom groups globally, or will I have to re-define special groups in every repo that uses acquaitances/depots? + 1. Not sure yet. I wonder where custom groups could be defined globally? Maybe in the user `.gitconfig`. +1. Is there a way to get CLI autocomplete to suggest custom groups? + 1. Not sure yet. +1. Will this play well with standard groups and the assistant, especially if `client`s and `archive`s are used? + 1. Probably not, I don't use the assistant, but I suspect if one wanted to they'd have to define depots as clients with the acquantaince logic added instead of substituted for archives. + +<!-- Work in progress! Feel free to leave comments like this if you have questions about the final idea once I finish it. --> +<!-- Learning in Public: I've only just begun to use this for myself and am eliciting feedback and fleshing it out by describing it here (Feynmann Technique Style) -->
Added a comment: My config works now
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment new file mode 100644 index 0000000000..2055ee769c --- /dev/null +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add/comment_1_132d155d5445745e5ee086370be48aad._comment @@ -0,0 +1,30 @@ +[[!comment format=mdwn + username="incogshift" + avatar="http://cdn.libravatar.org/avatar/fe527f5047693f6657cd03a6893da975" + subject="My config works now" + date="2025-10-04T08:05:00Z" + content=""" +I have `.gitattributes`: + +``` +* annex.largefiles=nothing filter=annex +*.pdf annex.largefiles=anything filter=annex +``` + +and git config: + +``` +[annex] + gitaddtoannex = true +``` + +Using `git add` now adds it to annex. This can be confirmed with + +``` +git annex info file.pdf +``` + +The output should show `present = true` at the end. If it wasn't added to annex, the output would show `fatal: Not a valid object name file.pdf`. + +And it seems that, by default, the files are stored in the working tree in their unlocked state. So `git add` doesn't replace the file with a symlink unlike `git annex add` +"""]]
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index 27c94248c1..128341f4ca 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -14,3 +14,18 @@ My config is the one below: *.pptx annex.largefiles=anything *.docx annex.largefiles=anything ``` + +I'm using NixOS. My git annex version info is below: + +``` +git annex version +git-annex version: 10.20250630 +build flags: Assistant Webapp Pairing Inotify DBus DesktopNotify TorrentParser MagicMime Servant Feeds Testsuite S3 WebDAV +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.8.4 http-client-0.7.19 persistent-sqlite-2.13.3.0 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: linux x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +local repository version: 10 +```
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index c43d41edda..27c94248c1 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -1,4 +1,4 @@ -I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. +I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large and small files as intended. My config is the one below:
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn index b357fcf5a7..c43d41edda 100644 --- a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -1,7 +1,8 @@ -I set up annex.largefiles in my global .gitattributes config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. +I set up `annex.largefiles` in my global `.gitattributes` config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. My config is the one below: +``` * annex.largefiles=nothing *.pdf annex.largefiles=anything *.mp4 annex.largefiles=anything @@ -12,3 +13,4 @@ My config is the one below: *.DOC annex.largefiles=anything *.pptx annex.largefiles=anything *.docx annex.largefiles=anything +```
diff --git a/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn new file mode 100644 index 0000000000..b357fcf5a7 --- /dev/null +++ b/doc/forum/annex.largefiles_doesn__39__t_work_for_git_add.mdwn @@ -0,0 +1,14 @@ +I set up annex.largefiles in my global .gitattributes config. But git add doesn't add the defined large files to annex. But git annex works with large files and small as intended. + +My config is the one below: + +* annex.largefiles=nothing +*.pdf annex.largefiles=anything +*.mp4 annex.largefiles=anything +*.mp3 annex.largefiles=anything +*.mkv annex.largefiles=anything +*.odt annex.largefiles=anything +*.wav annex.largefiles=anything +*.DOC annex.largefiles=anything +*.pptx annex.largefiles=anything +*.docx annex.largefiles=anything
comment
diff --git a/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment b/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment new file mode 100644 index 0000000000..73e3afccec --- /dev/null +++ b/doc/todo/very_confusing_name_annex.assistant.allowunlocked/comment_1_b7ad0090e29776c61babbc7bf0ccd684._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-10-02T17:32:52Z" + content=""" +I think that "annex.assistant.allowlocked" would be as confusing, like you +say the user would then have to RTFM to realize that they need to use +annex.addunlocked to configure it, and that it doesn't cause files to be +locked by default. + +To me, "treataddunlocked" is vague. Treat it as what? +"allowaddunlocked" would be less vague since it does get the (full) +name of the other config in there, so says it's allowing use of +the other config. + +I agree this is a confusing name, and I wouldn't mind changing it, but I +don't think it warrants an entire release to do that. So there would be +perhaps a month for people to start using the current name. If this had +come up in the 2 weeks between implementation and release I would have +changed it, but at this point it starts to need a backwards compatability +transition to change it, and I don't know if the minor improvement of +"allowaddunlocked" is worth that. +"""]]
Added a comment
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment new file mode 100644 index 0000000000..54f8af3378 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_2_33575c4a6477e3384a16533ff8b258ee._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="caleb@2b0d6f0eabf955cc8fd04c634b09f0ca4aad9233" + nickname="caleb" + avatar="http://cdn.libravatar.org/avatar/1d84382865c6c3378c04a35348fdfa07" + subject="comment 2" + date="2025-10-01T22:15:14Z" + content=""" +Thank you for the fix, that built just fine and I've successfully bumped the Arch Linux package to 20250929. +"""]]
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment new file mode 100644 index 0000000000..268f8cab57 --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_8_b545d29519e57fbc2d563ce6d9aafdb7._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 8" + date="2025-10-01T20:18:43Z" + content=""" +FTR: apparently sshfs is based on sftp and that one provides no means to access original inode. Not yet sure on what then it could say about stability of inode across remounts/as whether sensible to rely on it. Useful ref with pointers [sshfs/issues/109#issuecomment-2755824670](https://github.com/libfuse/sshfs/issues/109#issuecomment-2755824670) +"""]]
complaining about choice of variable
diff --git a/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn b/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn new file mode 100644 index 0000000000..e58d392424 --- /dev/null +++ b/doc/todo/very_confusing_name_annex.assistant.allowunlocked.mdwn @@ -0,0 +1,8 @@ +Thank you for addressing that [todo](https://git-annex.branchable.com/todo/allow_configuring_assistant_to_add_files_locked/)! + +But I must say though that the choice of `annex.assistant.allowunlocked` is very confusing! Without careful RTFM it suggests that by default assistant **does not** `allowunlocked`, thus using `locked` and thus to the **opposite** effect of the default behavior. + +Since really it instructs assistant to consider `addunlocked`, then I would have named it like `treataddunlocked` or alike. +Or the smallest change to make it semantically sensible would have been to remove `un` from it and make `annex.assistant.allowlocked` thus allowing for `locked` files in general, which would then in reality (after RTFM) mean using `addunlocked` config. + +Just wanted to check if you stick to current choice before I start making use of it!
comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment new file mode 100644 index 0000000000..1036bbd920 --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_7_9716fc56ccfb622c964a64b37c1c5fdc._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 7""" + date="2025-10-01T15:35:53Z" + content=""" +I wonder how sshfs manages stable inodes that differ from the actual ones? +But if it's really reliably stable, it would be ok to use it with the +directory special remote. + +Extending the external special remote interface to support +[import](https://git-annex.branchable.com/design/external_special_remote_protocol/export_and_import_appendix/#index1h2) +would let you roll your own special remote, that could use ssh with +rsync or whatever. + +The current design for that tries to support both import and export, but +noone has yet stepped up to the plate to try to implement a special remote +that supports both safely. So I am leaning toward thinking that it would be +a good idea to make the external special remote interface support *only* +import (or export) for a given external special remote, but not both. + +Then would become pretty easy to make your own special remote that +implements import only. Using whatever ssh commands make sense for the +server. +"""]] diff --git a/doc/todo/importtree_only_remotes.mdwn b/doc/todo/importtree_only_remotes.mdwn index 8f140c9450..2f9174b670 100644 --- a/doc/todo/importtree_only_remotes.mdwn +++ b/doc/todo/importtree_only_remotes.mdwn @@ -32,7 +32,9 @@ the wrong content. (So the remote should have retrievalSecurityPolicy = RetrievalVerifiableKeysSecure to make downloads be verified well enough.) I said this would not use a ContentIdentifier, but it seems it needs some -simple form of ContentIdentifier, which could be just an mtime. +simple form of ContentIdentifier, which could be just an mtime +(but mtime or mtime+size is not able to detect swaps of 2 files that share +both; using inode or something like that is better). Without any ContentIdentifier, it seems that each time `git annex import --from remote` is run, it would need to re-download all files from the remote, because it would have no way of knowing
followup
diff --git a/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment b/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment new file mode 100644 index 0000000000..5e8d14000d --- /dev/null +++ b/doc/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/comment_11_48d03d7cc1a5e007d3d06d9753d467ff._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 11""" + date="2025-10-01T15:32:45Z" + content=""" +This did get implemented, `git config annex.assistant.allowunlocked true` +and that will make it use your `annex.addunlocked` setting. +"""]]
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment new file mode 100644 index 0000000000..da3609a6fd --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_6_abc34860aed11d274a91d3134b6a7040._comment @@ -0,0 +1,34 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 6" + date="2025-10-01T13:09:00Z" + content=""" +quick check -- according to `ls` - original inodes are not mapped but some are given and persist across remounts: + +``` +❯ ls -li /tmp/glances-root.log ~/.emacs ~/20250807-15forzabava.pdf + 132280 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /home/yoh/.emacs -> .etc/emacs/.emacs +152278557 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /home/yoh/20250807-15forzabava.pdf + 34 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/glances-root.log + +❯ sshfs localhost:/ /tmp/localhost + +❯ ls -li /tmp/localhost{/tmp/glances-root.log,/home/yoh/{.emacs,20250807-15forzabava.pdf}} + 6 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /tmp/localhost/home/yoh/.emacs -> .etc/emacs/.emacs +10 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /tmp/localhost/home/yoh/20250807-15forzabava.pdf + 3 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/localhost/tmp/glances-root.log + +❯ fusermount -u /tmp/localhost + +❯ sshfs localhost:/ /tmp/localhost + +❯ ls -li /tmp/localhost{/tmp/glances-root.log,/home/yoh/{.emacs,20250807-15forzabava.pdf}} + 6 lrwxrwxrwx 1 yoh yoh 17 Nov 11 2014 /tmp/localhost/home/yoh/.emacs -> .etc/emacs/.emacs +10 -rw-rw-r-- 1 yoh yoh 207101 Aug 7 10:30 /tmp/localhost/home/yoh/20250807-15forzabava.pdf + 3 -rw-r--r-- 1 root root 1165 Oct 1 08:43 /tmp/localhost/tmp/glances-root.log + +``` + +ok, if not `sshfs` and not `rsync` -- any other way you see? e.g. could it be easily setup for some `git` with ssh URL type \"special\" remote? ;-) +"""]]
comments
diff --git a/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment b/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment new file mode 100644 index 0000000000..7ca4788d07 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_4_766ce3ab6c4ff368ec8e06e6c6f6aa8e._comment @@ -0,0 +1,23 @@ +[[!comment format=mdwn + username="joey" + subject="""git-annex activity""" + date="2025-09-30T14:29:54Z" + content=""" +Copying a related idea from @nobodyinperson on [[todo/remove_webapp]]: + +Furthermore, a command like `git annex activity` that goes arbitrarily far back in time and statically (non-live) lists recent activities like: + +- yesterday 23:32: remote1 downloaded 5 files (45MB) +- today 10:45: you modified file `document.txt` (10MB) +- today 10:46: you uploaded file `document.txt` (from today 10:45) to remote1, remote2 and remote3 +- today 12:35: Fred McGitFace modified file `document.txt` (12MB) and uploaded to remote2 +- ... + +Basically a human-readable (or as JSON), chronological log of things that happened in the repo. This is a superpower of git-annex: all this information is available as far back as one wants, we just don't have a way to access it nicely. `git log` and `git annex log` exist, but they are too specific, too broad or a bit hard to parse on their own. For example: + +- `git annex activity --since=\"2 weeks ago\" --include='*.doc'` would list things (who committed, which remote received it, etc.) that happened in the last two weeks to *.doc files +- `git annex activity --only-annex --in=remote2` would list recent annex operations (in the `git-annex` branch only) of remote2 +- `git annex activity --only-changes --largerthan=10MB` would list recent file changes (additions, modifications, deletions, etc., in `git log` only) + +This `git annex assistant-log` and `git annex activity` would be a very nice feature to showcase git-annex's power (which other file syncing tool can to this? 🤔) and also solve [[todo/Recent_remote_activities]]. +"""]] diff --git a/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment b/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment new file mode 100644 index 0000000000..ca7d2061b6 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_5_1f4f43b32af276ef3b3db54fc2cb33f7._comment @@ -0,0 +1,11 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-30T14:31:59Z" + content=""" +A `git-annex activity` (or `git-annex log`) could also optionally stream live +activity as it is happening. Eg, when a transfer is started it could display +the start, and then later the end. That would be easy to build with what's +in git-annex already. The assistant already uses the transfer logs that way, +using inotify to notice changes. +"""]] diff --git a/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment b/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment new file mode 100644 index 0000000000..7f06dd5337 --- /dev/null +++ b/doc/todo/Recent_remote_activities/comment_6_9e686c20ccd2c81f72f479441ca57698._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: git-annex activity""" + date="2025-09-30T14:34:50Z" + content=""" +> `git annex activity --since="2 weeks ago" --include='*.doc' + +This is essentially the same as `git-annex log` with a path. It also +supports --since and --json. The difference I guess is the idea to also +include information about git commits of the files, not only git-annex +location changes. That would complicate the output, and apparently +`git-annex log`'s output is too hard to parse already. So a design for a +better output would be needed. + +> `git annex activity --only-annex --in=remote2` + +This is the same as `git-annex log --all` with the output filtered to only +list a given remote. (`--in` does not influence `--all` currently). + +> `git annex activity --only-changes --largerthan=10MB` + +Can probably be accomplished with `git log` with some +-S regexp. +"""]] diff --git a/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment b/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment new file mode 100644 index 0000000000..ec4a8b0ae1 --- /dev/null +++ b/doc/todo/remove_webapp/comment_4_d80ec1b3534ffa514df926925a0105f7._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-09-30T14:22:09Z" + content=""" +git-annex does support desktop notifications of file uploads/downloads, +via --notify-start and --notify-finish. (When built with dbus support.) +That can be used with the assistant w/o webapp to keep a desktop user +informed about what is going on. +"""]] diff --git a/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment b/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment new file mode 100644 index 0000000000..f1225bb40d --- /dev/null +++ b/doc/todo/remove_webapp/comment_5_75c22d9f3a84c259084468c03f5735bb._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-30T14:55:25Z" + content=""" +I've copied the `git-annex activity` idea over to +[[todo/Recent_remote_activities]] so it doesn't get lost. + +I don't think it makes sense to make that a blocker for removing the webapp +though. That would only let an advanced user build some kind of activity +display, doesn't address the needs of most users of the webapp. +"""]]
Added a comment: Fixed in 20050929
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment new file mode 100644 index 0000000000..00264d5eba --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_7_4d6559666e8b53957ed93ffa5928cb00._comment @@ -0,0 +1,26 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Fixed in 20050929" + date="2025-09-29T21:54:14Z" + content=""" +Thanks for the very quick turn around on a new release! + +Conveniently HomeBrew also turned around building the new release quickly (I suspect it might be one of the packages in their CI for auto upgrade now), so I've been able to test the HomeBrew build of 20050929. + +20250929 seems to be working correctly to download podcast feeds, parse them, and download the media attachments as before. + +Ewen + +PS: Test example below. But also worked for my regular podcast downloads, which were failing with 20250926. + +``` +ewen@basadi:/tmp/retest$ TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}' +ewen@basadi:/tmp/retest$ git annex importfeed --relaxed --template=\"${TEMPLATE}\" \"https://risky.biz/feeds/risky-business\" +importfeed gathering known urls ok +importfeed https://risky.biz/feeds/risky-business (\"Risky Business\") ok +addurl https://dts.podtrac.com/redirect.mp3/media3.risky.biz/RB808.mp3 (to archive/Risky_Business/Risky_Business__808_--_Insane_megabug_in_Entra_left_all_tenants_exposed.mp3) ok +addurl https://dts.podtrac.com/redirect.mp3/media3.risky.biz/RB807.mp3 (to archive/Risky_Business/Risky_Business__807_--_Shai-Hulud_npm_worm_wreaks_old-school_havoc.mp3) ok +... +``` +"""]]
Revert "webapp: Remove support for local pairing"
This reverts commit 8ea6d7acc548cb35b4905c9c663e8a7de66ac752.
Temporarily, until builds finish for today's release.
This reverts commit 8ea6d7acc548cb35b4905c9c663e8a7de66ac752.
Temporarily, until builds finish for today's release.
diff --git a/Assistant.hs b/Assistant.hs index cd81895861..911ebd33d3 100644 --- a/Assistant.hs +++ b/Assistant.hs @@ -40,6 +40,9 @@ import Assistant.Threads.Glacier #ifdef WITH_WEBAPP import Assistant.WebApp import Assistant.Threads.WebApp +#ifdef WITH_PAIRING +import Assistant.Threads.PairListener +#endif #else import Assistant.Types.UrlRenderer #endif @@ -152,6 +155,11 @@ startDaemon assistant foreground startdelay cannotrun listenhost listenport star then webappthread else webappthread ++ [ watch commitThread +#ifdef WITH_WEBAPP +#ifdef WITH_PAIRING + , assist $ pairListenerThread urlrenderer +#endif +#endif , assist pushThread , assist pushRetryThread , assist exportThread diff --git a/Assistant/Pairing/MakeRemote.hs b/Assistant/Pairing/MakeRemote.hs new file mode 100644 index 0000000000..f4468bc07c --- /dev/null +++ b/Assistant/Pairing/MakeRemote.hs @@ -0,0 +1,98 @@ +{- git-annex assistant pairing remote creation + - + - Copyright 2012 Joey Hess <id@joeyh.name> + - + - Licensed under the GNU AGPL version 3 or higher. + -} + +module Assistant.Pairing.MakeRemote where + +import Assistant.Common +import Assistant.Ssh +import Assistant.Pairing +import Assistant.Pairing.Network +import Assistant.MakeRemote +import Assistant.Sync +import Config.Cost +import Config +import qualified Types.Remote as Remote + +import Network.Socket +import qualified Data.Text as T + +{- Authorized keys are set up before pairing is complete, so that the other + - side can immediately begin syncing. -} +setupAuthorizedKeys :: PairMsg -> OsPath -> IO () +setupAuthorizedKeys msg repodir = case validateSshPubKey $ remoteSshPubKey $ pairMsgData msg of + Left err -> giveup err + Right pubkey -> do + absdir <- absPath repodir + unlessM (liftIO $ addAuthorizedKeys True absdir pubkey) $ + giveup "failed setting up ssh authorized keys" + +{- When local pairing is complete, this is used to set up the remote for + - the host we paired with. -} +finishedLocalPairing :: PairMsg -> SshKeyPair -> Assistant () +finishedLocalPairing msg keypair = do + sshdata <- liftIO $ installSshKeyPair keypair =<< pairMsgToSshData msg + {- Ensure that we know the ssh host key for the host we paired with. + - If we don't, ssh over to get it. -} + liftIO $ unlessM (knownHost $ sshHostName sshdata) $ + void $ sshTranscript + [ sshOpt "StrictHostKeyChecking" "no" + , sshOpt "NumberOfPasswordPrompts" "0" + , "-n" + ] + (genSshHost (sshHostName sshdata) (sshUserName sshdata)) + ("git-annex-shell -c configlist " ++ T.unpack (sshDirectory sshdata)) + Nothing + r <- liftAnnex $ addRemote $ makeSshRemote sshdata + repo <- liftAnnex $ Remote.getRepo r + liftAnnex $ setRemoteCost repo semiExpensiveRemoteCost + syncRemote r + +{- Mostly a straightforward conversion. Except: + - * Determine the best hostname to use to contact the host. + - * Strip leading ~/ from the directory name. + -} +pairMsgToSshData :: PairMsg -> IO SshData +pairMsgToSshData msg = do + let d = pairMsgData msg + hostname <- liftIO $ bestHostName msg + let dir = case remoteDirectory d of + ('~':'/':v) -> v + v -> v + return SshData + { sshHostName = T.pack hostname + , sshUserName = Just (T.pack $ remoteUserName d) + , sshDirectory = T.pack dir + , sshRepoName = genSshRepoName hostname (toOsPath dir) + , sshPort = 22 + , needsPubKey = True + , sshCapabilities = [GitAnnexShellCapable, GitCapable, RsyncCapable] + , sshRepoUrl = Nothing + } + +{- Finds the best hostname to use for the host that sent the PairMsg. + - + - If remoteHostName is set, tries to use a .local address based on it. + - That's the most robust, if this system supports .local. + - Otherwise, looks up the hostname in the DNS for the remoteAddress, + - if any. May fall back to remoteAddress if there's no DNS. Ugh. -} +bestHostName :: PairMsg -> IO HostName +bestHostName msg = case remoteHostName $ pairMsgData msg of + Just h -> do + let localname = h ++ ".local" + addrs <- catchDefaultIO [] $ + getAddrInfo Nothing (Just localname) Nothing + maybe fallback (const $ return localname) (headMaybe addrs) + Nothing -> fallback + where + fallback = do + let a = pairMsgAddr msg + let sockaddr = case a of + IPv4Addr addr -> SockAddrInet (fromInteger 0) addr + IPv6Addr addr -> SockAddrInet6 (fromInteger 0) 0 addr 0 + fromMaybe (showAddr a) + <$> catchDefaultIO Nothing + (fst <$> getNameInfo [] True False sockaddr) diff --git a/Assistant/Pairing/Network.hs b/Assistant/Pairing/Network.hs new file mode 100644 index 0000000000..62a4ea02e8 --- /dev/null +++ b/Assistant/Pairing/Network.hs @@ -0,0 +1,132 @@ +{- git-annex assistant pairing network code + - + - All network traffic is sent over multicast UDP. For reliability, + - each message is repeated until acknowledged. This is done using a + - thread, that gets stopped before the next message is sent. + - + - Copyright 2012 Joey Hess <id@joeyh.name> + - + - Licensed under the GNU AGPL version 3 or higher. + -} + +module Assistant.Pairing.Network where + +import Assistant.Common +import Assistant.Pairing +import Assistant.DaemonStatus +import Utility.ThreadScheduler +import Utility.Verifiable + +import Network.Multicast +import Network.Info +import Network.Socket +import qualified Network.Socket.ByteString as B +import qualified Data.ByteString.UTF8 as BU8 +import qualified Data.Map as M +import Control.Concurrent + +{- This is an arbitrary port in the dynamic port range, that could + - conceivably be used for some other broadcast messages. + - If so, hope they ignore the garbage from us; we'll certainly + - ignore garbage from them. Wild wild west. -} +pairingPort :: PortNumber +pairingPort = 55556 + +{- Goal: Reach all hosts on the same network segment. + - Method: Use same address that avahi uses. Other broadcast addresses seem + - to not be let through some routers. -} +multicastAddress :: AddrClass -> HostName +multicastAddress IPv4AddrClass = "224.0.0.251" +multicastAddress IPv6AddrClass = "ff02::fb" + +{- Multicasts a message repeatedly on all interfaces, with a 2 second + - delay between each transmission. The message is repeated forever + - unless a number of repeats is specified. + - + - The remoteHostAddress is set to the interface's IP address. + - + - Note that new sockets are opened each time. This is hardly efficient, + - but it allows new network interfaces to be used as they come up. + - On the other hand, the expensive DNS lookups are cached. + -} +multicastPairMsg :: Maybe Int -> Secret -> PairData -> PairStage -> IO () +multicastPairMsg repeats secret pairdata stage = go M.empty repeats + where + go _ (Just 0) = noop + go cache n = do + addrs <- activeNetworkAddresses + let cache' = updatecache cache addrs + mapM_ (sendinterface cache') addrs + threadDelaySeconds (Seconds 2) + go cache' $ pred <$> n + {- The multicast library currently chokes on ipv6 addresses. -} + sendinterface _ (IPv6Addr _) = noop + sendinterface cache i = void $ tryIO $ (Diff truncated)
webapp: Remove support for local pairing
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
Sponsored-by: unqueued
As a feature only supported by the webapp, and not by git-annex at the
command line, this is by now a very obscure corner of git-annex, and not
one I want to keep maintaining.
It's worth removing it to avoid the security expsure alone. People using
the assistant w/o the webapp probably don't expect it to be listening on
a UDP port for a handrolled protocol, but it was.
The webapp has supported pairing via magic-wormhole since 2016, which
makes a link including between local computers, albeit with the overhead
of tor. That sort of covers the same use case. Of course advanced users
can easily enough add a ssh remote to their repository themselves, using
a hostname on the local network.
Sponsored-by: unqueued
diff --git a/Assistant.hs b/Assistant.hs index 911ebd33d3..cd81895861 100644 --- a/Assistant.hs +++ b/Assistant.hs @@ -40,9 +40,6 @@ import Assistant.Threads.Glacier #ifdef WITH_WEBAPP import Assistant.WebApp import Assistant.Threads.WebApp -#ifdef WITH_PAIRING -import Assistant.Threads.PairListener -#endif #else import Assistant.Types.UrlRenderer #endif @@ -155,11 +152,6 @@ startDaemon assistant foreground startdelay cannotrun listenhost listenport star then webappthread else webappthread ++ [ watch commitThread -#ifdef WITH_WEBAPP -#ifdef WITH_PAIRING - , assist $ pairListenerThread urlrenderer -#endif -#endif , assist pushThread , assist pushRetryThread , assist exportThread diff --git a/Assistant/Pairing/MakeRemote.hs b/Assistant/Pairing/MakeRemote.hs deleted file mode 100644 index f4468bc07c..0000000000 --- a/Assistant/Pairing/MakeRemote.hs +++ /dev/null @@ -1,98 +0,0 @@ -{- git-annex assistant pairing remote creation - - - - Copyright 2012 Joey Hess <id@joeyh.name> - - - - Licensed under the GNU AGPL version 3 or higher. - -} - -module Assistant.Pairing.MakeRemote where - -import Assistant.Common -import Assistant.Ssh -import Assistant.Pairing -import Assistant.Pairing.Network -import Assistant.MakeRemote -import Assistant.Sync -import Config.Cost -import Config -import qualified Types.Remote as Remote - -import Network.Socket -import qualified Data.Text as T - -{- Authorized keys are set up before pairing is complete, so that the other - - side can immediately begin syncing. -} -setupAuthorizedKeys :: PairMsg -> OsPath -> IO () -setupAuthorizedKeys msg repodir = case validateSshPubKey $ remoteSshPubKey $ pairMsgData msg of - Left err -> giveup err - Right pubkey -> do - absdir <- absPath repodir - unlessM (liftIO $ addAuthorizedKeys True absdir pubkey) $ - giveup "failed setting up ssh authorized keys" - -{- When local pairing is complete, this is used to set up the remote for - - the host we paired with. -} -finishedLocalPairing :: PairMsg -> SshKeyPair -> Assistant () -finishedLocalPairing msg keypair = do - sshdata <- liftIO $ installSshKeyPair keypair =<< pairMsgToSshData msg - {- Ensure that we know the ssh host key for the host we paired with. - - If we don't, ssh over to get it. -} - liftIO $ unlessM (knownHost $ sshHostName sshdata) $ - void $ sshTranscript - [ sshOpt "StrictHostKeyChecking" "no" - , sshOpt "NumberOfPasswordPrompts" "0" - , "-n" - ] - (genSshHost (sshHostName sshdata) (sshUserName sshdata)) - ("git-annex-shell -c configlist " ++ T.unpack (sshDirectory sshdata)) - Nothing - r <- liftAnnex $ addRemote $ makeSshRemote sshdata - repo <- liftAnnex $ Remote.getRepo r - liftAnnex $ setRemoteCost repo semiExpensiveRemoteCost - syncRemote r - -{- Mostly a straightforward conversion. Except: - - * Determine the best hostname to use to contact the host. - - * Strip leading ~/ from the directory name. - -} -pairMsgToSshData :: PairMsg -> IO SshData -pairMsgToSshData msg = do - let d = pairMsgData msg - hostname <- liftIO $ bestHostName msg - let dir = case remoteDirectory d of - ('~':'/':v) -> v - v -> v - return SshData - { sshHostName = T.pack hostname - , sshUserName = Just (T.pack $ remoteUserName d) - , sshDirectory = T.pack dir - , sshRepoName = genSshRepoName hostname (toOsPath dir) - , sshPort = 22 - , needsPubKey = True - , sshCapabilities = [GitAnnexShellCapable, GitCapable, RsyncCapable] - , sshRepoUrl = Nothing - } - -{- Finds the best hostname to use for the host that sent the PairMsg. - - - - If remoteHostName is set, tries to use a .local address based on it. - - That's the most robust, if this system supports .local. - - Otherwise, looks up the hostname in the DNS for the remoteAddress, - - if any. May fall back to remoteAddress if there's no DNS. Ugh. -} -bestHostName :: PairMsg -> IO HostName -bestHostName msg = case remoteHostName $ pairMsgData msg of - Just h -> do - let localname = h ++ ".local" - addrs <- catchDefaultIO [] $ - getAddrInfo Nothing (Just localname) Nothing - maybe fallback (const $ return localname) (headMaybe addrs) - Nothing -> fallback - where - fallback = do - let a = pairMsgAddr msg - let sockaddr = case a of - IPv4Addr addr -> SockAddrInet (fromInteger 0) addr - IPv6Addr addr -> SockAddrInet6 (fromInteger 0) 0 addr 0 - fromMaybe (showAddr a) - <$> catchDefaultIO Nothing - (fst <$> getNameInfo [] True False sockaddr) diff --git a/Assistant/Pairing/Network.hs b/Assistant/Pairing/Network.hs deleted file mode 100644 index 62a4ea02e8..0000000000 --- a/Assistant/Pairing/Network.hs +++ /dev/null @@ -1,132 +0,0 @@ -{- git-annex assistant pairing network code - - - - All network traffic is sent over multicast UDP. For reliability, - - each message is repeated until acknowledged. This is done using a - - thread, that gets stopped before the next message is sent. - - - - Copyright 2012 Joey Hess <id@joeyh.name> - - - - Licensed under the GNU AGPL version 3 or higher. - -} - -module Assistant.Pairing.Network where - -import Assistant.Common -import Assistant.Pairing -import Assistant.DaemonStatus -import Utility.ThreadScheduler -import Utility.Verifiable - -import Network.Multicast -import Network.Info -import Network.Socket -import qualified Network.Socket.ByteString as B -import qualified Data.ByteString.UTF8 as BU8 -import qualified Data.Map as M -import Control.Concurrent - -{- This is an arbitrary port in the dynamic port range, that could - - conceivably be used for some other broadcast messages. - - If so, hope they ignore the garbage from us; we'll certainly - - ignore garbage from them. Wild wild west. -} -pairingPort :: PortNumber -pairingPort = 55556 - -{- Goal: Reach all hosts on the same network segment. - - Method: Use same address that avahi uses. Other broadcast addresses seem - - to not be let through some routers. -} -multicastAddress :: AddrClass -> HostName -multicastAddress IPv4AddrClass = "224.0.0.251" -multicastAddress IPv6AddrClass = "ff02::fb" - -{- Multicasts a message repeatedly on all interfaces, with a 2 second - - delay between each transmission. The message is repeated forever - - unless a number of repeats is specified. - - - - The remoteHostAddress is set to the interface's IP address. - - - - Note that new sockets are opened each time. This is hardly efficient, - - but it allows new network interfaces to be used as they come up. - - On the other hand, the expensive DNS lookups are cached. - -} -multicastPairMsg :: Maybe Int -> Secret -> PairData -> PairStage -> IO () -multicastPairMsg repeats secret pairdata stage = go M.empty repeats - where - go _ (Just 0) = noop - go cache n = do - addrs <- activeNetworkAddresses - let cache' = updatecache cache addrs - mapM_ (sendinterface cache') addrs - threadDelaySeconds (Seconds 2) - go cache' $ pred <$> n - {- The multicast library currently chokes on ipv6 addresses. -} - sendinterface _ (IPv6Addr _) = noop - sendinterface cache i = void $ tryIO $ (Diff truncated)
remove old assistant release notes
diff --git a/doc/assistant/release_notes.mdwn b/doc/assistant/release_notes.mdwn deleted file mode 100644 index 6c7c432de4..0000000000 --- a/doc/assistant/release_notes.mdwn +++ /dev/null @@ -1,422 +0,0 @@ -## version 6.20170101 - -XMPP support has been removed from the assistant in this release. - -If your repositories used XMPP to keep in sync, that will no longer -work, and you should enable some other remote to keep them in sync. -A ssh server is one way, or use the new Tor pairing feature. - -## version 5.20140421 - -This release begins to deprecate XMPP support. In particular, if you use -the assistant with a ssh remote that has this version of git-annex -installed, you don't need XMPP any longer to get immediate syncing of -changes. - -## version 5.20140411 - -This release fixes a bug that could cause the assistant to use a *lot* of -CPU, when monthly fscking was set up. - -Automatic upgrading was broken on OSX for previous versions. This has been -fixed, but you'll need to manually upgrade to this version to get it going -again. Workaround: Remove the wget bundled inside the git-annex dmg. - -## version 5.20140221 - -The Windows port of the assistant and webapp is now considered to be beta -quality. There are important missing features (notably Jabber), documented -on [[todo/windows_support]], but the webapp is broadly usable on Windows -now. - -## version 5.20131221 - -There is now a arm [[install/linux_standalone]] build of git-annex, -including the assistant and webapp, -which can be installed on a variety of systems including Raspberry Pi, -Synology NAS, and Google Chromebooks. Details in -[[this forum thread|forum/new_linux_arm_tarball_build]]. - -## version 5.20131213 - -The assistant can now be used on Windows! However, it has known problems, -described in [[todo/windows_support]], and should be considered an -alpha-level preview. - -## version 5.20131127 - -Starting with this version, when git-annex is installed from a build on -this website, it will detect when new versions are available, and allow -easily upgrading. Automatic upgrades can also be configured if desired, -or automatic upgrade checking can be disabled in the preferences page. - -git-annex builds from distributions, like Debian will not automatically -upgrade; use the distribution's package manager for that. However, the -git-annex webapp will also detect when a distribution has upgraded -git-annex and offer to restart the assistant. - -## version 4.20131024 - -This version fixes several different bugs that could cause the webapp to -refuse to create a repository. Several other bugs are also fixed, including -a bug that caused it to not add files on Android. - -New in this release is the ability to use the webapp to set up scheduled -consistency checks of your repositories. Many problems with repositories -are now automatically corrected, and it can even repair damaged git -repositories. - -This is a recommended upgrade. - -## version 4.20131002 - -Now you can use the webapp to set up an encrypted git repository on a -remote ssh server, or on rsync.net, and use it as a live cloud backup. Or, -use the webapp to make an encrypted git repository on a removable drive, -and store it offsite as a secure backup. - -## version 4.20130920 - -This release is the first to support fully encrypted git repositories -stored on removable drives. This can be set up easily using the webapp. - -## version 4.20130909 - -This release fixes a crash that could occur when using XMPP with the -assitant. It has only been seen on OS X so far. The bug is not believed to -be explitable, but upgrading is still recommended. - -## version 4.20130802 - -This release fixes several bugs, including a reversion introduced in the last -version that broke direct mode on Windows, Android, and other crippled -filesystems. It contains a workaround for a bug in recent git pre-releases -that broke handling of filenames containing spaces. -It is a highly recommended upgrade. - -The webapp can now detect repositories that did not finish getting properly set -up, and can recover from one common bug that broke local pairing and remote -ssh server setups on systems using `ssh-agent`. - -## version 4.20130723 - -This release fixes some bugs. Notably it fixes a bug that could result in data -loss when adding a tarball of a git-annex repository to your git-annex -repository. - -Rsync.net have committed to support git-annex and offer a special -discounted rate for git-annex users. -<http://www.rsync.net/products/git-annex-pricing.html> - -## version 4.20130709 - -This release is mostly bug fixes. - -One of the bugs involved setting up rsync remotes on servers other than -rsync.net. The wrong `.ssh/authorized_keys` line was deployed to the -remote server. If you set up a rsync remote with a past release, and it does -not work, you will need to manually edit the `.ssh/authorized_keys` file, -and remove the `command=` forced command. - -## version 4.20130621, 4.20130627 - -These releases mostly consist of bug fixes. - -## version 4.20130601 - -This is a bugfix release, featuring significant XMPP improvements and -more robustness thanks to automated fuzz testing. Recommended upgrade. - -This version changes its XMPP protocol, so it will fail to sync with older -git-annex versions over XMPP. - -## version 4.20130521 - -This is a bugfix release. Recommended upgrade. - -## version 4.20130516 - -This version contains numerous bug fixes, and improvements. - -This is the first release with a fully usable Android app. No command-line -typing needed to set up syncing to your Android phone or tablet! -A few of the more advanced features may not work (or not work reliably) -on Android. The Android app is still beta quality. - -This is also the first release with a Windows port! The Windows port -is in an alpha quality state, and is missing many features. -It does not yet include the assistant. - -## version 4.20130501 - -This version contains numerous bug fixes, and improvements. - -## version 4.20130417 - -This version contains numerous bug fixes, and improvements. - -One bug that was fixed can affect users of gnome-keyring who -have set up remote repositories on ssh servers using the webapp. -The gnome-keyring may load the restricted key that is set up -for that, and make it be used for regular logins to the server; -with the result that you'll get an error message about "git-annex-shell" -when sshing to the server. - -If you experience this problem you can fix it by -moving `.ssh/key.git-annex*` to `.ssh/git-annex/` (creating -that directory first), and edit `.ssh/config` to reflect the new -location of the key. You will also need to restart gnome-keyring. - -Another change relates to files in `archive/` directories. Client repositories -now sync these files between themselves like any other files, until -the files reach an archive repository. Only then are they removed from -the client repositories. So you need to ensure you have at least one -archive repository if you want to use the `archive/` directory feature. - -## version 4.20130323, 4.20130405 - -These versions continue fixing bugs and adding features. - -## version 4.20130314 - -This version makes a great many improvements and bugfixes, and is -a recommended upgrade. - -If you have already used the webapp to locally pair two computers, -a bug caused the paired repository to not be given an appropriate cost. -To fix this, go into the Repositories page in the webapp, and drag the -repository for the locally paired computer to come before any repositories -that it's more expensive to transfer data to. - -## version 4.20130227 - -This release fixes a bug with globbing that broke preferred content expressions. -So, it is a recommended upgrade from the previous release, which introduced (Diff truncated)
add news item for git-annex 10.20250929
diff --git a/doc/news/version_10.20250605.mdwn b/doc/news/version_10.20250605.mdwn deleted file mode 100644 index 5a9016e9f5..0000000000 --- a/doc/news/version_10.20250605.mdwn +++ /dev/null @@ -1,19 +0,0 @@ -git-annex 10.20250605 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * sync: Push the current branch first, rather than a synced branch, - to better support git forges (gitlab, gitea, forgejo, etc.) which - use push-to-create with the first pushed branch becoming the default - branch. - * Added annex.fastcopy and remote.name.annex-fastcopy config setting. - When set, this allows the copy\_file\_range syscall to be used, which - can eg allow for server-side copies on NFS. (For fastest copying, - also disable annex.verify or remote.name.annex-verify.) - * map: Support --json option. - * map: Improve display of remote names. - * When annex.freezecontent-command or annex.thawcontent-command is - configured but fails, prevent initialization. This allows the user to - fix their configuration and avoid crippled filesystem detection - entering an adjusted branch. - * assistant: Avoid hanging at startup when a process has a *.lock file - open in the .git directory. - * Windows: Fix duplicate file bug that could occur when files were - supposed to be moved across devices."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250929.mdwn b/doc/news/version_10.20250929.mdwn new file mode 100644 index 0000000000..4d46ac2cf1 --- /dev/null +++ b/doc/news/version_10.20250929.mdwn @@ -0,0 +1,7 @@ +git-annex 10.20250929 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * enableremote: Allow type= to be provided when it does not change the + type of the special remote. + * importfeed: Fix encoding issues parsing feeds when built with OsPath. + * Fix build with ghc 9.0.2. + * Remove the Servant build flag; always build with support for + annex+http urls and git-annex p2phttp."""]] \ No newline at end of file
Fix build with ghc 9.0.2.
diff --git a/CHANGELOG b/CHANGELOG index aa63a40a0e..a220b171d2 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -5,6 +5,7 @@ git-annex (10.20250926) UNRELEASED; urgency=medium * enableremote: Allow type= to be provided when it does not change the type of the special remote. * importfeed: Fix encoding issues parsing feeds when built with OsPath. + * Fix build with ghc 9.0.2. -- Joey Hess <id@joeyh.name> Thu, 25 Sep 2025 13:36:21 -0400 diff --git a/Utility/OpenFd.hs b/Utility/OpenFd.hs index 95f18085a6..62ce4ace91 100644 --- a/Utility/OpenFd.hs +++ b/Utility/OpenFd.hs @@ -14,6 +14,9 @@ module Utility.OpenFd where import System.Posix.IO.ByteString import System.Posix.Types +#if ! MIN_VERSION_unix(2,8,0) +import Control.Monad +#endif import Utility.RawFilePath diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn index 5ae44072f5..a0a7a2882c 100644 --- a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn @@ -11,3 +11,5 @@ Utility/OpenFd.hs:28:9: error: ``` I'm not sure this error is directly caused by the antiquated compiler, but also not sure how to debug this further or work around it either. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment new file mode 100644 index 0000000000..1c923b26e1 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error/comment_1_b8ace7d676bdecfd0e3bb47331e48a13._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-29T15:19:41Z" + content=""" +git-annex is still targeting supporting ghc back to 9.0.2, so your old +ghc should not yet be a problem. However, I don't have any CI left that +uses such old versions of ghc, so it might break from time to time. + +I've fixed this one, which was a missing `import Control.Monad`. Please +report if you find other build failures. +"""]]
response
diff --git a/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment new file mode 100644 index 0000000000..07e7cefce5 --- /dev/null +++ b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__/comment_1_9657a0979fae0b88f8a9b8fcdd2417de._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-29T15:16:06Z" + content=""" +The inode cache is something git-annex uses internally to keep track of +changes to files. There are some known situations where it can get out of +date, including an upgrade from a v8 repository. Sometimes inodes change +for various reasons, like copying a repository from one filesystem to +another. So this just means that fsck has detected and updated the +information. I would not worry about it unless git-annex has other +unexpected behavior. +"""]]
comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment new file mode 100644 index 0000000000..3e240acd9c --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_5_28462adcccadd9a51a3c714a30cec23a._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-29T15:12:00Z" + content=""" +How efficient that would be would depend, I think, on how stable inodes are +between remounts of a sshfs mount. If it sees new inodes, it will +re-download all the files. +"""]]
add libghc-unbounded-delays-dev to debian/control deps
diff --git a/debian/control b/debian/control index 3ae6084609..a6911a9724 100644 --- a/debian/control +++ b/debian/control @@ -75,6 +75,7 @@ Build-Depends: libghc-optparse-applicative-dev (>= 0.11.0), libghc-torrent-dev, libghc-concurrent-output-dev, + libghc-unbounded-delays-dev, libghc-disk-free-space-dev, libghc-mountpoints-dev, libghc-magic-dev, diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn index f0e3c7b47c..e032d72f17 100644 --- a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn @@ -11,3 +11,5 @@ unbounded-delays ``` oddly we still built fine I believe for the http://github.com/datalad/git-annex where we also do not have that one I think + +> [[fixed|done]] presumably --[[Joey]] diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment new file mode 100644 index 0000000000..b670ce5a25 --- /dev/null +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev/comment_1_2415e9fc5ff3a66bacc039f5476dc013._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-29T15:07:41Z" + content=""" +git-annex has a dependency on unbounded-delays, listed in git-annex.cabal. + +Noting has changed here since 2024 when it stopped vendoring part of that +library and added the dependency. + +I do see that the debian/control shipped with git-annex was missing that +dep, I've added it and I *guess* that will fix your problem +"""]]
don't set locale encoding when opening binary file
importfeed: Fix encoding issues parsing feeds when built with OsPath.
importfeed: Fix encoding issues parsing feeds when built with OsPath.
diff --git a/CHANGELOG b/CHANGELOG index d61836f7c7..aa63a40a0e 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -4,6 +4,7 @@ git-annex (10.20250926) UNRELEASED; urgency=medium annex+http urls and git-annex p2phttp. * enableremote: Allow type= to be provided when it does not change the type of the special remote. + * importfeed: Fix encoding issues parsing feeds when built with OsPath. -- Joey Hess <id@joeyh.name> Thu, 25 Sep 2025 13:36:21 -0400 diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs index 3d1bb739f7..1a91add1e7 100644 --- a/Utility/FileIO/CloseOnExec.hs +++ b/Utility/FileIO/CloseOnExec.hs @@ -3,9 +3,9 @@ - All functions have been modified to set the close-on-exec - flag to True. - - - Also, functions that return a Handle have been modified to - - use the locale encoding, working around this bug: - - https://github.com/haskell/file-io/issues/45 + - Also, functions that return a Handle (for a non-binary file) + - have been modified to use the locale encoding, working around + - this bug: https://github.com/haskell/file-io/issues/45 - - Copyright 2025 Joey Hess <id@joeyh.name> - Copyright 2024 Julian Ospald @@ -70,12 +70,12 @@ openFile osfp iomode = augmentError "openFile" osfp $ withBinaryFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withBinaryFile osfp iomode act = (augmentError "withBinaryFile" osfp - $ withOpenFileEncoding osfp iomode True False closeOnExec (try . act) True) + $ withOpenFile' osfp iomode True False closeOnExec (try . act) True) >>= either ioError pure openBinaryFile :: OsPath -> IOMode -> IO Handle openBinaryFile osfp iomode = augmentError "openBinaryFile" osfp $ - withOpenFileEncoding osfp iomode True False closeOnExec pure False + withOpenFile' osfp iomode True False closeOnExec pure False readFile :: OsPath -> IO BSL.ByteString readFile fp = withFileNoEncoding' fp ReadMode BSL.hGetContents diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn index a0ab387188..e6ee11eb86 100644 --- a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn @@ -96,3 +96,5 @@ And it seems a fairly recent breakage, as IIRC the previous installed was from 2 ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) Yes, for many years. git-annex has worked vey well for downloading/collecting podcasts for years, which is why t was surprising it's suddenly failing like this. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment new file mode 100644 index 0000000000..c11f288b81 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_6_9c6851e659c977eb5106dcd83ea7765a._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2025-09-29T14:45:50Z" + content=""" +Thanks for some really good detective work @ewen. + +Note that this only happens when git-annex is built with the OsPath build +flag. + +That seems to indicate that the problem is in +Utility.FileIO.openBinaryFile, +which is the only way that parseFeedFromFile' varies depending on that +build flag. + +Aha yes, the problem is that uses withOpenFileEncoding, which is +inappropriate for a binary file! +"""]]
Added a comment: Cross link to importfeed parsing
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment new file mode 100644 index 0000000000..7f32b84e2e --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_26_6552491d65593df8346a764cb1cd3709._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Cross link to importfeed parsing" + date="2025-09-28T22:49:31Z" + content=""" +As a cross link, the changes in [comment 8 on this bug](http://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c) seem to have changed the feed parsing from binary mode to decoding UTF-8, which appears to be breaking on feeds which actually contain UTF-8 (eg, smart quotes, smart dashes, etc). + +See [comment on bug about importfeed breaking on `toEnum` out of range](http://git-annex.branchable.com/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/#comment-dbdafeb801ad23e2ccb3c2aa066a4efb) (where it took me a while to figure out what the root cause was). + +Ewen + + +"""]]
Added a comment: Feed seems to now be parsed as UTF-8 characters, not binary mode
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment new file mode 100644 index 0000000000..56b0b23315 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_5_9982bda0b8b224edd2300083f7e1ec00._comment @@ -0,0 +1,31 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Feed seems to now be parsed as UTF-8 characters, not binary mode" + date="2025-09-28T22:42:32Z" + content=""" +I think the relevant change is likely to be: + +``` +* feed (update: parseFeedFromFile uses openBinaryFile, updated git-annex to open + the file itself instead) +``` + +from [https://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c](https://git-annex.branchable.com/bugs/35_failed_tests_on_beegfs/#comment-d7e4cf0592937215e3acd3c08c03288c) + +Based on the fact that's a 2025-09-04 change (so since previous release), refers to `parseFeedFromFile`, and the relevant commit seems to be: + +[http://source.git-annex.branchable.com/?p=source.git;a=commit;h=2b1e9eced2fe825c882b4e9549a3a12f41d08055](http://source.git-annex.branchable.com/?p=source.git;a=commit;h=2b1e9eced2fe825c882b4e9549a3a12f41d08055) + +and particular in this file: + +[http://source.git-annex.branchable.com/?p=source.git;a=blobdiff;f=Command/ImportFeed.hs;h=e36e72370204ece44a05bfae5954272a46f34f5c;hp=7b66a2b5077613b7e33dc8597a8272e7fdea7102;hb=2b1e9eced2fe825c882b4e9549a3a12f41d08055;hpb=56cd59a9f4e24c5a6842179e0da9180875d837cc](http://source.git-annex.branchable.com/?p=source.git;a=blobdiff;f=Command/ImportFeed.hs;h=e36e72370204ece44a05bfae5954272a46f34f5c;hp=7b66a2b5077613b7e33dc8597a8272e7fdea7102;hb=2b1e9eced2fe825c882b4e9549a3a12f41d08055;hpb=56cd59a9f4e24c5a6842179e0da9180875d837cc) + +My reading of that code is that the feed parsing switched from (implicitly) \"just bytes\" (`openBinaryFile`) to decoding UTF-8 into full UTF-8 characters, but there's either (a) something in the later git-annex code or (b) the XML parser that does not expect to receive non-ASCII Unicode characters resulting from opening in \"character\" mode rather than \"binary\" mode, resulting in out of range values. + +Which results in the crash on encountering the first non-ASCII character in the feed :-/ + +It's not clear to me why in fixing \"set close-on-exec bit on open files\" the feed parsing was changed from bytes (binary mode) to decoded characters. But it appears it wasn't tested on feeds where the text has been through a wordprocessor throwing in smart quotes and smart dashes and the like all over the place. + +Ewen +"""]]
Added a comment: importfeed: utf-8 XML is (now?) parsed into 8-bit characters
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment new file mode 100644 index 0000000000..fb5436d072 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_4_3ee57c43594f381747b8463b8acadb9f._comment @@ -0,0 +1,69 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="importfeed: utf-8 XML is (now?) parsed into 8-bit characters" + date="2025-09-28T22:24:23Z" + content=""" +Based on looking at some examples, I'm fairly convinced that the podcast feeds are now being parsed into 8 bit characters (extended ASCII?), even when (only when?) they have `encoding=\"UTF-8\"` on the `<?xml ...?>` prelude tag. UTF-8 decoding can obviously can easily result in characters outside the 8-bit range, which seems to be the exception thrown, based on examining the feed contents (below) and the \"tag\" values outside range. + +8217 == 0x2019 (in hex). + +And [U+2019](https://www.compart.com/en/unicode/U+2019) is a single quotation mark, which encodes in UTF-8 as `0xE2 0x80 0x99`. + +The first problematic feed is littered with that exact byte sequence: + +``` +ewen@basadi:/tmp$ curl -s https://risky.biz/feeds/risky-business/ | head -1 +<?xml version=\"1.0\" encoding=\"utf-8\" ?> +ewen@basadi:/tmp$ +``` + +``` +ewen@basadi:/tmp$ curl -s https://risky.biz/feeds/risky-business/ | hexdump -C | grep \"e2 80 99\" | head +000008b0 65 65 6b e2 80 99 73 20 73 68 6f 77 20 50 61 74 |eek...s show Pat| +00000a20 20 77 65 65 6b e2 80 99 73 20 65 70 69 73 6f 64 | week...s episod| +00000a60 e2 80 99 73 20 73 70 6f 6e 73 6f 72 20 69 6e 74 |...s sponsor int| +00000bf0 20 74 68 65 20 77 65 65 6b e2 80 99 73 20 63 79 | the week...s cy| +00000d60 20 77 65 65 6b e2 80 99 73 20 65 70 69 73 6f 64 | week...s episod| +00000da0 e2 80 99 73 20 73 70 6f 6e 73 6f 72 20 69 6e 74 |...s sponsor int| +00001580 65 e2 80 9d 20 69 73 6e e2 80 99 74 20 74 68 65 |e... isn...t the| +00001c20 e2 80 99 20 61 73 20 73 75 70 70 6c 69 65 72 20 |... as supplier | +00002290 20 74 68 69 73 20 77 65 65 6b e2 80 99 73 20 73 | this week...s s| +000022d0 65 6b e2 80 99 73 20 63 79 62 65 72 73 65 63 75 |ek...s cybersecu| +ewen@basadi:/tmp$ +``` + +Another of the problematic feeds (reported as 8211; see first post) has lots of the UTF-8 sequence `e2 80 93` for [U+2103](https://www.compart.com/en/unicode/U+2013) (an en dash), and 8211 == 0x2013: + +``` +ewen@basadi:/tmp$ curl -s https://theamphour.libsyn.com/rss | hexdump -C | grep \" e2 80 \" | head +0001e800 31 39 36 20 e2 80 93 20 41 6e 20 49 6e 74 65 72 |196 ... An Inter| +0001e860 31 39 36 20 e2 80 93 20 41 6e 20 49 6e 74 65 72 |196 ... An Inter| +0003e510 68 74 3d 22 30 22 3e 4c 6f 61 64 69 6e 67 e2 80 |ht=\"0\">Loading..| +0003f660 3e 20 3c 70 3e 4c 6f 61 64 69 6e 67 e2 80 a6 20 |> <p>Loading... | +00052440 6d 70 20 48 6f 75 72 20 23 33 37 39 20 e2 80 93 |mp Hour #379 ...| +0007a7d0 e2 80 93 20 4f 73 74 72 6f 62 6f 67 75 6c 6f 75 |... Ostrobogulou| +00088480 72 20 23 38 33 20 e2 80 94 20 41 67 67 72 61 76 |r #83 ... Aggrav| +00088b40 41 6d 70 20 48 6f 75 72 20 23 38 32 20 e2 80 94 |Amp Hour #82 ...| +000891e0 20 23 38 31 20 e2 80 94 20 4a 65 72 73 65 79 20 | #81 ... Jersey | +000898a0 30 20 e2 80 94 20 4f 74 69 6f 73 65 20 4f 6e 74 |0 ... Otiose Ont| +ewen@basadi:/tmp$ +``` + +``` +ewen@basadi:/tmp$ curl -s https://theamphour.libsyn.com/rss | head -1 +<?xml version=\"1.0\" encoding=\"UTF-8\"?> +ewen@basadi:/tmp$ +``` + +The working feed appears to have no non-ASCII characters in it: + +``` +ewen@basadi:/tmp$ curl -s 'https://www.2600.com/oth-broadband.xml' | hexdump -C | grep ' [89abcdef][0-9a-f] ' +ewen@basadi:/tmp$ +``` + +So it appears non-ASCII UTF-8 encoding is required to trigger this problem. + +Ewen +"""]]
Added a comment: Example still working feed
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment new file mode 100644 index 0000000000..b3763d3fcb --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_3_f9d976fc829826401838b285698e22ee._comment @@ -0,0 +1,109 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Example still working feed" + date="2025-09-28T22:05:57Z" + content=""" +My *hunch* is that this error occurs during *parsing* the feed XML, based on not getting to the feed *title* and \"ok\" being displayed in the error case. But I'm not sure if there's a specific way to test just that. + +Example of a podcast feed that still works: + +https://www.2600.com/oth-broadband.xml + +There's no redirect on this one, and the `Content-Type` header has an explicit `charset=utf-8`, but so far I don't know if that matters. + +The failing feed has `encoding=\"utf-8\"` in the `<?xml ...?>` header of the file, which in theory is functionally equivalent in terms of XML communicating how to expect the file to be encoded. But maybe git-annex is not treating that the same any longer? + +``` +ewen@basadi:/tmp/podcasts$ git annex importfeed --relaxed \"https://www.2600.com/oth-broadband.xml\" +importfeed gathering known urls ok +importfeed https://www.2600.com/oth-broadband.xml (\"Off The Hook\") ok +ewen@basadi:/tmp/podcasts$ +``` + +second import attempt above, matching what my podcast downloads normally do; the first one was also `--relaxed` but with `--debug` and the debug output is quote long, so here's just the start of it, showing it got a lot further than the feeds that don't work: + +``` +ewen@basadi:/tmp/podcasts$ git annex importfeed --debug --relaxed \"https://www.2600.com/oth-broadband.xml\" +[2025-09-29 10:57:01.984117] (Utility.Process) process [8003] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"] +[2025-09-29 10:57:01.99142] (Utility.Process) process [8003] done ExitSuccess +[2025-09-29 10:57:01.992598] (Utility.Process) process [8004] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"] +[2025-09-29 10:57:01.999387] (Utility.Process) process [8004] done ExitSuccess +[2025-09-29 10:57:02.00066] (Utility.Process) process [8005] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"] +importfeed gathering known urls [2025-09-29 10:57:02.01013] (Utility.Process) process [8006] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"rev-parse\",\"--verify\",\"--quiet\",\"refs/heads/git-annex:\"] +[2025-09-29 10:57:02.016994] (Utility.Process) process [8006] done ExitSuccess +ok +importfeed https://www.2600.com/oth-broadband.xml [2025-09-29 10:57:02.101169] (Utility.Url) Request { + host = \"www.2600.com\" + port = 443 + secure = True + requestHeaders = [(\"Accept-Encoding\",\"identity\"),(\"User-Agent\",\"git-annex/10.20250925\")] + path = \"/oth-broadband.xml\" + queryString = \"\" + method = \"GET\" + proxy = Nothing + rawBody = False + redirectCount = 10 + responseTimeout = ResponseTimeoutDefault + requestVersion = HTTP/1.1 + proxySecureMode = ProxySecureWithConnect +} + +(\"Off The Hook\") ok +addurl https://download.2600.com/mediadownload/www.2600.com/offthehook/mp3files/2025/off_the_hook__20250924-128.mp3 [2025-09-29 10:57:02.798482] (Utility.Process) process [8025] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"-c\",\"annex.debug=true\",\"check-ignore\",\"-z\",\"--stdin\",\"--verbose\",\"--non-matching\"] +(to Off_The_Hook/Off_The_Hook__-_Wed__24_Sep_2025_19_00_00_EST.mp3) [2025-09-29 10:57:02.807045] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.web +[2025-09-29 10:57:02.808092] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.web +[2025-09-29 10:57:02.808366] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log +[2025-09-29 10:57:02.809307] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log +[2025-09-29 10:57:02.809547] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log +[2025-09-29 10:57:02.809684] (Messages.explain) [ Off_The_Hook/Off_The_Hook__-_Wed__24_Sep_2025_19_00_00_EST.mp3 does not match annex.addunlocked: nothing[FALSE] ] + +[2025-09-29 10:57:02.810562] (Utility.Process) process [8026] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"symbolic-ref\",\"-q\",\"HEAD\"] +[2025-09-29 10:57:02.816781] (Utility.Process) process [8026] done ExitSuccess +[2025-09-29 10:57:02.817626] (Utility.Process) process [8027] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"refs/heads/main\"] +[2025-09-29 10:57:02.824386] (Utility.Process) process [8027] done ExitFailure 1 +[2025-09-29 10:57:02.825815] (Utility.Process) process [8028] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"hash-object\",\"-w\",\"--no-filters\",\"--stdin-paths\"] +[2025-09-29 10:57:02.833619] (Annex.Branch) read 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.met +[2025-09-29 10:57:02.834642] (Annex.Branch) set 0d7/832/URL--https&c%%download.2600.com%media-1b3961da2b715a143256fcc3b5e6313a.log.met +ok +... +(recording state in git...) +[2025-09-29 10:57:03.027308] (Utility.Process) process [8047] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"] +[2025-09-29 10:57:03.033836] (Utility.Process) process [8047] done ExitSuccess +[2025-09-29 10:57:03.03538] (Utility.Process) process [8048] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-index\",\"-z\",\"--index-info\"] +[2025-09-29 10:57:03.052124] (Utility.Process) process [8048] done ExitSuccess +[2025-09-29 10:57:03.052815] (Utility.Process) process [8049] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"] +[2025-09-29 10:57:03.060486] (Utility.Process) process [8049] done ExitSuccess +[2025-09-29 10:57:03.061529] (Utility.Process) process [8050] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"write-tree\"] +[2025-09-29 10:57:03.080761] (Utility.Process) process [8050] done ExitSuccess +[2025-09-29 10:57:03.081517] (Utility.Process) process [8051] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"commit-tree\",\"6333ccc39e4da89afeb821dcb39b6ef3ba84c936\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\",\"-m\",\"update\"] +[2025-09-29 10:57:03.090011] (Utility.Process) process [8051] done ExitSuccess +[2025-09-29 10:57:03.090729] (Utility.Process) process [8052] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"update-ref\",\"refs/heads/git-annex\",\"1b74c6e0e12093128fe6f6ba4c895e9115df3015\"] +[2025-09-29 10:57:03.099106] (Utility.Process) process [8052] done ExitSuccess +[2025-09-29 10:57:03.102583] (Utility.Process) process [8005] done ExitSuccess +[2025-09-29 10:57:03.102969] (Utility.Process) process [8028] done ExitSuccess +[2025-09-29 10:57:03.103307] (Utility.Process) process [8025] done ExitFailure 1 +``` + +``` +ewen@basadi:/tmp$ curl --head https://www.2600.com/oth-broadband.xml +HTTP/1.1 200 OK +Date: Sun, 28 Sep 2025 21:56:26 GMT +Server: Apache +X-Content-Type-Options: nosniff +X-Drupal-Cache: HIT +Etag: \"1759088060-0\" +Content-Language: en +X-Frame-Options: SAMEORIGIN +Cache-Control: public, max-age=0 +Last-Modified: Sun, 28 Sep 2025 19:34:20 GMT +Expires: Sun, 19 Nov 1978 05:00:00 GMT +Vary: Cookie,Accept-Encoding +Content-Type: application/rss+xml; charset=utf-8 +Strict-Transport-Security: max-age=16070400; +orig_req_proto: https +Connection: close + +ewen@basadi:/tmp$ +``` +"""]]
Added a comment: Debug output
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment new file mode 100644 index 0000000000..c39b2ba71b --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_2_0a790f8fd42304f17887536102af09d4._comment @@ -0,0 +1,70 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Debug output" + date="2025-09-28T21:58:18Z" + content=""" +Having found `--debug` (by trying to scan the source; I barely know Haskell, but found almost no *explicit* `toEnum` and none that have changed in the last month AFAICT), it does seem like it's getting as far as downloading the feed URL contents, and then failing (presumably on doing something about parsing it). + +``` +ewen@basadi:/tmp/podcasts$ git annex importfeed --debug \"https://risky.biz/feeds/risky-business\" +[2025-09-29 10:51:54.947712] (Utility.Process) process [7859] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"git-annex\"] +[2025-09-29 10:51:54.954732] (Utility.Process) process [7859] done ExitSuccess +[2025-09-29 10:51:54.955442] (Utility.Process) process [7860] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"] +[2025-09-29 10:51:54.962048] (Utility.Process) process [7860] done ExitSuccess +[2025-09-29 10:51:54.963011] (Utility.Process) process [7861] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"cat-file\",\"--batch\"] +importfeed gathering known urls [2025-09-29 10:51:54.97316] (Utility.Process) process [7862] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"-c\",\"annex.debug=true\",\"rev-parse\",\"--verify\",\"--quiet\",\"refs/heads/git-annex:\"] +[2025-09-29 10:51:54.980308] (Utility.Process) process [7862] done ExitSuccess +ok +importfeed https://risky.biz/feeds/risky-business [2025-09-29 10:51:55.067656] (Utility.Url) Request { + host = \"risky.biz\" + port = 443 + secure = True + requestHeaders = [(\"Accept-Encoding\",\"identity\"),(\"User-Agent\",\"git-annex/10.20250925\")] + path = \"/feeds/risky-business\" + queryString = \"\" + method = \"GET\" + proxy = Nothing + rawBody = False + redirectCount = 10 + responseTimeout = ResponseTimeoutDefault + requestVersion = HTTP/1.1 + proxySecureMode = ProxySecureWithConnect +} + + +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +[2025-09-29 10:51:57.214034] (Utility.Process) process [7861] done ExitSuccess +importfeed: 1 failed +ewen@basadi:/tmp/podcasts$ +``` + +Request headers (via `curl`; note there's a 301 redirect, but asking `git-annex` to download the version at the end of the redirect doens't change the `git-annex` symptoms): + +``` +ewen@basadi:/tmp$ curl --head https://risky.biz/feeds/risky-business +HTTP/1.1 301 Moved Permanently +Date: Sun, 28 Sep 2025 21:54:28 GMT +Server: Apache +Strict-Transport-Security: max-age=63072000; +Location: https://risky.biz/feeds/risky-business/ +Connection: close +Content-Type: text/html; charset=iso-8859-1 + +ewen@basadi:/tmp$ curl --head https://risky.biz/feeds/risky-business/ +HTTP/1.1 200 OK +Date: Sun, 28 Sep 2025 21:54:40 GMT +Server: Apache +Strict-Transport-Security: max-age=63072000; +Last-Modified: Sun, 28 Sep 2025 19:34:25 GMT +ETag: \"864f7-63fe19b449bd4\" +Accept-Ranges: bytes +Content-Length: 550135 +Vary: Accept-Encoding +Connection: close +Content-Type: application/xml + +ewen@basadi:/tmp$ +``` +"""]]
Added a comment: Previous working build was 20250828
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment new file mode 100644 index 0000000000..9e55b54394 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs/comment_1_f24dadf21fc4a95e627e508d1e22488d._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="ewen" + avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e" + subject="Previous working build was 20250828" + date="2025-09-28T21:35:07Z" + content=""" +For context, [previous HomeBrew build](https://github.com/Homebrew/homebrew-core/commit/9dac1897529335a9115830a1c646ca3e90f39292) that I would have had installed, and working, before was `20250828`. + +Ewen + + +"""]]
importfeed: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255)
diff --git a/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn new file mode 100644 index 0000000000..a0ab387188 --- /dev/null +++ b/doc/bugs/importfeed__58___Enum.toEnum__123__Word8__125____58___tag___40__8217__41___is_outs.mdwn @@ -0,0 +1,98 @@ +### Please describe the problem. + +Since upgrading to git-annex version 10.20250925, from the macOS HomeBrew build, `git annex importfeed` seems to *usually* fail with an `Enum.toEnum{Word8}` out of bounds error. The exact value reported for the out of bounds value varies between feed URLs, but from a little bit of testing the error appears deterministic between those feed URLs. + +A few examples (plus one more in the reproducer below): + +``` +importfeed https://popculturedetective.agency/feed/podcast +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +``` + +``` +importfeed https://contextualelectronics.com/feed/podcast/ +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +``` + +``` +importfeed https://theamphour.libsyn.com/rss +git-annex: Enum.toEnum{Word8}: tag (8211) is outside of bounds (0,255) +failed +``` + +(A couple of podcast feeds with no new changes just report "ok"; but I'd also expect most of the above to not have any recent changes as they're weekly-or-less podcasts.) + +### What steps will reproduce the problem? + +Indicative example (one of the feed URLs I follow; but it's happening on all of *most* of them that all worked with the previous version of git-annex): + +``` +ewen@basadi:/tmp/podcasts$ git init +Initialized empty Git repository in /private/tmp/podcasts/.git/ +ewen@basadi:/tmp/podcasts$ git annex init 'Test repo' +init Test repo ok +(recording state in git...) +ewen@basadi:/tmp/podcasts$ TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}' +ewen@basadi:/tmp/podcasts$ git annex importfeed --template="${TEMPLATE}" "https://risky.biz/feeds/risky-business" +importfeed gathering known urls ok +importfeed https://risky.biz/feeds/risky-business +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +importfeed: 1 failed +ewen@basadi:/tmp/podcasts$ +``` + +The `--template` part does not seem necessary to the reproducer either, as I get the same error without (it's just the `--template` is in my standard run that I've used for years): + +``` +ewen@basadi:/tmp/podcasts$ git annex importfeed "https://risky.biz/feeds/risky-business" +importfeed gathering known urls ok +importfeed https://risky.biz/feeds/risky-business +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +importfeed: 1 failed +ewen@basadi:/tmp/podcasts$ +``` + +### What version of git-annex are you using? On what operating system? + +``` +ewen@basadi:~$ git annex version +git-annex version: 10.20250925 +build flags: Assistant Webapp Pairing FsEvents TorrentParser MagicMime Servant Benchmark Feeds Testsuite S3 WebDAV OsPath +dependency versions: aws-0.24.4 bloomfilter-2.0.1.2 crypton-1.0.4 DAV-1.3.4 feed-1.3.2.1 ghc-9.10.3 http-client-0.7.19 persistent-sqlite-2.13.3.1 torrent-10000.1.3 uuid-1.3.16 yesod-1.6.2.1 +key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2BP512E BLAKE2BP512 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL GITBUNDLE GITMANIFEST VURL X* +remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar git-lfs httpalso borg rclone hook external compute mask +operating system: darwin x86_64 +supported repository versions: 8 9 10 +upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 +ewen@basadi:~$ +``` + +on macOS 15.6.1 (Sequoia), which is the latest release apart from the macOS 26 released this month. On Intel in this case, but seems to also reproduce on the same macOS 15.6.1 (Sequoia) on Apple M2 processor, with the same HomeBrew build of git-annex. + +### Please provide any additional information below. + +[[!format sh """ +ewen@basadi:/tmp/podcasts$ git annex --verbose --verbose importfeed --verbose --verbose "https://risky.biz/feeds/risky-business" +importfeed gathering known urls ok +importfeed https://risky.biz/feeds/risky-business +git-annex: Enum.toEnum{Word8}: tag (8217) is outside of bounds (0,255) +failed +importfeed: 1 failed +ewen@basadi:/tmp/podcasts$ +"""]] + +At this stage I don't know if this is specific to `importfeed` or specific to the HomeBrew build of git-annex. + +Other annexes tracking files do seem to work (`git annex add` / `git annex sync` / `git annex copy ...` all work) with this version of git-annex. So I suspect it's somehow specific to importfeed and/or the HomeBrew build. + +And it seems a fairly recent breakage, as IIRC the previous installed was from 2025-08. + +[HomeBrew git-annex package information](https://formulae.brew.sh/formula/git-annex) + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +Yes, for many years. git-annex has worked vey well for downloading/collecting podcasts for years, which is why t was surprising it's suddenly failing like this.
diff --git a/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn b/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn new file mode 100644 index 0000000000..cce238807e --- /dev/null +++ b/doc/forum/remove_old_versions_of_files_in_archive__63__.mdwn @@ -0,0 +1,9 @@ +I have set up a number of annex repos for storing various different things (media, ebooks, audiobooks, gopro footage, archived files, files to sync to mobile devices over adb, etc). Many of them I sync to backblaze (as s3 special remote) and gdrive (as rclone special remote). + +Both backblaze and gdrive remotes are configured as "redundantarchive" groups (configured as `not (copies=redundantarchive:2)`). This all seems to be working properly. + +As time goes on, I expect I'll run out of storage in gdrive (backblaze I can keep storing more stuff in it as long as I keep paying money). This got me thinking about longer term storage management. How should one limit the size of an "archive" remote? Or decide to delete versions of files? Are there preferred content configs I could use to be smarter about which data I store where? + +What about keeping a certain number of versions of a file (last n) or versions before a particular date (no older than)? I see expireunused, but I don't think I understand how it interacts with archive groups or special remotes generally. + +How are we supposed to think about archives and removing old versions of data generally?
Added a comment
diff --git a/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment b/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment new file mode 100644 index 0000000000..3528199efe --- /dev/null +++ b/doc/todo/import_tree_from_rsync_special_remote/comment_4_55351f379349d2d7e6c769fa54f8a7ee._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 4" + date="2025-09-27T17:27:51Z" + content=""" +Am I right to assume that I might achieve this (import with efficient reimports if needed from a remote via ssh directory) using a `directory` special remote on top of `sshfs` mount? +Or is there a better way to achieve that? +"""]]
missing build dep for debian?
diff --git a/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn new file mode 100644 index 0000000000..f0e3c7b47c --- /dev/null +++ b/doc/bugs/FTBFS__58___needs_build-dep_libghc-unbounded-delays-dev.mdwn @@ -0,0 +1,13 @@ +### Please describe the problem. + +encountered while building standalone `10.20250828-1~ndall+1` under trixie + +``` + checking git-remote-gcrypt... git-remote-gcrypt + checking ssh connection caching... yes +Configuring git-annex-10.20250828... +Error: Setup: Encountered missing or private dependencies: +unbounded-delays +``` + +oddly we still built fine I believe for the http://github.com/datalad/git-annex where we also do not have that one I think
diff --git a/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn new file mode 100644 index 0000000000..07dc326192 --- /dev/null +++ b/doc/forum/meaning___34__stale_or_missing_inode_cache__34____63__.mdwn @@ -0,0 +1 @@ +What does "stale or missing inode cache; updating" from a fsck mean?
diff --git a/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn new file mode 100644 index 0000000000..5ae44072f5 --- /dev/null +++ b/doc/bugs/Compiling_20250925__44___variable_not_in_scope_error.mdwn @@ -0,0 +1,13 @@ +I'm attempting to update Arch Linux packaging ... the major caveat being we're stuck using an old (9.4.8) version of `ghc` for now... + +Building the latest tagged release now produces this error: + +``` +Utility/OpenFd.hs:28:9: error: + Variable not in scope: when :: Bool -> IO () -> IO a0 + | +28 | when closeonexec $ + | ^^^^ +``` + +I'm not sure this error is directly caused by the antiquated compiler, but also not sure how to debug this further or work around it either.
add news item for git-annex 10.20250925
diff --git a/doc/news/version_10.20250520.mdwn b/doc/news/version_10.20250520.mdwn deleted file mode 100644 index 07a4e9c893..0000000000 --- a/doc/news/version_10.20250520.mdwn +++ /dev/null @@ -1,12 +0,0 @@ -git-annex 10.20250520 released with [[!toggle text="these changes"]] -[[!toggleable text=""" * Preferred content now supports "balanced=groupname:lackingcopies" - to make files be evenly balanced amoung as many repositories as are - needed to satisfy numcopies. - * map: Fix buggy handling of remotes that are bare git repositories - accessed via ssh. - * map: Avoid looping forever with mutually recursive paths between - repositories accessed via ssh. - * whereused: Fix bug that could find matches from grafts - in remote git-annex branches. - * Windows: Fix bug that can cause git status to show annexed files as - modified when built with OsPath."""]] \ No newline at end of file diff --git a/doc/news/version_10.20250925.mdwn b/doc/news/version_10.20250925.mdwn new file mode 100644 index 0000000000..3cba8b8b77 --- /dev/null +++ b/doc/news/version_10.20250925.mdwn @@ -0,0 +1,28 @@ +git-annex 10.20250925 released with [[!toggle text="these changes"]] +[[!toggleable text=""" * Fix bug that made changes to a special remote sometimes be missed when + importing a tree from it. After upgrading, any such missed changes + will be included in the next tree imported from a special remote. + Fixes reversion introduced in version 10.20230626. + * Fix crash operating on filenames that are exactly 21 bytes long + and begin with a utf-8 character. + * Fix hang that could occur when using git-annex adjust on a branch with + a number of files greater than annex.queuesize. + * Fix bug that could cause an invalid utf-8 sequence to be used in a + temporary filename when the input filename was valid utf-8. + * Improve performance when used with a local git remote that has a + large working tree. + * drop: --fast support when dropping from a remote. + * Added annex.assistant.allowunlocked config. + * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. + * enableremote: Disallow using type= to attempt to change the type of an + existing remote. + * Add build warnings when git-annex is built without the OsPath + build flag. + * version: Report on whether it was built with the OsPath build flag. + * Avoid leaking file descriptors to child processes started by git-annex + in some situations. Note that when not built with the OsPath build + flag, these leaks can still happen. + * git-annex.cabal: Turn on the OsPath build flag by default. + * p2phttp: Fix a hang that could occur when used with --directory, + and a repository in the directory got removed. + * Removed support for building with unmaintained cryptonite, use crypton."""]] \ No newline at end of file
Added a comment
diff --git a/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment b/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment new file mode 100644 index 0000000000..99856a1ab3 --- /dev/null +++ b/doc/devblog/day_649-650__speeding_up_repeated_imports/comment_1_a58663214bc81c0cbd50d53f55e3325b._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="nadir" + avatar="http://cdn.libravatar.org/avatar/2af9174cf6c06de802104d632dc40071" + subject="comment 1" + date="2025-09-24T21:52:32Z" + content=""" +A bit late, but this is actually the feature I use the most. I mainly use git-annex to catalogue my media, not so much to manage it directly. Appreciate the work you've put into improving imports (and of course git annex in general). +"""]]
update
diff --git a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment index cf6c7cd167..0a5cd38c69 100644 --- a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment +++ b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment @@ -17,6 +17,11 @@ This change will fix it: - hashtype="${0##*git-annex-backend-X}" + hashtype="${0##*git-annex-backend-}" +However, since the hash is named "XXHASH", and this is an external backend, +I think the backend name you should really be using is "XXXHASH". This +leaves the "XXHASH" backend name free for git-annex to use if it +implemented it as a built-in backend. + Once you have the program working, we can add it to the list of external backends. """]]
comments
diff --git a/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment new file mode 100644 index 0000000000..cf6c7cd167 --- /dev/null +++ b/doc/todo/add_xxHash_backend/comment_5_ad6f50e7d27d31028c81a4899f91f223._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 5""" + date="2025-09-24T16:30:12Z" + content=""" +This is a bug in your program. It is generating a +key using the XH3 backend, rather than the XXH3 backend. + + [2025-09-24 12:29:41.565937669] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-backend-XXH3[1] <-- GENKEY .git/annex/othertmp/ingest-bar89415-0 + [2025-09-24 12:29:41.568293334] (Annex.ExternalAddonProcess) /home/joey/bin/git-annex-backend-XXH3[1] --> GENKEY-SUCCESS XH3-s30--88ad06d188b880a1 + +When git-annex later wants to do something that that key, +it expects to find a git-annex-backend-XH3 program. + +This change will fix it: + + - hashtype="${0##*git-annex-backend-X}" + + hashtype="${0##*git-annex-backend-}" + +Once you have the program working, we can add it to the list of external +backends. +"""]] diff --git a/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment b/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment new file mode 100644 index 0000000000..a13c264ef8 --- /dev/null +++ b/doc/todo/add_xxHash_backend/comment_6_6889f05d633cb340046c9d4796735a57._comment @@ -0,0 +1,20 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 6""" + date="2025-09-24T16:39:02Z" + content=""" +I am inclined to keep this todo open despite external backend +programs existing, because it would be nice to have xxHash in +git-annex natively due to its speed. + +I found this haskell library which includes xxh3 +and which would be easy to add as a git-annex dependency, +although it would need to be gated behind a build flag for now: +<https://hackage.haskell.org/package/xxhash-ffi> + +(Since that library uses Hashable, it generates an Int for the hash. +This seems to limit it to be used on 64 bit platforms. +<https://github.com/haskell-haskey/xxhash-ffi/issues/6> +The lower-level Data.Digest.XXHash.FFI.C uses CULLong so will work on 32 +bit.) +"""]]
fixed
diff --git a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn index 6be9060190..82eddd3aa3 100644 --- a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn +++ b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn @@ -58,3 +58,4 @@ Error: Process completed with exit code 1. e.g. [here](https://github.com/datalad/git-annex/actions/runs/17903960530/job/50901925018) +> [[fixed|done]] --[[Joey]]
Added a comment: the X prefix conflicts with the eXternal backend namespace
diff --git a/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment b/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment new file mode 100644 index 0000000000..bf18b44cfc --- /dev/null +++ b/doc/todo/add_xxHash_backend/comment_4_3e5b815dfea0939a6affa7443701a911._comment @@ -0,0 +1,79 @@ +[[!comment format=mdwn + username="Arnie97" + avatar="http://cdn.libravatar.org/avatar/607ed64cbd8e7a4cc2035a865b6cb5b2" + subject="the X prefix conflicts with the eXternal backend namespace" + date="2025-09-24T12:05:05Z" + content=""" +I'm trying to create a external backend for xxHash, but experienced weird behaviors. + +If only `/bin/git-annex-backend-XXH3` is present in `$PATH`, and `git config annex.backend XXH3` is set, then git annex complains `Cannot run git-annex-backend-XH3 -- It is not installed in PATH`, which seems like a bug. +And if `/bin/git-annex-backend-XXH3` is moved to `/bin/git-annex-backend-XH3` according to the error message, it will complain `Cannot run git-annex-backend-XXH3 -- It is not installed in PATH` (this is expected). +Finally I have to link the same shell script to both `/bin/git-annex-backend-XH3` and `/bin/git-annex-backend-XXH3` to make the backend config `XXH3` work. + +```bash +#!/bin/sh + +set -e + +hashtype=\"${0##*git-annex-backend-X}\" + +# could send PROGRESS while doing this, but it's +# hard to implement that in shell +case \"$hashtype\" in + BLAKE3_256) + hashfile() { b3sum --no-names \"$1\"; } ;; + BLAKE3_512) + hashfile() { b3sum --no-names -l 64 \"$1\"; } ;; + XXH32|XH32) + hashfile() { xxhsum -H0 \"$1\" | cut -d ' ' -f 1; } ;; + XXH64|XH64) + hashfile() { xxhsum -H1 \"$1\" | cut -d ' ' -f 1; } ;; + XXH128|XH128) + hashfile() { xxhsum -H2 \"$1\" | cut -d ' ' -f 1; } ;; + XXH3|XH3) + hashfile() { xxhsum -H3 --tag \"$1\" | awk '{ print $NF }'; } ;; +esac + +while read line; do + set -- $line + case \"$1\" in + GETVERSION) + echo VERSION 1 + ;; + CANVERIFY) + echo CANVERIFY-YES + ;; + ISSTABLE) + echo ISSTABLE-YES + ;; + ISCRYPTOGRAPHICALLYSECURE) + echo ISCRYPTOGRAPHICALLYSECURE-YES + ;; + GENKEY) + contentfile=\"$2\" + hash=$(hashfile \"$contentfile\") + sz=$(wc -c \"$contentfile\" | cut -d ' ' -f 1) + if [ -n \"$hash\" ]; then + echo \"GENKEY-SUCCESS\" \"$hashtype-s$sz--$hash\" + else + echo \"GENKEY-FAILURE\" \"calculate hash sum failed\" + fi + ;; + VERIFYKEYCONTENT) + key=\"$2\" + contentfile=\"$3\" + hash=$(hashfile \"$contentfile\") + khash=$(echo \"$key\" | sed 's/.*--//') + if [ \"$hash\" = \"$khash\" ]; then + echo \"VERIFYKEYCONTENT-SUCCESS\" + else + echo \"VERIFYKEYCONTENT-FAILURE\" + fi + ;; + *) + echo ERROR protocol error + ;; + esac +done +``` +"""]]
diff --git a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn index 02460f6426..9ba6311675 100644 --- a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn +++ b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn @@ -21,3 +21,5 @@ I have three repositories in groups archive and backup. These five failes are on Just manually copying them to other repositories solved it. What did I miss or did not understand? Git assistant runs on all repositories and are ssh connected nearly 24/7 and all other syncing works fine. + +`git annex sync --all --content ONE_OF_THE_ARCHIVE_BACKUP_REPOSITORIES` does not change anything either.
diff --git a/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn new file mode 100644 index 0000000000..02460f6426 --- /dev/null +++ b/doc/forum/does_assistant_autosolve___34__not_enough_copies__34____63__.mdwn @@ -0,0 +1,23 @@ + git annex fsck --quiet --fast --all + Only 2 of 3 trustworthy copies exist of SHA256E-s151552--fbffb0cb11ed636bff5ded57576601f97d375c2f4ab37e8c4ed72e9918e53f1b.pdf + Back it up with git-annex copy. + Only 2 of 3 trustworthy copies exist of SHA256E-s271--34654adac964ea7344753de2d6ce4385c8561abbc208bc73fd8e91ab6e97f53c.svg + Back it up with git-annex copy. + Only 2 of 3 trustworthy copies exist of SHA256E-s790305--4c1b2ba8088862f80f1cf40345715b7e3bed350245a4fb6808bfc928e8867feb.svg + Back it up with git-annex copy. + Only 2 of 3 trustworthy copies exist of SHA256E-s745093--8c0b45e513a55f2678c8a3bd2901a0f8673b43784a374f55c2646cfe41b6a11e.svg + Back it up with git-annex copy. + Only 2 of 3 trustworthy copies exist of SHA256E-s57327--663e9747669a3476cf14a697d715ac40d0da4d7c92df0ce42bed3c9c3eb77f96.pdf + Back it up with git-annex copy. + fsck: 5 failed + + +Shouldn't this be automatically resolved by git annex assistant? + +I have three repositories in groups archive and backup. These five failes are on/in neither of the three. + +`git annex copy --auto --to ONE_OF_THE_ARCHIVE_BACKUP_REPOSITORIES` does not solve the problem. + +Just manually copying them to other repositories solved it. + +What did I miss or did not understand? Git assistant runs on all repositories and are ssh connected nearly 24/7 and all other syncing works fine.
invalidate recorded content identifier tree when export changes
Fix bug that made changes to a special remote sometimes be missed when
importing a tree from it. The diff import would miss when a change was
exported, then manually undone on the special remote (eg deleting a newly
exported file). A full import is needed to catch such changes.
After upgrading, any such missed changes will be included in the next
tree imported from a special remote. This happens because the previously
recorded content identifier tree does not have export information included,
so it is treated as invalid, and a full import is done.
Fixes reversion introduced in version 10.20230626, commit
40017089f268391f79226592850b58855cdbf808
Unfortunately, this does mean that after each export, the next import will
be a full import. Which can take significantly longer than the diff import
does, when there are a lot of files in the tree.
It would be better if exporting also update the content identifier tree.
However, I don't know if that can be done inexpensively. It would be future
optimisation work, in any case.
(That could only be done for an export that is run in the same
repository as the import. When an export is run in a different repository,
the export.log gets updated, and that propagates to the repository where
import is later run. At that point, a full import is done.)
Sponsored-by: Luke T. Shumaker
Fix bug that made changes to a special remote sometimes be missed when
importing a tree from it. The diff import would miss when a change was
exported, then manually undone on the special remote (eg deleting a newly
exported file). A full import is needed to catch such changes.
After upgrading, any such missed changes will be included in the next
tree imported from a special remote. This happens because the previously
recorded content identifier tree does not have export information included,
so it is treated as invalid, and a full import is done.
Fixes reversion introduced in version 10.20230626, commit
40017089f268391f79226592850b58855cdbf808
Unfortunately, this does mean that after each export, the next import will
be a full import. Which can take significantly longer than the diff import
does, when there are a lot of files in the tree.
It would be better if exporting also update the content identifier tree.
However, I don't know if that can be done inexpensively. It would be future
optimisation work, in any case.
(That could only be done for an export that is run in the same
repository as the import. When an export is run in a different repository,
the export.log gets updated, and that propagates to the repository where
import is later run. At that point, a full import is done.)
Sponsored-by: Luke T. Shumaker
diff --git a/Annex/Import.hs b/Annex/Import.hs index b1ace3468e..9ba4caf1b1 100644 --- a/Annex/Import.hs +++ b/Annex/Import.hs @@ -21,6 +21,7 @@ module Annex.Import ( importKeys, makeImportMatcher, getImportableContents, + PostExportLogUpdate, ) where import Annex.Common @@ -74,6 +75,8 @@ import qualified Data.ByteArray.Encoding as BA #ifdef mingw32_HOST_OS import qualified System.FilePath.Posix as Posix #endif +import qualified Data.Semigroup as Sem +import Prelude {- Configures how to build an import tree. -} data ImportTreeConfig @@ -112,8 +115,9 @@ buildImportCommit -> ImportCommitConfig -> AddUnlockedMatcher -> Imported + -> PostExportLogUpdate -> Annex (Maybe Ref) -buildImportCommit remote importtreeconfig importcommitconfig addunlockedmatcher imported = +buildImportCommit remote importtreeconfig importcommitconfig addunlockedmatcher imported postexportlogupdate = case importCommitTracking importcommitconfig of Nothing -> go Nothing Just trackingcommit -> inRepo (Git.Ref.tree trackingcommit) >>= \case @@ -121,12 +125,14 @@ buildImportCommit remote importtreeconfig importcommitconfig addunlockedmatcher Just _ -> go (Just trackingcommit) where go trackingcommit = do - (importedtree, updatestate) <- recordImportTree remote importtreeconfig (Just addunlockedmatcher) imported + (importedtree, updatestate) <- recordImportTree remote importtreeconfig (Just addunlockedmatcher) imported postexportlogupdate buildImportCommit' remote importcommitconfig trackingcommit importedtree >>= \case Just finalcommit -> do updatestate return (Just finalcommit) - Nothing -> return Nothing + Nothing -> do + postExportLogUpdate postexportlogupdate + return Nothing {- Builds a tree for an import from a special remote. - @@ -138,8 +144,9 @@ recordImportTree -> ImportTreeConfig -> Maybe AddUnlockedMatcher -> Imported + -> PostExportLogUpdate -> Annex (History Sha, Annex ()) -recordImportTree remote importtreeconfig addunlockedmatcher imported = do +recordImportTree remote importtreeconfig addunlockedmatcher imported postexportlogupdate = do importedtree@(History finaltree _) <- buildImportTrees basetree subdir addunlockedmatcher imported return (importedtree, updatestate finaltree) where @@ -180,6 +187,7 @@ recordImportTree remote importtreeconfig addunlockedmatcher imported = do { oldTreeish = exportedTreeishes oldexport , newTreeish = importedtree } + postExportLogUpdate postexportlogupdate return oldexport -- importKeys takes care of updating the location log @@ -498,11 +506,26 @@ canImportKeys remote importcontent = where ia = Remote.importActions remote --- Result of an import. ImportUnfinished indicates that some file failed to --- be imported. Running again should resume where it left off. +-- Result of an import. data ImportResult t - = ImportFinished t + = ImportFinished PostExportLogUpdate t | ImportUnfinished + -- ^ ImportUnfinished indicates that some file failed to + -- be imported. Running again should resume where it left off. + +-- An action to run after the export log has been updated to reflect an +-- import. +newtype PostExportLogUpdate = PostExportLogUpdate (Annex ()) + +instance Sem.Semigroup PostExportLogUpdate where + PostExportLogUpdate a <> PostExportLogUpdate b = + PostExportLogUpdate (a >> b) + +noPostExportLogUpdate :: PostExportLogUpdate +noPostExportLogUpdate = PostExportLogUpdate (return ()) + +postExportLogUpdate :: PostExportLogUpdate -> Annex () +postExportLogUpdate (PostExportLogUpdate a) = a data Diffed t = DiffChanged t @@ -546,7 +569,10 @@ importChanges remote importtreeconfig importcontent thirdpartypopulated importab Nothing -> fullimport currcidtree Just lastimportedtree -> diffimport cidtreemap prevcidtree currcidtree lastimportedtree where - remember = recordContentIdentifierTree (Remote.uuid remote) + -- Record the content identifier tree after the export log is + -- updated for the import. + remember = PostExportLogUpdate . + recordContentIdentifierTree (Remote.uuid remote) -- In order to use a diff, the previous ContentIdentifier tree must -- not have been garbage collected. Which can happen since there @@ -567,11 +593,11 @@ importChanges remote importtreeconfig importcontent thirdpartypopulated importab ) fullimport currcidtree = - importKeys remote importtreeconfig importcontent thirdpartypopulated importablecontents >>= \case - ImportUnfinished -> return ImportUnfinished - ImportFinished r -> do - remember currcidtree - return $ ImportFinished $ ImportedFull r + importKeys remote importtreeconfig importcontent thirdpartypopulated importablecontents >>= return . \case + ImportUnfinished -> ImportUnfinished + ImportFinished a r -> + ImportFinished (a <> remember currcidtree) $ + ImportedFull r diffimport cidtreemap prevcidtree currcidtree lastimportedtree = do (diff, cleanup) <- inRepo $ Git.DiffTree.diffTreeRecursive @@ -589,17 +615,15 @@ importChanges remote importtreeconfig importcontent thirdpartypopulated importab ImportUnfinished -> do void $ liftIO cleanup return ImportUnfinished - ImportFinished (ImportableContentsComplete ic') -> - liftIO cleanup >>= \case - False -> return ImportUnfinished - True -> do - remember currcidtree - return $ ImportFinished $ - ImportedDiff lastimportedtree - (mkdiff ic' removed) + ImportFinished a (ImportableContentsComplete ic') -> + liftIO cleanup >>= return . \case + False -> ImportUnfinished + True -> ImportFinished (a <> remember currcidtree) $ + ImportedDiff lastimportedtree + (mkdiff ic' removed) -- importKeys is not passed ImportableContentsChunked -- above, so it cannot return it - ImportFinished (ImportableContentsChunked {}) -> error "internal" + ImportFinished _ (ImportableContentsChunked {}) -> error "internal" isremoval ti = Git.DiffTree.dstsha ti `elem` nullShas @@ -685,12 +709,12 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec ImportableContentsComplete ic -> go False largematcher cidmap importing db ic >>= return . \case Nothing -> ImportUnfinished - Just v -> ImportFinished $ ImportableContentsComplete v + Just v -> ImportFinished noPostExportLogUpdate $ ImportableContentsComplete v ImportableContentsChunked {} -> do c <- gochunked db (importableContentsChunk importablecontents) gohistory largematcher cidmap importing db (importableHistoryComplete importablecontents) >>= return . \case Nothing -> ImportUnfinished - Just h -> ImportFinished $ ImportableContentsChunked + Just h -> ImportFinished noPostExportLogUpdate $ ImportableContentsChunked { importableContentsChunk = c , importableHistoryComplete = h } diff --git a/CHANGELOG b/CHANGELOG index 821a554794..b6541fcb1a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -23,6 +23,10 @@ git-annex (10.20250829) UNRELEASED; urgency=medium existing remote. * Fix hang that could occur when using git-annex adjust on a branch with a number of files greater than annex.queuesize. + * Fix bug that made changes to a special remote sometimes be missed when + importing a tree from it. After upgrading, any such missed changes + will be included in the next tree imported from a special remote. + Fixes reversion introduced in version 10.20230626. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Command/Import.hs b/Command/Import.hs index 7375b807df..ba4efdeb8c 100644 --- a/Command/Import.hs +++ b/Command/Import.hs @@ -349,9 +349,9 @@ seekRemote remote branch msubdir importcontent ci addunlockedmatcher importmessa , Remote.name remote , ". Re-run command to resume import." ] - ImportFinished imported -> void $ - includeCommandAction $ - commitimport imported + ImportFinished postexportlogupdate imported -> + void $ includeCommandAction $ + commitimport imported postexportlogupdate where importmessages' | null importmessages = ["import from " ++ Remote.name remote] @@ -383,10 +383,10 @@ listContents' remote importtreeconfig ci a = (Diff truncated)
diff --git a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn index fe4d0b1c6f..6be9060190 100644 --- a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn +++ b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn @@ -1,6 +1,6 @@ ### Please describe the problem. -Continuation to https://git-annex.branchable.com/bugs/windows_FTBFS__44___advise_needed/ which was marked fixed but now FTBFS with +Continuation to [bugs/windows_FTBFS__44___advise_needed](https://git-annex.branchable.com/bugs/windows_FTBFS__44___advise_needed/) which was marked fixed but now FTBFS with ``` [38 of 40] Compiling Utility.CopyFile ( Utility\CopyFile.hs, Utility\CopyFile.o )
Windows still FTBFS
diff --git a/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn new file mode 100644 index 0000000000..fe4d0b1c6f --- /dev/null +++ b/doc/bugs/still_FTBFS_on_Windows__58___more_advice_needed.mdwn @@ -0,0 +1,60 @@ +### Please describe the problem. + +Continuation to https://git-annex.branchable.com/bugs/windows_FTBFS__44___advise_needed/ which was marked fixed but now FTBFS with + +``` +[38 of 40] Compiling Utility.CopyFile ( Utility\CopyFile.hs, Utility\CopyFile.o ) +Utility\CopyFile.hs:9:1: warning: [GHC-94817] [-Wtabs] +Warning: Tab character found here, and in 84 further locations. + Suggested fix: Please use spaces instead. + | +9 | copyFileExternal, + + | ^^^^^^^^ + +[39 of 40] Compiling Main ( Build\NullSoftInstaller.hs, Build\NullSoftInstaller.o ) +Build\NullSoftInstaller.hs:64:34: error: [GHC-83865] +Error: • Couldn't match type ‘bytestring-0.12.2.0:Data.ByteString.Internal.Type.ByteString’ + with ‘[Char]’ + Expected: FilePath + Actual: System.FilePath.Windows.ByteString.RawFilePath + • In the second argument of ‘makeInstaller’, namely ‘gitannexcmd’ + In the second argument of ‘($)’, namely + ‘makeInstaller + gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') + magicShare' [webappscript, autostartscript]’ + In a stmt of a 'do' block: + F.writeFileString (toOsPath nsifile) + $ makeInstaller + gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') + magicShare' [webappscript, autostartscript] + | +64 | gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') magicShare' + + | ^^^^^^^^^^^ + +Build\NullSoftInstaller.hs:64:54: error: [GHC-83865] +Error: • Couldn't match type ‘bytestring-0.12.2.0:Data.ByteString.Internal.Type.ByteString’ + with ‘[Char]’ + Expected: FilePath + Actual: System.FilePath.Windows.ByteString.RawFilePath + • In the fourth argument of ‘makeInstaller’, namely ‘htmlhelp’ + In the second argument of ‘($)’, namely + ‘makeInstaller + gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') + magicShare' [webappscript, autostartscript]’ + In a stmt of a 'do' block: + F.writeFileString (toOsPath nsifile) + $ makeInstaller + gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') + magicShare' [webappscript, autostartscript] + | +64 | gitannex gitannexcmd license htmlhelp (winPrograms ++ magicDLLs') magicShare' + + | ^^^^^^^^ + +Error: Process completed with exit code 1. +``` + +e.g. [here](https://github.com/datalad/git-annex/actions/runs/17903960530/job/50901925018) +
thought
diff --git a/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_3_8f6e2216765e610683a2684165c84201._comment b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_3_8f6e2216765e610683a2684165c84201._comment new file mode 100644 index 0000000000..c123fffa28 --- /dev/null +++ b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_3_8f6e2216765e610683a2684165c84201._comment @@ -0,0 +1,18 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-09-23T03:52:45Z" + content=""" +Another way to think about this problem is that git-annex export does not +update the remote tracking branch. If it did, there would be no need for +git-annex import to learn about changes made by export, and its current +diffing behavior would be ok. + +However, since one repository can export to a special remote, and a +different repository import from the same special remote, updating the +tracking branch would need to happen later, based on information the export +records in the git-annex branch. + +Which is something that git-annex import could do. But it might not be any +more efficient to do that than a non-diff based update. +"""]]
worse
diff --git a/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_2_9b3b7d6af9add0e8d6cb3e59b2768ff1._comment b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_2_9b3b7d6af9add0e8d6cb3e59b2768ff1._comment new file mode 100644 index 0000000000..b9304aaff6 --- /dev/null +++ b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates/comment_2_9b3b7d6af9add0e8d6cb3e59b2768ff1._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-09-23T03:41:09Z" + content=""" +In just the right circumstance, this can also prevent import from +adding files. If an export deleted a file, and then it was written back to +the special remote, in a way that caused it to have an identical content +identifier, then import would see no change when diffing, and so would not +add the file back. + +With the directory special remote, the way to trigger this is: + + git rm foo + git commit -m rm + mv ../directory/foo .. + git-annex export master --to remote + mv ../foo ../directory/foo + git-annex import master --from remote + git merge remote/master + +The `mv` is needed to preserve the inode, which is used in the content +identifier. + +With other types of special remotes that have less good content +identifiers, it might suffice for the same content to be written to the +special remote. +"""]]
promote forum post to bug, analysis
diff --git a/Annex/Import.hs b/Annex/Import.hs index b1ace3468e..06c6abab92 100644 --- a/Annex/Import.hs +++ b/Annex/Import.hs @@ -294,6 +294,7 @@ buildImportTrees buildImportTrees basetree msubdir addunlockedmatcher (ImportedFull imported) = buildImportTreesGeneric (convertImportTree addunlockedmatcher) basetree msubdir imported buildImportTrees basetree msubdir addunlockedmatcher (ImportedDiff (LastImportedTree oldtree) imported) = do + liftIO $ print $ importableContents imported importtree <- if null (importableContents imported) then pure oldtree else applydiff @@ -308,6 +309,7 @@ buildImportTrees basetree msubdir addunlockedmatcher (ImportedDiff (LastImported (importableContents imported) newtreeitems <- catMaybes <$> mapM mktreeitem new let removedfiles = map (mkloc . fst) removed + liftIO $ print ("removed", removedfiles) inRepo $ adjustTree (pure . Just) -- ^ keep files that are not added/removed the same @@ -507,7 +509,7 @@ data ImportResult t data Diffed t = DiffChanged t | DiffRemoved - deriving (Eq) + deriving (Eq, Show) data Imported = ImportedFull (ImportableContentsChunkable Annex (Either Sha Key)) @@ -577,7 +579,9 @@ importChanges remote importtreeconfig importcontent thirdpartypopulated importab (diff, cleanup) <- inRepo $ Git.DiffTree.diffTreeRecursive prevcidtree currcidtree + liftIO $ print (diff, prevcidtree, currcidtree) let (removed, changed) = partition isremoval diff + liftIO $ print (removed, changed) let mkicchanged ti = do v <- M.lookup (Git.DiffTree.dstsha ti) cidtreemap return (mkloc ti, v) diff --git a/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates.mdwn b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates.mdwn new file mode 100644 index 0000000000..508acc612e --- /dev/null +++ b/doc/bugs/annex_import_doesn__39__t_delete_files_during_updates.mdwn @@ -0,0 +1,477 @@ +## git annex import does not delete files that have not been imported before, even if they were exported + +It looks like an import after each export is required in order to keep proper track of files, at least in the case of special remote `type=directory`. + +A full reproducer is below, but the abbreviated version is the following: + +1. Create a file, add it to the annex +2. Export the file to the directory special remote +3. Remove the file from the directory using `rm` +4. Import the changes from the directory +5. The deletion of the file is never detected and the file stays hanging locally indefinitely + +### Full reproducer + +First, we initialize an empty git-annex repo and a directory that will serve as the special remote: + +``` +$ mkdir git-annex +$ cd git-annex/ +$ mkdir repository directory +$ ls +directory repository +$ cd repository^C +$ cd repository/ +$ ls +$ git init +hint: Using 'master' as the name for the initial branch. This default branch name +hint: is subject to change. To configure the initial branch name to use in all +hint: of your new repositories, which will suppress this warning, call: +hint: +hint: git config --global init.defaultBranch <name> +hint: +hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and +hint: 'development'. The just-created branch can be renamed via this command: +hint: +hint: git branch -m <name> +hint: +hint: Disable this message with "git config set advice.defaultBranchName false" +Initialized empty Git repository in /tmp/git-annex/repository/.git/ +$ git annex init +init ok +(recording state in git...) +``` + +We add initial content. Note that removing a file that was added in the initial export is correctly detected on subsequent import: + +``` +$ echo one > one; sleep 1; echo two > two +$ git annex add * +add one +ok +add two +ok +(recording state in git...) +$ hh +git commit -m "Initial content" +$ git commit -m "Initial content" +[master (root-commit) e3cfdea] Initial content + 2 files changed, 2 insertions(+) + create mode 120000 one + create mode 120000 two +$ git annex initremote homeserver type=directory directory=/tmp/git-annex/directory exporttree=yes import +tree=yes encryption=none +initremote homeserver ok +(recording state in git...) +$ git annex export master --to homeserver +export homeserver one ok +export homeserver two ok +(recording state in git...) +$ rm ../directory/two +$ git annex -d import master --from homeserver -m "Deleted two, this works" +[2025-08-25 17:42:17.549242346] (Utility.Process) process [1346542] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","git-annex"] +[2025-08-25 17:42:17.551512992] (Utility.Process) process [1346542] done ExitSuccess +[2025-08-25 17:42:17.551977656] (Utility.Process) process [1346543] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 17:42:17.554079617] (Utility.Process) process [1346543] done ExitSuccess +[2025-08-25 17:42:17.554859544] (Utility.Process) process [1346544] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +log","refs/heads/git-annex..673de1e45f8751c3ac0066b4c827e3a046051c4f","--pretty=%H","-n1"] +[2025-08-25 17:42:17.557950381] (Utility.Process) process [1346544] done ExitSuccess +[2025-08-25 17:42:17.559947735] (Utility.Process) process [1346545] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch"] +[2025-08-25 17:42:17.563916375] (Utility.Process) process [1346546] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/remotes/homeserver/master"] +[2025-08-25 17:42:17.566239833] (Utility.Process) process [1346546] done ExitSuccess +list homeserver ok +[2025-08-25 17:42:17.57127565] (Utility.Process) process [1346548] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","m +ktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.574520441] (Utility.Process) process [1346548] done ExitSuccess +[2025-08-25 17:42:17.577872901] (Utility.Process) process [1346549] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +rev-parse","--verify","--quiet","refs/heads/git-annex:"] +[2025-08-25 17:42:17.580406091] (Utility.Process) process [1346549] done ExitSuccess +[2025-08-25 17:42:17.581051093] (Utility.Process) process [1346550] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","9ab60e3cc17c42b13c16a28ae4c59f3716502756","fcf471e16d0105550406a29e13f9d385447cbd31","--"] +[2025-08-25 17:42:17.584953927] (Utility.Process) process [1346550] done ExitSuccess +[2025-08-25 17:42:17.585114599] (Database.Handle) commitDb start +[2025-08-25 17:42:17.585975051] (Database.Handle) commitDb done +update refs/remotes/homeserver/master [2025-08-25 17:42:17.588322083] (Utility.Process) process [1346551] read: git ["--git-dir=.git","--work-tree=.","--litera +l-pathspecs","-c","annex.debug=true","rev-parse","--verify","--quiet","e3cfdead06d04f4df78a01a309a1831af4961858:"] +[2025-08-25 17:42:17.590642005] (Utility.Process) process [1346551] done ExitSuccess +[2025-08-25 17:42:17.59120988] (Utility.Process) process [1346552] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","m +ktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.591563939] (Messages.explain) [ one does not match annex.addunlocked: nothing[FALSE] ] + +[2025-08-25 17:42:17.592509593] (Utility.Process) process [1346553] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +hash-object","-w","--no-filters","--stdin-paths"] +[2025-08-25 17:42:17.59575411] (Utility.Process) process [1346552] done ExitSuccess +[2025-08-25 17:42:17.596285972] (Utility.Process) process [1346554] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +log","e3cfdead06d04f4df78a01a309a1831af4961858","--full-history","--no-abbrev","--format=%T %H %P"] +[2025-08-25 17:42:17.598755905] (Utility.Process) process [1346554] done ExitSuccess +[2025-08-25 17:42:17.600719334] (Utility.Process) process [1346555] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--no-gpg-sign","-m","Deleted two, this works"] +[2025-08-25 17:42:17.603747862] (Utility.Process) process [1346555] done ExitSuccess +[2025-08-25 17:42:17.604331351] (Utility.Process) process [1346556] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--no-gpg-sign","-p","e3cfdead06d04f4df78a01a309a1831af4961858","-p","9344862d2a1be0162dfc63c2dcc5c1dd7 +c7406a3","-m","remote tracking branch"] +[2025-08-25 17:42:17.607310455] (Utility.Process) process [1346556] done ExitSuccess +[2025-08-25 17:42:17.609644622] (Utility.Process) process [1346557] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","6987476211346060b533f824472f34bd92602ccd","a2c2f21679a7c0864f7ca486c5d39998abb +1f33f","--"] +[2025-08-25 17:42:17.612718435] (Utility.Process) process [1346558] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"] +[2025-08-25 17:42:17.615267805] (Utility.Process) process [1346559] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +cat-file","--batch"] +[2025-08-25 17:42:17.617452533] (Utility.Process) process [1346557] done ExitSuccess +[2025-08-25 17:42:17.6175722] (Database.Handle) commitDb start +[2025-08-25 17:42:17.618291237] (Database.Handle) commitDb done +[2025-08-25 17:42:17.619671807] (Utility.Process) process [1346560] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +show-ref","--hash","refs/heads/git-annex"] +[2025-08-25 17:42:17.622156138] (Utility.Process) process [1346560] done ExitSuccess +[2025-08-25 17:42:17.622664547] (Utility.Process) process [1346561] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +rev-parse","--verify","--quiet","673de1e45f8751c3ac0066b4c827e3a046051c4f:"] +[2025-08-25 17:42:17.624898473] (Utility.Process) process [1346561] done ExitSuccess +[2025-08-25 17:42:17.625451325] (Utility.Process) process [1346562] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +mktree","--missing","--batch","-z"] +[2025-08-25 17:42:17.625953724] (Utility.Process) process [1346563] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +ls-tree","--full-tree","-z","-t","--","fcf471e16d0105550406a29e13f9d385447cbd31"] +[2025-08-25 17:42:17.628045705] (Utility.Process) process [1346563] done ExitSuccess +[2025-08-25 17:42:17.629385044] (Utility.Process) process [1346562] done ExitSuccess +[2025-08-25 17:42:17.629801349] (Utility.Process) process [1346564] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","becfcef63bb6e2521c4a66e41bdd1f867eda5d62","--no-gpg-sign","-p","673de1e45f8751c3ac0066b4c827e3a046051c4f","-m","graft"] +[2025-08-25 17:42:17.632644385] (Utility.Process) process [1346564] done ExitSuccess +[2025-08-25 17:42:17.633131506] (Utility.Process) process [1346565] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +commit-tree","fcf471e16d0105550406a29e13f9d385447cbd31","--no-gpg-sign","-p","d8a2fbef419e45b925d61363326a062bd31911ae","-m","graft cleanup"] +[2025-08-25 17:42:17.635853854] (Utility.Process) process [1346565] done ExitSuccess +[2025-08-25 17:42:17.63629057] (Utility.Process) process [1346566] call: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","u +pdate-ref","refs/heads/git-annex","7cf4ac7abbfd7e4e8be1a9af024ffafd230f7d71"] +[2025-08-25 17:42:17.63888163] (Utility.Process) process [1346566] done ExitSuccess +[2025-08-25 17:42:17.639467114] (Annex.Branch) read export.log +[2025-08-25 17:42:17.640240661] (Annex.Branch) set export.log +[2025-08-25 17:42:17.640651989] (Utility.Process) process [1346567] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true"," +diff-tree","-z","--raw","--no-renames","-l0","-r","6987476211346060b533f824472f34bd92602ccd","a2c2f21679a7c0864f7ca486c5d39998abb1f33f","--"] +[2025-08-25 17:42:17.643916998] (Annex.Branch) read 5a2/b05/SHA256E-s4--27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a.log +[2025-08-25 17:42:17.644707405] (Annex.Branch) set 5a2/b05/SHA256E-s4--27dd8ed44a83ff94d557f9fd0412ed5a8cbca69ea04922d88c01184a07300a5a.log +[2025-08-25 17:42:17.644810572] (Utility.Process) process [1346567] done ExitSuccess (Diff truncated)
prevent deadlock when reconcileStaged runs restagePointerFiles
Fix hang that could occur when using git-annex adjust on a branch with a
number of files greater than annex.queuesize. Or potentially other
commands.
When reconcileStaged is running, the database is being opened. But
restagePointerFiles closes the database, and later writes to it. So it will
deadlock if called by reconcileStaged.
The deadlock occurred when the git queue happened to be full, causing
adding a call to restagePointerFiles to it to flush the queue and
restagePointerFiles to run at the wrong time.
Fixed by making reconcileStaged, when it populates or depopulates a pointer
file, arrange for restagePointerFiles to be run as a cleanup action, rather
than from the git queue.
But, what if restagePointerFiles is already in the git queue before
reconcileStaged is run? If it adds anything else to the git queue, causing
the queue to flush, it would still deadlock. To avoid this hypothetical
situation, added a Annex.inreconcilestaged, and made restagePointerFiles
check it and not do anything.
Note that, I did consider the simpler approach of only running
restagePointerFiles as a cleanup action, rather than from the git queue.
But see commit 6a3bd283b8af53f810982e002e435c0d7c040c59 for why it was made
to use the queue in the first place. I wanted to avoid tying this bug fix
to a behavior change.
Sponsored-by: mycroft
Fix hang that could occur when using git-annex adjust on a branch with a
number of files greater than annex.queuesize. Or potentially other
commands.
When reconcileStaged is running, the database is being opened. But
restagePointerFiles closes the database, and later writes to it. So it will
deadlock if called by reconcileStaged.
The deadlock occurred when the git queue happened to be full, causing
adding a call to restagePointerFiles to it to flush the queue and
restagePointerFiles to run at the wrong time.
Fixed by making reconcileStaged, when it populates or depopulates a pointer
file, arrange for restagePointerFiles to be run as a cleanup action, rather
than from the git queue.
But, what if restagePointerFiles is already in the git queue before
reconcileStaged is run? If it adds anything else to the git queue, causing
the queue to flush, it would still deadlock. To avoid this hypothetical
situation, added a Annex.inreconcilestaged, and made restagePointerFiles
check it and not do anything.
Note that, I did consider the simpler approach of only running
restagePointerFiles as a cleanup action, rather than from the git queue.
But see commit 6a3bd283b8af53f810982e002e435c0d7c040c59 for why it was made
to use the queue in the first place. I wanted to avoid tying this bug fix
to a behavior change.
Sponsored-by: mycroft
diff --git a/Annex.hs b/Annex.hs index aba23587fb..421d152bf6 100644 --- a/Annex.hs +++ b/Annex.hs @@ -226,6 +226,7 @@ data AnnexState = AnnexState , cachedgitenv :: Maybe (AltIndexFile, OsPath, [(String, String)]) , urloptions :: Maybe UrlOptions , insmudgecleanfilter :: Bool + , inreconcilestaged :: Bool , getvectorclock :: IO CandidateVectorClock , proxyremote :: Maybe (Either ClusterUUID (Types.Remote.RemoteA Annex)) , reposizehandle :: Maybe RepoSizeHandle @@ -283,6 +284,7 @@ newAnnexState c r = do , cachedgitenv = Nothing , urloptions = Nothing , insmudgecleanfilter = False + , inreconcilestaged = False , getvectorclock = vc , proxyremote = Nothing , reposizehandle = Nothing diff --git a/Annex/Content.hs b/Annex/Content.hs index 638390b2bf..9162c34983 100644 --- a/Annex/Content.hs +++ b/Annex/Content.hs @@ -542,7 +542,7 @@ moveAnnex key src = ifM (checkSecureHashes' key) unless (null fs) $ do destic <- withTSDelta $ liftIO . genInodeCache dest - ics <- mapM (populatePointerFile (Restage True) key dest) fs + ics <- mapM (populatePointerFile QueueRestage key dest) fs Database.Keys.addInodeCaches key (catMaybes (destic:ics)) ) @@ -784,7 +784,7 @@ removeAnnex (ContentRemovalLock key) = withObjectLoc key $ \file -> resetpointer file = unlessM (liftIO $ isSymbolicLink <$> R.getSymbolicLinkStatus (fromOsPath file)) $ ifM (isUnmodified key file) ( adjustedBranchRefresh $ - depopulatePointerFile key file + depopulatePointerFile QueueRestage key file -- Modified file, so leave it alone. -- If it was a hard link to the annex object, -- that object might have been frozen as part of the diff --git a/Annex/Content/PointerFile.hs b/Annex/Content/PointerFile.hs index 22657a11c8..4054296c3b 100644 --- a/Annex/Content/PointerFile.hs +++ b/Annex/Content/PointerFile.hs @@ -52,8 +52,8 @@ populatePointerFile restage k obj f = go =<< liftIO (isPointerFile f) {- Removes the content from a pointer file, replacing it with a pointer. - - Does not check if the pointer file is modified. -} -depopulatePointerFile :: Key -> OsPath -> Annex () -depopulatePointerFile key file = do +depopulatePointerFile :: Restage -> Key -> OsPath -> Annex () +depopulatePointerFile restage key file = do st <- liftIO $ catchMaybeIO $ R.getFileStatus (fromOsPath file) let mode = fmap fileMode st secureErase file @@ -68,4 +68,4 @@ depopulatePointerFile key file = do (fmap Posix.modificationTimeHiRes st) #endif withTSDelta (liftIO . genInodeCache tmp) - maybe noop (restagePointerFile (Restage True) file) ic + maybe noop (restagePointerFile restage file) ic diff --git a/Annex/Ingest.hs b/Annex/Ingest.hs index 07b5dad282..84e6cadddf 100644 --- a/Annex/Ingest.hs +++ b/Annex/Ingest.hs @@ -172,7 +172,7 @@ ingestAdd' meterupdate ld@(Just (LockedDown cfg source)) mk = do {- Ingests a locked down file into the annex. Does not update the working - tree or the index. -} ingest :: MeterUpdate -> Maybe LockedDown -> Maybe Key -> Annex (Maybe Key, Maybe InodeCache) -ingest meterupdate ld mk = ingest' Nothing meterupdate ld mk (Restage True) +ingest meterupdate ld mk = ingest' Nothing meterupdate ld mk QueueRestage ingest' :: Maybe Backend -> MeterUpdate -> Maybe LockedDown -> Maybe Key -> Restage -> Annex (Maybe Key, Maybe InodeCache) ingest' _ _ Nothing _ _ = return (Nothing, Nothing) @@ -228,7 +228,7 @@ ingest' preferredbackend meterupdate (Just (LockedDown cfg source)) mk restage = finishIngestUnlocked :: Key -> KeySource -> Annex () finishIngestUnlocked key source = do cleanCruft source - finishIngestUnlocked' key source (Restage True) Nothing + finishIngestUnlocked' key source QueueRestage Nothing finishIngestUnlocked' :: Key -> KeySource -> Restage -> Maybe LinkAnnexResult -> Annex () finishIngestUnlocked' key source restage lar = do diff --git a/Annex/Link.hs b/Annex/Link.hs index 480a00ce25..b3c962a772 100644 --- a/Annex/Link.hs +++ b/Annex/Link.hs @@ -7,7 +7,7 @@ - - Pointer files are used instead of symlinks for unlocked files. - - - Copyright 2013-2022 Joey Hess <id@joeyh.name> + - Copyright 2013-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} @@ -32,6 +32,7 @@ import Git.Config import Annex.HashObject import Annex.InodeSentinal import Annex.PidLock +import Types.CleanupActions import Utility.FileMode import Utility.InodeCache import Utility.Tmp.Dir @@ -160,7 +161,10 @@ writePointerFile file k mode = do F.writeFile' file (formatPointer k) maybe noop (R.setFileMode (fromOsPath file)) mode -newtype Restage = Restage Bool +data Restage + = NoRestage + | QueueRestage + | LaterRestage {- Restage pointer file. This is used after updating a worktree file - when content is added/removed, to prevent git status from showing @@ -184,26 +188,27 @@ newtype Restage = Restage Bool - and will store it in the restage log. Displays a message to help the - user understand why the file will appear to be modified. - - - This uses the git queue, so the update is not performed immediately, - - and this can be run multiple times cheaply. Using the git queue also - - prevents building up too large a number of updates when many files - - are being processed. It's also recorded in the restage log so that, - - if the process is interrupted before the git queue is fulushed, the - - restage will be taken care of later. + - The update is not performed immediately, so and this can be run multiple + - times cheaply. It's also recorded in the restage log so that, if the + - process is interrupted before the git queue is fulushed, the restage + - will be taken care of later. -} restagePointerFile :: Restage -> OsPath -> InodeCache -> Annex () -restagePointerFile (Restage False) f orig = do +restagePointerFile NoRestage f orig = do flip writeRestageLog orig =<< inRepo (toTopFilePath f) toplevelWarning True $ unableToRestage $ Just f -restagePointerFile (Restage True) f orig = do +{- Using the git queue prevents building up too large a number of updates + - when many files are being processed. -} +restagePointerFile QueueRestage f orig = do flip writeRestageLog orig =<< inRepo (toTopFilePath f) - -- Avoid refreshing the index if run by the - -- smudge clean filter, because git uses that when - -- it's already refreshing the index, probably because - -- this very action is running. Running it again would likely - -- deadlock. unlessM (Annex.getState Annex.insmudgecleanfilter) $ Annex.Queue.addFlushAction restagePointerFileRunner [f] +{- Defer the restage until the end. -} +restagePointerFile LaterRestage f orig = do + flip writeRestageLog orig =<< inRepo (toTopFilePath f) + unlessM (Annex.getState Annex.insmudgecleanfilter) $ + Annex.addCleanupAction RestagePointerFiles $ + restagePointerFiles =<< Annex.gitRepo restagePointerFileRunner :: Git.Queue.FlushActionRunner Annex restagePointerFileRunner = @@ -219,7 +224,7 @@ restagePointerFileRunner = -- to bypass the lock. Then replace the old index file with the new -- updated index file. restagePointerFiles :: Git.Repo -> Annex () -restagePointerFiles r = unlessM (Annex.getState Annex.insmudgecleanfilter) $ do +restagePointerFiles r = checkcanrun $ do -- Flush any queued changes to the keys database, so they -- are visible to child processes. -- The database is closed because that may improve behavior @@ -330,6 +335,9 @@ restagePointerFiles r = unlessM (Annex.getState Annex.insmudgecleanfilter) $ do ck = ConfigKey "filter.annex.process" ckd = ConfigKey "filter.annex.process-temp-disabled" + checkcanrun a = unlessM (Annex.getState Annex.insmudgecleanfilter) $ + unlessM (Annex.getState Annex.inreconcilestaged) $ a + unableToRestage :: Maybe OsPath -> StringContainingQuotedPath unableToRestage mf = "git status will show " <> maybe "some files" QuotedPath mf diff --git a/CHANGELOG b/CHANGELOG index 530a560e34..821a554794 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -21,6 +21,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. * enableremote: Disallow using type= to attempt to change the type of an existing remote. + * Fix hang that could occur when using git-annex adjust on a branch with + a number of files greater than annex.queuesize. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Command/PreCommit.hs b/Command/PreCommit.hs index a58bfc6a70..8404cd6665 100644 --- a/Command/PreCommit.hs +++ b/Command/PreCommit.hs @@ -42,7 +42,7 @@ seek ps = do =<< isAnnexLink f -- after a merge conflict or git cherry-pick or stash, pointer -- files in the worktree won't be populated, so populate them here - Command.Smudge.updateSmudged (Restage False) (Diff truncated)
update
diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment index 954b8a76c2..33b3671fe9 100644 --- a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment @@ -5,4 +5,21 @@ content=""" I set annex.queuesize to 100, made 1000 files, and was able to reproduce the hang. + +A `git-annex smudge --update` process has open the `.git/annex/gitqueue.lck` file. +It is the only process with that lock file open. So it is in the process of trying to +flush the queued changes to the index that it is locking up. + +I removed .git/hooks/post-checkout, and then after the `git-annex adjust`, +manually running `git-annex smudge --update` causes the same hang. + +Debugging the git queue flush, it is hanging while running a FlushAction, specifically +restagePointerFileRunner. + +And the hang occurs when restagePointerFiles calls +Database.Keys.Handle.closeDbHandle. + +I suspect this bug may not be specific to `git-annex smudge --update` at all. It may be +that any time the git queue gets flushed with restagePointerFileRunner in the queue it +hangs like this. This needs further investigation. """]]
comment
diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment new file mode 100644 index 0000000000..954b8a76c2 --- /dev/null +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_2_f04aebe3f9eaa4cb0044c9d305d9f649._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-09-22T15:18:32Z" + content=""" +I set annex.queuesize to 100, made 1000 files, and was able to reproduce +the hang. +"""]]
comment
diff --git a/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_1_bce639fd90df8d8a3160a3425e4cb927._comment b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_1_bce639fd90df8d8a3160a3425e4cb927._comment new file mode 100644 index 0000000000..a51416c12e --- /dev/null +++ b/doc/bugs/Unlock_filter_seems_to_deadlock_for_huge_worktree/comment_1_bce639fd90df8d8a3160a3425e4cb927._comment @@ -0,0 +1,12 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-22T15:14:03Z" + content=""" +10240 is a very specific number, and is the *same* number that annex.queuesize +default to. + +So, that seems very likely to be the root of the problem, and at least +configuring annex.queuesize to something larger may work around the +problem. +"""]]
comment
diff --git a/doc/design/external_special_remote_protocol/comment_56_1d600b1ed8ba2c563b8bc753d224ea05._comment b/doc/design/external_special_remote_protocol/comment_56_1d600b1ed8ba2c563b8bc753d224ea05._comment new file mode 100644 index 0000000000..01a4ade4ee --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_56_1d600b1ed8ba2c563b8bc753d224ea05._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: support for bulk write/read/test remote""" + date="2025-09-22T15:02:49Z" + content=""" +@psxvoid that's fundamentally different than how git-annex works, so there +will need to be some kind of translation layer. And I think you could put +it in your special remote. + +For example, you could store both the archive file, as well as annex object +files that have not yet made it into the archive. So that when git-annex +sends a file to your remote, the file is actually stored in the remote, +rather than in a temporary location. Then you could periodically make a +new archive file from the loose objects. +"""]]
comment
diff --git a/doc/design/external_special_remote_protocol/comment_55_428b42728d231546284449e59e9214f6._comment b/doc/design/external_special_remote_protocol/comment_55_428b42728d231546284449e59e9214f6._comment new file mode 100644 index 0000000000..f29f611aea --- /dev/null +++ b/doc/design/external_special_remote_protocol/comment_55_428b42728d231546284449e59e9214f6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: Multi-line string in WHEREIS-SUCCESS?""" + date="2025-09-22T15:00:11Z" + content=""" +@matrss there is not currently a way to do that. If you need it, I suggest +you open a todo so we can design one. +"""]]
enableremote: Disallow using type= to attempt to change the type of an existing remote
Changing the type out from under an existing special remote exposes the
existing config to something that may interpret it wildly differently. As
seen in the bug report, this can even result in behavior that makes
git-annex say it's buggy. So prevent the user from doing this. --sameas is
the better way.
Sponsored-by: Kevin Mueller
Changing the type out from under an existing special remote exposes the
existing config to something that may interpret it wildly differently. As
seen in the bug report, this can even result in behavior that makes
git-annex say it's buggy. So prevent the user from doing this. --sameas is
the better way.
Sponsored-by: Kevin Mueller
diff --git a/CHANGELOG b/CHANGELOG index acb06c4485..530a560e34 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -19,6 +19,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium and a repository in the directory got removed. * Added annex.assistant.allowunlocked config. * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. + * enableremote: Disallow using type= to attempt to change the type of an + existing remote. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Command/EnableRemote.hs b/Command/EnableRemote.hs index 3aeae7147e..875231d458 100644 --- a/Command/EnableRemote.hs +++ b/Command/EnableRemote.hs @@ -59,8 +59,11 @@ start _ [] = unknownNameError "Specify the remote to enable." start o (name:rest) = go =<< filter matchingname <$> Annex.getGitRemotes where matchingname r = Git.remoteName r == Just name - go [] = deadLast name $ - startSpecialRemote o name (Logs.Remote.keyValToConfig Proposed rest) + go [] = deadLast name $ + let config = Logs.Remote.keyValToConfig Proposed rest + in case M.lookup SpecialRemote.typeField config of + Nothing -> startSpecialRemote o name config + Just _ -> giveup "Cannot change type= of existing special remote. Instead, use: git-annex initremote --sameas" go (r:_) | not (null rest) = go [] | otherwise = do diff --git a/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn index 21a7c72cb6..0fd5af8a1f 100644 --- a/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn +++ b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash.mdwn @@ -40,3 +40,6 @@ Using the standalone amd64 build on Debian 12. ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use git-annex for "everything". I have somewhere along the lines of 14TiB stored in various git-annex repositories, synced in various degrees to anywhere between 3 and 10 hosts, with repos dating back to 2012. It's awesome. + +> [[fixed|done]] the git-annex bug by providing a better error message when +> this is attempted. --[[Joey]]
comment
diff --git a/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash/comment_1_fdf7b80ed29703d6439b42b9b34067fe._comment b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash/comment_1_fdf7b80ed29703d6439b42b9b34067fe._comment new file mode 100644 index 0000000000..5dbcf88487 --- /dev/null +++ b/doc/bugs/enableremote_type__61__rclone_on_existing_remote_crash/comment_1_fdf7b80ed29703d6439b42b9b34067fe._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-22T14:25:21Z" + content=""" +A better way to do this is to make a new special remote, but tell git-annex +it's the same underlying storage as the old special remote. + + git-annex initremote --sameas=oldremotename newremotename type=rclone ... +"""]]
Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds
diff --git a/Build/Standalone.hs b/Build/Standalone.hs index 44e8171707..b98fcaaf66 100644 --- a/Build/Standalone.hs +++ b/Build/Standalone.hs @@ -221,6 +221,8 @@ installGitAnnex topdir = go (topdir </> literalOsPath "bin") error "cp failed" unlessM (boolSystem "strip" [File (fromOsPath (bindir </> literalOsPath "git-annex"))]) $ error "strip failed" + -- Note that when adding more commands here, wrapper + -- scripts also need to be added in standalone/ createSymbolicLink "git-annex" (fromOsPath (bindir </> literalOsPath "git-annex-shell")) createSymbolicLink "git-annex" (fromOsPath (bindir </> literalOsPath "git-remote-p2p-annex")) createSymbolicLink "git-annex" (fromOsPath (bindir </> literalOsPath "git-remote-tor-annex")) diff --git a/CHANGELOG b/CHANGELOG index fe989c47c5..acb06c4485 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -18,6 +18,7 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * p2phttp: Fix a hang that could occur when used with --directory, and a repository in the directory got removed. * Added annex.assistant.allowunlocked config. + * Add git-remote-p2p-annex and git-remote-tor-annex to standalone builds. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn b/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn index 8733ffee3c..7a500ce63f 100644 --- a/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn +++ b/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn @@ -25,4 +25,4 @@ Try to use git-remote-p2p-annex from the latest standalone build. ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) - +> [[fixed|done]] --[[Joey]] diff --git a/standalone/linux/skel/git-remote-p2p-annex b/standalone/linux/skel/git-remote-p2p-annex new file mode 100755 index 0000000000..8414df0d84 --- /dev/null +++ b/standalone/linux/skel/git-remote-p2p-annex @@ -0,0 +1,24 @@ +#!/bin/sh +link="$(readlink -f "$0" 2>/dev/null || readlink "$0")" || true +if [ -n "$link" ]; then + base="$(dirname "$link")" +else + base="$(dirname "$0")" +fi + +if [ ! -d "$base" ]; then + echo "** cannot find base directory (I seem to be $0)" >&2 + exit 1 +fi +if [ ! -e "$base/runshell" ]; then + echo "** cannot find $base/runshell" >&2 + exit 1 +fi + +# Get absolute path to base, to avoid breakage when things change directories. +orig="$(pwd)" +cd "$base" +base="$(pwd)" +cd "$orig" + +exec "$base/runshell" git-remote-p2p-annex "$@" diff --git a/standalone/linux/skel/git-remote-tor-annex b/standalone/linux/skel/git-remote-tor-annex new file mode 100755 index 0000000000..a9e722ea52 --- /dev/null +++ b/standalone/linux/skel/git-remote-tor-annex @@ -0,0 +1,24 @@ +#!/bin/sh +link="$(readlink -f "$0" 2>/dev/null || readlink "$0")" || true +if [ -n "$link" ]; then + base="$(dirname "$link")" +else + base="$(dirname "$0")" +fi + +if [ ! -d "$base" ]; then + echo "** cannot find base directory (I seem to be $0)" >&2 + exit 1 +fi +if [ ! -e "$base/runshell" ]; then + echo "** cannot find $base/runshell" >&2 + exit 1 +fi + +# Get absolute path to base, to avoid breakage when things change directories. +orig="$(pwd)" +cd "$base" +base="$(pwd)" +cd "$orig" + +exec "$base/runshell" git-remote-tor-annex "$@" diff --git a/standalone/osx/git-annex.app/Contents/MacOS/git-remote-p2p-annex b/standalone/osx/git-annex.app/Contents/MacOS/git-remote-p2p-annex new file mode 100755 index 0000000000..d498b4ba23 --- /dev/null +++ b/standalone/osx/git-annex.app/Contents/MacOS/git-remote-p2p-annex @@ -0,0 +1,31 @@ +#!/bin/sh +link="$(readlink "$0")" || true +if [ -n "$link" ]; then + base="$(dirname "$link")" +else + base="$(dirname "$0")" +fi + +if [ ! -d "$base" ]; then + echo "** cannot find base directory (I seem to be $0)" >&2 + exit 1 +fi +if [ ! -e "$base/runshell" ]; then + echo "** cannot find $base/runshell" >&2 + exit 1 +fi + +# Get absolute path to base, to avoid breakage when things change directories. +orig="$(pwd)" +cd "$base" +base="$(pwd)" +cd "$orig" + +# If this is a standalone app, set a variable that git-annex can use to +# install itself. +if [ -e "$base/git-annex" ]; then + GIT_ANNEX_APP_BASE="$base" + export GIT_ANNEX_APP_BASE +fi + +exec "$base/runshell" git-remote-p2p-annex "$@" diff --git a/standalone/osx/git-annex.app/Contents/MacOS/git-remote-tor-annex b/standalone/osx/git-annex.app/Contents/MacOS/git-remote-tor-annex new file mode 100755 index 0000000000..2a8061d7e5 --- /dev/null +++ b/standalone/osx/git-annex.app/Contents/MacOS/git-remote-tor-annex @@ -0,0 +1,31 @@ +#!/bin/sh +link="$(readlink "$0")" || true +if [ -n "$link" ]; then + base="$(dirname "$link")" +else + base="$(dirname "$0")" +fi + +if [ ! -d "$base" ]; then + echo "** cannot find base directory (I seem to be $0)" >&2 + exit 1 +fi +if [ ! -e "$base/runshell" ]; then + echo "** cannot find $base/runshell" >&2 + exit 1 +fi + +# Get absolute path to base, to avoid breakage when things change directories. +orig="$(pwd)" +cd "$base" +base="$(pwd)" +cd "$orig" + +# If this is a standalone app, set a variable that git-annex can use to +# install itself. +if [ -e "$base/git-annex" ]; then + GIT_ANNEX_APP_BASE="$base" + export GIT_ANNEX_APP_BASE +fi + +exec "$base/runshell" git-remote-tor-annex "$@"
diff --git a/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn b/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn new file mode 100644 index 0000000000..8733ffee3c --- /dev/null +++ b/doc/bugs/git-remote-p2p-annex_missing_in_standalone_build.mdwn @@ -0,0 +1,28 @@ +### Please describe the problem. + +The standalone build is missing an entrypoint script for git-remote-p2p-annex. There is only the binary in bin/git-remote-p2p-annex, but that is missing the setup work for the environment that the other scripts in the top-level directory do. + + +### What steps will reproduce the problem? + +Try to use git-remote-p2p-annex from the latest standalone build. + + +### What version of git-annex are you using? On what operating system? + +10.20250828-gfe7ecf505146342fe8df2430a0bcaf5f02d89a80 + + +### Please provide any additional information below. + +[[!format sh """ +# If you can, paste a complete transcript of the problem occurring here. +# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log + + +# End of transcript or log. +"""]] + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) + +
fix windows build
diff --git a/Annex/Multicast.hs b/Annex/Multicast.hs index 4fe3e0af6c..b117e39fb8 100644 --- a/Annex/Multicast.hs +++ b/Annex/Multicast.hs @@ -17,6 +17,7 @@ import Utility.Env import System.Posix.IO #else import System.Process (createPipeFd) +import GHC.IO.Handle.FD (fdToHandle) #endif import GHC.IO.Encoding (getLocaleEncoding) diff --git a/doc/bugs/windows_FTBFS__44___advise_needed.mdwn b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn index cccf814793..c13008c672 100644 --- a/doc/bugs/windows_FTBFS__44___advise_needed.mdwn +++ b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn @@ -44,3 +44,4 @@ Error: [S-7282] CI action with all the steps is [here](https://github.com/datalad/git-annex/blob/master/.github/workflows/build-windows.yaml#L112) +> [[fixed|done]] --[[Joey]]
response
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_25_31802dd28eaf83bcdd1d7b68caa58236._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_25_31802dd28eaf83bcdd1d7b68caa58236._comment new file mode 100644 index 0000000000..a9d95b107c --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_25_31802dd28eaf83bcdd1d7b68caa58236._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 25""" + date="2025-09-18T15:53:27Z" + content=""" +The pypi whl builds should already have it, as all stack builds default to +having the OsPath build flag enabled already. +"""]]
FTBFS on Windows
diff --git a/doc/bugs/windows_FTBFS__44___advise_needed.mdwn b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn new file mode 100644 index 0000000000..cccf814793 --- /dev/null +++ b/doc/bugs/windows_FTBFS__44___advise_needed.mdwn @@ -0,0 +1,46 @@ +### Please describe the problem. + + +Our windows build was failing for a while + +[here is the recent log](https://github.com/datalad/git-annex/actions/runs/17753795690/job/50453300001) +which shows + +``` +[336 of 754] Compiling Database.Init +[337 of 754] Compiling Database.Benchmark +D:\a\git-annex\git-annex\Annex\Multicast.hs:36:15: error: [GHC-88464] +Error: Variable not in scope: + fdToHandle + :: ghc-internal-9.1002.0:GHC.Internal.System.Posix.Internals.FD + -> IO Handle + | +36 | rh <- fdToHandle rfd + | ^^^^^^^^^^ + +D:\a\git-annex\git-annex\Annex\Multicast.hs:48:22: error: [GHC-88464] +Error: Variable not in scope: fdToHandle :: t0 -> IO Handle + | +48 | h <- fdToHandle fd + | ^^^^^^^^^^ + +[338 of 754] Compiling Creds +... +[752 of 754] Compiling Command.Assistant + +Error: [S-7282] + Stack failed to execute the build plan. + + While executing the build plan, Stack encountered the error: + + [S-7011] + While building package git-annex-10.20250828 (scroll up to its section to see the error) + using: + D:\a\git-annex\git-annex\.stack-work\dist\56bb250d\setup\setup --verbose=1 --builddir=.stack-work\dist\56bb250d build exe:git-annex --ghc-options "" + Process exited with code: ExitFailure 1 +``` + +### What steps will reproduce the problem? + + CI action with all the steps is [here](https://github.com/datalad/git-annex/blob/master/.github/workflows/build-windows.yaml#L112) +
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment new file mode 100644 index 0000000000..862ebf8647 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_24_2181f4b0acc9d01c85d7263cfa2d0cc1._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 24" + date="2025-09-16T19:34:27Z" + content=""" +well, may be [at least Michael's pypi whl builds](https://github.com/psychoinformatics-de/git-annex-wheel/blob/main/.github/workflows/build-linux.yaml) which directly install stack (I do not think they use system pkgs) could be tuned as needed? +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment new file mode 100644 index 0000000000..c02320110f --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_23_8e104885ed1e89c1b24bda54c7ba2bb4._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 23" + date="2025-09-16T19:33:40Z" + content=""" +well, may be [at least Michael's pypi whl builds](https://github.com/psychoinformatics-de/git-annex-wheel/blob/main/.github/workflows/build-linux.yaml) which directly install stack (I do not think they use system pkgs) could be tuned as needed? +"""]]
annex.assistant.allowunlocked
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Assistant/Threads/Committer.hs b/Assistant/Threads/Committer.hs index 6ffc9eb0e1..53663e8b5b 100644 --- a/Assistant/Threads/Committer.hs +++ b/Assistant/Threads/Committer.hs @@ -62,6 +62,11 @@ commitThread = namedThread "Committer" $ do fmap Seconds . annexDelayAdd <$> Annex.getGitConfig largefilematcher <- liftAnnex largeFilesMatcher annexdotfiles <- liftAnnex $ getGitConfigVal annexDotFiles + addunlockedmatcher <- liftAnnex $ + ifM (annexSupportUnlocked <$> Annex.getGitConfig) + ( Just <$> addUnlockedMatcher + , return Nothing + ) msg <- liftAnnex Command.Sync.commitMsg lockdowndir <- liftAnnex $ fromRepo gitAnnexTmpWatcherDir liftAnnex $ do @@ -70,7 +75,7 @@ commitThread = namedThread "Committer" $ do void $ liftIO $ tryIO $ removeDirectoryRecursive lockdowndir void $ createAnnexDirectory lockdowndir waitChangeTime $ \(changes, time) -> do - readychanges <- handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd $ + readychanges <- handleAdds lockdowndir havelsof largefilematcher annexdotfiles addunlockedmatcher delayadd $ simplifyChanges changes if shouldCommit False time (length readychanges) readychanges then do @@ -275,8 +280,8 @@ commitStaged msg = do - Any pending adds that are not ready yet are put back into the ChangeChan, - where they will be retried later. -} -handleAdds :: OsPath -> Bool -> GetFileMatcher -> Bool -> Maybe Seconds -> [Change] -> Assistant [Change] -handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = returnWhen (null incomplete) $ do +handleAdds :: OsPath -> Bool -> GetFileMatcher -> Bool -> Maybe AddUnlockedMatcher -> Maybe Seconds -> [Change] -> Assistant [Change] +handleAdds lockdowndir havelsof largefilematcher annexdotfiles addunlockedmatcher delayadd cs = returnWhen (null incomplete) $ do let (pending, inprocess) = partition isPendingAddChange incomplete let lockdownconfig = LockDownConfig { lockingFile = False @@ -340,9 +345,9 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret Command.Add.addFile Command.Add.Small f =<< liftIO (R.getSymbolicLinkStatus (fromOsPath f)) - {- Avoid overhead of re-injesting a renamed unlocked file, by - - examining the other Changes to see if a removed file has the - - same InodeCache as the new file. If so, we can just update + {- When adding the file unlocked, avoid overhead of re-injesting a renamed + - unlocked file, by examining the other Changes to see if a removed + - file has the same InodeCache as the new file. If so, we can just update - bookkeeping, and stage the file in git. -} addannexed :: [Change] -> Assistant [Maybe Change] @@ -357,18 +362,36 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret , checkWritePerms = True } if M.null m - then forM toadd (addannexed' cfg) + then forM toadd $ \c -> do + mcache <- liftIO $ genInodeCache (changeFile c) delta + addunlocked <- checkaddunlocked c + addannexed' cfg c addunlocked mcache else forM toadd $ \c -> do mcache <- liftIO $ genInodeCache (changeFile c) delta - case mcache of - Nothing -> addannexed' cfg c - Just cache -> - case M.lookup (inodeCacheToKey ct cache) m of - Nothing -> addannexed' cfg c - Just k -> fastadd c k - - addannexed' :: LockDownConfig -> Change -> Assistant (Maybe Change) - addannexed' lockdownconfig change@(InProcessAddChange { lockedDown = ld }) = + ifM (checkaddunlocked c) + ( case mcache of + Nothing -> addannexed' cfg c True Nothing + Just cache -> + case M.lookup (inodeCacheToKey ct cache) m of + Nothing -> addannexed' cfg c True Nothing + Just k -> fastadd c k + , addannexed' cfg c False mcache + ) + + checkaddunlocked (InProcessAddChange { lockedDown = ld }) = + case addunlockedmatcher of + Just addunlockedmatcher' -> do + let mi = MatchingFile $ FileInfo + { contentFile = contentLocation (keySource ld) + , matchFile = keyFilename (keySource ld) + , matchKey = Nothing + } + liftAnnex $ addUnlocked addunlockedmatcher' mi True + Nothing -> return True + checkaddunlocked _ = return True + + addannexed' :: LockDownConfig -> Change -> Bool -> Maybe InodeCache -> Assistant (Maybe Change) + addannexed' lockdownconfig change@(InProcessAddChange { lockedDown = ld }) addunlocked mcache = catchDefaultIO Nothing <~> doadd where ks = keySource ld @@ -376,14 +399,14 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret (mkey, _mcache) <- liftAnnex $ do showStartMessage (StartMessage "add" (ActionItemOther (Just (QuotedPath (keyFilename ks)))) (SeekInput [])) ingest nullMeterUpdate (Just $ LockedDown lockdownconfig ks) Nothing - maybe (failedingest change) (done change $ keyFilename ks) mkey - addannexed' _ _ = return Nothing + maybe (failedingest change) (done change addunlocked mcache $ keyFilename ks) mkey + addannexed' _ _ _ _ = return Nothing fastadd :: Change -> Key -> Assistant (Maybe Change) fastadd change key = do let source = keySource $ lockedDown change liftAnnex $ finishIngestUnlocked key source - done change (keyFilename source) key + done change True Nothing (keyFilename source) key removedKeysMap :: InodeComparisonType -> [Change] -> Annex (M.Map InodeCacheKey Key) removedKeysMap ct l = do @@ -399,11 +422,14 @@ handleAdds lockdowndir havelsof largefilematcher annexdotfiles delayadd cs = ret liftAnnex showEndFail return Nothing - done change file key = liftAnnex $ do + done change addunlocked mcache file key = liftAnnex $ do logStatus NoLiveUpdate key InfoPresent - mode <- liftIO $ catchMaybeIO $ - fileMode <$> R.getFileStatus (fromOsPath file) - stagePointerFile file mode =<< hashPointerFile key + if addunlocked + then do + mode <- liftIO $ catchMaybeIO $ + fileMode <$> R.getFileStatus (fromOsPath file) + stagePointerFile file mode =<< hashPointerFile key + else addSymlink file key mcache showEndOk return $ Just $ finishedChange change key diff --git a/CHANGELOG b/CHANGELOG index 740253853a..72e4a419f5 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -17,6 +17,7 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Removed support for building with cryptonite, use crypton. * p2phttp: Fix a hang that could occur when used with --directory, and a repository in the repository got removed. + * Added annex.assistant.allowunlocked config. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/Types/GitConfig.hs b/Types/GitConfig.hs index 35b07a50a3..156b88c32c 100644 --- a/Types/GitConfig.hs +++ b/Types/GitConfig.hs @@ -157,6 +157,7 @@ data GitConfig = GitConfig , annexSkipUnknown :: Bool , annexAdjustedBranchRefresh :: Integer , annexSupportUnlocked :: Bool + , annexAssistantAllowUnlocked :: Bool , coreSymlinks :: Bool , coreSharedRepository :: SharedRepository , coreQuotePath :: QuotePath @@ -281,6 +282,7 @@ extractGitConfig configsource r = GitConfig (if getbool "adjustedbranchrefresh" False then 1 else 0) (getmayberead (annexConfig "adjustedbranchrefresh")) , annexSupportUnlocked = getbool (annexConfig "supportunlocked") True + , annexAssistantAllowUnlocked = getbool (annexConfig "assistant.allowunlocked") False , coreSymlinks = getbool "core.symlinks" True , coreSharedRepository = getSharedRepository r , coreQuotePath = QuotePath (getbool "core.quotepath" True) diff --git a/doc/git-annex.mdwn b/doc/git-annex.mdwn index 6b668f69b5..4f633d7f0e 100644 --- a/doc/git-annex.mdwn +++ b/doc/git-annex.mdwn @@ -1044,13 +1044,23 @@ repository, using [[git-annex-config]]. See its man page for a list.) To configure a default annex.addunlocked for all clones of the repository, this can be set in [[git-annex-config]](1). - (Using `git add` always adds files in unlocked form and it is not - affected by this setting.) + Using `git add` always adds files in unlocked form and it is not + affected by this setting. The assistant defaults to adding all files + unlocked, unless `annex.assistant.allowunlocked` is set. When a repository has core.symlinks set to false, or has an adjusted unlocked branch checked out, this setting is ignored, and files are always added to the repository in unlocked form. +* `annex.assistant.allowunlocked` + + The `git-annex assistant` defaults to adding all files unlocked, so that + files can be modified without the user needing to do anything to unlock + them. + + If this is set to `true` then it will instead use the `annex.addunlocked` + configuration to decide which files to add unlocked. + * `annex.numcopies` This is a deprecated setting. You should instead use the diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn index ed0a60d5a4..2232354158 100644 --- a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn @@ -4,11 +4,10 @@ configured to add them locked. (Diff truncated)
tag repronim based on https://git-annex.branchable.com/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/#comment-096bedb2d22d5aae6a51a53179372d4f
diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn index 00c67762ea..ed0a60d5a4 100644 --- a/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked.mdwn @@ -10,3 +10,5 @@ Or perhaps a better name would be annex.assistant.allowaddlocked. See here for some motivating use cases <https://git-annex.branchable.com/forum/Is_there_a_way_to_have_assistant_add_files_locked__63__/> + +[[!tag projects/repronim]] diff --git a/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment new file mode 100644 index 0000000000..2361480ecf --- /dev/null +++ b/doc/todo/allow_configuring_assistant_to_add_files_locked/comment_2_62adc0910dcf29c74690d9da4a054048._comment @@ -0,0 +1,9 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 2""" + date="2025-09-16T17:41:10Z" + content=""" +This looks like it would be a relatively simple feature to add, +eg an hour or two, and I see in the forum that @yarik thinks ReproNim +can use it. So I'll go ahead... +"""]]
improve example
diff --git a/doc/git-annex-unlock.mdwn b/doc/git-annex-unlock.mdwn index 1a2bd32596..6d0ad22a7a 100644 --- a/doc/git-annex-unlock.mdwn +++ b/doc/git-annex-unlock.mdwn @@ -39,9 +39,10 @@ repository. So, enable annex.thin with care. # git annex unlock photo.jpg # gimp photo.jpg - # git annex add photo.jpg - # git annex lock photo.jpg - # git commit -m "redeye removal" + # git commit photo.jpg -m "redeye removal" + # gimp photo.jpg + # git commit photo.jpg -m "fix oversaturation" + # git annex lock photo.jpg # OPTIONS diff --git a/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment b/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment new file mode 100644 index 0000000000..777bcdf187 --- /dev/null +++ b/doc/git-annex-unlock/comment_13_db3d6eb5f238edbd505b6909863167df._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="joey" + subject="""Re: We don’t need a 'git annex lock' after a 'git annex add', right?""" + date="2025-09-16T17:28:49Z" + content=""" +Well spotted. `git-annex add` defaults to adding files locked, even when +adding what was an unlocked file before. + +I've improved the example. +"""]]
close
diff --git a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn index b46105c19d..57a0af5a38 100644 --- a/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn +++ b/doc/todo/Relative_Ignores_for_Relative_Imports.mdwn @@ -47,3 +47,5 @@ Then I commented out certain lines for each location. E.g. only try ignoring `a` Regardless of import or ignore, only `b` and `f` were ignored pertaining to the root `.gitignore` matching these files in the tree, even when the tree was imported to subtree `rel-ignore` or `root-ignore`. </details> + +> Closing as user error. [[done]] --[[Joey]]
improve error message when SETCREDS overwrites git-annex config
That is not allowed, so it's not a bug in git-annex when it happens and
instead tell the special remote developer how it's messed up.
Note that currently only Remote.External can overwrite the parsed remote
config with a PassedThrough value. PassedThrough values are otherwise
only generated for configs that are not parsed by the remote config
parser.
Sponsored-by: Joshua Antonishen
That is not allowed, so it's not a bug in git-annex when it happens and
instead tell the special remote developer how it's messed up.
Note that currently only Remote.External can overwrite the parsed remote
config with a PassedThrough value. PassedThrough values are otherwise
only generated for configs that are not parsed by the remote config
parser.
Sponsored-by: Joshua Antonishen
diff --git a/Annex/SpecialRemote/Config.hs b/Annex/SpecialRemote/Config.hs index 5f9d6db831..925b7e837c 100644 --- a/Annex/SpecialRemote/Config.hs +++ b/Annex/SpecialRemote/Config.hs @@ -206,13 +206,23 @@ getRemoteConfigValue :: HasCallStack => Typeable v => RemoteConfigField -> Parse getRemoteConfigValue f (ParsedRemoteConfig m _) = case M.lookup f m of Just (RemoteConfigValue v) -> case cast v of Just v' -> Just v' - Nothing -> error $ unwords - [ "getRemoteConfigValue" - , fromProposedAccepted f - , "found value of unexpected type" - , show (typeOf v) ++ "." - , "This is a bug in git-annex!" - ] + Nothing -> case cast v :: Maybe PassedThrough of + -- Handle the case where an external special remote + -- tries to SETCONFIG a value belonging to git-annex, + -- resulting in a PassedThrough type being stored. + Just _ -> error $ unwords + [ "Special remote config " + , fromProposedAccepted f + , "has been overwritten by SETCONFIG." + , "This is not supported." + ] + Nothing -> error $ unwords + [ "getRemoteConfigValue" + , fromProposedAccepted f + , "found value of unexpected type" + , show (typeOf v) ++ "." + , "This is a bug in git-annex!" + ] Nothing -> Nothing {- Gets all fields that remoteConfigRestPassthrough matched. -} diff --git a/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment b/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment new file mode 100644 index 0000000000..65c6774e52 --- /dev/null +++ b/doc/todo/encrypt_only_the_credentials/comment_8_7ec2d0de7c0b90f05475b86148982197._comment @@ -0,0 +1,24 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 8""" + date="2025-09-16T16:57:11Z" + content=""" +SETCONFIG is limited to setting the external program's configuration, +not to reaching inside git-annex and setting its own configuration. +The docs say that, but could perhaps be more clear. + +I have improved the error message. + +git-annex sets up encryption for the remote based on the encryption= and +encryptonlycreds= settings before it ever starts up the external program. +That would need to change in order to support this. + +But I'm also doubtful it would be a good idea to support SETCONFIG +of any of the things git-annex uses for encryption, chunking, etc. +It's essentially monkey-patching git-annex from the external program. +Some changes to git-annex's configs could lead to very unexpected behavior. + +If you really need the ability to turn on onlyencryptcreds by default +with your special remote, there will need to be some other way implemented +to do it. Please open a new todo about that. +"""]]
fixed
diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn index 50e3f1ba51..187e725f20 100644 --- a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names.mdwn @@ -50,3 +50,4 @@ I'm running Arch Linux (kernel 6.15.1-arch1-2). The repo I'm running the command git-annex has been brilliant for managing my large media collection across several removable drives, and I'm confident it will continue to scale. This is the first issue I've run into with it. +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment new file mode 100644 index 0000000000..f4ae0fef16 --- /dev/null +++ b/doc/bugs/symlink_already_exists_when_adding_non-ascii_names/comment_4_68a3c0b736e1ba3d44177a0fbd18b257._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 4""" + date="2025-09-16T16:44:16Z" + content=""" +There was another bug filed about the same problem, +[[bugs/git-annex_add__47__unlock_fails_for_some_names]]. + +Cause is a filename that is 21 bytes long and begins with a utf-8 +character. Which AFAICS all the filenames mentioned here are. + +[[!commit 67f00027d1b326c979db8b81c973a61234c406d7]] fixes this. +"""]]
close
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn index 28e1babc48..b6edddf4fd 100644 --- a/doc/bugs/35_failed_tests_on_beegfs.mdwn +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -81,3 +81,5 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 [[!meta author=yoh]] [[!tag projects/repronim]] + +> [[fixed|done]] when built with OsPath. --[[Joey]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment new file mode 100644 index 0000000000..5b67de98a2 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_23_4c28579ce8bb003f0eca155184b0bdfc._comment @@ -0,0 +1,22 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 23""" + date="2025-09-16T14:36:24Z" + content=""" +Yay! + +OsPath needs the os-string and file-io haskell packages. Which are not +currently in Debian. So either work will need to be done to package those, +or when Debian upgrades ghc to 9.12.2, it will include those libraries +automatically since they are bundled with ghc since that version. + +Maybe you know more than I do about the state of Debian's haskell support. + +The transition is being tracked at [[todo/RawFilePath_conversion]] but I +don't know yet what the solution is to getting the dependencies broadly +available. + +(Or I could implement the same fixes when not built with that flag of +course. It is doable. Just annoying especially since that code will have to +be carefully gotten just right, only to be thrown away later.) +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment new file mode 100644 index 0000000000..c57f30a299 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_21_ea61c9101b9779e75b49f898ebd1e91a._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 21" + date="2025-09-16T00:40:21Z" + content=""" +that version had no errors: `All tests succeeded. (Ran 50 test groups in 11m54s)` + +So, this will be default option? any specific dependency requirements we need to add/constrain? +"""]]
boot libs
diff --git a/doc/todo/RawFilePath_conversion.mdwn b/doc/todo/RawFilePath_conversion.mdwn index ec61f59b0a..cd0dec9b1a 100644 --- a/doc/todo/RawFilePath_conversion.mdwn +++ b/doc/todo/RawFilePath_conversion.mdwn @@ -50,6 +50,8 @@ of the status. The `require_OsPath` branch removes the OsPath build flag, and merging it would resolve this. That will need packagers to do some work -to package the libraries though. --[[Joey]] +to package the libraries though. Or to upgrade ghc, since file-io and +os-string are boot libraries since ghc 9.12.2 and 9.10.1 respectively. +--[[Joey]] [[!tag confirmed]]
work around file-io not setting locale encoding when opening a Handle
Works around this bug https://github.com/haskell/file-io/issues/45
The fix is in Utility.FileIO.CloseOnExec because all use of file-io is
already wrapped through that module. Although perhaps that ought to be
refactored at this point.
I'd hope that file-io will eventually fix this bug, and also provide
CloseOnExec variants of its functions. That would allow depending on the
fixed version, and removing this ugly code.
Note that, functions like readFile that don't care about the encoding
due to reading/writing a ByteString were kept optimally fast by not
setting the encoding. This avoids an IORef read and write per open.
Sponsored-by: Graham Spencer
Works around this bug https://github.com/haskell/file-io/issues/45
The fix is in Utility.FileIO.CloseOnExec because all use of file-io is
already wrapped through that module. Although perhaps that ought to be
refactored at this point.
I'd hope that file-io will eventually fix this bug, and also provide
CloseOnExec variants of its functions. That would allow depending on the
fixed version, and removing this ugly code.
Note that, functions like readFile that don't care about the encoding
due to reading/writing a ByteString were kept optimally fast by not
setting the encoding. This avoids an IORef read and write per open.
Sponsored-by: Graham Spencer
diff --git a/Utility/FileIO.hs b/Utility/FileIO.hs index 3624f940d2..a775dca6c6 100644 --- a/Utility/FileIO.hs +++ b/Utility/FileIO.hs @@ -2,7 +2,8 @@ - readFileString, writeFileString, and appendFileString. - - When building with file-io, all exported functions set the close-on-exec - - flag. + - flag. Also, some other issues are handled that file-io does not handle + - correctly. - - When not building with file-io, this provides equvilant - RawFilePath versions. Note that those versions do not currently diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs index 29e7c4b08a..3d1bb739f7 100644 --- a/Utility/FileIO/CloseOnExec.hs +++ b/Utility/FileIO/CloseOnExec.hs @@ -1,7 +1,12 @@ {- This is a subset of the functions provided by file-io. + - - All functions have been modified to set the close-on-exec - flag to True. - + - Also, functions that return a Handle have been modified to + - use the locale encoding, working around this bug: + - https://github.com/haskell/file-io/issues/45 + - - Copyright 2025 Joey Hess <id@joeyh.name> - Copyright 2024 Julian Ospald - @@ -34,7 +39,8 @@ module Utility.FileIO.CloseOnExec import System.File.OsPath.Internal (withOpenFile', augmentError) import qualified System.File.OsPath.Internal as I -import System.IO (IO, Handle, IOMode(..)) +import System.IO (IO, Handle, IOMode(..), hSetEncoding) +import GHC.IO.Encoding (getLocaleEncoding) import System.OsPath (OsPath, OsString) import Prelude (Bool(..), pure, either, (.), (>>=), ($)) import Control.Exception @@ -50,48 +56,47 @@ closeOnExec = True withFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withFile osfp iomode act = (augmentError "withFile" osfp - $ withOpenFile' osfp iomode False False closeOnExec (try . act) True) + $ withOpenFileEncoding osfp iomode False False closeOnExec (try . act) True) >>= either ioError pure -withFile' - :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFile' :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withFile' osfp iomode act = (augmentError "withFile'" osfp - $ withOpenFile' osfp iomode False False closeOnExec (try . act) False) + $ withOpenFileEncoding osfp iomode False False closeOnExec (try . act) False) >>= either ioError pure openFile :: OsPath -> IOMode -> IO Handle openFile osfp iomode = augmentError "openFile" osfp $ - withOpenFile' osfp iomode False False closeOnExec pure False + withOpenFileEncoding osfp iomode False False closeOnExec pure False withBinaryFile :: OsPath -> IOMode -> (Handle -> IO r) -> IO r withBinaryFile osfp iomode act = (augmentError "withBinaryFile" osfp - $ withOpenFile' osfp iomode True False closeOnExec (try . act) True) + $ withOpenFileEncoding osfp iomode True False closeOnExec (try . act) True) >>= either ioError pure openBinaryFile :: OsPath -> IOMode -> IO Handle openBinaryFile osfp iomode = augmentError "openBinaryFile" osfp $ - withOpenFile' osfp iomode True False closeOnExec pure False + withOpenFileEncoding osfp iomode True False closeOnExec pure False readFile :: OsPath -> IO BSL.ByteString -readFile fp = withFile' fp ReadMode BSL.hGetContents +readFile fp = withFileNoEncoding' fp ReadMode BSL.hGetContents readFile' :: OsPath -> IO BS.ByteString -readFile' fp = withFile fp ReadMode BS.hGetContents +readFile' fp = withFileNoEncoding fp ReadMode BS.hGetContents writeFile :: OsPath -> BSL.ByteString -> IO () -writeFile fp contents = withFile fp WriteMode (`BSL.hPut` contents) +writeFile fp contents = withFileNoEncoding fp WriteMode (`BSL.hPut` contents) writeFile' :: OsPath -> BS.ByteString -> IO () -writeFile' fp contents = withFile fp WriteMode (`BS.hPut` contents) +writeFile' fp contents = withFileNoEncoding fp WriteMode (`BS.hPut` contents) appendFile :: OsPath -> BSL.ByteString -> IO () -appendFile fp contents = withFile fp AppendMode (`BSL.hPut` contents) +appendFile fp contents = withFileNoEncoding fp AppendMode (`BSL.hPut` contents) appendFile' :: OsPath -> BS.ByteString -> IO () -appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) +appendFile' fp contents = withFileNoEncoding fp AppendMode (`BS.hPut` contents) {- Re-implementing openTempFile is difficult due to the current - structure of file-io. See this issue for discussion about improving @@ -99,16 +104,45 @@ appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) - So, instead this uses noCreateProcessWhile. - -} openTempFile :: OsPath -> OsString -> IO (OsPath, Handle) -openTempFile tmp_dir template = +openTempFile tmp_dir template = do #ifdef mingw32_HOST_OS - I.openTempFile tmp_dir template + (p, h) <- I.openTempFile tmp_dir template + getLocaleEncoding >>= hSetEncoding h + pure (p, h) #else noCreateProcessWhile $ do (p, h) <- I.openTempFile tmp_dir template fd <- handleToFd h setFdOption fd CloseOnExec True h' <- fdToHandle fd + getLocaleEncoding >>= hSetEncoding h' pure (p, h') #endif +{- Wrapper around withOpenFile' that sets the locale encoding on the + - Handle. -} +withOpenFileEncoding :: OsPath -> IOMode -> Bool -> Bool -> Bool -> (Handle -> IO r) -> Bool -> IO r +withOpenFileEncoding fp iomode binary existing cloExec action close_finally = + withOpenFile' fp iomode binary existing cloExec action' close_finally + where + action' h = do + getLocaleEncoding >>= hSetEncoding h + action h + +{- Variant of withFile above that does not have the overhead of setting the + - locale encoding. Faster to use when the Handle is not used in a way that + - needs any encoding. -} +withFileNoEncoding :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFileNoEncoding osfp iomode act = (augmentError "withFile" osfp + $ withOpenFile' osfp iomode False False closeOnExec (try . act) True) + >>= either ioError pure + +{- Variant of withFile' above that does not have the overhead of setting the + - locale encoding. Faster to use when the Handle is not used in a way that + - needs any encoding. -} +withFileNoEncoding' :: OsPath -> IOMode -> (Handle -> IO r) -> IO r +withFileNoEncoding' osfp iomode act = (augmentError "withFile'" osfp + $ withOpenFile' osfp iomode False False closeOnExec (try . act) False) + >>= either ioError pure + #endif diff --git a/doc/bugs/yt-dlp_mojibake.mdwn b/doc/bugs/yt-dlp_mojibake.mdwn index e4133f4fdc..ed7f8ac8b6 100644 --- a/doc/bugs/yt-dlp_mojibake.mdwn +++ b/doc/bugs/yt-dlp_mojibake.mdwn @@ -20,3 +20,5 @@ Unfortunatly, it is a bug in file-io: To fix it, git-annex will need to wrap file-io and call `getLocaleEncoding >>= hSetEncoding h` on each opened Handle. Or depend on a fixed version. --[[Joey]] + +> [[done]] --[[Joey]]
bug
diff --git a/doc/bugs/yt-dlp_mojibake.mdwn b/doc/bugs/yt-dlp_mojibake.mdwn new file mode 100644 index 0000000000..e4133f4fdc --- /dev/null +++ b/doc/bugs/yt-dlp_mojibake.mdwn @@ -0,0 +1,22 @@ +git-annex importfeed from an AvE video failed: + + renamePath:rename '/home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/ï¼ÂCorrectionï¼ hydraulic spool motor [HPajFNxnuN8].webm' to '../.git/annex/tmp/URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8': does not exist (No such file or directory) + +Here's the file list: + + joey@darkstar:~/lib/big>cat .git/annex/tmp/work.URL--yt\&chttps\&c%%www.youtube.com%watch\,63v\,61HPajFNxnuN8/git-annex-file-list-file + /home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm + /home/joey/lib/big/.git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm + +And the video was written to the file: + ".git/annex/tmp/work.URL--yt&chttps&c%%www.youtube.com%watch,63v,61HPajFNxnuN8/*Correction* hydraulic spool motor [HPajFNxnuN8].webm" + +This only affects a git-annex built with OsPath, and only recently +(not a released version). + +Unfortunatly, it is a bug in file-io: +<https://github.com/haskell/file-io/issues/45> + +To fix it, git-annex will need to wrap file-io and call +`getLocaleEncoding >>= hSetEncoding h` on each opened Handle. Or depend on +a fixed version. --[[Joey]]
require_OsPath branch
diff --git a/doc/todo/RawFilePath_conversion.mdwn b/doc/todo/RawFilePath_conversion.mdwn index b488353a60..ec61f59b0a 100644 --- a/doc/todo/RawFilePath_conversion.mdwn +++ b/doc/todo/RawFilePath_conversion.mdwn @@ -48,4 +48,8 @@ of the status. there use of FilePath remains in odd corners. These are unlikely to cause any noticiable performance impact. +The `require_OsPath` branch removes the OsPath build flag, +and merging it would resolve this. That will need packagers to do some work +to package the libraries though. --[[Joey]] + [[!tag confirmed]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment new file mode 100644 index 0000000000..204656bf99 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_21_54d9c66876137c549caafa469140904e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 21""" + date="2025-09-15T20:12:04Z" + content=""" +I have enabled OsPath in the build at +<https://downloads.kitenet.net/git-annex/autobuild/amd64/git-annex-standalone-amd64.tar.gz> +"""]]
update
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment index 035a891794..5992e82d7f 100644 --- a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment @@ -3,11 +3,10 @@ subject="""comment 19""" date="2025-09-15T18:20:21Z" content=""" -If your git-annex is not built with the OsPath build flag, -it will still not be using `O_CLOEXEC`. +Confirmed in your log that git-annex is not built with the +OsPath build flag, so it will still not be using `O_CLOEXEC`. -I'll bet it's not, since Debian doesn't have the necessary library packaged -yet.. - -Check for output from: `git-annex version | grep OsPath` +It would be good to get a build with OsPath and test it to see if my fixes +actually did work. Debian doesn't include the necessary library yet, so a +build using stack is needed. """]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment new file mode 100644 index 0000000000..035a891794 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_19_ef0de3bc5f73207817b27d31f1f96730._comment @@ -0,0 +1,13 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 19""" + date="2025-09-15T18:20:21Z" + content=""" +If your git-annex is not built with the OsPath build flag, +it will still not be using `O_CLOEXEC`. + +I'll bet it's not, since Debian doesn't have the necessary library packaged +yet.. + +Check for output from: `git-annex version | grep OsPath` +"""]]
drop problem end characters from filename operating on String not RawFilePath
Fix bug that could cause an invalid utf-8 sequence to be used in a
temporary filename when the input filename was valid utf-8.
Sponsored-by: k0ld
Fix bug that could cause an invalid utf-8 sequence to be used in a
temporary filename when the input filename was valid utf-8.
Sponsored-by: k0ld
diff --git a/CHANGELOG b/CHANGELOG index 371f53f30f..740253853a 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -3,6 +3,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * drop: --fast support when dropping from a remote. * Fix crash operating on filenames that are exactly 21 bytes long and begin with a utf-8 character. + * Fix bug that could cause an invalid utf-8 sequence to be used in a + temporary filename when the input filename was valid utf-8. * git-annex.cabal: Turn on the OsPath build flag by default. * Add build warnings when git-annex is built without the OsPath build flag. diff --git a/Utility/Tmp.hs b/Utility/Tmp.hs index 582f6849fc..df6673eadd 100644 --- a/Utility/Tmp.hs +++ b/Utility/Tmp.hs @@ -116,20 +116,29 @@ relatedTemplate' :: RawFilePath -> RawFilePath #ifndef mingw32_HOST_OS relatedTemplate' f | len > templateAddedLength = - {- Some filesystems like FAT have issues with filenames - - ending in ".", and others like VFAT don't allow a - - filename to end with trailing whitespace, so avoid - - truncating a filename to end that way. -} - let p = B.dropWhileEnd disallowed $ - truncateFilePath (len - templateAddedLength) f + let p = fixend $ truncateFilePath (len - templateAddedLength) f in if B.null p then "t" else p | otherwise = f where len = B.length f - disallowed c = c == dot || isSpace (chr (fromIntegral c)) + {- Some filesystems like FAT have issues with filenames + - ending in ".", and others like VFAT don't allow a + - filename to end with trailing whitespace, so avoid + - truncating a filename to end that way. -} + fixend p = + {- B.dropWhileEnd doesn't take wide characters + - into account, but is fast, so use it to check + - the common case. -} + let p' = B.dropWhileEnd disallowed p + in if p' == p + then p + else toRawFilePath $ reverse $ + dropWhile (disallowed . fromIntegral . ord) $ + reverse $ fromRawFilePath p dot = fromIntegral (ord '.') + disallowed c = c == dot || isSpace (chr (fromIntegral c)) #else -- Avoids a test suite failure on windows, reason unknown, but -- best to keep paths short on windows anyway. diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn index ff868a0419..4b10d6419f 100644 --- a/doc/bugs/multibyte_characters_broken.mdwn +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -31,3 +31,5 @@ The original file obviously has a correct encoding, but it seems that git annex ### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) I use git annex to manage my whole music collection successfully. + +> [[fixed|done]] --[[Joey]] diff --git a/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment new file mode 100644 index 0000000000..8660462f46 --- /dev/null +++ b/doc/bugs/multibyte_characters_broken/comment_1_eb421648f585296f7c44f969bdcae7a4._comment @@ -0,0 +1,28 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2025-09-15T16:59:59Z" + content=""" +git-annex actually attempts to truncate the filename taking unicode +character width into account. + +Here is the truncation on the wrong byte though: + + ghci> :t x + x :: String + ghci> x + "ingest-01-06 \19977\30707\29748\20035\12539\23500\27810\32654\26234\24693\12539\20037\24029\32190\12539\31712\21407\24693\32654\12539\28145\35211 \26792\21152 - Tuxedo Mirage.flac" + ghci> toRawFilePath x + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138\160 - Tuxedo Mirage.flac" + ghci> relatedTemplate (toRawFilePath x) + "ingest-01-06 \228\184\137\231\159\179\231\144\180\228\185\131\227\131\187\229\175\140\230\178\162\231\190\142\230\153\186\230\129\181\227\131\187\228\185\133\229\183\157\231\182\190\227\131\187\231\175\160\229\142\159\230\129\181\231\190\142\227\131\187\230\183\177\232\166\139 \230\162\168\229\138" + +What is going on is that '\160` is a space character, and filesystems like +FAT do not allow a filename to end with a space. So relatedTemplate trims +off trailing spaces, and accidentially trimmed off this byte, despite it +being part of a multibyte sequence. + +Aren't filesystems with arbitrary limitations on what valid filenames are fun? + +Fixed this. +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment new file mode 100644 index 0000000000..742a6d5403 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_18_3cddb2113a962827b495ae71f686453e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 18" + date="2025-09-15T17:11:51Z" + content=""" +I think so -- I posted a [full log](http://www.oneukrainian.com/tmp/2025.09.11T11.15.27-2500297_stdout) now to check +"""]]
comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment new file mode 100644 index 0000000000..61164d930a --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_17_bebe0ee51ba6a6c23ee3e4e5999d575b._comment @@ -0,0 +1,15 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 17""" + date="2025-09-15T16:07:35Z" + content=""" +Drat. Reopened bug. + +Is the error for these still "export.ex [...] Device or resource busy"? + +If so, the problem must not be beegfs not liking an open file to be +renamed, but something else. + +I have verified that the temp file that gets renamed to the "export.ex" +log file is now opened with `O_CLOEXEC`. +"""]]
reopen
diff --git a/doc/bugs/35_failed_tests_on_beegfs.mdwn b/doc/bugs/35_failed_tests_on_beegfs.mdwn index 557ecc8d53..28e1babc48 100644 --- a/doc/bugs/35_failed_tests_on_beegfs.mdwn +++ b/doc/bugs/35_failed_tests_on_beegfs.mdwn @@ -81,5 +81,3 @@ upgrade supported from repository versions: 0 1 2 3 4 5 6 7 8 9 10 [[!meta author=yoh]] [[!tag projects/repronim]] - -> [[fixed|done]] --[[Joey]]
fix p2phttp worker thread leak with deleted repository LOCKCONTENT
p2phttp: Fix a hang that could occur when used with --directory, and a
repository in the repository got removed.
It could leak up to -J number of worker threads, but this only affected a
client trying to access the deleted repository.
It may be that this could also affect a non-deleted repository, and also
leak a worker thread, if invalid p2p protocol is sent.
p2phttp: Fix a hang that could occur when used with --directory, and a
repository in the repository got removed.
It could leak up to -J number of worker threads, but this only affected a
client trying to access the deleted repository.
It may be that this could also affect a non-deleted repository, and also
leak a worker thread, if invalid p2p protocol is sent.
diff --git a/CHANGELOG b/CHANGELOG index cb3a0e0c15..371f53f30f 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -13,6 +13,8 @@ git-annex (10.20250829) UNRELEASED; urgency=medium * Improve performance when used with a local git remote that has a large working tree. * Removed support for building with cryptonite, use crypton. + * p2phttp: Fix a hang that could occur when used with --directory, + and a repository in the repository got removed. -- Joey Hess <id@joeyh.name> Fri, 29 Aug 2025 12:34:06 -0400 diff --git a/P2P/Http/Server.hs b/P2P/Http/Server.hs index 6e3d530303..88e6fa3367 100644 --- a/P2P/Http/Server.hs +++ b/P2P/Http/Server.hs @@ -477,14 +477,19 @@ serveLockContent mst su apiver (B64Key k) cu bypass sec auth = do let lock = do lockresv <- newEmptyTMVarIO unlockv <- newEmptyTMVarIO + -- A single worker thread takes the lock, and keeps running +- -- until unlock in order to keep the lock held. annexworker <- async $ inAnnexWorker st $ do lockres <- runFullProto (clientRunState conn) (clientP2PConnection conn) $ do net $ sendMessage (LOCKCONTENT k) checkSuccess liftIO $ atomically $ putTMVar lockresv lockres - liftIO $ atomically $ takeTMVar unlockv - void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do - net $ sendMessage UNLOCKCONTENT + case lockres of + Right True -> do + liftIO $ atomically $ takeTMVar unlockv + void $ runFullProto (clientRunState conn) (clientP2PConnection conn) $ do + net $ sendMessage UNLOCKCONTENT + _ -> return () atomically (takeTMVar lockresv) >>= \case Right True -> return (Just (annexworker, unlockv)) _ -> return Nothing diff --git a/doc/todo/p2phttp_serve_multiple_repositories.mdwn b/doc/todo/p2phttp_serve_multiple_repositories.mdwn index f2ad9c752e..47cf8ad8fd 100644 --- a/doc/todo/p2phttp_serve_multiple_repositories.mdwn +++ b/doc/todo/p2phttp_serve_multiple_repositories.mdwn @@ -19,3 +19,5 @@ I asked matrss if this would be useful for forgejo-aneksajo and he said very useful, although I think I can work with the limitation [of only 1]." [[!tag projects/INM7]] + +> [[done]] --[[Joey]] diff --git a/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment new file mode 100644 index 0000000000..c3dee29010 --- /dev/null +++ b/doc/todo/p2phttp_serve_multiple_repositories/comment_3_429520e5411c5785b63598ffee7dbb95._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 3""" + date="2025-09-15T15:18:15Z" + content=""" +Seems the bug is specific to LOCKCONTENT. When doing other operations, +like CHECKPRESENT after the repo is deleted, the server returns +FAILURE and continues being able to serve more requests for that repo. + +Ah, the problem is that serveLockContent is running a block of actions in +a single inAnnexWorker call, which first sends on the LOCKCONTENT, then +blocks waiting for the unlock to arrive. Which never happens, so it remains +blocked there forever, consuming a worker thread. + +Fixed that, finally. +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment new file mode 100644 index 0000000000..02a8f992bd --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_16_db166b7303911b63ec458dfb5309862a._comment @@ -0,0 +1,36 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 16" + date="2025-09-11T15:46:52Z" + content=""" +I have reran with freshish build 10.20250828+git58-g38786a4e5e-1~ndall+1 and still observe those FAILs as before IIRC + +```shell +$> show-paths -e FAIL -f full-lines .duct/logs/2025.09.11T11.15.27-2500297_stdout +1016 Tests +1017 Repo Tests v10 locked +1025: git-remote-annex exporttree: FAIL (3.60s) +1206 Tests +1207 Repo Tests v10 locked +1215: export and import of subdir: FAIL (7.19s) +1225 Tests +1226 Repo Tests v10 locked +1234: export and import: FAIL (4.91s) +1268 Tests +1269 Repo Tests v10 adjusted unlocked branch +1277: git-remote-annex exporttree: FAIL (4.68s) +1299 Tests +1300 Repo Tests v10 unlocked +1308: export and import of subdir: FAIL (10.38s) +1330 Tests +1331 Repo Tests v10 unlocked +1339: export and import: FAIL (10.19s) +1373 Tests +1374 Repo Tests v10 adjusted unlocked branch +1382: export and import: FAIL (6.96s) +1428 Tests +1429 Repo Tests v10 adjusted unlocked branch +1437: export and import of subdir: FAIL (10.63s) +``` +"""]]
Added a comment
diff --git a/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment b/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment new file mode 100644 index 0000000000..e0cbf3b0e8 --- /dev/null +++ b/doc/bugs/35_failed_tests_on_beegfs/comment_15_86bd1e45651c6153128775af2c4eab57._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="yarikoptic" + avatar="http://cdn.libravatar.org/avatar/f11e9c84cb18d26a1748c33b48c924b4" + subject="comment 15" + date="2025-09-11T12:30:49Z" + content=""" +FTR: fixed in [10.20250828-58-g38786a4e5e](https://git.kitenet.net/index.cgi/git-annex.git/commit/?id=38786a4e5ec2dd697d2abf1ee93a927a9e9fcf41) +"""]]
diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn index 9eedac0c7e..ff868a0419 100644 --- a/doc/bugs/multibyte_characters_broken.mdwn +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -2,20 +2,23 @@ git annex add is not fully compatible with multibyte-characters in filenames and may generate filenames with invalid character sequences. ### What steps will reproduce the problem? +``` $ git init test; cd test $ git annex init test $ echo bla > 01-06\ 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨加\ -\ Tuxedo\ Mirage.flac $ git annex add 01* - +``` The last command generates an invalid character sequence as filename which, depending on the filesystem, may cause an error: Example output: + +``` add "01-06 \344\270\211\347\237\263\347\220\264\344\271\203\343\203\273\345\257\214\346\262\242\347\276\216\346\231\272\346\201\265\343\203\273\344\271\205\345\267\235\347\266\276\343\203\273\347\257\240\345\216\237\346\201\265\347\276\216\343\203\273\346\267\261\350\246\213\346\242\250\345\212\240 - Tuxedo Mirage.flac" .git/annex/othertmp/: openTempFile template ingest-01-06 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨�: invalid argument (Invalid or incomplete multibyte or wide character) failed add: 1 failed - +``` ### What version of git-annex are you using? On what operating system? git annex 10.20250630
diff --git a/doc/bugs/multibyte_characters_broken.mdwn b/doc/bugs/multibyte_characters_broken.mdwn new file mode 100644 index 0000000000..9eedac0c7e --- /dev/null +++ b/doc/bugs/multibyte_characters_broken.mdwn @@ -0,0 +1,30 @@ +### Please describe the problem. +git annex add is not fully compatible with multibyte-characters in filenames and may generate filenames with invalid character sequences. + +### What steps will reproduce the problem? +$ git init test; cd test +$ git annex init test +$ echo bla > 01-06\ 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨加\ -\ Tuxedo\ Mirage.flac +$ git annex add 01* + +The last command generates an invalid character sequence as filename which, depending on the filesystem, may cause an error: + +Example output: +add "01-06 \344\270\211\347\237\263\347\220\264\344\271\203\343\203\273\345\257\214\346\262\242\347\276\216\346\231\272\346\201\265\343\203\273\344\271\205\345\267\235\347\266\276\343\203\273\347\257\240\345\216\237\346\201\265\347\276\216\343\203\273\346\267\261\350\246\213\346\242\250\345\212\240 - Tuxedo Mirage.flac" + .git/annex/othertmp/: openTempFile template ingest-01-06 三石琴乃・富沢美智恵・久川綾・篠原恵美・深見梨�: invalid argument (Invalid or incomplete multibyte or wide character) + +failed +add: 1 failed + + +### What version of git-annex are you using? On what operating system? +git annex 10.20250630 +NixOS 25.11pre851350.3b9f00d7a7bf + +### Please provide any additional information below. + +Creation of the file fails due to zfs being set to only accept valid utf-8 filenames (utf8only=on, normalization=formD), which greatly helps me detecting encoding issues in filenames. +The original file obviously has a correct encoding, but it seems that git annex generates a new filename by just cutting of the filename after a specific byte, instead of taking character lengths into account. + +### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders) +I use git annex to manage my whole music collection successfully.
Added a comment: git annex bundle - questions
diff --git a/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment b/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment new file mode 100644 index 0000000000..c2d1dd3414 --- /dev/null +++ b/doc/forum/Equivalent_to_git_bundle__63__/comment_3_2935498815a3de295b7d573f28e12fdc._comment @@ -0,0 +1,14 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="git annex bundle - questions" + date="2025-09-11T08:02:08Z" + content=""" +I'm also interested in this feature, because I'd got git annex repo corrupted a couple of times due to a power loss. + +But it's not clear enough yet how it's supposed to work. Should it create an archive containing a git-bundle + annexed files containing ONLY files in this bundle commit range? + +It might be possible to write a script that does exactly that, but having something integrated into git-annex itself could be a bonus with cross-platform support (available on both windows and linux), standardized and ready for archival (e.g. bundles can be written periodically onto m-discs). + +I'm also wondering what if a current repository get corrupted (at least partially), will git annex be able to \"restore\" it's state after git-bundle-restore? +"""]]
Added a comment: resolved
diff --git a/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment new file mode 100644 index 0000000000..e4b8f2209c --- /dev/null +++ b/doc/forum/Current_git-annex_downloads_aren__39__t_available__63__/comment_1_570c154278cd2e94bcaf03eefeb9126e._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="psxvoid" + avatar="http://cdn.libravatar.org/avatar/fde068fbdeabeea31e3be7aa9c55d84b" + subject="resolved" + date="2025-09-11T05:12:29Z" + content=""" +Seems like it's fine now, thanks. +"""]]
noCreateProcessWhile to fix close-on-exec races
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
diff --git a/Annex/Multicast.hs b/Annex/Multicast.hs index 0af2d888db..a559c76c23 100644 --- a/Annex/Multicast.hs +++ b/Annex/Multicast.hs @@ -1,18 +1,23 @@ {- git-annex multicast receive callback - - - Copyright 2017 Joey Hess <id@joeyh.name> + - Copyright 2017-2025 Joey Hess <id@joeyh.name> - - Licensed under the GNU AGPL version 3 or higher. -} +{-# LANGUAGE CPP #-} + module Annex.Multicast where import Common import Annex.Path import Utility.Env -import Utility.Process -import GHC.IO.Handle.FD +#ifndef mingw32_HOST_OS +import System.Posix.IO +#else +import System.Process (createPipeFd) +#endif multicastReceiveEnv :: String multicastReceiveEnv = "GIT_ANNEX_MULTICAST_RECEIVE" @@ -20,8 +25,14 @@ multicastReceiveEnv = "GIT_ANNEX_MULTICAST_RECEIVE" multicastCallbackEnv :: IO (OsPath, [(String, String)], Handle) multicastCallbackEnv = do gitannex <- programPath - -- This will even work on Windows +#ifndef mingw32_HOST_OS + (rfd, wfd) <- noCreateProcessWhile $ do + (rfd, wfd) <- createPipe + setFdOption rfd CloseOnExec True + return (rfd, wfd) +#else (rfd, wfd) <- createPipeFd +#endif rh <- fdToHandle rfd environ <- addEntry multicastReceiveEnv (show wfd) <$> getEnvironment return (gitannex, environ, rh) diff --git a/Remote/Directory.hs b/Remote/Directory.hs index 75ec9b09cd..f204d50bf4 100644 --- a/Remote/Directory.hs +++ b/Remote/Directory.hs @@ -470,7 +470,7 @@ retrieveExportWithContentIdentifierM ii dir cow loc cids dest gk p = docopynoncow iv = do #ifndef mingw32_HOST_OS - let open = do + let open = noCreateProcessWhile $ do fd <- openFdWithMode f' ReadOnly Nothing defaultFileFlags (CloseOnExecFlag True) -- Need a duplicate fd for the post check. diff --git a/Utility/FileIO/CloseOnExec.hs b/Utility/FileIO/CloseOnExec.hs index a638ea2d9b..29e7c4b08a 100644 --- a/Utility/FileIO/CloseOnExec.hs +++ b/Utility/FileIO/CloseOnExec.hs @@ -42,6 +42,7 @@ import qualified Data.ByteString as BS import qualified Data.ByteString.Lazy as BSL #ifndef mingw32_HOST_OS import System.Posix.IO +import Utility.Process #endif closeOnExec :: Bool @@ -92,24 +93,22 @@ appendFile' :: OsPath -> BS.ByteString -> IO () appendFile' fp contents = withFile fp AppendMode (`BS.hPut` contents) -{- Unlike all other functions in this module, this only sets the - - close-on-exec flag after opening the file. Thus, it is vulnerable to - - races. - - - - Re-implementing openTempFile is difficult due to the current +{- Re-implementing openTempFile is difficult due to the current - structure of file-io. See this issue for discussion about improving - that: https://github.com/haskell/file-io/issues/44 + - So, instead this uses noCreateProcessWhile. - -} openTempFile :: OsPath -> OsString -> IO (OsPath, Handle) -openTempFile tmp_dir template = do - (p, h) <- I.openTempFile tmp_dir template -#ifndef mingw32_HOST_OS - fd <- handleToFd h - setFdOption fd CloseOnExec True - h' <- fdToHandle fd - pure (p, h') +openTempFile tmp_dir template = +#ifdef mingw32_HOST_OS + I.openTempFile tmp_dir template #else - pure (p, h) + noCreateProcessWhile $ do + (p, h) <- I.openTempFile tmp_dir template + fd <- handleToFd h + setFdOption fd CloseOnExec True + h' <- fdToHandle fd + pure (p, h') #endif #endif diff --git a/Utility/Gpg.hs b/Utility/Gpg.hs index 6c13392032..2566bfdf85 100644 --- a/Utility/Gpg.hs +++ b/Utility/Gpg.hs @@ -162,8 +162,10 @@ feedRead cmd params passphrase feeder reader = do #ifndef mingw32_HOST_OS let setup = liftIO $ do -- pipe the passphrase into gpg on a fd - (frompipe, topipe) <- System.Posix.IO.createPipe - setFdOption topipe CloseOnExec True + (frompipe, topipe) <- noCreateProcessWhile $ do + (frompipe, topipe) <- System.Posix.IO.createPipe + setFdOption topipe CloseOnExec True + return (frompipe, topipe) toh <- fdToHandle topipe t <- async $ do B.hPutStr toh (passphrase <> "\n") diff --git a/Utility/Process.hs b/Utility/Process.hs index 81fbef30bd..6052c7186b 100644 --- a/Utility/Process.hs +++ b/Utility/Process.hs @@ -1,5 +1,6 @@ {- System.Process enhancements, including additional ways of running - - processes, and logging. + - processes, logging, and amelorations for cases where FDs are not able to + - be opened with close-on-exec. - - Copyright 2012-2025 Joey Hess <id@joeyh.name> - @@ -21,6 +22,7 @@ module Utility.Process ( forceSuccessProcess', checkSuccessProcess, withNullHandle, + noCreateProcessWhile, createProcess, withCreateProcess, waitForProcess, @@ -46,7 +48,9 @@ import System.Exit import System.IO import Control.Monad.IO.Class import Control.Concurrent.Async +import Control.Concurrent import qualified Data.ByteString as S +import System.IO.Unsafe (unsafePerformIO) data StdHandle = StdinHandle | StdoutHandle | StderrHandle deriving (Eq) @@ -173,9 +177,34 @@ startInteractiveProcess cmd args environ = do (Just from, Just to, _, pid) <- createProcess p return (pid, to, from) --- | Wrapper around 'System.Process.createProcess' that does debug logging. +-- | Runs an action, preventing any new processes from being started +-- until it is finished. +-- +-- Unfortunately, Haskell has a pervasive problem with the close-on-exec +-- flag not being set when opening files. It's also difficult to portably +-- dup or pipe a FD with the close-on-exec flag set. So, this can be used +-- to run an action that opens a FD, and then calls setFdOption to set the +-- close-on-exec flag, without risking a race with a process being forked +-- at the same time. +-- +-- Note that only one of these actions can run at a time, and long-duration +-- actions are not advisable. +noCreateProcessWhile :: (MonadIO m, MonadMask m) => (m a) -> m a +noCreateProcessWhile = bracket setup cleanup . const + where + setup = liftIO $ takeMVar createProcessSem + cleanup () = liftIO $ putMVar createProcessSem () + +-- | A shared global MVar. Processes are not created while it is empty. +{-# NOINLINE createProcessSem #-} +createProcessSem :: MVar () +createProcessSem = unsafePerformIO $ newMVar () + +-- | Wrapper around 'System.Process.createProcess'. +-- This adds debug logging, and avoids starting a process when in a +-- noCreateProcessWhile block. createProcess :: CreateProcess -> IO (Maybe Handle, Maybe Handle, Maybe Handle, ProcessHandle) -createProcess p = do +createProcess p = noCreateProcessWhile $ do r@(_, _, _, h) <- Utility.Process.Shim.createProcess p debugProcess p h return r diff --git a/Utility/Process/Transcript.hs b/Utility/Process/Transcript.hs index 7bf94ffa05..cb71e30b91 100644 --- a/Utility/Process/Transcript.hs +++ b/Utility/Process/Transcript.hs @@ -45,7 +45,7 @@ processTranscript'' cp input = do #ifndef mingw32_HOST_OS {- This implementation interleves stdout and stderr in exactly the order - the process writes them. -} (Diff truncated)