The Internet Archive allows members to upload collections using an Amazon S3 compatible API, and this can be used with git-annex's S3 support.
So, you can locally archive things with git-annex, define remotes that correspond to "items" at the Internet Archive, and use git-annex to upload your files to there. Of course, your use of the Internet Archive must comply with their terms of service.
A nice added feature is that whenever git-annex sends a file to the
Internet Archive, it records its url, the same as if you'd run git annex
addurl
. So any users who can clone your repository can download the files
from archive.org, without needing any login or password info. This makes
the Internet Archive a nice way to publish the large files associated with
a public git repository.
Sign up for an account, and get your access keys here: http://www.archive.org/account/s3.php
# export AWS_ACCESS_KEY_ID=blahblah
# export AWS_SECRET_ACCESS_KEY=xxxxxxx
Specify host=s3.us.archive.org
when doing initremote
to set up
a remote at the Archive. This will enable a special Internet Archive mode:
Encryption is not allowed; you are required to specify a bucket name
rather than having git-annex pick a random one; and you can optionally
specify x-archive-meta*
headers to add metadata as explained in their
documentation.
# git annex initremote archive-panama type=S3 \
host=s3.us.archive.org bucket=panama-canal-lock-blueprints \
x-archive-meta-mediatype=texts x-archive-meta-language=eng \
x-archive-meta-title="original Panama Canal lock design blueprints"
initremote archive-panama (Internet Archive mode) ok
# git annex describe archive-panama "a man, a plan, a canal: panama"
describe archive-panama ok
Then you can annex files and copy them to the remote as usual:
# git annex add photo1.jpeg --backend=SHA1E
add photo1.jpeg (checksum...) ok
# git annex copy photo1.jpeg --fast --to archive-panama
copy (to archive-panama...) ok
Once a file has been stored on archive.org, it cannot be (easily) removed from it. Also, git-annex whereis will tell you a public url for the file on archive.org. (It may take a while for archive.org to make the file publically visibile.)
Note the use of the SHA1E backend when adding files. That is the default backend used by git-annex, but even if you don't normally use it, it makes most sense to use the WORM or SHA1E backend for files that will be stored in the Internet Archive, since the key name will be exposed as the filename there, and since the Archive does special processing of files based on their extension.
It doesn't seem like git annex addurl by itself supports the archive.org urls...
[[!format txt """ anarcat@marcos:presentations$ git annex addurl --file=re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm addurl re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm failed to verify url exists: http://archive.org/download/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm failed git-annex: addurl: 1 failed """]]
I also tried the "details" url (http://archive.org/details/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia) - but that just downloads the webpage, not the video either...
Even the ultimate video URL doesn't work:
[[!format txt """ anarcat@marcos:presentations$ git annex addurl --debug --file=re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm [2013-10-09 18:26:30 EDT] call: quvi ["-v","mute","--support","http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm"] addurl re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm [2013-10-09 18:26:30 EDT] read: curl ["-s","--head","-L","http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm","-w","%{http_code}"]
failed to verify url exists: http://ia601009.us.archive.org/9/items/Republica2012-EbenMoglen-FreedomOfThoughtRequiresFreeMedia/re_publica_2012Eben_MoglenFreedom_of_Thought_Requires_Free_Media.webm failed git-annex: addurl: 1 failed """]]
... even though that URL actually gives out a proper 200 OK response code.
Any ideas? --anarcat
This was a misleading error message. The url you are trying to add to the file does not match the size recorded for the file already in the annex. (Or possibly the file's key has no recorded size). If you really want to add the url to the file despite it being a different encoding, you can use --relaxed, although fsck may not like the result if you ever end up downloading that url..
(Please file bug reports for problems in the future, rather than posting comments on only vaguely related pages which as we can see here can turn out to be entirely offtopic.)