You can use git-annex as a podcatcher, to download podcast contents.
No additional software is required, but your git-annex must be built
with the Feeds feature (run git annex version
to check).
All you need to do is put something like this in a cron job:
cd somerepo && git annex importfeed http://url/to/podcast http://other/podcast/url
This downloads the urls, and parses them as RSS, Atom, or RDF feeds.
All enclosures are downloaded and added to the repository, the same as if you
had manually run git annex addurl
on each of them.
git-annex will avoid downloading a file from a feed if its url has already
been stored in the repository before. So once a file is downloaded,
you can move it around, delete it, git annex drop
its content, etc,
and it will not be downloaded again by repeated runs of
git annex importfeed
. Just how a podcatcher should behave.
templates
To control the filenames used for items downloaded from a feed,
there's a --template option. The default is
--template='${feedtitle}/${itemtitle}${extension}'
Other available template variables:
feedauthor, itemauthor, itemsummary, itemdescription, itemrights, itemid
catching up
To catch up on a feed without downloading its contents,
use git annex importfeed --relaxed
, and delete the symlinks it creates.
Next time you run git annex addurl
it will only fetch any new items.
fast mode
To add a feed without downloading its contents right now,
use git annex importfeed --fast
. Then you can use git annex get
as
usual to download the content of an item.
storing the podcast list in git
You can check the list of podcast urls into git right next to the files it downloads. Just make a file named feeds and add one podcast url per line.
Then you can run git-annex on all the feeds:
xargs git-annex importfeed < feeds
distributed podcatching
A nice benefit of using git-annex as a podcatcher is that you can
run git annex importfeed
on the same url in different clones
of a repository, and git annex sync
will sync it all up.
centralized podcatching
You can also have a designated machine which always fetches all podcstas to local disk and stores them. That way, you can archive podcasts with time-delayed deletion of upstream content. You can also work around slow downloads upstream by podcatching to a server with ample bandwidth or work around a slow local Internet connection by podcatching to your home server and transferring to your laptop on demand.
It seems that some of my feeds get stored into keys that generate a too long filename:
Is there a way to work around this?
git-annex addurl
already deals with this sort of problem by limiting the filename to 255 characters. If you'd like to file a bug report with details about your system, I can try to make git-annex support its limitations, I suppose.Looking forward to seeing it in Debian unstable; where it will definitely replace my hpodder setup.
I guess there is no easy way to re-use the files already downloaded with hpodder? At first I thought that
git annex importfeed --relaxed
followed by adding the files to the git annex would work, butimportfeed
stores URLs, not content-based hashes, so it wouldn’t match up.@nomeata, well, you can, but it has to download the files again.
When run without --fast,
importfeed
does use content based hashes, so if you run it in a temporary directory, it will download the content redundantly, hash it and see it's the same, and add the url to that hash. You can then delete the temporary directory, and the files hpodder had downloaded will have the url attached to them now. I don't know if this really buys you anything over deleting the hpodder files and starting over though.The only way it can skip downloading a file is if its url has already been seen before. Perhaps you deleted them?
I've made
importfeed --force
re-download files it's seen before.Joey - your initial post said:
...but how do I actually switch on the feeds feature?
I install git-annex from cabal, so I do
which I did this morning and now
git annex version
gives me:So it is the latest version, but without Feeds. :-(
cabal install feed
should get the necessary library installed so that git-annex will build with feeds support.Then I reinstalled
git-annex
but it still doesn't find the feeds flag.Do I need to do something like:
...but what are the default flags to include in addition to
-feed
-f-Feed will disable the feature. -fFeed will try to force it on.
You can probably work out what's going wrong using cabal install -v3
So I ran
cabal install -v3
and looked at the output,This looks like feed should be on.
There doesn't appear to be any errors in the compile either.
Is it as simple as a bug where this flag just doesn't show in the
git annex version
command?http://user:pass@site.com/rss.xml
but it didn't work.Hi,
the explanations to --fast and --relaxed on this page could be extended a bit. I looked it up in the man page, but it is not yet clear to me when I would use one or the other with feeds. Also, does “Next time you run git annex addurl it will only fetch any new items.” really only apply to --relaxed, and not --fast?
Furthermore, it would be good if there were a template variable
itemnum
that I can use to ensure thatls
prints the casts in the right order, even when the titles of the items are not helpful.Greetings, Joachim
importfeed
just runswget
(orcurl
) to do all downloads, and wget's documentation says that works. It also says you can use ~/.netrc to store the password for a site.The git-annex man page has a bit more to say about --relaxed and --fast. Their behavior when used with
importfeed
is the same as withaddurl
.If the podcast feed provides an
itemid
, you can use that in the filename template. I don't know how common that is. Due to the wayimportfeed
works, it cannot keep track of eg, an incrementing item number itself.