Posted on Sat 18 February 2017

pandoc for the win

If you are faced with any sort of text formatting conversion problem, you should probably start with pandoc to solve it.

Suppose you have a portion of a website that you would like to turn into an ebook.

  1. Use wget -nc -nd -v -r -l1 $URL to retrieve it. -r sets up recursion, and -l1 limits it to one level deep. You might need more than that.

  2. Use a for loop and pandoc -f html -t markdown $infile -o $infile.txt to strip out CSS, JavaScript and other crap.

  3. Assemble your ebook with pandoc $file1 $file2 $file3... -t epub -o $ebook.epub

Et voila.

There are lots of other options, including the ability to add your own CSS, table of contents, cover image… It’s really quite nice.


© -dsr-. Send feedback or comments via email — by continuing to use this site you agree to certain terms and conditions.

Built using Pelican. Derived from the svbhack theme by Giulio Fidente on github.