Posted on Sat 18 February 2017
If you are faced with any sort of text formatting conversion problem, you should probably start with pandoc to solve it.
Suppose you have a portion of a website that you would like to turn into an ebook.
Use
wget -nc -nd -v -r -l1 $URL
to retrieve it.-r
sets up recursion, and-l1
limits it to one level deep. You might need more than that.Use a
for
loop andpandoc -f html -t markdown $infile -o $infile.txt
to strip out CSS, JavaScript and other crap.Assemble your ebook with
pandoc $file1 $file2 $file3... -t epub -o $ebook.epub
Et voila.
There are lots of other options, including the ability to add your own CSS, table of contents, cover image… It’s really quite nice.