Compiling LaTeX to html with pandoc
Pandoc is a hugely versatile document converter written in Haskell. It is in active development with many contributors. There are many pandoc questions on tex.stackexchange.
This is the output for the tex file I’ve used in earlier tests.
Usage
pandoc --toc inputfile.tex -s --mathjax -o outputfile.html
The --toc
option produces a table of contents (which will otherwise be
omitted even if your LaTeX file has \tableofcontents
), and -s
makes it
produce a complete html file. Images work well, double primes are
legible even if they aren’t aligned exactly right. It doesn’t know
about nolinkurl
from the LaTeX hyperref package and silently fails to
render \href
s whose link text uses nolinkurl
. qedhere
doesn’t
render.
Issues
The big problem is that section numbering and environments like theorem, definition, lemma, proof, and so on don’t work: in the html output section numbers are omitted and the text contents of the environment appears with no decoration.
I can’t find an easy way to fix this. There are several numbering filters for pandoc, e.g. pandoc-numbering or pandoc-eqnos, but they are not designed for conversion of LaTeX to html. There is an issue filed at the pandoc github asking for amsthm support, and it’s still open. Someone on the issue thread created pandoc-amsthm but despite the promising name, so far as I can see it is not for LaTeX-html conversion.
The University of Nevada, Reno has a page about math accessibility in which they describe their LaTeX-html conversion process. They create an html file rendering the math with mathjax, and then “update” it to restore theorem and definition environments and numbering. It’s not clear if they have an automatic tool for this, and the email address they provide for queries rejects mail from people not subscribed to their list.
Conclusion
It is probably possible to adapt pandoc or produce filters which will add environment support and numbering to its LaTeX-html output, but the lack of them at the moment makes it currently unsuitable.
UPDATE
As far as I know there still isn’t a good theorem numbering solution. But I’m getting more convinced that pandoc is the right way to go. This template produces really nice output (especially if you fiddle with it a bit) and I’ll be using it for MATH0005 lecture notes this year - there will be a link on my homepage.
UPDATE 2
Gavin McWhinnie sent me a link to his Pandoc filter that works with amsmath and amsthm to allow equation, section, and environment numbering. You can read the docs here and there’s an example document here.