The internet document class takes a different approach to the other programmes we’ve used so far. It is a LaTeX document class used with

\documentclass[<options>]{internet}

<options> can specify an output format like xhtml or epub or markdown amongst others. The document is then compiled to pdf, and text is extracted from the pdf. This text is in the requested format.

The github page shows that internet isn’t being actively developed.

Installation and usage

The readme on the github page is a bit low on detail, so here’s how installation worked for me after I first cloned the repo. The readme says first that all the internet<type>.<format>.code.tex files and the h<type>.def files and the internet.cls file have to be “linked into your local texmf tree.” One way to do this is to find out where this is (you can use kpsepath -n latex tex to find all the directories tex can see), for me (on Debian) it was at ~/texmf/tex/latex/ which didn’t exist.

There’s then some instructions for generating .tfm and .vf files which “need to be put somewhere that TeX can find them” - for me, /usr/share/texmf/fonts/vf/ and /usr/share/texmf/fonts/tfm/ worked (but I had to run texhash as root after putting them there).

I got a lot of errors about usage of \tl_to_lowercase:n. Using grep -ri tl_to_lowercase I found all files containing this and replaced every occurrence in every file with \tex_lowercase:D which at least silenced the errors.

At this point some of the test documents (markdown_text.tex, maruku_test.tex, basicmaths_text.tex) can be compiled with

./latex2txt.sh testfile.tex

as long as you are in the folder of latex2txt.sh. But the epub and xhtml tests fail with errors about \xhtm_verb:n and hyperref version mismatches. That’s above my pay grade, so I gave up.