This is code for doing simple processing on HTML. I know there are bugs and limitations in the code, but it suffices for simple purposes. Among the limitations: This is an HTML parser, not an SGML parser - it does not accept a DTD, rather the model of HTML is built into the code. Also it does not validate the HTML - it will attempt to parse invalid documents, and the results are undefined if the document is in error.
It runs under perl 4.0 patch level 36. I don't know about other versions of perl. This directory contains:
<META name="status" content="Internet Draft"> <META name="title" content="Internet audio protocol"> <META name="date" content="July 1983"> <META name="author" content="Nixon, Haldeman">(The META tag is not officially part of HTML, it was proposed by Roy Fielding.) The tags should be in the HEAD.
Although I don't see that it's required, it seems to be the custom that RFCs have a left margin of three characters, so this code does that too.
Since table processing doesn't work, I suggest you use the PRE tag for the title page. As a special hack, the very first PRE tag is not indented.
<!DOCTYPE HTML [ <!entity % HTML.Minimal "INCLUDE"< <!-- Include standard HTML DTD --< <!ENTITY % html PUBLIC "-//connolly hal.com//DTD WWW HTML 1.8//EN"< %html; ]<
<TT>foo</TT>yields "foo ,".