The Perl script used to split the poems up makes use of the fact that almost all of the poems in pgbev10.txt (the plain text file from Project Gutenberg) are separated by two consecutive newlines. Unfortunately there are a few cases where this convention had not been adhered to. Instead of trying to brew multi-line regexps which I any was not competent to do at that time, I decided to change the text itself in these few places.
Originally, it had been my intention to make a SGML DTD for the poems and split the anthology into SGML documents conforming thereto, and thence convert them into a number of formats. This never materialised however; I ended up turning them into raw HTML.
If you want the Perl script I wrote, here it is. It is the first piece of Perl that I ever wrote and as such is very disorganised and unstructured; there are almost no comments and is composed of bits of code cobbled on as I learnt new constructs. The script receives the anthology on STDIN and splits it into poem/pgbev-1.html to poem/pgbev-883.html, so you need to make a "poem" directory.
You will probably also want my typographically modified anthology. It is compressed with the compressor bzip2 and weighs in at only 300-400kb (just under a third of its size uncompressed).
The HTML of the poems is I hope valid HTML and SGML. It uses CSS for all formating, so that it will not display properly in old browsers, and it is possible that in a browser that has vague and desultory support for CSS as, if I my memory serves me, old versions of Netscape Navigator 4 do, the text might appear unreadable (because the background-color directive is ignored/not inherited). Maybe you can remedy this by turning on the "always use my colours" feature in Edit/Preferences/Appearance/Colors.
Certainly the mark-up for some poems could be improved; "The Rime of the Ancient Mariner" (poem 549) is a mixture of verse and explanitive textual annotation, but there is no discrimination between the two and I cannot see a neat way of separating them.
The gloss definitions at the bottom of the poem are presented as DT-DD pairs in a DL (definition list) element which is, as I understand it, unsupported in older browsers, so glosses may appear jumbled. My Netscape Navigator 4.51 for linux cannot jump to the target of the gloss links in the text, but the meaning of the gloss term does appear in the status bar. Internet Explorer 5 does jump correctly to the links, but this is sometimes difficult to see if the text and gloss fit on the screen together.
If you are having difficulty with the glosses of poems, see above. I believe that the problem will likely be the fault of your browser.
Our budding anthologist, Sir Arthur Quiller-Couch, apparently went around hacking bits out of his victims' poems, patching fragments together and calling the results whatever took his fancy, though as far as I know he always credited these compositions to their original authors. So before reporting that a poem is missing two stanzas, it would be nice if you could check that the poem had those two stanzas in the original edition of the Oxford Book of English Verse. (I myself do not have the first edition but a later "new edition", in which some poems appear that were not in the original and vice versa, with the result that poem numbers are different).
Having read the above and met with a typo please e-mail me.