Ask what has been the single greatest influence on literary research since the Sixties and the answer might be the Xerox machine, the jumbo jet or Jacques Derrida. Ask what will transform literary research in the next ten years and a likely answer is The English Poetry Full-Text Database. This project, whose three serial instalments will be complete this summer, has reportedly clocked up almost a hundred sales. That may not seem a lot, until you multiply it by the unit price of £30,000 (£5000 cheaper if you got in early). Chadwyck-Healey is a commercial publisher and the sales figures are now well past break-even into substantial profit. More important, enough databases are available for the establishment of an academic information infrastructure – user networks, newsletters, bulletin boards, help groups and team research projects. All English departments in the UK will have at least one colleague who is nagging the library to buy it or is actually using it.
The success of EPFTD has been achieved by brash pragmatism on the part of a small, independent, publisher in Cambridge whose name would not raise a flicker of recognition among the bulk of the book-buying public. It may be a portent. EPFTD represents the kind of alliance between profit-driven entrepreneurs and institutions of learning that Tory theorists foresee as the salvation of Britain’s higher-education sector. Not a penny of state, foundational or university funding went into a project which will richly benefit research and teaching. Many of the decisions that determined the shape of EPFTD can be queried on strict scholarly grounds, but meeting those queries would have delayed delivery to the market and increased the likelihood of competitors getting into a field where it was not clear that even one player could survive. At every stage, Charles Chadwyck-Healey and his editorial board have taken the nearest, speediest, cheapest option. Rather than fool around with optical scanners, Chadwyck-Healey used low-tech manual data input (two keyboarders in the Far East typed in the same text: where there were discrepancies the copy was checked for error. It is a system which is far from foolproof). Rather than deal with all the problems involved in the definition of what ‘English Poetry’ comprises – a pinhead on which academics would happily dance for eternity – Chadwyck-Healey simply took the 1350 poets listed as poets in the Cambridge Bibliography of English Literature. Rather than wait for CBEL3, as it is to be called – whose volumes will materially reshape the canon – Chadwyck-Healey took its list from the obsolescent NCBEL (1969-72), starting at AD 600 and stopping at 1900. Co-operation with CUP would have been logical but it would have eroded the profitability of the venture.
Rather than deal with the copyright problems which would arise from using the best, most recently-edited texts, Chadwyck-Healey has resorted to the out-of-copyright editions that PhD students are specifically warned not to use. This does not matter with the mass of minor poets, most of whom are represented in first editions because their work never merited reprinting. It does matter with canonical poets – who will most often be invoked by users of the EPFTD. The EPFTD records stanza form, line indentation, typographic effect and page breaks, but the visual image of the printed page and the actual form of the book are not available, something that reduces the value of the database for bibliographical analysis. Chadwyck-Healey decided to base its publication on CD-Rom, a technology which is reliable, easy to run, and almost impossible to pirate. But it is an information device which is nearing the end of its life. New systems of networking will undermine the whole notion of site-restricted materials on which retail distribution of the database is founded.
Chadwyck-Healey’s Gordian-knot approach has had one overriding benefit. The EPFTD is here and is accessible to every moderately resourceful student. In a couple of years even sixth-formers will be using it. Handling the package is no problem. Anyone who can manage a desktop computer and a CD player can learn to operate EPFTD in a few hours. The hardest trick is to think up sufficiently clever questions to keep it busy. Most important – for the future development of literary database materials – EPFTD has proved sufficiently profitable to warrant future investment. Big money will be needed to convert all ‘literature’ (if one is utopian, all ‘writing’) into machine-readable form. One of the problems about the literary-research database is that – unlike telephone directories, court summaries or dictionaries – there is no obvious market for new issues of the complete works of George Ogle, William Oldisworth or Richardson Pack (all represented in EPFTD). Until Chadwyck-Healey, it was not clear that literary criticism could afford the panoply of advanced technology. Riding on the success of EPFTD Chadwyck-Healey has followed up with English Verse Drama: The Full-Text Database. This will be launched in late 1994 and will be complete in 1998. A comparative snip at £10,000, it will gather 1500 works by 450 playwrights, with Shakespeare as its main attraction. Texts will be reproduced in original spelling (to save on the cost of editing). Again the selection is based on the NCBEL ‘Drama’ listings, and again the coverage stops short at 1900. If this proves another winner, pre-Edwardian fiction should be next in line.
EPFTD and its spin-offs have arrived at a timely moment. British libraries are bursting at the seams and have no plausible plan for accommodating the next ten years’ production of printed books beyond more compact shelving and remote storage. Chadwyck-Healey realises that although academics will bawl for their new high-tech toys, it is librarians who will buy them. The brochure for EVDFTD baits its hook with the claim that ‘the texts are more accessible than any printed source and yet take up no shelf-space. There are no problems of conservation and security of rare volumes and no significant cataloguing costs.’ In other words, buy our products, liberate hundreds of feet of shelves and fire half your workforce. With these bonuses, many libraries will see EPFTD as an irresistible purchase.
One can indulge some patriotic pride in EPFTD, the first commercially available corpus of its kind. But while beating the rest of the world, Chadwyck-Healey has done little to solve the most intractable problems involved in book-to-database conversion beyond making those problems acutely obvious. Foremost is the copyright obstacle. Cutting the coverage off at 1900 limits EPFTD’s utility drastically. Totality is the essence of any database. Most PhD theses in literature nowadays take modern writers as their subject and post-1900 literary material dominates school and higher-education curricula. In EPFTD, you can examine diction links between Keats and Tennyson, but not Tennyson and Yeats; you can compare the dramatic monologue techniques of Arnold and Browning, but not those of Browning and Eliot; you can examine usage of the word ‘time’ by Coleridge, but not Graves.
There are three ways in which Chadwyck-Healey might have had their way cleared to the creation of a total database. In an ideal world, publishers and copyright-owners would have released protected material for a small sum, with the stipulation that it was used only for bona-fide educational and research purposes. Increasingly, however, British publishers are moving in an opposite direction, charging as much in permission fees as the market will bear. There are no special cases. The Publishers’ Association has set its face against any electrocopying of protected materials as a trespass on primary rights of literary ownership.
In view of this intransigence an enlightened government might have amended copyright law to cover the limited use of copyright materials, as an extension of the fair-dealing provision (under which reviewers, for instance, quote from books). But reform of this kind is as unlikely as an outbreak of cultural philanthropy among publishers. As it is, Chadwyck-Healey’s only option was to negotiate with copyright-holders and pay publishers like Faber the going rate for all Auden, all Eliot, all Larkin, all Heaney. The dealings would have taken years, the final costs would have been astronomical, the restrictions crippling, and the price to the end-user prohibitive.
Clearly at some future point the law will have to come to terms with post-Gutenberg technology. The publishing industry can hold the line until the arrival of cheap optical character-reading devices. Then for an interim period we shall have the same situation as now obtains with xeroxing – widespread piracy and the criminalisation of the academic classes. Eventual rationalisation of copyright law as it affects electrocopying will most likely be pioneered in America as a necessary consequence of the 1993 Congressional enactment of the ‘Information Superhighway’ legislation. Sooner or later the UK will tag along.
It is instructive to compare EPFTD, a commercially driven project, with American counterparts which have been funded by the National Endowment for the Humanities and other grant-giving bodies. The Rossetti edition which is going forward at Virginia, at the Institute for Advanced Technology in the Humanities, is the brainchild of Jerome McGann. His aim is to create a total package which will contain digitised reproductions of D.G. Rossetti’s pictures, facsimiles of key editions of his verse, facsimiles and transcripts of his manuscripts and electronic texts of the poetry in the latest and best-edited form. Using Windows it will be possible to collocate on the same screen different kinds of item, add marginal notes and move material around. The student will be able to call up critical articles, biography and background materials.
The Rossetti multi-media, hypertextual, electronic-variorum bundle puts into practice a theory which McGann outlined to readers of this journal in February 1988. He described an ideal edition which would fix ‘the entire sociohistory of the work – from its originary moments of production through all its subsequent reproductive adventures’. Such an edition, McGann added, would be ‘a kind of analogue computer designed to reconstitute past texts and versions in forms which make them usable in the present and for the future’. The Virginia Rossetti project began not with any commercial motive, but with a visionary textual theory and a huge dollop of cash from IBM. When finished it will not be a commodity for sale but a public domain scholarly resource – something which will help it circumvent copyright restrictions. The daunting amount of computer power required to reproduce colour paintings and the array of contextual matter is not generally available outside central institutional facilities. This is not a study aid for sixth-formers.
Given a choice between the Chadwyck-Healey Rossetti texts in their raw form and McGann’s ‘analogue computer’, any scholar would choose the second. There are similar attractions in the MIT-based ‘Shakespeare Interactive Video Archive’, an ingenious package which – when perfected – will enable the student or teacher to weave into the discussion not just secondary critical material but visual ‘quotations’ from the library of Shakespeare films which are currently available on video disk. This is another project for which funding has come in the form of grants and institutional support. Its not-for-profit educational character enables it to draw – as Chadwyck-Healey could never have done – on current performances like Branagh’s Henry V. As a single item it clearly has the edge on EVDFTD’s rough-and-ready Shakespeare texts.
Textual theory and experimental pedagogy are producing impressive electronic tools in the United States. But no one in the UK is going to come up with funding on the scale that IATH and MIT enjoy. The commercial option is all we have. The consolation is that commerce seems to work very well. From the purely financial standpoint, EPFTD is a good deal for the customer. What you get for your £30,000 is the oeuvre of 1350 poets, running to some 4500 volumes, costing around £6.60 apiece (it would take, my calculator tells me, 112 years of the LRB to review them all). As an instant library, it is not expensive. But, of course, EPFTD is much more than a library. The database is one gigantic poetic text – the largest ever created. Over the next few years it should inspire a boom in concordance-making, influence-tracking, rhyme studies, Nemo clue-cracking, and the kind of criticism William Empson pioneered in The Structure of Complex Words and Josephine Miles in Eras and Modes in English Poetry. Hitherto undetected plagiarisms will be uncovered. Teachers will be able to customise anthologies for classroom purposes and – who knows? – the gems of William Oldisworth will be on every civilised person’s lips.
It would be nice to think of the Chadwyck-Healey databases as so much new gadgetry and nothing more. In fact, they will in time subvert the whole notion of what it is to be a scholar, a teacher or a student. Richard Altick called his 1960 guide for the aspiring student The Scholar Adventurers. The tiro entering the British Museum or the Widener could picture himself as Richard Burton going into darkest Africa. It was an ennobling thought that sustained many postgraduates as they hoed their lonely rows. The student at the keyboard with EPFTD at his command will feel less like a scholar adventurer than a scholar technician first-class – or, at worst, a scholar nerd. The whole idea of critical brilliance will come more and more to depend on the ability to devise elegant algorithms as the databases become larger, more interactive, with subtler tagging and search facilities. Institutions which have invested in EPFTD will surely direct new PhD students to use the resource, and the skills it develops will (like theory) mark a line of division between academic generations. By the end of the century theorists – the wild men and women of the Seventies and Eighties – may well find themselves the old buffers pitted against the new virtuosi of the virtual library.