en

 
English | Nederlands

Background information

Origins

In 2005, the ING received a subsidy from the NWO to continue with the retrodigitisation of its own series entitled: the Rijks Geschiedkundige Publicatien (RGP). This series is the largest national collection of documentary editions in the field of Dutch history. Because the series consists of more than 450 separate volumes, a choice had to be made. Among the issues considered when making this choice was the (international) demand for a publication as well as the extent to which digitization would help unlock the source publication.

Outcomes

A special application has been developed whereby the facsimile is the starting point. The application not only makes it possible to ‘virtually’ browse through the book, but has been built up layer by layer. This allows other methods of unlocking the material to be used to improve access. This can involve compiling a contents page with hyperlinks, or indices or even a database with keywords. The actual text is presented as an image, as well as an electronic text. This could have come about as a result of OCR (Optical Character Recognition), by re-typing it, or because the original text was already in electronic format.

Advantages of the facsimile method:

  • The user sees that the edition has already been published earlier (and that it therefore concerns a source that has been edited previously.).
  • The original layout of the book is largely preserved.
  • When quotes are used, it is possible to simply refer to the page in question. Old references to the book version are still valid.
  • By making use of the disclosure of the actual book, the ING does not have to delve once more into the contents of the (aged) material.
  • With the same amount of expenditure, a multitude of digital books can be produced. Not only will this result in a larger number of sources but it will also lead to many more searchable texts.

Remarks relating to the facsimile method:

  • In the majority of cases, the text is automatically recognised via OCR. However, the OCR software does not recognise every single character in a text and is therefore not 100% accurate.
  • This inhibits searches for the frequency of specific words, word use, etc.
  • The text is not codified, so it is not possible to unlock it further based on this.

In order to obtain a text that is completely accurate, the text would either have to be typed out again in full, or edited. This option was dismissed out of hand.

Reasons:

  • Typing a text from scratch is very labour-intensive and expensive; the costs per page would be twenty times as much.
  • It is expected that OCR techniques will be improved in the future.
  • Aside from making it easier to search for specific words, combinations of words and elements of words, a completely accurate text has little added value. In many cases, the old-fashioned spelling of some words and antiquated use of language makes it difficult to carry out a search in the first place. Moreover, different languages are often used simultaneously in one single historical source.
  • The modern look of a freshly typed text might give the user the impression that the text is actually quite modern and has only recently been unlocked, instead of being a selected edition with dated editing principles, as a result of which quite a lot of the original material could have either been omitted or summarised
  • A completely accurate text would first have to be codified to give it any added value. This would be very labour-intensive and would demand (expensive) expertise with regard to the content.

Completely accurate text:

  • Besides, it is always possible to add a completely accurate text to the application later on, and these will certainly be available in future for a few publications. Electronic copies of books that have already been published (during the last decade) are available.
  • In the case of very structured publications such as the Repertorium van Vertegenwoordigers in Binnen- en Buitenland [Repertorium of Dutch representatives at home and abroad. In Dutch] the OCR will be corrected and the text will be codified so that it will eventually be possible to search and select separate parts of the text. These publications will be offered through a different application which has yet to be developed. One example of this is the Biografisch Woordenboek van Nederland which has already been digitised.

Unlocking the contents

A concerted effort has been made to stay as close to the original edition during the process of unlocking it, using the minimum amount of means. The means of disclosure are:

  • Clickable contents page (for every edition).
  • The possibility to search for specific words (available with every edition).
  • Fixed elements in letter and document headings for each article such as date, name of correspondent, or title (if available).
  • Clickable index that can be searched (for the time being, in a few cases).