Archive for the 'Project Gutenberg' Category

Cutting up old books

Saturday, May 7th, 2011

Did I say ‘up’? I meant ‘open’. Bought a couple of candidates for everybody’s favourite free e-book repository. It turns out the two of them had never been read. I know this because the pages were still stuck together, and I had to cut them open with a pocket knife. One was from 1880, the other from 1897.

It’s not the first time I discover books in that state, but I still think it is weird. My guess is that these books have been in the inventories of book seller after book seller for well over 100 years, going from new to unsellable to antique to, oh no, unsellable after all. (Us Dutch have a word for products that overstay their welcome in a store’s inventory: winkeldochter, lit. shop daughter. The Germans say Ladenhüter, lit. shop guard.)

Distributed Proofreaders sends its 20,000th ebook off to Project Gutenberg

Sunday, April 10th, 2011

Today the counter for public domain e-books at Distributed Proofreaders says: “20,000 titles preserved for the world!” At the top of the Recently Completed Titles list is Niederländische Volkslieder by Hoffmann von Fallersleben, though I don’t know if that is the official #20,000.

Project Gutenberg expects to post its 30,000th English language ebook somewhere during this week. Late last year Distributed Proofreaders produced its 500th Dutch language ebook, Vanden Vos Reynaerde, of which I was one of the two post-processors (together with the mysteriously named Clog, who may have put in one or two hours more work than I, he said understatingly).

Distributed Proofreaders started in 2001 (the same year as Wikipedia). They were crowd-sourced projects 5 years before the term was invented by a Wired editor. Before Distributed Proofreaders it took volunteers about 40 hours to produce a single ebook for Project Gutenberg. This was a problem as volunteers often lost interest half way through, and abandoned projects without notification. Distributed Proofreaders doubled and in some instances tripled the production time, but was nevertheless a better proposition for volunteers. An ebook project got split into pages, and people could work on as little as a single page and still be productive. It then took a single person again to ‘glue’ those pages back together again in a single ebook, but by then a lot of the work had already been done.

The success of this new model is shown in the amount of books produced. Of the about 30,000 ebooks published by Project Gutenberg since 2001, two-thirds were produced by Distributed Proofreaders.

A welcome side-effect, the Project Gutenberg people assure us, is that the Distributed Proofreaders produced ebooks are of a consistent and high quality. Another side-effect is that this production model makes it easier to tackle difficult books.

Project Gutenberg started publishing electronic books before anyone else, in 1970, when then-student Michael Hart got a present of lots of computer time on the then nascent internet, which he used to publish out-of-copyright books there.

The spoils of 2010

Saturday, May 1st, 2010

Yesterday Queen’s Day. Grunt.

Comics:

  • Tangy & Laverdure, De verdwenen DC-8
  • De Blauwbloezen, De roos van Bantry

Books for the scanner:

  • Herman en Dorothea, Wolfgang von Goethe
  • De doorluchte vatenspoelster, Cervantes
  • Over toneel, Frans Mijnssen

Books for the Branko:

  • Momo, Michael Ende
  • Prodwitt’s Guide to Writing
  • Fotografie, Frans Naeff
  • Lucky Jim, Kingsley Amis
  • Paddeltje, de scheepsjongen van Michiel de Ruyter, Joh. H. Been
  • Carpe Jugulum, Terry Pratchett
  • Meneer Foppe in zijn blootje, Wim de Bie
  • Een tafel vol vlinders, Tim Krabbé
  • A Clockwork Orange, Anthony Burgess
  • Kramer Versus Kramer, Avery Corman
  • Maigret en het lijk aan de kerkdeur, Georges Simenon
  • Twee verhalen van Yury Kazakov

Lees meer antieke teksten

Saturday, January 10th, 2009

This posting in Dutch, as it wouldn’t work in English on so many levels.

Enfin! eindelijk kom je klaar. Je eindigt altijd met min of meer klaar te komen per auto. Met vuile handen zit je weer aan je stuur, hijgend en zweetend nog van ’t zwoegen, en met bezorgd gezicht staar je naar de lucht en naar den hemel, die reeds zijn avondkleur begint te krijgen.

Cyriël Buysse, De vrolijke tocht. Daarmee heeft het gewone volk ook wat leuks te lezen wat al meer dan 2 jaar geleden geschreven is.

Foei, Branko!

Werewolf? Wehr-wolf? Werwolf?

Sunday, November 30th, 2008

Taking, then, the actual existence of werwolves to be an established fact, it is, of course, just as impossible to state their origin as it is to state the origin of any other extraordinary form of creation. Every religious creed, every Occult sect, advances its own respective views—and has a perfect right to do so, as long as it advances them as views and not dogmatisms.

I, for my part, bearing in mind that everything appertaining to the creation of man and the universe is a profound mystery, cannot see the object on the part of religionists and scientists in being arbitrary with regard to a subject which any child of ten will apprehend to be one whereon it is futile to do other than theorize. My own theory, or rather one of my own theories, is that the property of transmutation, i.e., the power of assuming any animal guise, was one of the many properties—including second sight, the property of becoming invisible at will, of divining the presence of water, metals, the advent of death, and of projecting the etherical body—which were bestowed on man at the time of his creation; and that although mankind in general is no longer possessed of them, a few of these properties are still, in a lesser degree, to be found among those of us who are termed psychic.

From Werwolves by Elliott O’Donnell (1912).

Barbara Tozier also produced the fictional Wagner, the Wehr-Wolf by George W. M. Reynolds.

Also blogging elsewhere

Friday, August 15th, 2008

Although my posting frequency here never has been a thing to brag about much, lately it has dropped below the “once a week” that I unconsciously saw as a minimum. This is not because of the dreaded blogging fatigue, but because I’ve joined a couple of other blogs—which I must have written about once or twice before, so let this be just a gentle reminder.

Most of my time goes to 24 Oranges, weird and wonderful news about the Netherlands (English). (Or: just my postings.)

I used to post about twice a week at the Teleread blog, but since 2007 my Teleread posting frequency has also suffered. At first that was because of lots of paid work, but when I had more time later it went to 24 Oranges. (Or: just my postings.)

Finally, the past few weeks I have had four guest blogs up at the Iusmentis blog, which is Arnoud Engelfriet’s blog about the meeting of technology and law. Writing mainly about copyright and Project Gutenberg, I have posted the following items there (in Dutch):

I will try and translate, and then post these four entries either here or at Teleread, when I have the time. I put a lot of research into these postings, so it would be a pity to limit them to speakers of Dutch. Also, the readers of the Iusmentis blog have added some valuable comments that could use a larger audience.

Getting a little bit back from Elsevier

Tuesday, April 8th, 2008

The British-Dutch mega-publisher Reed Elsevier spent more than 3 million dollars in bribes lobbying fees in the US last year. What the publisher hopes to get back for this money? It probably won’t be a more balanced and more honest form of copyright. The US politicians that were bolstered by this “support” have been bullying most of the rest of the world into accepting always stronger and more bizarre forms of copyright. Those countries unwilling to participate are threatened with economic sanctions.

On January 1 of this year ‘t was more than 70 years ago that son of Elsevier founder Jacob G. Robbers died. In our current climate copyrights last insanely long, but not for ever. To be precise, in the Netherlands copyrights last until 70 full calendar years after the death of the author. On January 1 of this year I uploaded Herman Robbers’ De Vreemde Plant (The Strange Plant) to The Internet Archive. Please consider that a tiny remuneration from Elsevier for whatever copyright hell it’s going to loose on Dutch citizens.

(Lobbying story via Teleread.)

Ontboezemingen by Gabriël

Thursday, February 28th, 2008

Last week I posted a book to Project Gutenberg that I had talked about earlier (”Haddockisms“): Ontboezemingen by Gabriël, Carel van Nievelt’s pseudonym. Van Nievelt was a writer of fantasy and travel stories. Oddly enough he does appear from time to time in translated collections, but he has almost been forgotten in the Netherlands. Only his stories about Dutch India (what is now Indonesia) have recently been reprinted in their original language.

His fame declined during his lifetime. As Metamorfoze, the digitization project of the Dutch national library, writes:

[...] Van Nievelt was not popular with the Tachtigers [a literary movement that made l'art pour l'art, Branko]. They thought him old-fashioned, pathetic and sentimental.

[But] in his productive years he was a well-read author, and literary historians and critics paid much attention to his work: “The novelist Van Nievelt is Somebody,” a reviewer wrote in De Gids in 1884. But after that his fame faded quickly, and oblivion remained.

Snatch! Thanks to Project Gutenberg his name lives on a little longer. Ontboezemingen (Confidences) is Van Nievelt’s first book, and it contains a number of short stories and one farcical play. There are a number of stories about his travels to and time in India, and three love-letters (he continuously calls young women “nonnas”, the Italian for “grannies”). The play appears to be referenced earlier, when he describes how he got so bored at sea that he wrote a play, and he and his friends performed it, to pass the time.

With the help of countless volunteers I have transcribed the two song fragments in the book into Lilypond format, which means you can turn them into anything you want: Project Gutenberg has PDF and MIDI files of both songs. According to Van Nievelt the songs are supposed to be local, Indonesian compositions, but that is doubtful, as they follow Western chord progressions. The second tune (Gamelan) sounds supiciously like the first few notes of the theme tune to Dallas, by the way.

Haddockisms

Saturday, December 15th, 2007

(Due to untranslatableness of some words, rest of this entry be in Dutch.)

Om de een of andere reden associeer ik creatieve, kindvriendelijke scheldwoorden zo zeer met Hergé’s Kapitein Haddock, dat toen ik dergelijke scheldwoorden tegenkwam in een boek uit 1869, de bijzonderheid daarvan me niet eens opviel. Tegenwoordig kun je iedereen een koektrommel of wafelijzer noemen zonder dat het tot noemenswaardig trekken van wenkbrauwen leidt.

Het boek is Ontboezemingen van Gabriël (pseudoniem van Carel van Nievelt), en in een toneelspel slaan twee vrienden elkaar speels met hun hoeden; de een probeert een “serieuze” monoloog te houden, de ander onderbreekt hem daarbij met scheldwoorden: boekworm! … kinderkanibaal! … hutspotverknoeier! … mottige foliant! … vogelverschrikker!

Distributed translation experiment, conclusions

Friday, December 7th, 2007

A couple of lessons I learned from my distributed translation experiment:

1. Don’t worry about volunteers showing up. Initially nobody seemed to be interested in participating, but after a while somewhere from ten to twenty people turned up, which was more than enough for my purposes. I had advertised my experiment in four places: this blog, the Dutch forum at Distributed Proofreaders, a chatty general purpose Usenet group, and a mailing list for (non-literary) professional translators. OK, so do worry, a lot. :) Thing is, if you’ve made something interesting, people will come and take a peek.

2. Don’t just dabble. I set up the site as minimally as possible using the very simple Usemod wiki. Usemod is great because it so small; you can easily modify it if you have simple needs. Unfortunately, spammers found out about the site rather quickly and began hitting it heavily. If I had used better developed software, such as the Mediawiki, I could probably have turned on all kinds of anti-spam measures that were now not available to me, and that would have been too much work to develop. Even then I could probably have switched to Mediawiki, but that seemed too much work to me for a simple experiment. In hind-sight that would have allowed me to keep the experiment running, so it’s a pity I chose not to take that path.

3. Don’t underestimate your volunteers. I had assumed that the level of quality would be fairly high, but perhaps a little too consistent; and in order to remedy this I had planned to add a few bad translations myself (remember, the experiment was to measure differences in consistency). Not necessary, it turned out. The quality of submitted translations was both high and varied.

4. Let your volunteers find things out for themselves. I had planned a translation dictionary, but nobody used the pages I set up for that. No need to provide your volunteers with things you think they would need, only provide them with what they actually need.

Looking at other translation projects:

5. There are more ways to skin a cat. My experiment was set up to find out what happens when different volunteers tackle one paragraph at a time. That idea was borrowed from Distributed Proofreaders, where volunteers work at one page at a time. My fear was that you cannot slowly build a literary translation when every translated paragraph ends up with a different style (Wikipedia syndrome). My hope was that you could solve this problem by having post-processors try to smooth out the differences.

Harry auf Deutsch worked this way; volunteers would each get assigned a small bunch of pages; then chapter managers would iron out the differences chapter-wide, and a book manager would do something similar for an entire book.

I have since seen another distributed translation project that takes a radically different approach. Although volunteers there are still free to tackle a work one paragraph at a time, in practice they work on much more, sometimes even on entire novels at a time. The difference is that they limit themselves in the quality levels they try to achieve. The first volunteer or set of volunteers uses software to generate a machine translation. The second volunteer for a work tries to produce a rough translation from the machine translation. The third tries to clean up that rough translation a bit.