Monthly Archives: September 2012

New open-source dictionary for FBReader on Android

I’ve just finished work on a new dictionary for FBReader on Android (FBReaderJ). The code is on my github account (a fork from geometer’s official version, which is available from the official website or the official github account). It doesn’t look like much at the moment, and frankly the code’s a little bit of a mess, but it reads the dictionary files! And it puts words on the screen! Seriously, what more could you ask for??!

Anyway, the main reason I wanted to do this was that as far as I could see there didn’t seem to be a suitable open-source Android dictionary app to go alongside FBReaderJ. At the moment the only option is to install a closed-source App, which FBReader uses as an external service. This meant that there was nothing to work on to make the customisations I wanted to make. So I thought I’d remedy the situation!

What customisations, you might ask? Well, it may sound silly, but one major thing was that I wanted to be able to move from one entry to another without having to press the screen twice. The existing support requires you to press the screen to make an entry disappear, before pressing it again to bring up another one. Small, but can have a high impact on usability, particularly for a language learner.

Of course, I’ve since realised that my efforts were probably largely unnecessary. I probably didn’t have to reimplement StarDict dictionary support for Java – that was already available as part of the (mostly extinct) babiloo project, it seems. Also, I possibly could have figured out how to use the Android APIs to do a textual query to some service provided by (closed-source) ColorDict or something (not sure about this; what I’m really thinking of is the Android Quick Search Box functionality, and as far as I can see [from this page], it brings up its own dialog on screen). Also it’s just occurred to me that I might have been able to get the ColorDict Activity to behave in a similar way by giving it some kind of rendering hints in the Intent or something. I don’t know.

On the other hand, there are other customisations that I want to make, but more on those another time… 🙂

Well anyway, here’s a free screenshot in case you don’t believe me. 🙂 As you can see, it’s not displaying the whole entry (but this should be comparatively simple to fix). The text is from WordNet, I believe.

Right, now I’m off to bed ’cause it’s one o’clock in the morning. Either that or to stay up all night coding on my next crazy idea. Not quite sure yet.

Leave a comment

Filed under EBooks

Turning ePub chapter end-notes into “popups”

Lately, as part of my efforts to learn French, I’ve been trying to read Rivarol’s French translation of Dante’s Inferno on my Android mobile phone. I’ve mostly been using Kindle, but I have also tried FBReader. I’m not sure what format Kindle uses, but with FBReader(J) I’ve been using the ePub version from gutenberg.org.

One problem I’ve been finding is that all of the end-notes are at the end of the chapter (as you might expect), and it is hard to flip back and forth between the end-note and its reference. To remedy this, I decided to find a way to make the end-notes appear as “popup” boxes over the main text. Unfortunately, this was not directly possible with the fairly limited ePub format (or, at least, not with FBReaderJ). A simple solution was to (1) put all of the end-notes at the end of the document, rather than the end of the chapter (so that you wouldn’t have to skip over them), (2) create hyperlinks allowing you to jump back and forth between each reference and its corresponding end-note, and (3) put a page break after each end-note entry so that you only see one at a time. To achieve (3), one way in FBReaderJ is to simply put a new level-2 heading at the start of each entry, which causes it to be put on a new page.

Luckily, the ePub format is very simple, consisting of just a few XML files contained in a ZIP archive, with the actual content being stored in XHTML format. My usual approach to this kind of problem would be to hack something up in Perl using regexps. However, this time, I decided to do something in Java, partly because I had some kind of idea that I might try to integrate it into FBReaderJ at some stage. This has probably led to it taking twice as long and ending up twice as complex, but on the other hand it has improved my knowledge of Java’s XML libraries.

Here is an example of the result, looking at a paragraph from Dante’s “l’Enfer” chapter 18 (the text is public-domain). Notice the blue link reading “[end-note 1]”. Before processing, this was simply “[1]” (and not a link). I added “end-note” to make it a bigger target.

And here is what you get if you click it:

…And there’s a link back to the reference’s location in the main text. As you can see, the link text I have added is in English. In retrospect, perhaps it should be in French :).

It took surprisingly long to write the code for this, considering how simple it seems now. But I did have to update the OPF descriptor file, and the toc file as well, and create tables of all the relevant ids to create the links, and then separate the end-notes out into a separate document so they could all go at the end without risking hitting some obscure ePub file size limitation, all while wrestling with the w3c DOM in Java… you get the picture :). I think my program should work with all of the rest of the ePubs in the same series (Rivarol’s French Dante – so far I’ve only tried it with l’Enfer [Inferno]), and it should be fairly easy to adapt it to work with slightly differently formatted documents.

I am continuing to work on trying to make FBReader a better platform for language learning. I have a funny feeling my next stop is going to be to do something with ColorDict…

Leave a comment

Filed under EBooks

Chinese font with magic embedded Pinyin!

I’ve just about finished working on a new Chinese font (actually, an extension of WenQuanYi Micro Hei) that includes something I like to think of as “Pinyin characters”.

UPDATE: you can now download the font, and code, from my github repository. Just click the button that says “ZIP”.

Basically, each Pinyin syllable in the chinese text is converted into a character-sized block containing the Pinyin. The tone mark is also moved and enlarged to make it more visible. At the moment, the Pinyin must be entered using numbers for tones, and colons for umlauts, with a space after each syllable, as in “nu:3 ren2 ” (if you’re wondering, the space forms part of the “ligature”). The conversion happens automatically if your browser or application is set up with proper ligature support, and with the font correctly installed. I have currently managed to get this to work in both gEdit (a text editor) and Google Chrome on Linux. With a little work, it should be possible to get it to work in recent versions of IE and Firefox too (and probably on Windows/Mac as well as Linux).

The characters are generated automatically by a Python script, which needs to be imported into the FontForge user interface. The whole set is generated (on my PC) in under 10 minutes, and takes up slightly less than the whole “Private Use” area of Unicode (and also just slightly less than the maximum allowed number of ligatures in “liga”, I think).

First, here is a sample of some Chinese text with embedded Pinyin, using the standard “WenQuanYi Micro Hei” font, displayed in Google Chrome:

Image

And here is the same text in my modified font, still in Google Chrome:

Note that the only difference here is that a different font is selected! The text is identical. Same characters, same number of characters; just some of them are grouped into ligatures. There is no need to re-encode the Pinyin; and saving to a file from gEdit (for instance) causes the original Latin characters to be saved, not the ligatures.

Why would I want to do this? Well, as with most of the things I do, it just seemed like a good idea at the time, I suppose. But the principal motivation really was that I have read online that one of the main reasons for the Chinese not using Pinyin more is that it doesn’t look right alongside Chinese characters. So I thought, what if I could change that? If Pinyin was made to fit in better with ordinary Chinese characters, would the Chinese start using Pinyin more? Would they find it useful? Or does it miss the point? I think it would be interesting to find out.

Leave a comment

Filed under Chinese