Guide to importing dictionaries

English | Français | Русский

Click here to go to the documentation index

Introduction

This document provides a guide to using the Wunderkammer Import Package 2 to port electronic dictionaries for display on mobile phones through Wunderkammer. There are three major steps to this process:

  1. Ensuring that the original dictionary file is in a suitable format
  2. Setting up the dictionary configuration in wkimport
  3. Transferring and installing the dictionary on mobile phones

There are two optional steps that can be undertaken to further customise the dictionary:

  1. Making a custom theme file
  2. Making a custom font (advanced)

Note that the Wunderkammer Import Package requires Java to run. This can be downloaded for most major computer platforms for free from the linked website if it is not already installed.

Common problems that arise when importing dictionaries are listed in the troubleshooting section. If you encounter a problem that is not listed here, write to James at the address james followed by the at sign and then pfed dot info

Source dictionary format

The Wunderkammer Import Package can read and convert dictionaries stored in the backslash coded format used by Shoebox/Toolbox and in the XML format used by Kirrkirr. Although it is possible to create a mobile phone dictionary directly from an existing electronic dictionary, there are a few design features of Wunderkammer that should be taken into consideration when importing dictionaries to make the most of the platform.

Wunderkammer does not have any real support for multiple senses within a single entry. It is possible to suggest a subgrouping of fields in an entry through the order in which they appear (as long as wkimport is set to take the field order from the source dictionary - see under Mappings tab below). However, Wunderkammer does not recognise any groupings below the level of the entry and so it is not possible, for example, to target a menu search or link to a particular part of an entry. Long entries with multiple senses might also be difficult for users to read on their phones because they may have to scroll down a long way to read the entire entry. The best strategy for formatting dictionaries that make use of multiple senses is probably to divide the senses into separate homonymous entries.

The importing package automatically uniquifies the entries in the input dictionary. If there are any homographic entries in the input dictionary - that is, if there are any entries that have identical lemmas - then a number will be added after each of the lemmas to distinguish them. The numbering starts from 1 for each group of identical lemmas. For example, in a dictionary that contains two entries, each of which has the lemma turla, the lemmas will be renamed to turla 1 and turla 2.

If a link points to a lemma that is uniquified when imported then the link will be broken. The uniquifying and link verification routines print lists of the lemmas uniquified and the broken links to the console so it is possible to see which lemmas were uniquified and which links were broken.

When the standard uniquify method renames a lemma, it does not check that there is not already a lemma with the new name. Dictionary makers should try to avoid using names with the same format as those outputted by the uniquifying method to prevent the possibility of creating new homographic entries through uniquification.

Dictionary configuration

The importing process is driven by the Java application wkimport.jar. On most computer platforms that have Java installed, it should be enough to double click on wkimport.jar to start the program.

When wkimport is started for the first time, it will detect the operating system language and attempt to display the user interface in that language. If wkimport does not know the operating system language, it will default to English. The wkimport user interface language can be manually changed on the Settings > Language menu.

Before it can import a dictionary, wkimport needs to know where the source dictionary content is stored and how the content should be presented in Wunderkammer. This section provides a guide to what information is required and how it can be entered into the three tabs of the wkimport user interface:

The configuration data entered into wkimport can be saved and loaded using the Save and Open commands on the File menu. Two sets of sample dictionary data have been included in the Wunderkammer Import Package to provide working examples of dictionary configurations. The Kaurna sample dictionary is an XML dictionary. Its configuration file can be found at ./demodics/kaurnademo/kaurnaconfig.cfg The Tura sample dictionary is a Shoebox/Toolbox dictionary. Its configuration file is at ./demodics/tourademo/touraconfig.cfg

Once all the required data have been entered, the Wunderkammer dictionary can be created by selecting Create dictionary from the Run menu. The user interface view will jump to the Console tab where information about the progress of the importing process and reports of any errors will be shown. The jar and jad files for the resulting dictionary will then be available in the output directory specified in the Input/Output tab.

Input/Output tab

The Input/Output tab collects information about the dictionary source and output files. This information can be typed into the text fields. Text fields that require file or directory paths have buttons to their right that can be clicked to open a file selection dialog box that automatically enters the path of the selected file into the text field.

Input/Output tab Figure 1. Input/Output tab.

The data required in the text fields are:

Mappings tab

In the Mappings tab fields in the source dictionary can be 'mapped' to fields in the output dictionary. This means that a correspondence between the fields is established so that wkimport knows, for example, that an lx field in a source dictionary should appear as a lemma field in the output dictionary.

When wkimport is first started the Input fields list will be empty. If the input dictionary is a Shoebox/Toolbox dictionary, the list can be automatically populated from the input dictionary by clicking the Populate list button immediately below the Input fields list, as can be seen in Figure 2 below.

Figure 2. Mappings tab Shoebox/Toolbox dictionary.

If the source dictionary is an XML dictionary, however, the XPaths for the input fields must be entered manually by clicking the Add XPath button and then entering the XPath in the dialog box that appears. Note that the XPaths must refer to XML elements; they cannot refer to XML attributes. This can be seen in Figure 3 below.

Figure 3. Mappings tab XML dictionary.

Both for Shoebox/Toolbox and XML dictionaries, unwanted input fields that appear in the Input fields list can be removed from the list by selecting the field and then clicking the Remove selected button immediately below the list.

To establish a mapping, select a field from the Input fields list on the left, select its corresponding Wunderkammer field from the Output fields on the right, and then click the Map button. The new mapping should then appear in the Mappings list at the bottom. It is possible to create multiple mappings from one Input field to different Output fields or from different Input fields to a single Output field. Unwanted mappings can be removed by selecting the mapping and clicking the Remove selected button immediately above the right side of the Mappings list.

Each of the Output fields has a conventional association to a particular type of data that is typically stored in dictionaries. These associations are spelt out in the list below.

Note that even though most of the fields have conventional associations to particular types of data, the data in sd, pos, glossdef, ri, rii and riii fields are treated simply as plain text by Wunderkammer. This means that any type of data intended to be displayed as text could be stored in these fields. The way that text should be rendered in each of these fields and the link field is determined by the theme. See below under Custom themes for information on how to modify themes. All of these fields can be repeated within a single entry.

The other fields are treated specially by Wunderkammer and must contain specific types of data. The lemma field must contain the lemma, or headword, of the entry. The sound and image fields must contain the names of sound and image files that should be played or shown in the entry. The link field must contain the value of the lemma of the entry that it links to. There can only be one lemma and sound field in each entry. The image and link fields can be repeated in a single entry.

The In entries checkbox below the Map button is used to determine whether the specific mapping should be shown in entries in the final dictionary. Some fields are only included in the input dictionary for the purpose of making indexes and should not appear in entries in the final dictionary. For example, an input dictionary might have a reverse index field that contains values that are the same as or simply tranformations of those in a gloss field, e.g. from gloss 'swamp grass' to 'grass, swamp' or 'cockatoo' to 'cockatoo'. When In entries is selected and a mapping is made, the output field will be shown in entries in the final dictionary. When In entries is not selected the field will not be shown in entries in the final dictionary. In the Mappings list fields that will be shown in entries are marked as true and those that will not be shown are marked as false. In Figures 2 and 3 above it can be seen that the ri field is marked as false, since it is simply used in these dictionaries for creating a reverse index and should not appear in entries.

The checkbox Field order from source dictionary, immediately above the Mappings list, is used to determine whether the order in which fields are shown in the output dictionary follows the order in which they appear in the input dictionary or whether it follows the order of the mappings in the Mappings list. When the box is not selected, the fields in entries in the output dictionary will be in exactly the order listed in the Mappings list (except for lemma fields, which are not part of the body of entries). The order of fields in the Mappings list can be adjusted using the up and down arrows to the left of the Field order from source dictionary checkbox. When the checkbox is selected, the order of fields in entries in the output dictionary will be the same as those in the input dictionary. If the order of fields in the source dictionary is not consistent from entry to entry, this inconsistency will appear in the output dictionary. The Field order from source dictionary checkbox cannot be selected for XML dictionaries.

The Menus tab allows the menus of the output dictionary to be specified. The Wunderkammer menu system is structured as a tree. The first menu that is loaded is always the root menu. From this menu there can be any number of submenus that are embedded to any depth. Each submenu displays a list of the data contained in the field that the submenu is associated with. For example, a submenu associated with the field lemma will display a list of all lemmas in the dictionary. Submenus that are embedded within other submenus will not show the fields of all entries in the dictionary, but only those that would be contained under the item selected in the menus they are embedded in. For example, in the case where there is a menu of semantic domains that contains a menu of lemmas, when the user selects a semantic domain from the first menu only lemmas of entries within the selected semantic domain will be shown in the embedded menu of lemmas. When a user navigates to the bottom of the menu system they will be taken to the entry that corresponds to the last selected menu item.

Figure 3. Menus tab.

A submenu can be added to the tree by selecting the menu that should be its parent and clicking the Add child button. The name of the menu that will be displayed to the user in Wunderkammer can be set in the Menu name text field, the entry field that it is linked to can be selected in the Field selection box, and the sort order used for the menu can be entered in the Sort order text field. The syntax for describing sort orders follows that used by the Java RuleBasedCollator. To confirm changes to these properties of menus, click the Update node button. Unwanted menus can be removed by clicking the Remove selected button.

The Custom font button can be used to load a custom font for displaying the menu tree, menu names and sort orders. Custom fonts might be needed for languages that use non-Roman scripts or special Roman-based characters (see Custom fonts for more information). Custom fonts must be installed on the host computer for wkimport to be able to use them.

Dictionary install

To run a Wunderkammer dictionary the jar and jad files produced by wkimport for the dictionary must be transferred to a mobile phone. The files could be transferred from a computer using Bluetooth, removable memory cards or a USB connection, depending on what options are available on the phone and the computer the files are being transferred from.

If a phone has internet access, it may also be possible to download the dictionary directly from the internet on to the phone. For instance, the Kaurna dictionary demo MIDlet can be downloaded by opening a phone's web browser and taking it to the address http://www.pfed.info/wunderkammer.jad. Note that the mobile network operator may charge extortionate fees for the data transfer. It costs nothing to transfer the files directly from a computer to the phone using any of the methods described in the paragraph above, however.

It should be fairly straightforward to install (if necessary) and run the files once they have been transferred to the mobile phone. There is too much diversity in mobile phone models to be able to describe the steps required here. Information about how to install software on particular phone models can probably be found in the phone manual or online.

Since Wunderkammer is a Java ME program, it cannot be run in Java SE, the standard environment used on desktop computers. To run a Wunderkammer dictionary on a computer, it is necessary to use an emulator. There are several Java ME emulators available, but the most reliable is probably the one included in the Sun Java Wireless Toolkit, which can be downloaded for free from the linked website.

Custom themes

It is possible to change the appearance and localisation settings of Wunderkammer by bundling the program with a modified resource file. The standard resource files can be found in the directory ./standardfiles/themes. These can be edited with the ResourceEditor application, which is bundled with the LWUIT library. ResourceEditor is located at LWUIT/util/ResourceEditor.jar in the package. There is documentation included with ResourceEditor.

To change the general appearance of Wunderkammer, the theme, images and animations stored within the resource file need to be edited. To modify the localisation settings or change the additional text that is added to fields within entries, the localisation settings need to be edited.

Custom fonts

Custom fonts may be needed for dictionaries of languages that use non-Roman scripts or special characters. Any custom fonts used must be included in the Wunderkammer theme file. It might also be necessary to write a special input method to allow users to enter characters from the custom font in the menu search box. Dmitry Idiatov has provided detailed instructions on creating custom fonts, incorporating them into theme files and creating custom input methods on the PFED blog. The relevant posts are at 1, 2, 3 and 4.

Troubleshooting

wkimport has trouble reading the input dictionary file. Error messages like java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 are often indicative of this problem.

The input dictionary must be encoded in the format UTF-8 no BOM. To ensure this is the case, open the dictionary in an advanced text editor and save it again in the correct encoding. Free text editors that provide this functionality include TextWrangler (for Mac OS X) and Notepad++ (for Windows). Linux users should already have their own favourite text editor with this functionality.

There is a semantic domain menu in the output dictionary (or other type of menu that groups entries together) and the same semantic domain is appearing multiple times, e.g. Living things and Living things .

Make sure that entries that should appear in the same semantic domain really do have exactly the same text in their semantic domain fields. wkimport is case sensitive and is also sensitive to leading and trailing spaces in all input fields (as in the example above) when Trim input fields is not turned on.

Version 2.1 of Guide to importing dictionaries, 15 August 2010. Wunderkammer project.