Shoebox: A Review of it as a tool for digitizing linguistic data
A BIFoCAL document by Laura Buszard-Welcher
How does this tool store and export data?
- Although Shoebox files are not given an extension, they appear
to be simply text files. They can be opened in a text editor.
- Text files can be imported into Shoebox if they are marked
up properly (given field tags and a record identifier)
- You can either export the data as a text file, or cut
and paste individual records.
- Formatting of exported records can be laborious,
but exporting records with field markers means you can do some
global formatting in a word processer.
What types of data will you be working with?
- Shoebox is best suited for analyzing primary texts and
creating a lexicon that is used to analyze texts. It can also be used
to create a stand-alone dictionary, and because it is basically
a database, could be used to store other types of data such as
paradigms and anthropological notes.
- Although Shoebox itself does not store sound files,
you can create a link in a record to a .wav file and play
it from within the Shoebox program. Other audio formats do
not appear to be supported, nor video.
How will you be annotating and marking up your data?
- Shoebox is designed to create interlinear glosses for texts,
and it has built-in tools for morphophonemic parsing
- It automatically generates concordances and can be used to
make dictionaries, but there is no way to "link" dictionary
entries with morphemes in the texts
Will you need to work with special characters?
- Shoebox does not support Unicode character encoding
- There is no 'insert symbol' menu option, so in Windows, if
you want to insert a special character, you will either have
to use the character map system accessory, or download Keyman
(also available from SIL).
- Note that fonts are a property of an entire field, so
specific data within a field cannot be in a different font.
This is a pain if you want to reference something with special
characters in a notes field.
- Some special characters and character combinations
seem to trip up the parser, even though they are defined
properly within Shoebox.
How will you input and mark up your data?
- Shoebox is not designed as a collaborative tool.
It is basically for a single user inputting data offline.
- There are both PC and Mac versions
What kinds of resources do you intend to produce with your data?
- Primary end products are a corpus of glossed texts and a lexicon
- A well-known problem with Shoebox is that it does not constrain the
user to particular fields in a particular order. The result is that
if you have not been religious about imposing this structure yourself,
you will have a lot of re-formatting to do before your dictionary is ready.
- You can select fields for exporting, which would allow you to easily
modify the output for various audiences (so long as you have designed your
fields with that goal in mind)
- Since the storage format is a text file, you could create other
kinds of objects with it, such as an on-line searchable corpus.
However, to make a searchable on-line database, the data would
have to be exported and imported to another database, and the
interlinearization mapping would not be preserved
What sorts of analysis will you do on your prepared data?
- Browsing
- Records can be browsed by record ID for quick navigation
- You can choose which fields to browse, and in what order.
- You can define particular colors and styles for viewing different fields
- Other than this, the browse layout is rather limited
- Sorting
- It is possible to sort any field, and to subsort with multiple fields
- A nice feature, particularly for the lexicon, is end-sorting (so you can look for morphemes)
- Searching
- The basic function is find (single instance); there is also find next and find previous
- Find all requires you to create and store a data filter-this is something of a pain, also because you have to save all of your searches.
- Concordancing
- The program has the capacity to
create concordances and word lists, but apparently not glossaries.
Other
- Shoebox 5 is available for Windows and Mac-specific OS not specified.
- What kind of support is available for this program?
Are they continuing to develop new versions? When will Unicode be supported?
(Note: SIL does not plan to develop new versions.)
- Talk to other people who have used it. Rumor has it that Shoebox gurus
can get the program to do backflips, however for the average user,
there seems to be a steep learning curve. Documentation is labyrinthine,
and much basic information seems to be omitted (for example, the
help section only talks about importing data from SH2, and not how
to mark up a text file for import-this is apparently available on the
'walk through')