Celtic Poets in North America – 1st Prototype

I’m happy to say that most of the basic functionality of the first version of the system is now running. I’m currently hosting it on CloudBees at this link.

(Please note that, due to the low priority of free web-hosting, it sometimes takes a long time for dormant websites to be reactivated, and you it may timeout the first time you try the link on your browser — just try again.)

Here’s the home screen (of a previous draft):

Home

The “banner” I’ve created for the project contains two images: the image on the left is a section of a manuscript known traditionally as the Cathach of St. Columba and is dated from the late 6th century; the image on the right is the masthead of the all-Gaelic newspaper Mac-Talla, printed in Sydney, Cape Breton, from 1892 to 1904. This represents the continuity of literary tradition that the Celtic Poets in North America project is meant to represent and promote.

The user can search through the database and visualize its contents in various ways, and particularly by looking through poets and poems.

PoetsFilter

Most of the poets currently in the database are taken from my editions of Scottish Gaelic poetry from North America (including an anthology of Canadian material on which I’m currently working), although it also contains some Welsh contributions from Robert Humphries and a few Irish items I’ve gleaned from a few articles.

The screen above allows the user to look at the entries for Poets in the database, narrowing down the entries displayed by particular criteria. As you’ll see, I’ve selected three criteria: the poet must have “Iain” in his name, must be born in Scotland between the years 1750 and 1810. Note also I’ve left the default display list “As List.” So, when I push the Run filter button, I get the following list:

PoetsList

Four poets result from these criteria and we get their biographical details. If we wish to see the details about the places they were born or died, we can select the hotlinks. We can also select the associated See poems link to see the list of poems they composed.

Rather than see biographical details about the poets in textual form, we can also produce a map based on the places poets were born or died:

PoetsMap

Rather than look at individual items, we can also produce charts that summarize certain patterns in the database. This screen allows you to select the criteria for such charts:

PoetsSetChart

In this case, we’ve just specifying that we’d like to see a summary of the data regarding the state, province or county where the poets were born. Here’s the resulting chart (of the current database):

Poets1DChart

You can deal with the poems in a very similar way. Here’s the filter screen for poems.

PoemsFilter

In this case, I’ve specified that I only want to see the poems that were composed in Ontario in Scottish Gaelic that have been tagged with IDENTITY (as a major topic). Here’s the results as a list:

PoemsList

Note that for each entry (for each poem), you can select a hot link to see the details about the place where it was composed or the details about the composer. But rather than seeing lists of poems, you can also place them on maps. Here’s a map of the current poems in the database:

PoemsMap

So, as you can see, most of the functionality is in place. In the near future, I will add timelines (so that you can see poets and poems in their chronological order), and other visualization techniques.

But for now, I’d like to reiterate my call for data contributions from those of you who have details about poets who composed in Celtic languages in North America and their compositions! All contributions will be acknowledged on the home page. (Details on data formats can be seen in this previous blog.)

Celtic Poets: Representing Uncertainty and Ambiguity

The Current Design

Now that I’ve got a basic prototype of my Celtic Poets in North America project running and have entered in a good number of sample data items, I’m realizing that one of the shortcomings of the design of my data types is that it does not allow for as much uncertainty and ambiguity than actually exists in the messy world we live in and our inexact knowledge of it.

In a previous blog entry, I described the data design. The system database contains three data types: Place, Poet, and Poem. To save on storage space, some of the data fields in both Poets and Poems point to Place records, and each Poem record points to the Poet that composed it.

I did build in some ambiguity to the system. As the date at which a Poem was composed can be hard to pin down, for example, I have allowed for a range of years (the compEarliestYear and compLatestYear fields). And if the date of the birth or death of a Poet is unknown, these can essentially be left blank (although the Poet record will be necessarily disqualified from filter criteria if nothing is known about his/her dates).

In the case of anonymous poems, I have a special Poet entry called “Anonymous” and if the place of the birth or death of a Poet, of the composition of a poem, is unknown, there is a special record in the Place table called “Unknown.” And currently, the software I’ve written knows about these special records, so that it doesn’t mistake them for definite people or places.

The Predicament

This design is simple, fast and space efficient. But what if we do know something, rather than nothing, about the Poet or Place? For example, what if it is clear that the author was a female? If the poem(s) she composed dates from 1886 (as a random date), then she must have been alive at that point, even if her lifespan could vary considerably on either side of that date?

What if we know that a Poem was composed somewhere in Montana? Or east of the Mississippi?

Potential Improvements

One workaround for an anonymous Poet about whom something is known would be to create a special record for the Poet. Her name might be “Anonymous Female” and as much as know about her – e.g., her sex — could at least be represented accordingly. This would help to increase the accuracy of the representation of individuals, although it might complicate the code slightly (do I need to add another field to indicate anonymity?).

[NOTE: I have now added two new special cases to the database: “Unknown Male” and “Unknown Female.”]

How to allow for minimal information about the lifespan of a Poet? Similar to the compEarliestYear and compLatestYear fields in the Poem data type, the birth and death years of poets could be arranged as ranges to allow for uncertainty. So, for example, if know that s/he composed a Poem in 1886, then the latestBirthYear would be set to 1886 (or perhaps 1876, as a 10-year-old would not likely compose a poem let alone a new-born) and earliestDeathYear would be set to 1886. The only drawback to such a system is that it would complicate the logic for filtering poets according to birth/death year criteria, and increase the amount of data stored (and which I maintain in a data file myself).

Uncertainty of Place is probably the most complex of these issues to address. I probably need to use special strings in the country and stateOrProv fields to indicate uncertainty (of course, if the country is unknown, the stateOrProv would necessarily be as well).

I currently only store a single long/lat pair in order to indicate a location on a map, which is efficient and simple, and this allows me to create neat points on a GoogleMap. Should I store another pair to allow for a potential geographical spread in the shape of a rectangle?

I may not implement these provisions for uncertainty anytime soon, so I would be glad to hear any comments or suggestions about these issues.

Celtic Poets in North America – 3

In this, my third post about the Celtic Poets in North America Digital Humanities project, I’ll describe some of the functionality I’ve designed for the system, and speculate on how it might be extended. Any comments or suggestions would be welcome.

Search Filters

In my last post, I described the characteristics of the three kinds of data. One of the most basic kinds of functions that users will do will be to browse through data after they have set search filters that narrow the data set down to what they are interested in.

For example, show me just:

  • Defined locations in Wales
  • Male poets who were born before 1800
  • Female poets who died after 1900 in Canada
  • Poems in any language dealing with war composed by men
  • Poems in Irish composed before 1800 in Canada dealing with religion

So, you can see how helpful this kind of system would be, not only for finding information but for analyzing patterns and asking research questions. For example, are there clusters of poems about death that form around well-known war events? How much discussion of war are there in inter-war periods? Leading up to major wars? How does gender and language correlate with the topic of war?

Visualization

The most basic form of browsing through the results of these search filters will be via textual lists. But lists are always not the most useful form of results, so I’m providing support for Google Maps and Simile Timelines. I’m also hoping to support graphs.

So, to return to the kind of examples given above, imagine being able to ask the system:

  • List all of the poems in Breton composed between 1939 and 1944
  • Show me a timeline of all of the female poets born in Canada
  • Show me a map of all of the poems in Welsh about morality
  • Show me a graph which compares the number of poems written in each Celtic language that deal with nature

Future Extensions

Only the bare-bones of the system are working at the moment, and I’m not even ready to publish screen shots, but I’d like to anticipate how it might be extended in the future. Such future functionality sometimes must be allowed for in data and software design.

The most obvious extension would be to allow for prose material as well as poetry: expository prose, memorates, historical narratives, folktales, etc. This would necessitate adding some kind of genre tag to items and extending the topic tags, but it probably wouldn’t have too much of a design impact. As there are a great many prose items, though, it would be a much larger commitment of data entry.

Another kind of extension would be dealing with the primary sources themselves. This project only handles the metadata: information which describes the primary texts. To date software support for Celtic language primary texts in the form of parsers, tagged texts, and natural language processing in general seems almost non-existent, and such features of Celtic languages as mutation and noun case make primary texts much more complex than those in English. Until supporting technologies appear in all of the Celtic languages so it would be beyond the capacity of this system to extend into the primary texts themselves.

It has occurred to me that it might be useful to represent and incorporate social organizations which supported literary activity and their events. For example, there have been many Scottish Gaelic organizations around North America, some of which have organized Mòds (annual musical and literary competitions) since 1893. Although the Mòd was essentially imported from Scotland, it was largely modelled on the Welsh Eisteddfod, and these Welsh events were plentiful in the United States and certainly impressed Scottish Gaels in North America (as I’ve explained in an article in eKeltoi). The main complication with this idea is the primary research on these activities is still largely undone and unwritten.

Another important aspect of literary activity that could be modelled and incorporated into the system would be books and journals printed in North America – media that contained and facilitated literary activity… Perhaps this would be the best candidate for extending the system. Just having a list and database of such items would be useful.

Are there other ways in which the system should and/or could be extended?

Celtic Poets in North America – 2

In this blog – my second blog on this topic – I’ll be going into the technical details of the Celtic Poets in North America Digital Humanities project on which I’m currently working. I’d be happy to receive comments and/or suggestions from those with knowledge of these matters; these details are important to those who wish to contribute (meta)data to the system.

The Platform

In my last blog I described the Finding the Celtic digital collaboratory that I created in 2007-8 by modifying the Collex system, written in Ruby on Rails. DH authority Bethany Nowviskie explained to me recently that Collex is now a stable and mature system that is being used by other scholarly communities for purposes similar to my own. One of the most interesting recent examples of this is the Medieval Electronic Scholarly Alliance, who have a digital collaboratory with federated data services, based on Collex.

I may decide that the most effective strategy for this project would be to return to Collex and spawn a new mutant of the system. For the time being, at least, however, I’m creating my own system from scratch using Groovy/Grails, as a means of gaining a comprehensive understanding of the Groovy/Grails system and allowing me to create my own architecture from scratch.

I really like the Groovy/Grails programming environment. That system is very analogous to Ruby/Rails in most respects – dynamic OOP languages incorporating closures and data models that map automatically onto SQL databases, and systems that handle web interactions with CRUD operations and scaffolding – but Groovy runs on the Java Virtual Machine and thus allows you to stay in the Java environment and draw on those resources. And because Groovy is compiled to Java byte code, it’s faster than Ruby.

After the (substantial) investment of learning about the Grails architecture and libraries, it has not taken me too long to get some basic operations up and running on my system. It was amazing to see how easily I could read and use a JSON file (which contains the database), for example.

I am a little concerned, though, that Groovy/Grails has not caught on as well as expected to date. Ruby on Rails seems to have really entrenched itself in the Digital Humanities world because it came along at the right time, when people needed a rapid application development environment for web apps. However, there may be some signs that the RoR monopoly may cave in, to some degree, to G/G in the long run (here’s an interesting discussion about the two).

Data Model

The implementation details above need not concern anyone but me, but the data model is of central importance for those who wish to comment on the system and contribute (meta)data.

There are currently three types of data in the CPNA database: Places, Poets and Poems. Each of these is defined differently. The software is very particular about the format of the file, so it would make my life much easier as editor of the data if you are careful about how you format your entries to me, following carefully the guidelines below.

The data file format I am using (called “JSON”) consists of a number of field-name/data pairs, separated by commas. In other words, the name of a data field in double quotes, a colon and the value of the data of that field (double quotes for everything but numbers), followed by a comma, and then the next field-name/data pair, and so on.

In general, English names for common place names are privileged in this database where they are the dominant conventional forms, especially because few users will have adequate command of the relevant Celtic languages. Native Celtic names and words are to be used elsewhere as appropriate.

It would be easiest for me to have contributor data in three sections of the same type: all of the places, followed by all of the poets, followed by the poems.

Places

Both the Poet and Poem data types refer to Places.

A place is defined with the following data fields:

  • name: the name of the specific place, in whatever language is most appropriate.
  • country: the name of the country in English. The countries currently in use are: US, Canada, France, Ireland, Man, Scotland, Wales, England.
  • stateOrProv: the 2 to 4 letter abbreviation of the state (in the US), province (in Canada or France), shire (in UK), or county (in Ireland). Use BRT for Brittany and CON for Cornwall.
  • latitude: the decimal value of latitude.
  • longitude: the decimal value of longitude.

Most locations have entries on WikiPedia, which gives their longitude and latitude.

See the following for stateOrProv abbrevations:

Here is an example definition of two places, one in Scotland and one in Nova Scotia:

"name": "Tiree", "country": "Scotland", "stateOrProv": "ARL", "latitude": 56.516667, "longitude": -6.816667
"name": "Antigonish", "country": "Canada", "stateOrProv": "NS", "latitude": 45.626522, "longitude": -61.998253

You must provide definitions for all places referred to in your Poets and Poetry entries, unless they already exist in the system or are unknown.

Poets

A poet is defined with the following data fields:

  • nativeName: the formal name of the poet in his/her Celtic language.
  • transName: the English or French equivalent for the name of the poet.
  • nickname: the nickname for the poet in his/her Celtic language.
  • sex: the sex of the poet (M, F or U)
  • birthYear: the year born, or 0 if unknown
  • birthPlace: the place where poet was born; use “unknown” if unknown
  • deathYear: the year died, or 0 if unknown (or still alive)
  • deathPlace: the place where the poet died; use “unknown” if unknown

Here is a definition of a poet who was born in Scotland and died in Canada:

"nativeName": "Iain MacGilleain", "transName": "John MacLean", "nickname": "Am Bard MacGilleain", "sex": "M", "birthYear": 1787, "birthPlace": "Tiree", "deathYear": 1848, "deathPlace": "Antigonish"

You must provide definitions for all poets whose poetry you list, unless the poet already exists in the system or is anonymous.

Poems

A poem is defined with the following data fields:

  • name: the common name for the poem, in the language of the poem.
  • composer: the Poet who composed the poem. There are three special Poet entries: “Unknown male,” “Unknown female,” and “Anonymous” (if composer’s sex is unknown). Please be specific in the case of poets with the same name (disambiguate with nickname or birth year).
  • firstLine: the first line of the poem.
  • compPlace: the place where the poem was composed, or 0 if unknown
  • compYearEarliest: the earliest year it could have been composed.
  • compYearLatest: the latest year it could have been composed.
  • language: the language in which the poem was composed. Valid values are: Breton, Cornish, Irish, Manx, Gaelic, or Welsh.
  • textSource: a reference to a source where the item can be found (journal, MSS, etc).
  • urlSource: the URL to the text, if online; otherwise, leave empty.
  • tags: the list of tags which indicate the main topics of the poem (see below).

The textSource field entry should be concise:

  • In the case of a book: Name of author/editor, book title (short), colon, page range.
  • In the case of an article in a journal: Name of author/editor of article, article title (short) in single quotes, comma, name of journal, issue number, year (in parentheses), colon, page range.
  • In the case of an article in an edited volume: Name of author/editor of article, article title (short) in single quotes, comma, name of book (short), colon, page range.
  • In the case of poems printed as stray items in newspapers or periodicals: name of periodical, volume and issue numbers (if any), date in parentheses.

If the poem has appeared in more than one source, provide the most authoritative edition of it; if there is more than one variation, provide an article that discusses these.

The tag field entry can be a combination of any of the following values (tags are a controlled vocabulary, you can’t use any label you want):

LOVE, DEATH, WAR, POLITICS, MIGRATION, RELIGION, IDENTITY, COMMUNITY, LANGUAGE, NATURE, MORALITY, ECONOMICS, TECHNOLOGY, HUMOR

Tags are intended to indicate to users – who may not have copies of these poems or even easy access to them – what is explicitly in the content of the poem. In other words, this should enable a user to get an idea of what the poem discusses substantially without having the text. Please keep the tags to a minimum of the major topics of the poem, the best way to characterize it, rather than an exhaustive inventory.

Do not use a tag due to the appearance of metaphors or symbols: for example, Gaelic elegies often compare its subject to a fallen tree or his/her dependents to a flock of sheep missing its shepherd, but this is not reason to tag the poem with Nature.

Here are definitions of three poems composed by the poet above (the last two fictional), each with different types of sources.

"name": "Oran do dh'Ameireaga", "firstLine": "Gum bheil mi 'm onrachd 's a' choille ghruamaich", "composer": "Iain MacGilleain", "compPlace": “Antigonish”, "compYearEarliest": 1819, "compYearLatest": 1819, "language": "Gaelic", "textSource": "Donald Meek, The Wiles of the World: 64-73", "urlSource": "", tags: ["MIGRATION", "IDENTITY", "COMMUNITY", "LANGUAGE", "NATURE"]
"name": "Oran do dh’Ailean MacGilleain", "firstLine": "Ailein, chunna mi uair thu", "composer": "Iain MacGilleain", "compPlace": “Antigonish”, "compYearEarliest": 1820, "compYearLatest": 1826, "language": "Gaelic", "textSource": "Robert Dunbar, 'The Bard MacLean', Transactions of the Gaelic Society of Inverness 55 (2011): 64-73", "urlSource": "", tags: ["MIGRATION", "COMMUNITY" ]
"name": "Cumha do dh’Iain MacDhomhnaill", "firstLine": "Duisg, m’ anam, air ball", "composer": "Iain MacGilleain", "compPlace": “Antigonish”, "compYearEarliest": 1830, "compYearLatest": 1830, "language": "Gaelic", "textSource": "Calum MacLeod, 'Antigonish Poets', Literature in Nova Scotia: 64-73", "urlSource": "", tags: ["RELIGION", "COMMUNITY" ]

Contributing Data

If you are contributing data to the project, you can either email me your data in a text file or else just insert the data entries into your email message. I’ll need to curate them into the system’s JSON data file in any case.

All data contributions will be acknowledged on the website.

Read the above guidelines carefully, and examine the example entries, and ask me if you have any questions or need any clarifications.

  • You should have a Poet entry (with all of the fields defined as above) for every distinct poet who composed one or more of the poems you’re submitting (unless the author is unknown);
  • A Poem entry for every poem (with all of the fields defined as above);
  • A Place entry for every distinct birth and death location for every Poet and place of composition of each Poem (unless any location is unknown).
  • Please send full bibliographic details for all books and articles cited (in abbreviated form) in the Poem entries. This is not necessary in the case of poems printed in isolation in periodicals (i.e., poems printed on their own in newspapers, etc.).