The Current Design
Now that I’ve got a basic prototype of my Celtic Poets in North America project running and have entered in a good number of sample data items, I’m realizing that one of the shortcomings of the design of my data types is that it does not allow for as much uncertainty and ambiguity than actually exists in the messy world we live in and our inexact knowledge of it.
In a previous blog entry, I described the data design. The system database contains three data types: Place, Poet, and Poem. To save on storage space, some of the data fields in both Poets and Poems point to Place records, and each Poem record points to the Poet that composed it.
I did build in some ambiguity to the system. As the date at which a Poem was composed can be hard to pin down, for example, I have allowed for a range of years (the compEarliestYear and compLatestYear fields). And if the date of the birth or death of a Poet is unknown, these can essentially be left blank (although the Poet record will be necessarily disqualified from filter criteria if nothing is known about his/her dates).
In the case of anonymous poems, I have a special Poet entry called “Anonymous” and if the place of the birth or death of a Poet, of the composition of a poem, is unknown, there is a special record in the Place table called “Unknown.” And currently, the software I’ve written knows about these special records, so that it doesn’t mistake them for definite people or places.
This design is simple, fast and space efficient. But what if we do know something, rather than nothing, about the Poet or Place? For example, what if it is clear that the author was a female? If the poem(s) she composed dates from 1886 (as a random date), then she must have been alive at that point, even if her lifespan could vary considerably on either side of that date?
What if we know that a Poem was composed somewhere in Montana? Or east of the Mississippi?
One workaround for an anonymous Poet about whom something is known would be to create a special record for the Poet. Her name might be “Anonymous Female” and as much as know about her – e.g., her sex — could at least be represented accordingly. This would help to increase the accuracy of the representation of individuals, although it might complicate the code slightly (do I need to add another field to indicate anonymity?).
[NOTE: I have now added two new special cases to the database: “Unknown Male” and “Unknown Female.”]
How to allow for minimal information about the lifespan of a Poet? Similar to the compEarliestYear and compLatestYear fields in the Poem data type, the birth and death years of poets could be arranged as ranges to allow for uncertainty. So, for example, if know that s/he composed a Poem in 1886, then the latestBirthYear would be set to 1886 (or perhaps 1876, as a 10-year-old would not likely compose a poem let alone a new-born) and earliestDeathYear would be set to 1886. The only drawback to such a system is that it would complicate the logic for filtering poets according to birth/death year criteria, and increase the amount of data stored (and which I maintain in a data file myself).
Uncertainty of Place is probably the most complex of these issues to address. I probably need to use special strings in the country and stateOrProv fields to indicate uncertainty (of course, if the country is unknown, the stateOrProv would necessarily be as well).
I currently only store a single long/lat pair in order to indicate a location on a map, which is efficient and simple, and this allows me to create neat points on a GoogleMap. Should I store another pair to allow for a potential geographical spread in the shape of a rectangle?
I may not implement these provisions for uncertainty anytime soon, so I would be glad to hear any comments or suggestions about these issues.