Thursday, 27 November 2014

Some semi-random musings on evidence-based vs conclusion-based genealogy...

So I guess I am now taking a bit of a "study break" while I (try to) learn a bit more about genealogical standards and best practices, specifically "evidence-based genealogy". My early forays into this field have been quite interesting.

Apparently there seems to be two main approach with genealogy, known as Evidence-based genealogy and Conclusion-based genealogy. Most well-known genealogical software is conclusion-based - users enter facts/events with associated sources proving these "conclusions" and usually only one fact exists for each event in the person's life. For example, a person in a genealogy database will (usually) only have one birth date listed and sources supporting this date will be referenced. If conflicting evidence should arise, the date for the event might be adjusted, and sources added/removed, but the working premise is that there is "one true date" (or date range) for the date of birth and that is that.

Evidence-based genealogy takes a different approach. An evidence-based genealogists will locate as many sources for an event and will list conflicting versions separately. So a person might have several different birth dates - one sourced from a parish register, one (or more) from census data, one from a death certificate, and so on. Studying the various sources and their assertions, one date might be favoured over the others and labelled as a preferred date, but the other dates are all recorded for that person. Of course the preferred date isn't just plucked out randomly, it must be supported by the evidence and a well-reasoned argument is used to support this claim. While it is possible to use "traditional" genealogy software to record these extra dates, they can be a bit cumbersome when used this way.

As a side note: Maybe my understanding of these terms is flawed, but I don't think they have been appropriately named. To my way of thinking, the monikers "Evidence-based" and "Conclusion-based" should be referring to the same thing - using various pieces of evidence to form a reasoned conclusion. I personally don't like the name "Conclusion-based genealogy" - it just doesn't seem to fit, but I will continue to use it until a better name comes up.

My approach to date has been a mishmash of these two methodologies. I have been searching for all the sources I can lay my hands on and attaching them to the relevant events for the people in my family tree. Where sources contradict one another, I have tried my best to understand, explain and resolve these conflicts and have included notes explaining my conclusions. But I have felt constrained by the software I have used, or to put it another way, coerced into a certain way of thinking which just doesn't feel right.

One of my issues with "conclusion-based" software is what happens when you discover a source (or group of sources) relates to a different person. I have encountered this many times and it has been a pain to correct in every genealogy tool I have used to date.

All traditional genealogy packages are (in my mind at least) too closely coupled to the GEDCOM standard. Now I am all for standards - they help make interoperability easier and can allow for easy data migration from one tool to another - but GEDCOM is nearly 20 years old and hasn't adapted to well over the years. GEDCOM is based around the concepts of Person, Event, and Source. There is also very rudimentary support for the concept of a Location and a Family, which is a collection or Persons, comprising of two parents and zero or more children. Basically it boils a family tree down to Persons, Events and Sources and that is how most genealogy programs are structured.

I have used and evaluated a number of different genealogy programs and just about every program starts with a person, to which you attach a series of events (both, baptism, death, etc) and for each event you attach one or more sources. The way these programs are designed, it encourages the user to start with a person and create events for that person and then attach sources to the events. I find this to be somewhat backwards. I start with a person and an event such as their date of birth for sure, then I locate sources, but the sources really drive the process for me. I locate all the relevant sources and try to extract as much information as I can from them. A birth certificate for example relates to more than a single person - you will also have the parents listed on the birth certificate and possibly other siblings will be named too. You are also likely to get an address or location for the family's residence along with the occupations of one or both parents and possibly even ages for the parents and siblings. So one source can provide details of multiple events for multiple people. And that to me seems to be the key difference between the two genealogical approaches.

I have only just now realised that better terms to use would be "Source-based genealogy" and "Person-based genealogy"? The "Evidence-based" people start with sources, extract a number of events (or facts or assertions - whatever you want to call them), then collate the events into a person. Meanwhile the "Conclusion-based" people start with a person, then create events and then locate and add sources to affirm those events. Rightly or wrongly I am going to use "Source-based" and "Person-based" from now on.

So most traditional software is "Person-based" where my natural instinct has been to operate in a "Source-based" manner. This is why I started to create my own tools some time back where I could start with the sources, extract the facts and then collate the facts to find my people. An interesting thing happened while I was researching the "Evidence-based vs Conclusion-based" argument - I discovered a small number of tools that have been designed to operate in an "Evidence-based" (or as I now think it should be called, "Source-based") manner. Had I realised these tools existed I probably wouldn't have started creating my own. However I have started on my own tools and will continue, but with a slight change of focus informed by my current research into these methodologies.

Note: Part of this diversion into "Evidence-based" vs "Conclusion-based" genealogy was triggered by a series of blog posts, starting with “Evidence-based” and “Conclusion-based” software use by Michael Hait on his Planting the Seeds blog.

