Saturday, 10 January 2015

My file naming scheme

Before I (finally) dive back into my research, I'll take a brief moment to outline the file naming scheme I have chosen for my digital files. In previous posts I have outlined most of the folder structure I am using but the real magic in my system is in the file names I use and careful use of symbolic links, aka file aliases. This will be a long post, but I hope you will stick with it.

I might give a more detailed recap of the overall folder hierarchy in a future post, however a brief summary is in order for now. At the top level is a single folder for the family I am researching. I give this folder a descriptive name, which matches the name I give to the database in my genealogical software. In my case this is Bannisters of Stretford. Under this folder are a series of sub-folders for Books, People, Pictures, Places, and Sources.

Books contains, well, books of course, but only books that are not used as sources in my research. This is more a place for helpful reference books, histories of the town my family came from, auto-generated reports etc.

People is a special folder that I will come back to shortly.

Pictures contains photos, mostly of people in the family, but no images of source documents - source images belong elsewhere.

Places is another special folder like People and will be explained below.
Sources contains all my source documents, broken down by type of source such as BMD, Books, Census, Electoral Rolls, etc. All images of sources are appropriately named using the scheme I will define below and then placed in one of the subfolders under this Source folder. The key is that every source has one and only one location for the original document which can be (hopefully) easily discerned by the type of document it is.

Now, onto the actual naming scheme...

For sources relating to a specific person, the basic file naming scheme is as follows:
<Date> <Person Name> - <Type of document>, <Page number>.<file type>
  • <Date> is the date of the source document (or a date range if applicable) in the format YYYY, YYYY-MM or YYYY-MM-DD as applicable
  • <Person Name> should be self explanatory, but to be clear this should be the name of the person (or persons) as given in the document - not their nickname or a name they may be known by at a future date
  • <Type of document> will be something like "Birth Certificate", "Marriage Certificate", etc
  • <Page number> is optional, but should indicate the page number or range of page numbers if this image is just part of a larger source.
  • <File type> is just the standard file extension, like tiff, png, jpg, pdf, etc.
For other types of sources, such as census pages, electoral rolls, parish registers or newspapers:
<Date> <Source name>, <Page number>.<file type>
  • <Date> is the date of the source document (or a date range if applicable) in the format YYYY, YYYY-MM or YYYY-MM-DD as applicable
  • <Source Name> is a descriptive name for this source
  • <Page number> is optional, but should indicate the page number or range of page numbers if this image is just part of a larger source.
  • <File type> is just the standard file extension, like tiffpngjpg, pdf, etc.
It is vitally important that the first part of the filename be a date in the format YYYY, YYYY-MM or YYYY-MM-DD as applicable. If the year is not known, then _unknown_ is used instead. Using YYYY-MM-DD for dates means the filesystem will automatically sort your documents by date, with unknown dates filtering to the top of the list.

Some sample document names would be:
1968-11-23 Amos Ross BANNISTER - Birth Certificate.png
1956-04-13 Kennett John BANNISTER and Colleen Dawn WALTERS - Marriage Certificate.png
2014-03-05 Narooma News, p34.tiff
Admittedly some of the filenames can get a little long using this scheme, so you may want to use abbreviations, but I like the verbosity - it makes it quite clear what each document contains.

It might seem strange lumping all the sources into one set of folders, mixing up sources for different people in the one place, but here is where the magic comes into play. The People folder does not contain any actual files, but this is where you will come to find all the files relating to a particular person. How can this happen? The answer is a nifty feature of modern filesystems called symbolic links. A symbolic link allows you to give a file multiple names and even make it appear in multiple folders and my file naming scheme uses symbolic links to avoid duplication of files.

Some sources naturally pertain to multiple people. For example, a birth certificate is not only a relevant source for the child who was born, but it can also be a source for the father and mother of the child and possibly even siblings who may be named in the document. A marriage certificate is obviously a source for bother the bride and the groom and might also contain details of the parents of one or both. By using symbolic links I can store a marriage certificate in the Sources > BMD > Marriage Certificates folder and then create a link to this document under the folders for each person named in the document. The actual file is located in only one place on disk, but it can be accessed from several alternate locations.

Even better, when creating a link to a file it is possible to rename the linked file. This means while the original file might be called "1956-04-13 Kennett John BANNISTER and Colleen Dawn WALTERS - Marriage Certificate.png", under the individual folders for Kennett John and Colleen Dawn the link might be renamed to simply "1956-04-13 Marriage Certificate.png".

So what does my People folder actually look like? Under the People folder I have a subfolder for each surname in my family tree. Within each surname folder I create folders for each person born with that name. Now it might be possible that at some point in my family tree I have two people sharing the same surname who are not directly related - this doesn't matter, they would both be placed under the same surname folder. These folders are not family groups, they are just names, just like the names in a phone directory.

In each surname folder I create a subfolder for each person with that surname. These folder have a specific naming convention:
<First name(s)> <SURNAME> (<born> - <died>)
(I tossed up whether to include the surname in the person's folder name, as it can be inferred from the parent folder, but decided to leave it in. You might choose otherwise.)

The <born> and <died> fields are the year only. If more than one person with the same surname shares the same birth and death years you could include the month and/or day to further differentiate them, but for most people the birth and death years should be enough to identify them. If the person is still living, then I use "living" for <died>, and if one or both of the dates is (as yet) unknown I simply use "unknown" for the year.

So some sample folder names would be:
Amos BANNISTER (unknown - 1782)
Amos Parker BANNISTER (1868 - 1954)
Amos Parker BANNISTER (1907 - 1979)
Amos Ross BANNISTER (1968 - living)
Here's where the symbolic links come into play. Within each person's individual folder, I create a symbolic link to all the sources pertinent to that person. I can rename the link to a more user-friendly name if I choose, or I could leave the link with the original filename. Now when I want to find a particular document relating to a person, I can just drill down to that person's individual folder and I will see every source document for that person, sorted by date, and I can see at a glance their entire timeline. opening the links will open the original document, no matter where in the source folder hierarchy the document actually lives.

So what does this look like? Here's a snippet of my family tree (which I am rebuilding) showing my father's folder as it currently looks. (Used with his permission.)

You can see how each source document appears in chronological order. The little arrow at the bottom left of each document icon indicates that it is a symbolic link and the original document lives somewhere else on disk. (When I get around to sorting through my photos, I will also be creating symbolic links to photos in each person's folder, so my father's folder will also contain links to all the photos he appears in.)

With this folder structure, file naming scheme and using symbolic links, I now have an easy way to find any document relating to an individual I want so long as I know their given name(s) and surname at birth. Of course there is the issue of what to do with people who change their name, whether through marriage, adoption or some other means? This is easily solved with symbolic links.

Let's look at marriage. Generally when two people get married, the female will take her husband's surname as her own. In this case I create a symbolic link from the female's individual folder (which lives under her maiden name's folder) in the folder for her new surname. So my mother's individual folder is "Colleen Dawn WALTERS (1936 - living)" which is located under the WALTERS surname folder, and I created a link to this folder under the BANNISTER folder. In the process I renamed the linked folder to "Colleen Dawn (WALTERS) BANNISTER (1936 - living)" to indicate her maiden name when I view her folder under the BANNISTER folder, but her original individual folder under WALTERS remains unchanged. So my People folder looks like this:

Now I can access my mother's source documents by drilling down through her maiden name (WALTERS) or her married name (BANNISTER). Whichever path I take, I will see all the same documents, covering her entire life. If she were to remarry, I would create a new link under her new married surname and she would appear in three places, all with the same documents accessible.

There is one special folder under People called "zz Unknown". The zz Unknown folder  is a temporary holding cell for people whose surname I have not yet identified. For example, one of my ancestors, Amos Bannister was born in 1771 and his parents were listed as Amos Bannister and Catherine. I have not yet found any other information about Catherine, so I do not know her maiden name, so she gets a folder in the zz Unknown folder until I find more information. Once I do identify who Catherine was, I would create a new folder for her surname, then move her individual folder into the correct location.

The Places folder uses symbolic links in a similar way, to group sources relating to a place, such as maps, pictures, histories, etc. The subfolder hierarchy for Places is broken down by Country, State, County, Town and optionally folders for individual sites, such as churches or houses. I haven't actually created many folders in my Places yet, but as a rough guide the folder hierarchy might look like this:

  • Places
    • Australia
      • Victoria
        • Hotham
        • West Brunswick
          • 33 Burnell Street
    • UK
      • England
        • Lancashire
          • Stretford
            • St Matthews
            • Edge House
      • Ireland
        • Cavan
          • Bailieborough
      • Scotland
        • Moray
          • Elgin


So that's an overview of my file and folder naming scheme. it might sound complex, but I find it to be quite a powerful system. When I get a new source document, I immediately name it appropriately and place it in the correct subfolder under the Sources folder. As I add a source to a person (or place) I create a link to the source document under the relevant person's (or place's) individual folder. I find if I do this as I get sources and add them to people, the folders stay in sync with my family tree and I can quickly and easily locate any document for any person (or place) without any problems.

Previously I had documents scattered all over the place and it was almost impossible to find what I wanted when I wanted it. A side effect of my previous "system" was massive duplication of files. I had multiple copies of the same file stored under different people and when I was using family groups as the core of my folder structure I had no end of problems working out where people belonged when they married, divorced and remarried. Now every source document belongs in only one place and every individual has only one folder in a clearly defined place. By using symbolic links I can access these documents and people in various locations, but when I create a new person or a new document there is only ever one place that person/document could be created. This has helped me keep my files in order to no end.

I would love to hear your feedback on this system. Do you have any questions about how it works or where certain document would be stored? Can you see any problems with what I have described? Does it make sense to you or not? What file storage/naming scheme do you use for your digital files?

6 comments:

  1. Amos,

    I want to let you know that your blog is listed in today's Fab Finds post at http://janasgenealogyandfamilyhistory.blogspot.com/2015/01/follow-friday-fab-finds-for-january-9.html

    Have a great weekend!

    ReplyDelete
  2. I have looked through your blog posts regarding naming folders and files. I am participating in the Do-Over and LOVE the system you have devised. Being able to put all of my original files in only a few folders and then use the aliases to distribute to all of the sub-folders is brilliant!! Thank you so much for sharing this in the Bag the Web!!

    ReplyDelete
    Replies
    1. Thank you for the feedback - I'm glad you like my system. It works well for me and I hope you are able to a) understand it; and b) use it and tweak it to suit your own needs. ;^)

      Delete
  3. I do understand your system. The one struggle I am having is that my mother's family is Norwegian, so I need to deal with farm names and patronymics. I am trying to work out whether to keep all direct lines intact, even though the "surnames" change, or whether to use farm name at birth as "surname" which breaks the lines. Any thoughts? Thank you.

    ReplyDelete
    Replies
    1. I am not familiar with Norwegian naming conventions, so I cannot help directly with this, but I can perhaps give you a few ideas to consider.

      The key to my system is that everything has one canonical location. That is, every source document has one and only one place where the original file belongs and every person has one and only one "home folder" where their documents and pictures can be found. Where a document needs to be referenced in multiple places, an alias is used so there is no duplication of files. Similarly, when a person changes their name, aliases are used so that person's associated documents can be found no matter which name you are searching for. I guess in your case you need to decide what that canonical location should be for a given person.

      I suppose in a way my use of surname folders is just a convenience for me - they are not used to represent family groups or any connection between the people in a particular folder. I could just as easily leave out that level of the folder hierarchy and dump all the person folders under the People folder itself. If I did this, the name of each person's folder would change slightly to be "<SURNAME> <First names> ..." so the folders would sort in a logical (to me) order. People with multiple surnames over their lives would appear multiple times, but only one of the folders would be a "true" folder and all the others just aliases to that original.

      So using this style, my People folder would look something like:

      People
      + BANNISTER Amos Parker (1868 - 1954)
      + BANNISTER Amos Parker (1907 - 1979)
      + BANNISTER Amos Ross (1968 - living)
      + BANNISTER Colleen Dawn (WALTERS) (1936 - living)
      + BANNISTER Glynda Mavis (unknown) (unkown - unknown)
      + BANNISTER Kennett John (1935 - living)
      + BANNISTER Kennett John (1956 - living)
      + WALTERS Colleen Dawn (1936 - living)
      + zz UNKNOWN Glynda Mavis (unkown - unknown)

      (Where the folders in italics are aliases.)

      You know, I could also create aliases to my mother's and father's folders and call them "Mum" and "Dad" with this modification and it would still be kosher. The key is to only have one "real" folder for each person and as many aliased folders as you want/need to be able to find them. Remember, this isn't a substitute for a family tree or a database, it is just a way for me to be able to quickly and easily locate a file related to a particular person. ;^)

      Hope that helps. 8^)

      Delete