Computing - Introduction

On these pages, I discuss some issues involved in using computers in genealogy research. Since I'm a computer geek, this information is somewhat technical and is directed more towards genealogists who aren't afraid of doing a bit of programming themselves. There is a lot of commercial software out there, and not all of it is worth the money. With a bit of knowledge about gedcom format and programming, you can develop tools of your own without spending any money.

On other pages, I describe how to process Gedcom files using the programming languages Python and Perl. You may already know that Perl is very commonly used in programming web pages. (I use some Perl myself for these pages!) But, it's string manipulation features make it very useful in other application domains, such as genealogy.

Unfortunately, Perl is a difficult programming language to master, even for those of us experienced in programming. The language Python has many of the advantages of Perl, but in a much easier to learn form. Some have even argued that they are ten times more productive using Python! Thus, now, I'm doing all of my new genealogy programming using Python.

I'm not going to offer a full description of the Gedcom format. But hopefully, this will be enough to get you started.

In addition, I offer a couple of my own Perl programs for doing various tasks on Gedcom files. The most ambitious of these programs is mkfamweb, a program for producing family group sheets in HTML format suitable for publishing genealogy data on a web page.

Gedcom Basics

A Gedcom file is simply a plain text file and can be edited directly using a simple text editor. But, this is normally not a good option due to the relationships between elements, so programs are used to manipulate the data.

A Gedcom file contains a list of records. Each record consists of the following:

Level number
The top level records are numbered 0 and subsidiary records are numbered 1, 2, 3, 4, etc. The records are organized in a hierarchical manner. Records with level number n+1 are subsidiary to the previous record with level number n.
This is required for all 0 level records, except for the HEAD and TRLR records. It is omitted for all other records.
This identifies the type of information specified by the record. Some examples are INDI for an individual and BIRT to indicate a birth event.
The format of the text varies for each type of record. For example, the text for CHIL, HUSB, and WIFE records is a label which identifies an individual.

Here's an example of part of a Gedcom file:

0 @I226@ INDI
1 NAME Evert/van Koot/
1 NOTE Occupation: shopkeeper.
2 DATE ABT 1808
2 PLAC Nijkerk, Gelderland, Netherlands
2 DATE 7 NOV 1889
2 PLAC Nijkerk, Gelderland, Netherlands
1 FAMS @F141@
1 FAMC @F63@
0 @I227@ INDI
1 NAME Aaltje/van Koot/
2 DATE 7 JUL 1811
2 PLAC Nijkerk, Gelderland, Netherlands
1 FAMC @F63@

This fragment describes two individuals, Evert van Koot and Aaltje van Koot. In the NAME records, the surname is delimited by slashes.

Note the tags FAMS and FAMC. The FAMS record shows the label for the FAM structure that describes the family that includes the individual, the individuals spouse, and their children. The FAMC record shows the label for the FAM structure that describes the family in which the individual is a child. Those FAM structures include tags that point back to the individual.

Links to more information

.The GEDCOM Standard provides detailed information about Gedcom release 5.5. Needless to say, this is an invaluable resource.