Randy's Genealogy Program Wish List
by Randy Wilson
(randy@axon.cs.byu.edu),
September 1998.
Introduction
As I've done genealogical research over the past several years, I have
seen that technology has helped make the process much easier, but I am
frequently impressed by how much better we can still do. Although
computers have helped people organize and share data, it seems that
deficiencies in current genealogical software is keeping us from saving
people as much time as we could, and may in some cases cause data to be
lost or confused by not making it easy enough to include all the
information we have available. For example, we sometimes have to cram our
data into fewer characters than we need, or we fail to specify the source
of our data because doing so is too time-consuming and takes up too much
hard drive space.
Current Software
After having used Personal Ancestral File (PAF) on the Mac for several
years, I have been pleased with its stability and some of its features,
but there are quite a few features it lacks, many of which would be simple
to implement, and others of which are non-trivial. I have tried demo
versions of other genealogy programs as well, and some, such as Reunion,
have several features that I would like, including handling photos,
printing editable charts, etc. But even Reunion is missing some features
handled well by PAF, such as handling of LDS ordinances in a
straightforward fashion. In addition, many of the features I would like
are not currently available in any genealogy program.
Below is a wish list of features I would like to see in a genealogy
program. Although I would like to write the program myself ("If you need
something done right..."), I currently don't have nearly the time it would
take to complete the project. I hope that by making this list publicly
available I can encourage others to incorporate these features into
existing or new genealogy programs.
I would be interested in hearing which of these features are already
available in currently-existing programs. I would also like to hear from
anyone who feels like they would like to incorporate these features into
their software or start a new software project. That way if anyone else
expresses a similar interest, I can link you together and make sure we
aren't duplicating work. I would be happy to work on some of the trickier
algorithms and on porting stuff to the Macintosh, as well as help with the
initial design and other aspects of the project.
Your Ideas Wanted
If you have any other features that you have wished for, please send me an e-mail and I'll probably add
it to the list with credit given to you, if you wish.
Cool Genealogy Program Features
Click on any item for a detailed
explanation.
Import/Export
Though the GEDCOM standard is supposed to facilitate the exchange of
genealogial data, over half of the files I have received from other
researchers have not been usable to me without doing some tricky
preprocessing of the data. More powerful I/O features such as those below
would help make the sharing of data more convenient.
- Able to import any
valid GEDCOM file.
PAF cannot read GEDCOM files written by several other programs,
including Family Tree Maker. It fails to follow some of the
cross-reference tags used by other programs, doesn't handle leading white
space (tabs and spaces) at the beginning of lines (used by some programs
for indentation), and doesn't handle the different kinds of carriage
return/line feeds used on different platforms properly, making it almost
impossible to import valid GEDCOM files exported by several other
programs. It also doesn't recognize a host of valid GEDCOM events (e.g.,
OCCU), but instead just puts them into the notes along with an error
message. Some other genealogy programs have similar problems.
- Able to export and then import GEDCOM files without losing or changing
data.
PAF doesn't use the "CONC" flag on notes, so when you export a PAF
database to a GEDCOM file and then read it back in, the notes end up
having hard returns at the ends of lines instead of only at the end of
paragraphs like they had originally. A genealogy program should encode
data such that exporting it and importing it results in the exact same
file.
- Able to import
other formats such as PAF, Reunion, FTM, and other popular
formats.
While most programs can export and import GEDCOM files, it would
of course be more convenient to skip those steps and be able to import
directly from another program's format. Furthermore, sometimes
researchers send a database on a floppy disk or by e-mail that is in one
of these other formats either because it is more convenient, or they're
new at using the software and don't know how to export a GEDCOM. In such
cases it would be nice to be able to decode the information.
- Able to export
other formats such as PAF, Reunion, FTM, etc.
It might be convenient to export directly into another progam's
format for easier sharing between programs. In addition, some other
programs may have trouble handling GEDCOM files that are not done in a
particular way, and may not handle some of the tags properly. In such
cases it would be nice for one's program to be able to export directly
into another format so that intelligent decisions can be made as to how to
translate from the internal format into what the other formats can
handle.
Output
Formats
- Able to view data on the
screen in many formats.
I have often wanted to get a quick overview of what information I
have on a family or set of individuals. For example, when looking over
someone's e-mail message containing a bunch of information on a line, I
would like to be able to look at the information for an entire family
without having to click on each individual and traverse through them.
Some printed reports may give me what I want, but I often do not want to
waste paper printing something that I should be able to look at on the
screen. PAF provides a "print preview" option, but it is painful to use
because it is geared towards the printed page. I would like to see a
variety of ways to display data which also allow you to click on
individuals to navigate through the data and edit or add information.
Following are some suggested formats that should be provided to help
people have a better interface to their data.
- Current PAF-like "questionnaire" format.
Having a form to fill out with blanks for given names; surnames;
title; and birth, death and burial dates and places are convenient and
straightforward ways to enter data. The default fields displayed there,
however, should be customizable, with other fields selectable. For
example, PAF includes "Christening" as one of the events though I have
rarely needed this. It would have been more convenient to leave that out
unless I selected it specifically from a list of other events. As another
example, Reunion allows you to select events from a list, but I didn't see
a way to change what the default fields are for each individual. That
made it very inconvenient to enter LDS ordinances.
- Pedigree chart
with information displayed.
Pedigree charts should display as much information as will fit
on the screen, even if something need to be nudged a bit. For example, if
a place name is too long to fit in the given space, it should be wrapped
to another line and/or printed in a smaller font. Chances are that by
shifting that individual up a little or the following one down a little
they can all still fit on the page. Pedigree charts should be allowed to
have blank fields such as "b." so that they can be clicked on and filled
in. To allow more generations to fit on the screen, a "compressed"
pedigree chart should also be available that displays only the information
that is available for a person (e.g., "b. 23 March 1803, Lee Co., VA; d.
VA"). Clicking on the person could then allow additional fields to be
entered.
- Ancestry chart
(compact pedigree chart).
Ancestry charts show a persons ancestors in a compact format so
as to fit as many on the screen as possible. One possible format is one
horizontal line per individual, with as much information as desired about
the individual flowing to the right (e.g. "William WILSON, b. 1747; m.
1773, Jackson Co., GA; d. May 1802, Jackson Co., GA" in one line, even if
it goes off the end of the screen). Another format is to do the same
thing but wrap lines to fit on the screen, likely requiring a vertical
scrollbar. Another option is to compact things even tighter by having two
parents a half line above and below their child and only put other
information about a person when there is room and/or when clicked on.
- Family group
sheet.
Family group sheet like that displayed in PAF (with fields for
all the "default" fields used in the questionarre display, even if some
fields are blank) or Reunion (with fields only for the information that a
person has data for).
- Descendant
chart
Descendant charts can be done as a heirarchy (like in PAF), as a
registry report, modified registry, or as a graphical tree (like a
backwards ancestry chart).
- Hourglass
chart
Chart in which ancestors are displayed to the right, and
descendants are displayed to the left of an individual. More information
could be displayed for the first generation or two where there is more
room.
- Everybody
chart.
This chart includes everyone in the entire database, beginning
with a particular individual. Ancestors go off to the right like an
ancestry chart, descendents fan off to the left, descendants of ancestors
are also included. This is a tricky chart to do, but I have pretty much
worked out the algorithm for it. Also perhaps more useful would be a
subset of this chart in which a person is displayed attached to their
parents, grandparents, aunts, uncles, cousins, children, siblings,
in-laws, nephews, neices, etc. are displayed in order of importance until
there is no more room.
- Text/book
form.
Another useful format for a report would be as a text report
similar to that used in printing a book. Information on families can be
summarized there, along with notes and stories, and the information can be
edited there and/or links to other individuals in the "book" can be
followed.
- Anhentafel
chart.
I haven't used these myself, but they're popular, and so they
should also be included, along with "tiny tafel" reports.
- Custom formats,
programmable by the user.
A powerful option would be to allow users to specify ("program")
their own custom views, determining who should be displayed where and what
information should be included with each individual. This would allow
someone to take one of the above formats and modify it for their
particular style, and/or create an entirely new way to approach the data.
Such additional views could then be shared with other users of the program
as add-ons.
- Reports imported from Lifelines.
There are many useful output report scripts for use by the
"Lifelines" unix genealogy program. Some of these may be usable as
additional ways to view and edit data.
Able to edit
information in any of the above formats.
In addition to being able to view information in the above
formats, the user should be able to edit information directly from within
any of these views. In some cases clicking on an individual could cause a
box in the upper-left portion of the screen to display the person's more
complete information (such as in a descendant chart, where there would not
be room to include everyone's information). In certain compact views,
too, there would need to be a way to add a new piece of information that
is not displayed, since there wouldn't be a way to click on a piece of
information that isn't there. For example, a person's parents, spouse,
birth place, etc., should all be able to be entered whenever that person
is visible on the screen. It has been annoying to entering information
from a book into PAF to have to go through the data twice--once to get the
individual's information, and then again later to get the spouse's
information, since you can't go to the spouse of a child when in family
view.
Able to follow links
to people displayed in any of the above formats.
As mentioned above, you should be able to click (or double-click)
on any individual in any of the above views to make that the "main" person
and/or edit that person's information, add relatives to that person,
etc.
Able to output hyperlinked
WWW pages in any of the above formats.
The program should also be able to generate WWW pages in any of
the above formats with individuals hyperlinked to their 'main' record in
the report, which in turn can have links to that person's corresponding
location in other reports. For example, in a registry report, each
individual has a main entry where their birth, death and marriage
information is listed, and where their children are listed. Clicking on
any child should take you to the page (or the position within the report)
of the child's main record.
Able to print reports
in any of the above formats.
Of course the program should be able to print any of the above
reports to a printer. Where a high-resolution printer is available,
compressed letters should be used to make information fit when necessary
instead of chopping off (abbreviating) words. Also, multi-page charts (to
be taped together or printed to a postscript file and rendered on a
plotter) should be allowed.
Able to copy text from
such reports to paste into e-mail messages.
Able to e-mail a
report on one or more individuals to someone.
I am often contacted by someone who is (or might be) related to a
particular individual. It would be nice to decide on a particular report
that can be generated in such situations (such as a book-like report on
the individual's spouse, children, grandchildren, and all his/her
ancestors and their children) and e-mailed or printed out instead of
having to build such a report by hand in this common situation.
Have a "main" person (e.g., the researcher) in the database and
easy ways to follow links back
to the main person.
It is easy to get lost in a database. While it is easy to follow
a person's ancestors, it is not so easy to follow the trail of descendants
back (at least in PAF). A "Go back" navigation button would be a good
start. However, if quite a bit of moving around has been done, what is
often wanted is just an indication of which way to go to get back "home".
Also, when I look up a name out of a list, I want to know how that person
relates to the "main" person in the database, where the "main" person
could be yourself, the closest relative you have in a database, the main
ancestor the database is constructed around, etc. It should be possible
to change the "main" person easily, too. Once the main person is set,
bold lines can show which way to go to get back to the main person using
the most direct relationship available. These can also serve as the
default person to move to in any particular direction when navigating with
cursor keys or similar methods.
Able to select trees,
bushes, families, etc.
It would be nice to be able to select an individual's ancestors or
descendants. It is also convenient, though, to select their ancestor
"bush", including their ancestors and everyone else those ancestors are
related to except for the original individual and anyone whose
relationship goes through that individual. It may be necessary to check
for double relationships to avoid a connection in one ancestor bush to
cause the entire database to be selected. In such cases the selection of
a bush can stop when it tries to branch out to individuals that are more
closely connected to the main person than the current person being
considered. For example, if I have a large database and find a relative
of my paternal grandfather who is interested in that line but not the
other three-fourths of my database, it would be nice to select my
grandfather and say "select this person's ancestors and all their
relatives" and have it completely traverse the tree but without going to
any of my grandfather's descendants. It should also be possible to add
other groups together into a set of selected individuals.
Able to cut, copy, paste, export, generate reports and otherwise operate on
selections.
Once a tree, bush, etc., has been selected, you should be able to
cut or copy the individuals from one database and paste them into another
database or another portion of a database. They could be pasted by
merging one of the new individuals with an existing one, by just
connecting one of the new individuals to an existing relative, or by
adding them in unconnected and allowing a manual connection to take place
later if desired.
Have undo/redo
list.
If you change several pieces of information and then realize that
you didn't want to do that, it would be nice if a multiple undo function
was available, along with a redo function to allow you to redo things in
case you did "undo" a few too many times.
Multiple
databases opened at once.
You should be able to look at two or more databases at once, copy
& paste (or drag & drop) individuals between them, etc. PAF currently
only allows one database open at a time, so you need two copies of the
application on your hard drive running at the same time to look at two
databases simultaneously. Even then some of the modal dialogs make it a
pain.
Multiple
windows/views available at once.
You should be able to have several views going at once. It is
often helpful to, for example, look at a pedigree view in one window and a
descendant view in another while typing notes about a tricky
situation.
Keep track of all
GEDCOM-tagged events (not just in notes).
All known GEDCOM tags should be available and translated into
human-readable form. GEDCOM-tagged events should be displayed as events,
not notes, except for actual note events, which still are interpreted as
such. Even unknown GEDCOM tags can be displayed as events, just with an
ugly 4-character description. The user should be allowed to add an
English name to display for new events (along with a warning that nobody
else's genealogy program will know what that event means).
Long enough "place"
fields for place names.
PAF currently uses four 16-character fields for place names. This
makes it impossible to enter some place names correctly ("Rose Hill
Cemetery", for example). Strangely, some experiments I ran showed it
would actually make PAF databases smaller to use a single
64-character field instead of four 16-character fields because of all the
extra (often-unused) pointers that are currently used. It has been a
bummer that there isn't enough room for some fields (like the cemetery
name above), while other fields are short ("VA") or empty. Reunion seems
to do a better job, treating the whole place name as one field that
happens to have commas in it.
Smart line-completion, especially for place
names.
PAF lets you hit "escape" to look for the place name you have
started to type, but that only helps with the current field (i.e., the
city OR county OR state, not the whole place name), making it pretty much
useless. Again, Reunion (and perhaps others) guess at what place you mean
as you type, allowing you to save typing. Such guessing should attempt to
use the most recently-used place name that matches the characters typed so
far.
Able to reorder
children by birthdate.
This currently has to be done by hand in most programs, and it
would be easy to have a command to reorder the children by birth date.
Sanity-checking to
make sure dates, names and places seem to make sense.
For example, the program could check to make sure people don't die
before they're born; don't have children before age 10 or after age 90;
aren't born before their parents; aren't married before age 10 or after
age 90. The program shouldn't prevent someone from entering this
information if they really want to, but should warn the user of
strange-looking information when it is entered. The program should also
have a command to do sanity-checking on the whole database.
Built-in list of cities,
counties, states, etc., preferably by year.
Often a city and state is given as the location of an event, and
it would be nice if the county could be looked up and/or filled in by a
look-up table that keeps track of what county each city was in. The table
should also take the year into account since county boundaries change from
time to time, and the county at the time of the event should be
used. Similarly, a warning could be issued if a mismatch is found (city
in the wrong county for that year, etc.) In some cases, the warning
should be accompanied by an explanation (e.g., "Jonesville was in Lee
County in 1832, though it became part of Jackson county in 1887.")
Built-in historical
atlas with detailed maps.
Another nice feature would be to have an electronic historical
atlas so you could look at, for example, a map of Jackson County, Georgia,
in 1850. It should be detailed enough to have all cities and towns in the
above list in it, including small ones and even nicknames that may not
appear on most regular maps. The ability to go down to street level would
be convenient at times, but not as important and perhaps not worth the
extra space, especially since it is doubtful that historical street maps
could be obtained easily.
Automatically enter
surnames when guessable from father or child.
In many countries, children are typically given the same surname
as their father. Some countries have other ways of passing on the family
name. The user should be able to have such surnames "guessed" (e.g.,
filled in but highlighted so they are easily replaced with something
typed). A more advanced version of this feature would use the country of
the child or parent (or the nearest relative that has its country
specified) to determine what method should be used in guessing the
surname, as well as what position it should be in.
Easy to search for
individuals by name, approximate date, etc.
You should be able to do a "command-F" and start typing a surname
(or given name, place, etc.) and have a list of individuals matching the
stuff typed so far appear. (In PAF you have to type the name really fast
or it starts over with names beginning with whatever letter you're
typing. On a slow computer you can only do one or two letters of the
first name, and then you have to scroll through the list).
Handle LDS
ordinances.
Though LDS ordinances are not important for everyone, they are an
essential part of genealogical information for members of The Church of Jesus Christ of Latter-Day
Saints. The one thing PAF seems quite a bit better at than Reunion is
the handling of LDS ordinances. Even PAF, however, could be more
convenient in this respect.
- Quick way to identify good candidate families or
individuals for ordinance work.
Individuals need to have a name, birth or death date and place (at
least approximate) before being submitted for ordinances in order to avoid
duplication or ordinance work. For couples to be sealed to each other
there must also be at least an approximate date and place of the
marriage. People also need to have been deceased for at least 1 year
before any LDS ordinances can be performed for them. It would be nice to
be able to generate lists of individuals matching these criteria. It
would also be nice to be able to generate lists of families for which all
of this information is available for the parents and children so that the
work for the entire family can be done. Families with almost all of the
information available could also be listed separately indicating what
information needs to be gathered or estimated in order to complete
families.
- Able to generate temple submissions
and flag submitted individuals
as "submitted".
This is one thing that PAF can do currently. After selecting a
bunch of individuals and/or couples to submit for ordinances, the
individuals' LDS ordinance fields are filled with the word "Submitted" so
you know the names have been submitted until such a time as the work is
done and the ordinance date and temple can be entered.
- Able to import
IGI ordinance information back into your database
automatically.
When names are submitted for their ordinance work, the
International Genealogical Index (IGI) must be checked first to see if the
ordinances have been done. This is currently done at an LDS Family
History Library, where a printout can be made showing what ordinances are
actually being submitted as well as what ordinances have already been
done, and their date and place. It would be nice if this information
could be saved electronically and fed back into the computer program to
automatically enter the ordinance information that had been found to have
already been completed.
- Able to check IGI
on-line for LDS ordinance information.
Better yet, it would be nice if the IGI could be checked on-line
right from within your own genealogy program before (or while) going
through the process of submitting names. When matches are found, a single
click or keystroke should copy the ordinance information into the
individual's record. It would also be nice to prepare the temple
submission disks at home without having to go to an LDS Family History
Library, but being able to check the IGI at home would at least be a good
start.
Able to access
internet directly.
- Able to go to a WWW
location (URL) (possibly by launching an external viewer)
URL's can be stored in notes or sources, so it would be nice to be
able to click on them and go there.
- Allow usable WWW URL's
in notes and sources.
- Keep track of URL's of corresponding individuals in other on-line
databases.
More and more people are putting their genealogical information on
the WWW. Sometimes it is not practical (e.g., when there's too much
information there) or desirable (e.g., when the information conflicts
or is not a direct line) to import all of someone else's information
into your own database. However, when you find a match with someone
in your own database, it would be nice to keep track of links
to others' on-line information that you've run across so you can find
it again
- Have a standardized
URL name that won't change next time the database is
generated.
One of the problems with storing a URL for a corresponding
individual is that URL's tend to change, due to (1) regenerating the
database and thus building new WWW pages with different names or
directory locations; (2) changing which directory the pages are stored
in; (3) changing internet providers; or (4) simply removing the
information from the internet. In the first case, it may be possible
to come up with reasonable filenames for the WWW URL's (e.g., based on
a person's name and birthdate, or perhaps a unique ID# like that used
in the Ancestral File) that remain the same even when the data
is regenerated or changes directories. In the second case, aliases
(i.e., symbolic links or shortcuts) can be used to reroute the old
directory to the new one. For the third case, perhaps a central site
could be built by someone to keep track of lists of people and their
current URL's. It could be automatically updated on-line by the
program regenerating the data if it is being put in a new place, and
changes can be noted so that anyone requesting the old URL could be
pointed to the new one. As for the last case, the only hope seems to
be to copy the information while it's still there, or to have some
central archive that is kept by someone (like the LDS church did with
its Ancestral File) and is automatically updated (sort of like
GENDEX). This last point is of course a general problem on the internet and may
require a more general solution there.
- Keep track of lists
of researchers working on sets of individuals.
After swapping a few e-mail messages with another researcher I
usually know what individuals they are interested in (typically, an
individual or couple and their ancestors and their children). When
new information is found for an individual, it would be nice to be able
to generate an e-mail message notifying those individuals of the new
information, without having to scour my old e-mail messages to find
out which of all those researchers care about this individual's
information. (e.g., I'd like to go to an individual; do "select
ancestors"; then, for everyone currently selected, do "select
children"; then do "add interested party". When updating any of those
people's information, I could either check for interested parties, or
there could be an on-screen flag indicating that there are some, or an
optional alert could pop up (perhaps at the end of the entire session)
asking if I want to send a note to interested people, or a log could
be checked and every once in a while you could sit down and send
e-mails to all the people who are interested in the individuals you
have found information since the last time you did this.
- Able to send an e-mail
message(possibly by launching an external e-mail program)
E-mail addresses (e.g., of other researchers working on an
individual or line) are often stored in notes or sources. It would be
nice to be able to send an e-mail message to someone from within the
genealogy program so you don't have to manually copy & paste the
person's e-mail address.
- Able to look up
information on-line in LDS Ancestral File, IGI, GENDEX, RSL, and other
on-line sources.
You should be able to have an individual selected and then choose
a command to look that person up in one of several on-line sources.
Users should be able to add new on-line sources, though they may have
to know how to configure a cgi query for the particular site until
some standard can be arrived at.
- Able to automatically share information on WWW in indices, GENDEX,
Ancestral File, RSL, and as WWW pages.
You should be able to select an individual, branch, bush (see the
various selection options above), or the entire database and submit
them for inclusion in the various on-line databases
- Able to generate
WWW pages (i.e., HTML) in various formats.
It would be nice to be able to generate WWW pages for your
database (or a selected subset of it). I believe Reunion does this
already. The program ged2html also does, but it's nice to have it
built-in. These pages should also include pictures, notes, etc. It
would be cool if the WWW pages could be programmable like they are with
ged2html, allowing the user to customize how they want the output to
work. Various built-in formats should be included, including pedigree
charts, registry charts, family group information, etc.
- CGI code to "serve" WWW
pages without having to store them (cross-platform code and data
format).
WWW pages can take up a lot of hard drive space and take some time
to update when changes are made to the database. However, a CGI program
could dynamically generate html pages on-the-fly. This would allow
the person visiting the WWW site to change how the information is
displayed (e.g., they could select any of the formats listed under
"Output Formats," above). Such a program would need to be available
for a variety of platforms, since they would need to run on the
WWW server's machine rather than the computer of the owner of
the database. (Since not all internet providers allow CGI
programs to be run on their systems, generation of the static
WWW pages would still be good, too.)
Automatically NOT export living individuals and private notes, if
desired.
In order to protect the privacy of living individuals (or those
recently deceased), it would be nice to be able to export all
individuals except for those meeting some criteria such as not having
been deceased for at least 10 years. A manual override for some
individuals (such as yourself, if you don't mind your own information
being on there) should be available.
Automated sources
- Able to have sources on every name, date, place, relationship,
event and note.
- Use links to sources for efficiency instead of storing sources as
notes.
- Able to refer to a different page number within the same
source.
- Able to put sources in format that identifies title, author,
publisher, year, URL, e-mail addresses, etc.
- Automatic source information: Set the current source, then enter
information that is from that source.
While PAF does none of these, it appears that Reunion does all of
them. Cool.
Be able to store "best
guess" information, such as the upper- and lower- bounds on birth,
marriage, or death year, or a very rough guess as to a place. Display in
brackets.
Able to automatically generate "best guess" information from
relatives' information.
There are many individuals in my database that I am pretty sure are
from a particular county (e.g., Lee Co., VA), but since I don't have any
solid evidence of this (e.g., no birth record, census record, etc.), I
leave the birthplace field blank until I have real information. However,
it would be nice to be able to have guesses show up in some cases (e.g.,
when exporting records for use by other people or when building indices
where an approximate birth date and place is needed). Usually a guess
can be made based on relatives' information. For example, birth dates
can be approximated from parents' and siblings' birth dates. At least an
approximate range can be arrived at given loose constraints such as assuming
the mother's age was between 15 and 65. Currently in PAF I have to say
"about" before a year and "of" before a place if I need an estimated
piece of information. However, these are not quite accurate, and there
is no indication that these are just guesses. "Estimated" (abbreviated
"Est.") is commonly used in such situations, but I don't think PAF
supports that in dates.
It should be possible to have estimates generated automatically and
marked as guesses, perhaps by using "est." or a tilde (~) before the info, by using
brackets around them, or some other method to indicate that the
information is only estimated from other information in the database,
and not taken from any original sources. It might also be nice to
click on an estimated piece of information to see what the guessed
information is based on.
Intelligent merging.
- Able to resynchronize two databases that used to be
one (i.e., auto-merge
except where there are conflicts).
Often I want to share my information with others in my family.
However, if we both make changes to the data, there is no good way to
merge the information back together. I have heard of people exporting
GEDCOM files and using a Unix "diff" command to highlight differences
and then typing the new information in by hand. That's ridiculous.
This problem comes up in other areas such as two programmers working
on the same source code file. Programs such as Source Safe allow you
to merge changes back together as long as there are no conflicts, and
then manually resolve any conflicts that arise. The same thing should
be done with genealogical databases.
Databases that start out the same will have identical information in
most places, so when resynchronizing the databases, additional
information in one database can be fairly automatically added to the
other. For example, a birth date and place added to one database
that was not in the other one could be added to the corresponding
individual in the other database. Similarly, new individuals that
have been added to one database can be merged into the other,
possibly with confirmation from the user on each individual or
"bush" of individuals (e.g., "There are 240 new relatives of this
individual, including 4 ancestors. Do you want to add all of
them, some of them, or none of them?").
Conflicts will arise only when information has changed, such as the
order of children, or a date or place that exists in both databases but
is different for each one. These can be displayed on the screen and
the user can be allowed to decide which one they want (with the
default usually being the user's own database).
- Able to intelligently merge two very different databases that
overlap.
Another common task is taking an electronic database provided by
someone else and merging it into your own database. Sometimes there are
quite a few overlapping individuals, so doing a GEDCOM import and then
merging by hand can be pretty painful. Once one or more matching
individuals can be positively identified (either with the user's help
or automatically), relationships and bounds on dates (like the
automatically-generated ones mentioned above) should be used to prevent
the user from having to decide if people are the same when they
clearly are not. When it is decided that two individuals are the same,
then it follows that their parents are the same, their spouse is often
the same, and their children are often the same, especially if spelled
the same or similarly. That is not to say that definite decisions can
be automatically made in all these situations, but it should at least
help guide the cases in which the user is asked to decide if people are
the same.
- Able to standardize
place name formats (e.g., so it looks nice after merging someone
else's data in).
When merging someone else's data in, I have often noticed that
they use different conventions with their data, such as putting their
surnames in all caps (instead of letting the genealogy program do it),
or leaving a blank field for the city (e.g., ", Lee, VA" instead of
"Lee Co., VA"), etc. It would be nice if there was a way to
standardize how place names, surnames, etc., are formatted so that
merging can be more automatic. It would also be nice to use the same
feature to make your own database more consistent or in case you change
your mind on what convention you want to use.
- Graphical drag & drop
merging.
When combining individuals from different databases, you should be
able to select a group of individuals and then drag them to where they
connect in the other database. A condensed view (e.g., a subset of the
"Everybody" chart) would be helpful in this process.
- Able to merge two
individuals and intelligently follow links to see what other
individuals might also need to be merged.
As mentioned above, when two individuals are merged--whether by
combining different databases, or by identifying duplicates in a single
database--their parents, spouses, and children should also be checked
(recursively) to see if they should be merged, too.
- Able to identify
likely duplicate individuals, using relatives to rule out obvious
mismatches.
Duplicate individuals are searched for when merging two databases
or when checking a single database for duplicates. In either case,
upper and lower bounds can be calculated for birth and death dates
(based on other relatives' birth and/or death dates), and individuals
whose dates do not overlap can be excluded from testing for merges.
Parents' names are also a good clue as to whether people can be
matched (though similar name matches should also be considered
there). The point is that PAF and other systems often present the
user with individuals that are clearly not the same person, and the
computer should be able to rule out some of these obvious mismatches
and leave only the questionable ones in there for someone to decide on.
Multimedia.
- Include pictures on screen, pedigree, family group
sheets, HTML pages, CGI-served pages, etc.
A variety of graphics file formats should be supported. There
should also be an option to scale pictures down to 72 dots per inch
for display on the screen, or to keep the original resolution for
printing.
- Attach pictures
to families as well as individuals.
Family pictures belong on the family page more than on individual's
pages, so they should be allowed to go there.
- Allow several people
to link to the same photo and indicate where they are in
it.
When there is a group picture, such as one taken at a family
reunion, or a picture of three children in a family, it should be
possible to select a rectangular portion of the picture and use that
as a picture for a particular individual. That way the picture is
stored only once, and if the entire picture is displayed, the
identities of several (or all) of the people in it are given. There
should also be a way to label the other people in a picture separately
in case they aren't in the genealogial database. (e.g., "Mark Hanson,
Jeff's friend from next door"). It should also be possible to put a
caption on the picture to identify the date and event. This caption
might be sufficient for identifying out-of-database individuals.
- Audio and
video clips, too.
Audio and video files in various formats should also be supported
and attachable to individuals and families. It should also be possible
to have them included automatically on WWW pages, CGI-served pages,
etc., so that people on the internet can get at them, too.
- Allow long notes, photos, audio or video clips to either be stored internally as part
of the database, or externally via a link (path/file or
URL).
It is sometimes convenient to have a single (albeit large)
file containing all of the information for a database, so that it can
be moved from one directory to another or handed off on removable media
or sent over the internet in one file. However, if many images, video
clips, and/or audio clips are included, it might be wise to have these
files stored separately, especially if used by more than one database
or even by other documents and programs. It might even be desirable to
allow these images to be stored on removable media (such as writable
CD's or Jaz disks) and quietly ignored when the media is not
available. For example, there could be a checkbox that says "Display
warning when multimedia files are not available."
- Able to move or rename
files for photos, notes, video, or audio so that it is possible to
reorganize these files on a hard drive without breaking all the
links.
It should be possible to move external files without having to
remember where in the database they were referenced. One possibility
would be to have some file-moving (or -renaming) capability within the
genealogy program. Another would be to warn users when a needed file
is not there, and ask them to locate it. A little bit of quick
hard-drive searching for a file of the same name within a couple of
levels of directories up & down would be helpful so users can do
regrouping of files into new subfolders without worrying about it. If
the user ever has to show the program where the new file went to, that
should be an additional place the program should look for other files
that are not found, especially for files in common directories (or at
least with partial path matches). There should also be a command to
check all of the files' locations to make sure there aren't dangling
pointers anywhere. This command should be offered as an option when
any file is found to be missing.
Able to keep different portions of a database in separate files, but able
to follow links between them so you can work on one (smaller) subtree but
access the others easily.
My parents' lines do not connect except through their marriage,
and the same is true of my father's parents. However, there are
multiple connections across some portions of my grandmother's tree
because many of her ancestors lived in the same area for many years.
Thus, it would be convenient to have my database broken up into my
father's father's line, my father's mother's line, and my mother's
line, so that I can keep the file sizes more manageable and share
information/databases with other people more easily. But I would like
to be able to print a pedigree chart of myself and have it know to link
into the various databases instead of having to duplicate information
in each one. Links to individuals in other databases would be handy
for this use, and also for keeping different databases from different
sources that have some overlapping individuals, i.e., you should be
able to say "This person is the same as that person in that database
(though we might have different information in both places),"
or else to say "This person's father IS that person in that database
(and there is no information in this database except for this
pointer)."
Mac/PC/Unix cross-platform compatibility.
I need the program to run on a Mac, but it should also be available
for PC, and ideally for other platforms such as Linux and Unix.
Free "players" for
Mac/PC/Unix so people without the program can look at your stuff
easily.
If you want to share your database with other people for free, you
should be able to send a free "player" program along with it so that
they can look at your data, though perhaps they would not be able to
edit it without buying the program. Players should be available for
different platforms, and the database format should either be the same,
or else the players should be able to play any of them.
Decent text
editor with spell check (and/or allow use of an external
editor).
PAF's text editor is pretty weak. It doesn't use many of the
extended keyboard keys such as forward delete, page up/down, etc. A
powerful text editor with spell check, formatted text, etc., should be
included, or an external word processor should be easily accessable as
the editor for notes.
Cost $99 or less, or
done by internet community and have source code freely available.
Most of the people working on genealogy that I have met are not
doing it professionally. It is therefore hard for them to justify
spending more than $99 on a genealogy program. PAF's $15 price tag is
attractive, but a program with all of the powerful features above
would of course be worth quite a bit more than that.
It may be that it would not be worth a company's time to implement all
of the above features into a genealogy program, since the market is
relatively small (compared to word processing, for example), and the
buyers are on a relatively tight budget (compared to businesses that can
afford to pay $600 for Photoshop, for example). If such is the case,
then it may be up to the academic and internet community to write the
various parts of this program, with one or more individuals
coordinating the efforts, as has been done with the Linux operating
system and several other large projects. Two of the main advantages to
this approach are: (1) the software would be free to whoever wanted
it, and (2) those adept at programming could add whatever features they
needed that weren't already part of the program. There are, of
course, disadvantages to this approach, including a lack of full-time
people working on it, a possibly sloppier and less cohesive interface,
and a possible lack of standardization. It would also hurt the current
genealogy software vendors if a freely available, powerful and elegant
genealogy program was made available to the public.
The challenge, therefore, is for a commercial system to do such a good
job on all of this stuff that the internet community doesn't need to
build a competitor. I hope someone steps up to bat.
Able to enter
information by typing text, like "Ann SMITH, b. 7 aug 1859, Lee
Co., VA; m. ...".
I'm constantly having to click in fields between typing
information, and moving my hands back and forth between the mouse and
the keyboard is time-consuming and annoying. I can "tab" between
fields, but usually I have to hit tab numerous times to get where I
need to be. Also, in PAF at least, I can't always access some part of
a person's information such as their spouse's name (and possibly the
spouse's entire information) without saving the individual, closing the
family, selecting the individual, and adding the spouse. Then I have
to go back to the family and pick up where I left off. The alternative
is to skip the spouse and come back to that information when I'm
finished with the rest of the person's siblings.
I would like to be able to enter information as I come to it so that I
don't risk skipping something. You should be able to add an
individual, then add that individual's spouse while the
individual's record is still open, and go on from there.
Furthermore, you should be able to enter this information by typing it
as it commonly occurs in printed form, namely, "b." for "born" and "d." for
"died", followed by a date and/or place (in either order, automatically
detected); "m" for married, followed by a date and/or place and/or
spouse's name. Adding the spouse's name should create a new
individual for which information can be added just like was done with
the first individual. "son[ of] " or "dau[ghter][ of] " could be used
to introduce the parents and it should be possible to add them right
then, too. Semicolons or parentheses can indicate groupings of
information. For example, typing "Fred Wilson, b. 23 jan 1875, m.
Jennie Smith (b. 12 Jul 1888, dau. of Frank Smith & Mary Jensen), d.
19 dec 1899" should be the same as typing "Fred Wilson, b. 23 jan 1875, m.
Jennie Smith, b. 12 Jul 1888, dau. of Frank Smith & Mary Jensen; d.
19 dec 1899" (or maybe two semicolons would be needed to get back down
to Fred). You should still be allowed to click on fields, but this
other option would allow more natural entry of information.
Able to copy & paste
such text information into an interpreter.
When text information on an individual is available (such as from
an e-mail message or scanned text),
it would be nice to be able to copy & paste such
information into an interpreter that would build individuals and
information fields as they are entered from the pasted text. It would
be wise to include an option to allow such pasted text to be entered
one field at a time (e.g., by hitting "paste" repeatedly or hitting
some other key) to make sure all of the information gets interpreted
correctly and put into the right places.
Extract genealogical
information from text
I have spent many hours typing in information that was most likely
printed by a computer program in a consistent format. It should be
possible to build a set of interpreters that can take paragraphs,
sections, or entire pages or even books of genealogical information
and, with very little human intervention, convert them to usable
genealogical information. For example, one can imagine scanning a
page from a genealogy book into the computer, selecting several
adjacent paragraphs that contain information about the descendants of
an individual and *poof* having them automagically converted to a
little group of individuals linked together on the screen ready to be
hooked (or intelligently merged) into where they go on the family
tree.
Accomplishing this would not be trivial, but it is possible. Near the
beginning of this wish list are several formats that should be
available for display and report generation. Along with that is an
option for creating new reports. Most of the information in genealogy
books is fairly structured and in one of a few main styles, with
variations in how dates are reported, whether places are put before or
after dates, what words are used between events, how semicolons, commas
and parentheses are used, etc. However, it should be possible to
build up a grammar (or perhaps a stochastic/probabilistic grammar)
that "explains" the text, i.e., can "reproduce" it from the extracted
data. In other words, the trick is to figure out what format is being
used, and the most common and straightforward method that can be found
to regenerate the text from the extracted genealogical data is quite
likely to be correct.
A set of grammar constructs can be made available for use by a search
algorithm (such as a dynamic programming algorithm, for example). The
search tries applying different constructs to explain the names,
places, dates and punctuation it runs across, along with paragraph
breaks, child numbering, and textual notes. Some statistics can be
gathered from a bunch of different genealogical sources to determine
how likely the various grammar constructs are, and statistics on
exceptions to the rules can also be gathered. In the end it may be
possible to scan an entire book in and output a genealogical database
containing all the names, dates, places, notes, and perhaps even
pictures that appear in the book. This information could then be
indexed, searched and portions of it copied and pasted/linked into
your own database.
That's what I think would be cool, anyway, and I may be able to help
figure this part out.
More data available
on-line:
Information in books and on microfilm is archived but not readily
available. There is a huge difference between a book containing the
information you need and a searchable index having the information you
need. One of the main challenges of our era is to get all of the
information currently available in libraries, court houses, etc., put
into electronic format so that it can easily be indexed, searched,
shared and made publicly available. There are hundreds of man-years'
worth of data entry to be done. It would be nice if the data
extraction that is being done by so many different people were
coordinated to avoid duplication of data entry so that we can focus
our efforts on making all the information available more quickly.
- All available census
records available on-line.
Wouldn't it be nice to have all of the census records available
on-line so you could just type in a name and perhaps a state and look
the person up? Currently you have to go to a family history library,
use an index book to look up the person's name, then scroll through
microfilm to look at each of the potentially many different people
with that same name in the same state, and squint through handwritten
names that are often almost impossible to read, due to varying
microfilm quality and poor, antiquated or overly fancy penmanship.
- All available genealogical books available on-line.
There are thousands of genealogy books out there, and looking them
up, pulling them off the shelf, looking in the index (or cursing the
idiots who wrote a genealogy book without an index!) takes forever,
especially since most of the time the information you want isn't in
there anyway. It would be so nice to be able to enter a name and
approximate location and have all the references to that individual in
various books (perhaps conditioned with AND statements such as a
spouse's name) and quickly browse all the places where that name shows
up.
- All LDS
microfilms available on-line (images, text transcriptions, linked
database extracted from data...).
The Church of Jesus Christ of Latter-Day Saints has microfilmed
millions of documents, and it would be nice if that information was
available on-line as well. Ideally, the extracted text would be
available for searching and browsing, along with the original image
if you really want to look at it.
I realize not all of these things are easy, and the data acquisition part will take years or coordinated efforts of large numbers of people, which is probably why I call it my wish list. But many of the features are within reach and would make the accumulation and distribution of genealogical information much easier.
I would love to hear any comments, suggestions, additions or corrections with regard to the above list.
--Randy Wilson, (randy@axon.cs.byu.edu).
Mail questions or suggestions to randy@axon.cs.byu.edu
Last updated 17 October 1998.