The LA times is flabbergasted that amongst large numbers of entries in a database:

  1. There are similar entries
  2. People who have been banking on people’s bad understanding of the statistics of large numbers are reluctant to have them re-educated about the reality of the situation.

Here is the story.

The FBI DNA database has some close matches (strangers matching at 9 points of the DNA profile). Defense attorneys are jumping on this trying to make DNA not be the nail in the coffin for their clients. Prosecutors have been lazily overstating the uniqueness of a 9-point match. The FBI, rather than just acknowledge that a higher match level might be necessary to ensure uniqueness, is seeking court orders to stop wide match searching in its database. This to me seems retarded from the FBI. Wouldn’t you rather crawl the database once, find all of your close matches, then resolve those cases (a few hundred out of 65,000+) so that you can be aware that any of the people involved in those matches will require 11 or 12 point matches if they are on trial. The FBI should just crawl their own database (Google iFBI !) once, and publish the numerical results to DA’s offices nation-wide.

It would seem that you’d want to eliminate the uncertainties that you can, so they don’t bite you in the butt unexpectedly.

The complaints about tying up the database or violating the right to privacy are ludicrous. My laptop could do billions of comparisons in a day. Depending on how hard a comparison is, this shouldn’t take more than overnight, unless the FBI database is running on a TI-85 graphing calculator. Borrow time on a DoE supercomputer overnight and get it done. Doing numerical compilation of the results while havingthe names stripped off the numbers would be sufficient to not violate someone’s privacy. Yes, the whole DNA strand is mostly unique (twins being the outliers), and the profile is apparently less, but still significantly unique. But the counts of comparisons between profiles aren’t unique. It’s analogous to comparing the names of the people in the database and returning the amount of matches among the letters of the names. The names might be private, but the match numbers won’t be.