Image Fingerprints

For computing similarity between pictures, Pix maintains a database of fingerprints of images. These are large natural numbers in F = [0, 2128), calculated from the images by some rather simple algorithm of Neal Krawetz, and the idea is that the Hamming Distance, which is a metric on F that can take only natural numbers in [0, 128] as values (with, you guessed it, an expectation value of 64), between the fingerprints of images is a measure for the visual difference of these images.

In Pix's preferences you can choose a folder as the root folder for the database, and as soon as the "Active" checkbox is checked, the database can be used for finding similar images in the selected folder — but only images the fingerprints of which are already in the database are taken into account. Thus the question is: How do the fingerprints get into the database? (And how do fingerprints the original images of which have vanished get removed from the database?) There are basically 3 ways for this database maintenance:

The images are represented in the database by their relative paths in the root folder, such that renaming files or folders can make them invisible for the database. In principle this drawback could be overcome by using so-called bookmark data for identifying the files, but these bookmarks consume so much memory and their handling is so cumbersome that I thought that sometimes rescanning the root folder, which has to be done anyway if the picture collection in it is not static, is the better solution.

The same is true for the root folder itself. If it is moved, the database has to be rebuilt. Folder bookmark data is not stable enough to prevent this.