Image Fingerprints

For computing similarity between pictures, Pix maintains a database of fingerprints of images. These are large natural numbers in F = [0, 2¹²⁸), calculated from the images by some rather simple algorithm of Neal Krawetz, and the idea is that the Hamming Distance, which is a metric on F that can take only natural numbers in [0, 128] as values (with, you guessed it, an expectation value of 64), between the fingerprints of images is a measure for the visual difference of these images.

In Pix's preferences you can choose a folder as the root folder for the database, and as soon as the "Active" checkbox is checked, the database can be used for finding similar images in the selected folder — but only images the fingerprints of which are already in the database are taken into account. Thus the question is: How do the fingerprints get into the database? (And how do fingerprints the original images of which have vanished get removed from the database?) There are basically 3 ways for this database maintenance:

On-the-fly maintenance: Each time Pix opens, deletes, or moves an image file, the respective database entry is maintained.
Explicit maintenance: In the database window, opened with the action Open Database… of the Media menu, there is a button "Start Update" (and labelled "Stop Update" while the job is running, so it can be stopped using the button). "Start Update" will start a background job iterating through the database and the root folder of the database removing fingerprints of non-existent image files and adding fingerprints of image files that are not represented yet in the database. The job uses only one execution thread, in order to keep the CPU load moderate.
Background maintenance: Pix can start the fill and clean jobs on its own. If and only if the checkbox "Automatic Update" is checked, Pix checks every p minutes, where p is the value configured as the "Update Check Period", whether the last completed run of the job occurred more than t ago, where t is the value configured as the "Max. Update Age", and if so, it will start the update job.

The images are represented in the database by their relative paths in the root folder, such that renaming files or folders can make them invisible for the database. In principle this drawback could be overcome by using so-called bookmark data for identifying the files, but these bookmarks consume so much memory and their handling is so cumbersome that I thought that sometimes rescanning the root folder, which has to be done anyway if the picture collection in it is not static, is the better solution.

The same is true for the root folder itself. If it is moved, the database has to be rebuilt. Folder bookmark data is not stable enough to prevent this.