Images of the Russian Empire:
Colorizing the Prokudin-Gorskii Photo Collection
Background
In the early 20th century, between the years of 1909 and 1915, Russian photographer Sergey Prokudin-Gorsky got the special permission of the Emperor Nicholas II to travel throughout the Russian empire. While technology to display color images would not appear until much later, Prokudin-Gorsky strongly believed that color photography was the future, and that representing color with RBG color channels would work well. So, he went throughout Russia and took thousands of images, using a special apparatus that captured 3 different images, each with a different color filter (red, green, blue) such that the resulting separate images could eventually be combined and displayed as color images in the future. Since he had the blessing of Emperor Nicholas II, he was able to go to places that were often restricted to the public, allowing a very interesting and unique picture of Russia from over a hundred years ago. The challenge of this project is to take the per-channel digitized images from the Library of Congress and determine how to automatically align the channels to reconstruct the color images.
Approach
The digitized archives have two different sized images. For the smaller images, simply brute-force trying a range of offsets according to a certain alignment metric works well, but larger images require a more clever pyramid algorithm. We will first discuss our methodology on smaller images, and then explain how we extend the approach to larger images.
Part 1: Single-Scale Approach for Small images
We start with digitized per-channel images, where the channels are stacked vertically as black and white images, in the order of blue, green, and red. Here is an example from the smaller cathedral image:
If we naively just put this channels into an RGB color image, the image does not look very good or natural, since these images were not taken from the exact same position, leading to chromatic fringing and other visual artifacts. Below is the same cathedral image above, but assembled into a color image with no alignment:
To align these small images, we simply do a brute force search over all possible displacements of -15 to 15 pixels in both directions. We pick the best displacement for each channel with respect to a reference channel according to a chosen metric.
In our implementation, we always align the red and green channels to the blue channel, and our chosen metric of choice is Normalized Cross Correlation (NCC), which is defined as shown below:
We treat each image as a vector, normalize each vector by its norm, and then calculate the dot product. We want to maximize this metric, since it corresponds with the level of alignment. This is likely better than other simple metrics, such as minimizing the Euclidian norm of the difference between different channels. The reason is because the Euclidian norm will have a loss if the overall intensity is different between color channels, whereas NCC is a measurement more of the alignment of the different channels. There will still be a positive signal if the intensities are not the same, but just a stronger one if the intensities align as much as possible.
After alignment, we see that the image now does not have chromatic fringing, and looks close to a normal color image. The edges do not look great, but this is to be expected since to keep the shapes the same we are rolling part of the image from the top and right to the bottom and left. Future work could add automatic cropping. The same cathedral image after alignment is shown below:
Part 2: Image Pyramid For Larger Images
For the larger images, this approach will not work, as it will be cost-prohibitively slow. The larger images have a size of around 10 times larger in each direction. If we wanted to use the same brute-force approach but scale up the search radius accordingly, it would require quadratically more positions to check. Additionally, each calculation of our metric would take about 100 times longer as well. Thus, the total runtime would be about 10,000 times larger. The small image takes about a second, so the large image would take around 3 hours! Furthermore, it might be somewhat noisy of a metric to compare all of these different alignments to each other.
Instead, we use a technique called an image pyramid. The big idea is that we do the same brute-force alignment with the same -15 to 15 pixel small search radius, but apply this to several copies of the image at different downsampling scales. This allows us to first find the coarsest shift, and then finetune this shift repeatedly at the lower (more detail rich) levels. Below is a helpful explanatory graphic created by Wikipedia user Cmglee:
In more detail, alignment is done highest level (most downsampled), giving some displacement vector (x, y). This gives us the coarsest level of alignment. Then, we apply this alignment at the level immediately below (which requires shifting by (2x, 2y) pixels because we are at higher resolution). Then, we return the combined alignment of the previous layer multiplied by 2 and the alignment from the current layer. This process repeats until we are back at the full image. Once we get to an image that is less than 400 pixels in each dimension (around the size of the smaller images), we stop downsampling and start to perform alignment.
This leads to successful alignment on larger images, like the image shown below:
Bells and Whistles + Enhancements
To successfully align images, we need to ignore the edges of the image from the computation of the alignment metric. This makes sense, since they will never match since the original images were not aligned, thus we need to not consider them in the calculation. In my implementation, I do not include 7.5% of the pixel length of the shorter image dimension. This assures that the same section of the image is ignored at each level of the pyramid. This is very important, initially I had a fixed cropping region of 30px for computing the alignment metric, which led to poor performance in some images. It was also very important to allow negative shifts in alignment, I was stuck on monastery for a long time because I forgot to allow negative shifts.
I also used histogram-equalized versions of the image to perform image alignment. This did not seem to have a noticeable improvement on most images, though in theory seems more principled, since it should be able to avoid some of the issues around overall intensity being different in different channels, since histogram equalization changes the overall intensity to achieve a uniform distribution across pixel intensities across each channel.
Results
All of the aligned images are displayed below, first the small images that don't require the image pyramid, and then the larger images which use the image pyramid. We also include the (x, y) displacement vectors for the red and green channels with respect to the blue channel.
Small Images
Higher Resolution Larger Images
Additional Results on Other Images
Analysis
We note that alignment results are generally quite good, however there is more noticeable chromatic fringing on pictures with human subjects, especially around the frame of the human subject. This is likely because the individual moved during the taking of the picture, which makes a perfect alignment much harder and likely not a pure translation. One such example of this is the following section from harvesters: