Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection

"... And three begot the ten thousand things."

Lao Tzu, Tao Te Ching (Feng/English trans.)

Introduction

The Prokudin-Gorskii photo collection is a series of glass plate exposures under red/green/blue filters, digitized by the Library of Congress (Fig. 1a). Since the exposures were not taken under perfectly identical conditions, a simple overlay of the three channels produces colorization artifacts (Fig. 1b). Here, we attempt to produce faithful colorizations of these photographs by combining the three exposures with automatic x-y alignment and RGB channel assignment (Fig. 1c).

Fig. 1a. Example digitized glass plate. Fig. 1b. Simple overlay of the three channels. Fig. 1c. Automatic composite of the three channels.
Glass plate negatives Simple overlay Automatic composite

Single-Scale Grid Search

For low-resolution images such as cathedral.jpg, monastery.jpg, and tobolsk.jpg (where the edge length is \(\sim 300\) px), we found that a simple grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px, optimizing for \[\min_{(dx, dy)} \mathcal{L}(\text{displace}(p_1, dx, dy), p_2),\] where \(\mathcal{L}(p_1, p_2)\) is the \(\ell_2\) norm between the center \(90\%\) region of images \(p_1, p_2\) and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px, between both the green/blue and red/blue plates sufficed to find a reasonable alignment (Fig. 2a-c).

Legend Fig. 2a. cathedral.jpg Fig. 2b. monastery.jpg Fig. 2c. tobolsk.jpg
Baseline overlay cathedral.jpg baseline monastery.jpg baseline tobolsk.jpg baseline
Single-scale aligned composite cathedral.jpg aligned monastery.jpg aligned tobolsk.jpg aligned
Displacements G \((2, 5)\), R \((3, 12)\) G \((2, 3)\), R \((2, 3)\) G \((2, 3)\), R \((3, 6)\)
Runtime \(306\) ms \(250\) ms \(219\) ms

Multiscale Pyramid Search

For high-resolution images such as emir.tif and others (where the edge length is \(> 3000\) px), we implemented an iterative image pyramid search, where the top level image is downscaled to a maximum edge length of \(256\) px, and each successive level differs in edge length by a factor of \(2\). At the top level, we performed a grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px (as in the single-scale case), and at each successive level, we restricted the search region to \(\pm 1\) px around the scaled estimate from the previous level. To balance the effects of local and global structural differences on the alignment, at each level we optimized for \[\min_{(dx, dy)} \mathcal{L_e}(\text{displace}(p_1, dx, dy), p_2)+\lambda \mathcal{L_c}(\text{displace}(p_1, dx, dy), p_2),\] where \(\mathcal{L_e}(p_1, p_2)\) is the mean \(\ell_2\) norm between images \(p_1, p_2\) over the center \(90\%\) region, \(\mathcal{L_c}(p_1,p_2)\) is the normalized cross-correlation between images \(p_1, p_2\), and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px. We found that \(\lambda = -5\times 10^{-5}\) sufficed to produce empirically well-aligned images (Fig. 3a-k).

Legend Fig. 3a. emir.tif Fig. 3b. church.tif Fig. 3c. harvesters.tif Fig. 3d. icon.tif Fig. 3e. lady.tif Fig. 3f. melons.tif Fig. 3g. onion_church.tif Fig. 3h. sculpture.tif Fig. 3i. self_portrait.tif Fig. 3j. three_generations.tif Fig. 3k. train.tif
Baseline overlay emir.tif baseline church.tif baseline harvesters.tif baseline icon.tif baseline lady.tif baseline melons.tif baseline onion_church.tif baseline sculpture.tif baseline self_portrait.tif baseline three_generations.tif baseline train.tif baseline
Multiscale aligned composite emir.tif aligned church.tif aligned harvesters.tif aligned icon.tif aligned lady.tif aligned melons.tif aligned onion_church.tif aligned sculpture.tif aligned self_portrait.tif aligned three_generations.tif aligned train.tif aligned
Displacements G \((23, 48)\), R \((39, 103)\) G \((3, 24)\), R \((5, 55)\) G \((15, 60)\), R \((11, 123)\) G \((17, 41)\), R \((22, 89)\) G \((3, 55)\), R \((5, 113)\) G \((7, 82)\), R \((10, 177)\) G \((25, 52)\), R \((36, 108)\) G \((11, 33)\), R \((26, 140)\) G \((27, 78)\), R \((31, 173)\) G \((11, 54)\), R \((9, 111)\) G \((3, 42)\), R \((31, 87)\)
Runtime \(6179\) ms \(5763\) ms \(6043\) ms \(6262\) ms \(6332\) ms \(6242\) ms \(6317\) ms \(6406\) ms \(6678\) ms \(5983\) ms \(6321\) ms

Bells 🔔 and Whistles 🥳

More realistic colors

Since the digital R/G/B channels do not perfectly match the filters that Prokudin-Gorskii used, the composite images contain tints. To correct for this, we used linear combinations of the aligned plates for each of the three digital color channels. As we did not have access to the exact spectra of Prokudin-Gorskii's filters, we tested several combinations and found that \((R',G',B')=(0.9R+0.1G, 0.8G+0.1R+0.1B, 0.9B+0.1G)\) produced visually realistic colors (Fig. 4a-e). We note that color correction is a subjective problem, and the choice of channel compositions may be further refined with larger-scale human psychophysics.

Legend Fig. 4a. emir.tif Fig. 4b. lady.tif Fig. 4c. melons.tif Fig. 4d. self_portrait.tif Fig. 4e. three_generations.tif
Multiscale aligned composite (Fig. 3) emir.tif aligned lady.tif aligned melons.tif aligned self_portrait.tif aligned three_generations.tif aligned
Remapped color channels emir.tif color corrected lady.tif color corrected melons.tif color corrected self_portrait.tif color corrected three_generations.tif color corrected


Automatic cropping

We attempt to remove regions of the images that contain color block artifacts caused by non-uniform placement of the exposure region on the glass plates. We use a combination of a variational metric (for colored blocks) \[(R-G)^2+(G-B)^2+(R-B)^2\] and an intensity metric (for black blocks) \[\frac{1}{R+G+B+\epsilon}\] applied pixel-wise, and crop at left/right/top/bottom lines where the number of pixels exceeding a threshold for either metric is sufficiently high (indicative of a line artifact). Results are shown in Fig. 5a-e. Note that this simple method has some failure modes, particularly when the artifact is not high contrast enough.

Legend Fig. 5a. emir.tif Fig. 5b. lady.tif Fig. 5c. melons.tif Fig. 5d. self_portrait.tif Fig. 5e. three_generations.tif
Before cropping emir.tif color corrected lady.tif color corrected melons.tif color corrected self_portrait.tif color corrected three_generations.tif color corrected
After cropping emir.tif automatically cropped lady.tif automatically cropped melons.tif automatically cropped self_portrait.tif automatically cropped three_generations.tif automatically cropped

Comment: Dynamics as an artifact or feature?

Since the plates were separately exposed, the information content in the three plates are not identical. This can be seen most clearly in non-static scenes, such as waves of water (Fig. 6a) or sunlight patterns (Fig. 6b)

Fig. 6a. Waves Fig. 6b. Sunlight
waves sunlight

Clearly, these dynamic structures cannot be perfectly modeled by x-y translational alignment of the plates (although they could perhaps be interpolated between using generative models). However, whether these structures are artifacts pointing to inferior imaging technology, or features indicative of the passage of time, is a matter of personal taste. Personally, I find these structures to be a quintessential feature of the Prokudin-Gorskii collection--indeed, they add a sparsely sampled temporal dimension to the pieces. In this sense, these are not just the first color photographs; they are the first live photos.