"... And three begot the ten thousand things."
—Lao Tzu, Tao Te Ching (Feng/English trans.)
The Prokudin-Gorskii photo collection is a series of glass plate exposures under red/green/blue filters, digitized by the Library of Congress (Fig. 1a).
Since the exposures were not taken under perfectly identical conditions, a simple overlay of the three channels produces colorization artifacts (Fig. 1b).
Here, we attempt to produce faithful colorizations of these photographs by combining the three exposures with automatic x-y alignment and RGB channel assignment (Fig. 1c).
Fig. 1a. Example digitized glass plate. | Fig. 1b. Simple overlay of the three channels. | Fig. 1c. Automatic composite of the three channels. |
---|---|---|
For low-resolution images such as cathedral.jpg
, monastery.jpg
, and tobolsk.jpg
(where the edge length is \(\sim 300\) px), we found that a simple grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px, optimizing for
\[\min_{(dx, dy)} \mathcal{L}(\text{displace}(p_1, dx, dy), p_2),\]
where \(\mathcal{L}(p_1, p_2)\) is the \(\ell_2\) norm between the center \(90\%\) region of images \(p_1, p_2\) and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px, between both the green/blue and red/blue plates sufficed to find a reasonable alignment (Fig. 2a-c).
Legend | Fig. 2a. cathedral.jpg |
Fig. 2b. monastery.jpg |
Fig. 2c. tobolsk.jpg |
---|---|---|---|
Baseline overlay | |||
Single-scale aligned composite | |||
Displacements | G \((2, 5)\), R \((3, 12)\) | G \((2, 3)\), R \((2, 3)\) | G \((2, 3)\), R \((3, 6)\) |
Runtime | \(306\) ms | \(250\) ms | \(219\) ms |
For high-resolution images such as emir.tif
and others (where the edge length is \(> 3000\) px), we implemented an iterative image pyramid search, where the top level image is downscaled to a maximum edge length of \(256\) px, and each successive level differs in edge length by a factor of \(2\).
At the top level, we performed a grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px (as in the single-scale case), and at each successive level, we restricted the search region to \(\pm 1\) px around the scaled estimate from the previous level. To balance the effects of local and global structural differences on the alignment, at each level we optimized for
\[\min_{(dx, dy)} \mathcal{L_e}(\text{displace}(p_1, dx, dy), p_2)+\lambda \mathcal{L_c}(\text{displace}(p_1, dx, dy), p_2),\]
where \(\mathcal{L_e}(p_1, p_2)\) is the mean \(\ell_2\) norm between images \(p_1, p_2\) over the center \(90\%\) region, \(\mathcal{L_c}(p_1,p_2)\) is the normalized cross-correlation between images \(p_1, p_2\), and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px. We found that \(\lambda = -5\times 10^{-5}\) sufficed to produce empirically well-aligned images (Fig. 3a-k).
Legend | Fig. 3a. emir.tif |
Fig. 3b. church.tif |
Fig. 3c. harvesters.tif |
Fig. 3d. icon.tif |
Fig. 3e. lady.tif |
Fig. 3f. melons.tif |
Fig. 3g. onion_church.tif |
Fig. 3h. sculpture.tif |
Fig. 3i. self_portrait.tif |
Fig. 3j. three_generations.tif |
Fig. 3k. train.tif |
---|---|---|---|---|---|---|---|---|---|---|---|
Baseline overlay | |||||||||||
Multiscale aligned composite | |||||||||||
Displacements | G \((23, 48)\), R \((39, 103)\) | G \((3, 24)\), R \((5, 55)\) | G \((15, 60)\), R \((11, 123)\) | G \((17, 41)\), R \((22, 89)\) | G \((3, 55)\), R \((5, 113)\) | G \((7, 82)\), R \((10, 177)\) | G \((25, 52)\), R \((36, 108)\) | G \((11, 33)\), R \((26, 140)\) | G \((27, 78)\), R \((31, 173)\) | G \((11, 54)\), R \((9, 111)\) | G \((3, 42)\), R \((31, 87)\) |
Runtime | \(6179\) ms | \(5763\) ms | \(6043\) ms | \(6262\) ms | \(6332\) ms | \(6242\) ms | \(6317\) ms | \(6406\) ms | \(6678\) ms | \(5983\) ms | \(6321\) ms |
Since the digital R/G/B channels do not perfectly match the filters that Prokudin-Gorskii used, the composite images contain tints. To correct for this, we used linear combinations of the aligned plates for each of the three digital color channels.
As we did not have access to the exact spectra of Prokudin-Gorskii's filters, we tested several combinations and found that \((R',G',B')=(0.9R+0.1G, 0.8G+0.1R+0.1B, 0.9B+0.1G)\) produced visually realistic colors (Fig. 4a-e).
We note that color correction is a subjective problem, and the choice of channel compositions may be further refined with larger-scale human psychophysics.
Legend | Fig. 4a. emir.tif |
Fig. 4b. lady.tif |
Fig. 4c. melons.tif |
Fig. 4d. self_portrait.tif |
Fig. 4e. three_generations.tif |
---|---|---|---|---|---|
Multiscale aligned composite (Fig. 3) | |||||
Remapped color channels |
We attempt to remove regions of the images that contain color block artifacts caused by non-uniform placement of the exposure region on the glass plates. We use a combination of a variational metric (for colored blocks)
\[(R-G)^2+(G-B)^2+(R-B)^2\]
and an intensity metric (for black blocks)
\[\frac{1}{R+G+B+\epsilon}\]
applied pixel-wise, and crop at left/right/top/bottom lines where the number of pixels exceeding a threshold for either metric is sufficiently high (indicative of a line artifact). Results are shown in Fig. 5a-e. Note that this simple method has some failure modes, particularly when the artifact is not high contrast enough.
Legend | Fig. 5a. emir.tif |
Fig. 5b. lady.tif |
Fig. 5c. melons.tif |
Fig. 5d. self_portrait.tif |
Fig. 5e. three_generations.tif |
---|---|---|---|---|---|
Before cropping | |||||
After cropping |
Since the plates were separately exposed, the information content in the three plates are not identical. This can be seen most clearly in non-static scenes, such as waves of water (Fig. 6a) or sunlight patterns (Fig. 6b)
Fig. 6a. Waves | Fig. 6b. Sunlight |
---|---|