CS 280A Fa 24 Projects

"... And three begot the ten thousand things."

—Lao Tzu, Tao Te Ching (Feng/English trans.)

Introduction

The Prokudin-Gorskii photo collection is a series of glass plate exposures under red/green/blue filters, digitized by the Library of Congress (Fig. 1a). Since the exposures were not taken under perfectly identical conditions, a simple overlay of the three channels produces colorization artifacts (Fig. 1b). Here, we attempt to produce faithful colorizations of these photographs by combining the three exposures with automatic x-y alignment and RGB channel assignment (Fig. 1c).

Fig. 1a. Example digitized glass plate.	Fig. 1b. Simple overlay of the three channels.	Fig. 1c. Automatic composite of the three channels.

Single-Scale Grid Search

For low-resolution images such as cathedral.jpg, monastery.jpg, and tobolsk.jpg (where the edge length is \(\sim 300\) px), we found that a simple grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px, optimizing for \[\min_{(dx, dy)} \mathcal{L}(\text{displace}(p_1, dx, dy), p_2),\] where \(\mathcal{L}(p_1, p_2)\) is the \(\ell_2\) norm between the center \(90\%\) region of images \(p_1, p_2\) and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px, between both the green/blue and red/blue plates sufficed to find a reasonable alignment (Fig. 2a-c).

Legend	Fig. 2a. `cathedral.jpg`	Fig. 2b. `monastery.jpg`	Fig. 2c. `tobolsk.jpg`
Baseline overlay
Single-scale aligned composite
Displacements	G \((2, 5)\), R \((3, 12)\)	G \((2, 3)\), R \((2, 3)\)	G \((2, 3)\), R \((3, 6)\)
Runtime	\(306\) ms	\(250\) ms	\(219\) ms

Multiscale Pyramid Search

For high-resolution images such as emir.tif and others (where the edge length is \(> 3000\) px), we implemented an iterative image pyramid search, where the top level image is downscaled to a maximum edge length of \(256\) px, and each successive level differs in edge length by a factor of \(2\). At the top level, we performed a grid search over displacements \((dx, dy) \in [-15,15]\times [-15,15]\) px (as in the single-scale case), and at each successive level, we restricted the search region to \(\pm 1\) px around the scaled estimate from the previous level. To balance the effects of local and global structural differences on the alignment, at each level we optimized for \[\min_{(dx, dy)} \mathcal{L_e}(\text{displace}(p_1, dx, dy), p_2)+\lambda \mathcal{L_c}(\text{displace}(p_1, dx, dy), p_2),\] where \(\mathcal{L_e}(p_1, p_2)\) is the mean \(\ell_2\) norm between images \(p_1, p_2\) over the center \(90\%\) region, \(\mathcal{L_c}(p_1,p_2)\) is the normalized cross-correlation between images \(p_1, p_2\), and \(\text{displace}(p_1, dx, dy)\) is a circular shift of image \(p_1\) in the 2D-plane by \((dx, dy)\) px. We found that \(\lambda = -5\times 10^{-5}\) sufficed to produce empirically well-aligned images (Fig. 3a-k).

Legend	Fig. 3a. `emir.tif`	Fig. 3b. `church.tif`	Fig. 3c. `harvesters.tif`	Fig. 3d. `icon.tif`	Fig. 3e. `lady.tif`	Fig. 3f. `melons.tif`	Fig. 3g. `onion_church.tif`	Fig. 3h. `sculpture.tif`	Fig. 3i. `self_portrait.tif`	Fig. 3j. `three_generations.tif`	Fig. 3k. `train.tif`
Baseline overlay
Multiscale aligned composite
Displacements	G \((23, 48)\), R \((39, 103)\)	G \((3, 24)\), R \((5, 55)\)	G \((15, 60)\), R \((11, 123)\)	G \((17, 41)\), R \((22, 89)\)	G \((3, 55)\), R \((5, 113)\)	G \((7, 82)\), R \((10, 177)\)	G \((25, 52)\), R \((36, 108)\)	G \((11, 33)\), R \((26, 140)\)	G \((27, 78)\), R \((31, 173)\)	G \((11, 54)\), R \((9, 111)\)	G \((3, 42)\), R \((31, 87)\)
Runtime	\(6179\) ms	\(5763\) ms	\(6043\) ms	\(6262\) ms	\(6332\) ms	\(6242\) ms	\(6317\) ms	\(6406\) ms	\(6678\) ms	\(5983\) ms	\(6321\) ms

Bells 🔔 and Whistles 🥳

More realistic colors

Since the digital R/G/B channels do not perfectly match the filters that Prokudin-Gorskii used, the composite images contain tints. To correct for this, we used linear combinations of the aligned plates for each of the three digital color channels. As we did not have access to the exact spectra of Prokudin-Gorskii's filters, we tested several combinations and found that \((R',G',B')=(0.9R+0.1G, 0.8G+0.1R+0.1B, 0.9B+0.1G)\) produced visually realistic colors (Fig. 4a-e). We note that color correction is a subjective problem, and the choice of channel compositions may be further refined with larger-scale human psychophysics.

Legend	Fig. 4a. `emir.tif`	Fig. 4b. `lady.tif`	Fig. 4c. `melons.tif`	Fig. 4d. `self_portrait.tif`	Fig. 4e. `three_generations.tif`
Multiscale aligned composite (Fig. 3)
Remapped color channels

Automatic cropping

We attempt to remove regions of the images that contain color block artifacts caused by non-uniform placement of the exposure region on the glass plates. We use a combination of a variational metric (for colored blocks) \[(R-G)^2+(G-B)^2+(R-B)^2\] and an intensity metric (for black blocks) \[\frac{1}{R+G+B+\epsilon}\] applied pixel-wise, and crop at left/right/top/bottom lines where the number of pixels exceeding a threshold for either metric is sufficiently high (indicative of a line artifact). Results are shown in Fig. 5a-e. Note that this simple method has some failure modes, particularly when the artifact is not high contrast enough.

Legend	Fig. 5a. `emir.tif`	Fig. 5b. `lady.tif`	Fig. 5c. `melons.tif`	Fig. 5d. `self_portrait.tif`	Fig. 5e. `three_generations.tif`
Before cropping
After cropping

Comment: Dynamics as an artifact or feature?

Since the plates were separately exposed, the information content in the three plates are not identical. This can be seen most clearly in non-static scenes, such as waves of water (Fig. 6a) or sunlight patterns (Fig. 6b)

Fig. 6a. Waves	Fig. 6b. Sunlight

Clearly, these dynamic structures cannot be perfectly modeled by x-y translational alignment of the plates (although they could perhaps be interpolated between using generative models). However, whether these structures are artifacts pointing to inferior imaging technology, or features indicative of the passage of time, is a matter of personal taste. Personally, I find these structures to be a quintessential feature of the Prokudin-Gorskii collection--indeed, they add a sparsely sampled temporal dimension to the pieces. In this sense, these are not just the first color photographs; they are the first live photos.