Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Retinex theory [7] provides a computational model to estimate the human color sensation. It is based on the empirical evidence that, in the human vision system, the color signal is firstly processed separately by the retina photoreceptors, and then by the cortex. This latter re-works the color information taking into account the spatial arrangement of the other colors present in the observed scene. Therefore, the color sensation we derive when observing a point, depends not only on the photometric properties of that point but also on those of the surrounding regions.

According to this principle, the Retinex algorithm estimates the color sensation from a color digital picture as follows. The chromatic channels of the image are processed separately. For each channel, the chromatic intensity of each pixel x is re-scaled by a local white reference. This is an intensity level obtained by re-working the intensities of the pixels in a neighborhood of x, with the general prescription that the intensity of the pixels closer to x influences more the color sensation at x than the intensity of the pixels far away [2, 12, 19]. This procedure outputs an enhanced color image with better visible details. This image, that we refer to as filtered, differs from the actual color sensation in a set of pre- or post- LUT calibrations, that adjust the color gamut of the device in order to estimate the actual color sensation [20].

Two key points of Retinex implementation are (i) the definition of the sampling figure, i.e. of the neighboring pixels relevant to color sensation, and (ii) the determination of the local white reference. Despite the importance of these issues, the original Retinex description does not provide specific details about them. This has led to many different Retinex implementations [12, 13, 21]. For instance, the original Retinex algorithm [7] scans the neighborhood of any pixel x by a set of paths, traveling randomly over the image and ending at x: the sampling figure at x is the union set of the pixels lying along these paths. Each chromatic intensity at x in the filtered image is obtained by computing, over each path, the product of the intensity ratios of adjacent pixels, and then by averaging these products over the number of paths (division by zero is avoided). This path-based approach has been adopted by many subsequent Retinex implementations, e.g. [5, 9, 14, 15, 22, 23], which mainly impose some constraints on the path shape in order to improve the spatial exploration of the image. The methods in [1, 3, 6, 8, 16] define the neighborhood of any pixel x as a spray, i.e. a set of pixels distributed around x with radial density. They compute the local white reference through the equation of the so-called Milano Retinex algorithms [10, 11], mathematically formalized in [15] and implemented with sprays in [16]: for each x, a set of sprays is generated and the local white reference is computed as the mean value of the maximum intensity over the sprays, averaged over the number of sprays.

In this work, we present T-Rex, a novel method belonging to the Milano Retinex family [17]. In T-Rex, for each image channel, the sampling figure of any image pixel x is the set of the image pixels whose intensity value exceeds the intensity value of x. The local white reference is obtained by averaging the intensity of the sampled pixels, weighted by their spatial distance from x. The intensity at x acts as a threshold for defining the pixels relevant for estimating the color sensation. The name T-Rex just comes from the keywords Threshold and REtineX, which characterize this approach.

The main novelty of T-Rex is the definition of a sampling figure specific for each pixel and based on a self-regulating intensity threshold. T-Rex shares with the spray based methods [1, 3, 4, 6, 8, 16] the idea of defining the neighborhood of any pixel x as a 2D set of points and to compute the local white reference by re-working pixel intensities greater than the intensity at x. Nevertheless, differently from these spray based approaches, which are characterized by a radial distribution around the center, the sampling figure of T-Rex at any pixel x does not have any specific geometric structure: it may strongly vary from pixel to pixel, according to the intensity at x and to the spatial weights. Unlike the methods employing a random sampling, such as the original Retinex implementation, the path-based methods mentioned above, and many spray-based approaches, the exploration of the image performed by T-Rex is deterministic. This is an advantage because the random sampling may introduce in the filtered image chromatic noise, that is usually removed a posteriori or mitigated by repeating many times the image sampling and then averaging the results. Finally, in T-Rex, the sampled intensities do not correspond to intensity extrema over the pixel neighborhood (as in [1, 3, 6, 8, 16]), and their selection is performed in an unsupervised manner, without requiring the user to input any threshold on intensity (as is done instead in [4]).

In this work, we do not consider any pre- or post- LUT calibration, and thus we employ and evaluate T-Rex as an image enhancer, not as a model of human vision [20]. The experiments, carried out on real world color pictures, show that, as a member of the family of Milano Retinex algorithms, T-Rex improves the readability of images captured with unbalanced exposures, increasing their brightness and contrast and equalizing their dynamic range.

The rest of the paper is organized as follows: Sect. 2 describes T-Rex in details; Sect. 3 reports the experiments, and Sect. 4 outlines our conclusions and future work.

2 T-Rex

Let us introduce the notation used hereafter. We indicate a RGB image by \(\overline{I}\) and any chromatic channel of \(\overline{I}\) by I. For numerical reasons, we rescale the intensity values of I over [0, 1]. Moreover, in order to avoid division by zero, for any \(x \in S\) such that \(I(x) = 0\) we set \(I(x):= 10^{-6}\). Then, we represent I as a function \(I: S \rightarrow (0, 1]\), where S denotes the image support, i.e. the set of pixels coordinates, and |S| is the size of S. We denote the filtered version of \(\overline{I}\) by \(\overline{L}\) and any chromatic channel of \(\overline{L}\) by L.

T-Rex takes as input a RGB image \(\overline{I}\). According to the Retinex theory, it processes its channels independently. For each channel I and for each pixel \(x \in S\), T-Rex implements the following operations:

  1. 1.

    Modeling the color spatial interaction: T-Rex defines the function \(v_x: S \rightarrow \mathbf {R}\) such that

    $$\begin{aligned} v_x(y) = I(y) \exp [-\lambda d(x, y)^2] \end{aligned}$$
    (1)

    where d(xy) is the Euclidean spatial distance between x and y, normalized in order to range over [0, 1]. Precisely:

    $$\begin{aligned} d(x, y) = \frac{\parallel x - y \parallel }{D} \end{aligned}$$
    (2)

    where D is the length of the diagonal of S. The parameter \(\lambda \) is a positive real number, weighting the importance of the distance term versus the intensity. As suggested by the subscript, \(v_x\) varies from pixel to pixel. It is introduced to model the spatial interaction among colors. The multiplicative term \(\exp [-\lambda d(x, y)^2]\) acts as a penalty term: the intensity of the pixels close to x are weighted more than that of the pixels further from x. This is in line with the studies in [12], reporting about the influence of the distance on the color sensation.

  2. 2.

    Defining the Sampling Figure: T-Rex scans the neighborhood of x to find out the sampling figure N(x) at x. We refer to x as the center of N(x). Precisely, a pixel y of S belongs to N(x) iff

    1. (a)

      \(v_x(y) > I(x)\)

    2. (b)

      \(d(x, y) = \min \{ d(u, x): u \in S \text { and } v_x(u) = v_x(y)\}\).

    The sampling figure N(x) is defined by thresholding the function \(v_x\) by the intensity value I(x) (condition (a)). The pixels of N(x) are the closest to x among the pixels satisfying the condition in (a) (condition (b)). The size and the geometry of N(x) depends on the parameter \(\lambda \). Differently from the sampling set of the path-based approaches, which is simply connected, the sampling figure of N(x) is usually not connected. Moreover, N(x) may also be empty. In this case, the local white reference is I(x), as explained next.

  3. 3.

    Computing the Local White Reference: the local white reference is computed as follows:

    $$\begin{aligned} w(x) = {\left\{ \begin{array}{ll} \frac{1}{\sum _{y \in N(x)} \exp [-\lambda d(x, y)^2]} \mathop {\sum }\nolimits _{y \in N(x)} v_x(y) &{} \text { if } N(x) \ne \emptyset \\ I(x) &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
    (3)

    and the L(x) is given by

    $$\begin{aligned} L(x) = \frac{I(x)}{w(x)}. \end{aligned}$$
    (4)
Fig. 1.
figure 1

On top: the green circles indicate the pixels of the figure sampling at the pixel highlighted on a grey level image by the red circle for \(\lambda = 0.5\) (left) and \(\lambda = 1.5\) (right). On bottom: the corresponding T-Rex filtered images. (Color figure online)

From Eq. (3), we have that the value of w(x) is always greater or equal than I(x). This is in line with the principles of the Milano Retinex algorithms, that select as local white reference an intensity values equal to or greater than I(x).

The T-Rex algorithm requires as input an image and a value for the parameters \(\lambda \). When \(\lambda = 0\), no penalty is applied to the image intensities and the sampling figure of T-Rex includes all the image pixels with intensity higher than I(x). When \(\lambda \rightarrow +\infty \), \(v_x(y)\) tends to zero, thus if \(I(x) \ne 0\), the set N(x) is empty, and \(w(x) = I(x)\). Therefore, different values of \(\lambda \) produce different figure samplings and lead to different color filtering (see for instance Fig. 1).

3 Experiments

In Subsect. 3.1 we define the measures used for evaluating the T-Rex performance, while in Subsect. 3.2 we describe the dataset used in the experiments and the results, including also a comparison with two other Milano Retinex approaches.

3.1 Evaluation Measures

We evaluate the performance of T-Rex in terms of image enhancement. We observe that in the literature, there are not agreed measures for assessing the quality and/or the accuracy of image enhancement algorithms. In this framework, we consider three measures, already employed for analysing Retinex performance, e.g. [8, 9]. These measures are suitable to describe numerically the variations of visual features related to the readability of an image: its brightness, its details and its dynamic range. These features are modified by Retinex, that usually increases the brightness and the details visibility (i.e. the contrast) and equalizes the dynamic range of the input image.

Given a color image \(\overline{I}\) with support S, we compute its brightness \(B_{\overline{I}}\) as the 1-channel image defined on S such that

$$\begin{aligned} B_{\overline{I}}(x) = \frac{1}{3}\sum _{i = 1}^{3} I_i(x). \end{aligned}$$

Here we do not normalize their intensity values over [0, 1]: the variability range of \(I_i\) (\(1 \le i \le 3\)) is thus the discrete set \(\{0, \ldots , 255\}\). For each pixel x, the value \(B_{\overline{I}}(x)\) is cast to an integer number between 0 and 255.

The three measures employed for evaluating the image enhancement performance of T-Rex are:

  1. 1.

    Mean brightness \(f_0\): it is the average of the values of \(B_{\overline{I}}\) over the number of pixels:

    $$\begin{aligned} f_0 = \frac{1}{|S|} \sum _{x \in S} B_{\overline{I}}(x). \end{aligned}$$
    (5)
  2. 2.

    Multi-resolution Contrast \(f_1\): this measure, introduced in [18], is defined by building up a pyramid of K (\({>}0\)) images \(B_1, \ldots , B_K\), where \(B_1\) = \(B_{\overline{I}}\), and, for each \(1 < k \le K\), \(B_k\) is computed by rescaling \(B_{k-1}\) by 0.5. The value of \(f_1\) is then obtained by the following steps:

    • Computing the mean local contrast on each pyramid image: for each \(k \in \{1, \ldots , K\}\), for each x in the support \(S_k\) of \(B_k\), we compute the local contrast

      $$\begin{aligned} c_k (x) = \frac{1}{8} \sum _{u \in \mathcal {N}(x)} |B_k(u) - B_k(x)| \end{aligned}$$
      (6)

      where \(\mathcal {N}(x)\) indicates the 3 \(\times \) 3 window centered at x. Then, we compute the mean value

      $$\begin{aligned} \overline{c}_k = \frac{1}{|S_k|} \sum _{x \in S_k} c_k(x), \end{aligned}$$
      (7)

      where \(|S_k|\) is the cardinality of \(S_k\);

    • Computing the multi-resolution contrast on the pyramid: we average of the values \(\overline{c}_k\)’s over the number of images \(B_k\)’s:

      $$\begin{aligned} f_1 = \frac{1}{K} \sum _{k=1}^{K}\overline{c}_k. \end{aligned}$$
      (8)
  3. 3.

    Histogram Flatness \(f_2\): it measures how much the dynamic range of the image brightness has been stretched by the T-Rex filtering. Let H be the histogram of \(B_{\overline{I}}\) normalized in order to range over [0, 1]; let U be the discrete uniform probability density function defined over the set \(\{0, \ldots , 255\}\). The histogram flatness is the \(L^1\) distance between H and U, i.e.

    $$\begin{aligned} f_2 = \frac{1}{255}\sum _{b = 0}^{255} |H(b) - U(b)|. \end{aligned}$$
    (9)

An image enhancer “usually” increases the values of \(f_0\) and \(f_1\), while decreases the value of \(f_2\). We have quoted the word usually because the amount of the variations of \(f_0, f_1, f_2\) depends on the input image. In particular, we observe that the increment of \(f_0\) and \(f_1\) and the reduction of \(f_2\) are more evident when the input image is dark and its details are poorly visible, than when the image is already clear.

3.2 Results

For our experiments, we consider a dataset of 20 real-world color pictures, depicting both indoor and outdoor environments. Despite its small size, this dataset is of interest because its images have been mainly captured under bad illuminant conditions, so they appear quite dark and with poorly visible details. Moreover, they display dark and bright regions with different size, proportion, and location. These cues make this image set suitable to evaluate the performance of T-Rex as image enhancer, also from a qualitative point of view. Some examples are shown in Fig. 2.

Fig. 2.
figure 2

Some images used for evaluating T-Rex performance.

Table 1. Evaluation of T-Rex performance in comparison with RSR-P and QBRIX (local and global).

Table 1 reports the evaluation measures when no filtering is applied (NONE) and when T-Rex (with \(\lambda = 1.0\)) is applied. In addition, this table also reports the performance of two other Milano Retinex algorithms (QBRIX [4] and RSR-P [3]). We have chosen to compare these approaches with T-Rex because they present some similarities with T-Rex. Precisely: (a) QBRIX and RSR-P are Milano Retinex implementations; (b) as Milano Retinex implementations, they normalize the intensity I(x) of any pixel x with an intensity level greater than I(x); (c) like T-Rex, they are deterministic approaches.

Both QBRIX and RSR-P derive from the algorithm Random Spray Retinex (RSR) [16], that works as follows. Given an image channel I and a pixel x, RSR re-scales the intensity I(x) with the maximum intensity over a spray, i.e. a cloud of n pixels randomly sampled around x with radial density. In order to remove - as much as possible - the chromatic noise due to the random sampling, many sprays are generated and the final value L(x) is obtained by averaging the contribution from each spray. The size n of the spray and the number N of sprays are input by the user. When n equals the number of image pixel, RSR behaves like the scale-by-max algorithm.

QBRIX proposes an approximated, probabilistic version of RSR. It is based on the observation that the colors rarely occurring in the image do not influence the color sensation, thus they can be ignored by the color filtering process. There are two implementations of QBRIX. The first one is a global filter (G-QBRIX): for each channel, it re-scales the chromatic intensity of each pixel by a local white reference \(I_Q\), corresponding to a quantile \(Q_G\) of the probability density function (pdf) of the intensities of that channel. The local white reference is thus the same for each pixel, determined by the value \(Q_G\) input by the user. The second implementation is a local filter (L-QBRIX): in this case, the local white reference \(I_L\) depends on the pixel, and it corresponds to a quantile \(Q_L\) of the pdf of the channel intensities, weighted by a function accounting for their spatial arrangement in the image. The value \(Q_L\) is fixed by the user.

RSR-P re-writes the random sampling procedure of RSR in a deterministic, noise free, population based approach. In RSR, the local white reference is the average of the maximum intensities selected from random sprays. In RSR-P, the same local white reference is determined by re-working suitable quantities from the pdf of the chromatic intensities, without performing any random sampling. These quantities are basically related to the probability that a given pixel has the maximum intensity over a set of n samples where, as in RSR, n is an user input. Differently from the RSR approximation provided by QBRIX, RSR-P is an exact mapping of RSR into a population based approach. In particular, when \(N \rightarrow +\infty \), RSR and RSR-P yield the same results.

T-Rex enhances the brightness and the contrast of the input pictures, producing higher values of \(f_0\) and \(f_1\), while it equalizes the brightness histogram, so that the value of \(f_2\) decreases. The algorithms QBRIX and RSR-P exhibit a similar behaviour. In these experiments, the quantiles \(Q_G\) and \(Q_L\) are set up to 0.99, while in RSR-P n has been set up to 250.

Fig. 3.
figure 3

Some input images (in column (a)) and their color filtered versions obtained by T-Rex (in column (b)), RSR-P (in column (c)), L-QBRIX (in column (d)), and G-QBRIX (in column (e)).

Figure 3 shows some visual examples of color filtering produced by T-Rex, RSR-P, QBRIX (local and global).

On average, the T-Rex outputs are close to those obtained by RSR-P and G-QBRIX. The highest (lowest, resp.) values of brightness and contrast (flatness, resp.) are obtained by L-QBRIX: this is because L-QBRIX weights the contribution of the distance versus the intensity much more than the other algorithms. Precisely, in the pdf computation performed by L-QBRIX, for any pixel x, the intensity value I(y) of any image pixel \(y \ne x\) is weighted by the quantity

$$\begin{aligned} \Big [\frac{\parallel x - y \parallel }{D} \Big ]^{-\alpha } \end{aligned}$$

where D is the length of the image support diagonal, while \(\alpha \) determines the metric adopted for modeling the spatial interaction among color, and here \(\alpha = 2\).

Fig. 4.
figure 4

In clock-wise order: an input image and its T-Rex filtered versions with \(\lambda \) = 0.50, 0.75, 1.00, 1.25, 1.50, 2.00, 2.25, 2.50.

Fig. 5.
figure 5

Two input images and their T-Rex filtered versions for \(\lambda = 0,50, 1.00, 2.00\). For the input image in the first row, varying the \(\lambda \) parameter yields very different outputs. This does not happen for the input image in the second row.

The spatial weight introduced in L-QBRIX is similar to that expressed by the term \(\exp [-\lambda d(x, y)^2]\) in the Eq. (1) of T-Rex. We observe that a high value of \(\lambda \) (and a low value of \(\alpha \) in L-QBRIX) may produce a loss of the image local details, and even introduce artifacts: in particular, as already mentioned in Sect. 2, for \(\lambda \rightarrow +\infty \), the local reference is the pixel intensity itself, so that the final lightness is a white image. This is illustrated in Fig. 4, showing a gray level image and its T-Rex filtered versions for increasing values of \(\lambda \): for \(\lambda > 1.00\), an over-enhanced region is visible on the upper left corner of the image. This effect is emphasized in the color version of this image (see first row in Fig. 5). In general, the value of \(\lambda \) giving a “satisfactory” output in terms of image enhancement depends on the input image. For instance, for the image shown in the second row of Fig. 5, all the values \(\lambda = 0.5, 1.0, 2.0\) produce good results. The dependency of the parameter tuning on the image content, at a first quick analysis can appear as an unwanted characteristic, but on the contrary it is a positive one. First, it is exactly a characteristic of the HVS, that has no fixed response and thus cannot be modeled as a static filter, second, due to the image variability and complexity, a fixed thresholding usually means good results only for a subset of the input images. This characteristic is in fact common for the whole spatial color algorithms family [20].

4 Conclusions

In this paper, we have presented T-Rex, a novel, deterministic Milano Retinex implementation. It is based on the definition of a sampling figure through a self-adaptive intensity thresholding strategy. The experiments, conducted on a set of real-world color pictures, captured with unbalanced exposure, show that, in agreement with the principles of the Retinex theory, T-Rex works as image enhancer: it equalizes the dynamic range of the input image and improves the visibility of its details. The experiments also show that the final output depends on the value of \(\lambda \). For instance, a very large value of \(\lambda \) may produce a loss of the image details. In the current implementation of T-Rex, the value of \(\lambda \) is input by the user. Our future work will address a more detailed analysis of the dependence of the T-Rex output on \(\lambda \), and the development of a technique for an unsupervised estimation of a variability range of \(\lambda \) suitable for image enhancement.