Keywords

1 Introduction

Different image stabilization techniques, both block-based and feature-based ones, have been proposed in literature. Most of these techniques are computationally expensive and thus not suitable for real time applications. We work under the assumption of limited (i.e. few pixels) misalignments between adjacent frames. Moreover, we take into account only vertical and horizontal shifts, because rotation is less disturbing for optical models with a wide field of view and the rotational compensation is expensive in terms of computations.

The proposed technique is primarily inspired by the use of characteristics curves and tends to overcome some limitations, often occurring in practical situations, especially in low-cost cameras. This is obtained by an accurate analysis of the characteristics curve by filtering out the deceptive information.

This paper is structured as follows: in Sect. 2 the prior art is shown; in Sect. 3 the proposed method is described. Then the experimental results are shown in Sect. 4, followed by some conclusions and future work in final Sect. 5.

2 Prior Art

Prior art digital video stabilization techniques can be grouped in two main categories: block based and feature based. The block based techniques split the image into blocks and then blocks of current frame are compared with blocks of previous frame to calculate the related motion. The matching between blocks of the current and the previous frame is done on the basis of block matching criteria. Smaller the value of matching criterion better is the match between blocks. To retrieve the motion vector for each block, different matching metrics have been proposed, like the Partial Distortion Elimination (PDE) [1], Mean Absolute Difference (MAD) [2] or Universal Image Quality Index (UIQI) [3]. A compounding algorithm to detect the global motion vector is then applied. These techniques are usually robust, but really slow, because the whole image should be processed, in a block by block fashion.

The feature based techniques, on the contrary, allow retrieving directly the global motion vector through the analysis of a particular feature. Recent papers have mainly adopted Speeded Up Robust Features (SURF) [4], Scale Invariant Feature Transform (SIFT) [5] and Kanade-Lucas-Tomasi (KLT) techniques [6]. Even if there is no need to process the whole image, as the block based algorithms, the disadvantage of feature based methods is that they are strictly dependent on feature point extraction step [7], in terms of accuracy. Moreover, even if these kinds of algorithms are robust enough, they are too expensive to be used in low cost cameras. Their complexity is due, not only to the expensive feature calculation, but especially to the related feature matching.

Usually in feature based techniques, the optical flow analysis is executed [8,9,10]. The optical flow is a useful representation of the scene, consisting in the set of motion vectors, calculated between the previous and current frame. Main problem is to distinguish foreground motion vectors, caused by moving objects, which should be not considered for the video stabilization, from background motion vectors, which have to be used to determine the correct stabilization. To solve this problem usually the Random Sample Consensus (RANSAC) [11] is used. Since RANSAC is a not deterministic iterative method, the worst case could require too much iterations to converge, excessively slowing down the whole processing. In recent years some optimizations of RANSAC algorithm have been proposed [12], but not so relevant to drastically increase performances.

Looking at the complexity of the aforementioned methods, we have chosen to start from a simpler technique, based on motion estimation through integral projection curves [13, 14]. This technique works as follows: for simplicity, let us assume we have two gray-scale temporally adjacent frames, where M and N are the horizontal and vertical dimensions and p ij is the pixel value in the position (i, j). The characteristics curves along the horizontal and vertical dimensions are respectively defined as:

$$ C_{h} (j) = \frac{1}{N}\sum\limits_{i = 1}^{N} {p_{ij} } $$
(1)
$$ C_{v} (i) = \frac{1}{M}\sum\limits_{j = 1}^{M} {p_{ij} } $$
(2)

The meaning of the curves can be easily understood by referring to the drawing of Fig. 1, where two schematic characteristics curves are reported. Figure 2 shows C h curves for two successive frames F 1 and F 2 . A shift of the curves along the x-axis represents an equivalent shift of the frames in the horizontal dimension (similar approach for y-axis and vertical shift). From Fig. 2 the horizontal shift is particularly evident. Hence, in order to properly evaluate the motion occurring between consecutive frames, the shift along the axes (off h , off v ) of both C h and C v curves can be calculated as follows:

Fig. 1.
figure 1

A schematic representation of characteristics curves.

Fig. 2.
figure 2

Characteristics curves related to two adjacent frames.

$$ \begin{aligned} P_{h} (s) & = 1\frac{1}{M - \left| s \right|}\sum\limits_{j = \hbox{max} (1, - s)}^{\hbox{min} (M - s,M)} {\left| {C_{h}^{{F_{1} }} (j) - C_{h}^{{F_{2} }} (j + s)} \right|} \\ off_{h} & = \left\{ {s':P_{h} (s') = \hbox{min} P_{h} (s)} \right\} \\ \end{aligned} $$
(3)
$$ \begin{aligned} P_{v} (s) & = \frac{1}{{N\text{ - }\left| s \right|}}\sum\limits_{j = \hbox{max} (1, - s)}^{\hbox{min} (N - s,N)} {\left| {C_{v}^{{F_{1} }} (i) - C_{v}^{{F_{2} }} (i\text{ + }s)} \right|} \\ off_{v} & = \left\{ {s':P_{v} (s') = \hbox{min} P_{v} (s)} \right\} \\ \end{aligned} $$
(4)

The term s is the search window size and represents the maximum retrievable displacement.

The described method is low-cost and robust to noise, but it suffers in the cases of illumination changes and motion blur. The effect of illumination changes, in the calculated characteristics curves, is shown in the following example. Figure 3 represents the results of the ideal matching (−29), that is one curve should be shifted by 29 pixels to match with the other one. It is to note that the shape of the curves is similar, but the values are slightly different due to the scene illumination changes. Due to these differences, with the integral projection method, the matching is not correct, as indicated in Fig. 4. This effect is more evident along the edge (a strong edge is visible around the point 1600).

Fig. 3.
figure 3

Ideal matching of two successive characteristics curves in the case of illumination changes (perfect matching = −29).

Fig. 4.
figure 4

Matching results with integral projection in the case of scene illumination changes (wrong matching = −25).

3 Proposed Filtering

The effect of the illumination change in the curve is basically a shift, as indicated in Fig. 3. Removing this shift, the matching problem can be eliminated. It can be performed deleting the DC component to the integral projection signal, by pre-filtering the characteristics curve with a High Pass Filter (HPF), thus enhancing curve peaks, obtained as follows:

$$ C_{x}^{'} = C_{x} - (C_{x} \;{ \otimes }\;\text{ }LPF) $$
(5)
$$ C_{y}^{'} = C_{y} - (C_{y} \;{ \otimes }\;LPF) $$
(6)

A very simple LPF with 16 ones (i.e. [1111111111111111]/16) allows obtaining a simple HPF, with good results and low extra cost, subtracting the filtered data to the original one. The filter response is shown in Fig. 5. In the proposed example, with this simple filter, we obtain the perfect matching (−29), as shown in Fig. 6.

Fig. 5.
figure 5

The magnitude response of the HPF: 1 − [1111111111111111]/16.

Fig. 6.
figure 6

Perfect matching results obtained applying a simple HPF to the characteristics curves in the case of scene illumination changes

Although this filter works better than prior art approaches in case of scene illumination changes, this is not the case when motion blur is present. To better understand this problem, we will show an example. In Fig. 7 the unfiltered characteristic curves are plotted. The matching is not perfect (+11 instead of +13 pixels). Using the HPF shown above, the matching is even worst (+7), as shown in Fig. 8. The mismatch is mainly due to the highest frequencies, considered as noise. By removing highest frequencies, hence applying a Low Pass Filter, the problem can be reduced.

Fig. 7.
figure 7

Matching results with prior art in the case of motion blur (good matching = + 11, instead of perfect matching = + 13).

Fig. 8.
figure 8

Matching results obtained applying a simple HPF to the characteristics curves in the case of motion blur (bad matching = + 7).

Combining the two conditions, illumination changes and motion blur, a Band Pass Filter (BPF) seems to be the solution, so a Butterworth IIR BPF has been chosen. The general form of the IIR filter function is defined as:

$$ H\left( z \right) = \frac{{\sum\limits_{l = 0}^{L - 1} {b_{l} z^{ - l} } }}{{1\text{ + }\sum\limits_{m = 1}^{M} {a_{m} z^{ - m} } }} $$
(7)

The design problem consists in determining the coefficients b l and a m so that H(z) satisfies the given specifications. A second-order IIR filter has been chosen, to obtain a good tradeoff between implementation cost and results. The cutoff frequencies of the filter were fixed to w1 = 0.01 Hz and w2 = 0.20 Hz, chosen after having performed several simulations to obtain better results in the matching. The magnitude response of this filter is indicated in Fig. 9. With this filter, we obtain a good matching in both illumination changes and motion blur conditions, as indicated in Fig. 10 (related to the example in Fig. 4) and Fig. 11 (related to the example in Fig. 7).

Fig. 9.
figure 9

The magnitude response of the filter 2th order BPF, Butterworth IIR, Direct form II, w1 = 0.01; w2 = 0.2.

Fig. 10.
figure 10

Scene illumination changes: matching results obtained applying a IIR BPF to the characteristics curves (perfect matching = −29).

Fig. 11.
figure 11

Motion blur: matching results obtained applying a IIR BPF to the characteristics curves (obtained matching = +11 perfect matching = +13).

This IIR BPF is really a good solution, even if it requires several floating point multiplications. It should not be disregarded that IIR filters are often prone to coefficients quantization errors. The zeroes-poles graph of the IIR filter, represented in Fig. 12, shows that one pole is quite close to the unit circle, so this IIR filter is susceptible to finite precision effects, and hence optimization of this filter should be carefully chosen and tested.

Fig. 12.
figure 12

Zeroes-poles graph of the IIR Butterworth BPF, order 2, cut-off frequencies 0.01–0.2.

4 Experimental Results

About one hundred video sequences have been used to test the quality of the results in both objective and subjective way. Tables 1, 2 and 3 show the results obtained respectively with the prior art method (Classical IP) and the proposed FIR (HPF) and IIR (BPF) methods. In these Tables, for each video sequence, the minimum (MinX and MinY), the maximum (MaxX and MaxY) and the standard deviation (StdX and StdY) were computed between the real motion vectors and the estimated ones (in pixel), for each axis (X and Y). We can note a reduction in the standard deviation error of about 11% for the FIR and about 16% for the IIR. In particular, the best improvement of IIR compared with the prior art method and FIR is obtained in the cases of weakness of these methods, represented by the presence of overall motion blur and artifacts due to the rolling shutter (respectively video 3 and 6).

Table 1. Previous art results.
Table 2. Proposed FIR HPF results.
Table 3. Proposed IIR BPF results.

5 Conclusion and Future Work

A very low-cost algorithm for video stabilization has been developed, suitable for real-time processing. It achieves significant improvements in both subjective and objective manner compared with state of the art algorithms with similar complexity, reaching about 16% of improvement in standard deviation of the error. Moreover, apart to be robust to noise, like the prior art methods, it is also robust to illumination changes, motion blurring and rolling shutter distortions. Further investigation will involve a deeper research of better BPF to approximate the IIR chosen, to fully work in finite precision, and extension of the proposed technique to roto-traslation.