Keywords

1 Introduction

Frame interpolation, that is also called Frame Rate Up-Conversion (FRUC), is to generate new frames on the basis of the prior information, which increases the frame rate. For example, we can utilize the technique of FRUC to convert a video at 30 frames per second to 60 frames per second or more by interpolating new frames. Techniques of frame interpolation can be applied to improve the visual effect of videos in various electronic equipments such as the television, game consoles, computers and so on.

The conventional framework of frame interpolation is composed of block matching motion estimation and Motion-Compensated Interpolation. Block matching motion estimation is our main concern. Furthermore, there are various kinds of algorithms using block matching motion estimation. [8] paid more attention to reducing the computational complexity. [1, 9, 15] took use of multivariate information including multi-frame and multi-level. [4, 6, 7, 12, 14] concentrated on getting true motion vectors through motion estimation and motion refinement.

Motivated by the efficiency and the performance of block matching motion estimation and motion refinement, new motion estimation with coarse-to-fine searching and Angular-Distance Median Filter (ADMF) is proposed for frame interpolation as illustrated in Fig. 1. The reasons how we design the framework from coarse to fine and the algorithm of motion refinement called ADMF will be explained as follows.

In [4, 6, 12, 14], some researchers combined Unidirectional Motion Estimation (UME) and Bidirectional Motion Estimation (BME), while other some researchers combined Forward Motion Estimation and Backward Motion Estimation. The combinations of various kinds of motion estimation aim to obtain more accurate motion vectors. But, simply combining them together is not efficient and maintains much redundant calculation.

For the purpose of reducing the redundant computation, Bidirectional Motion Estimation is applied in both coarse and fine searching. Because, the computation of BME is much less than the motion estimation abovementioned. In addition, relatively accurate motion vectors generated by BME can be used to refine wrong motion vectors by the proposed algorithm ADMF. Based on BME, the framework from coarse to fine not only reduces the amount of calculation, but also improves the final performance.

After the initial motion estimation, a variety of methods of motion refinement can be applied to refine motion vectors. These algorithms include Spatio-Temporal Motion Vector Smoothing [12], Two-Dimensional Weighted Motion Vector Smoothing (2DW-MVS) [14] and Trilateral Filtering Motion Smoothing [4]. These methods are unable to correct all the wrong motion vectors generated by the initial motion estimation, which results in blurry-problems. Blurry-problems are resulted from overlapped situations that different estimative blocks move towards similar position.

In order to effectively correct wrong motion vectors, a new algorithm of motion refinement, that Angular-Distance Median Filter, is proposed to applied after the initial motion estimation. ADMF is based on the angular and the distance of motion vectors. Furthermore, wrong motion vectors can be refined according to neighbouring motion vectors of them and their neighbours. So, it is an excellent algorithm for refining most wrong motion vectors. More details of ADMF will be elaborated in Sect. 2.

The contributions of this paper are summarized as follows. Firstly, a new method of motion estimation, which is a two-stage method from coarse searching to fine searching, is proposed for frame interpolation. Secondly, a novel algorithm of motion refinement called Angular-Distance Median Filter is put forward to effectively correct wrong motion vectors. Thirdly, experimental results demonstrate that the proposed approach outperforms the other compared techniques for frame interpolation in both subjective and objective evaluation.

The rest of this paper is organized as follows. In Sect. 2, we will elaborate the proposed algorithm including coarse-to-fine searching and Angular-Distance Median Filter. In Sect. 3, the experimental results will be showed and analysed. In Sect. 4, conclusions will be discussed.

2 Methodology

As illustrated in Fig. 1, the framework of frame interpolation is composed of motion estimation and Motion-Compensated Interpolation. This article proposes a new method of motion estimation based on coarse-to-fine searching and Angular-Distance Median Filter (ADMF).

Fig. 1.
figure 1

The proposed motion estimation for frame interpolation

Firstly, from the previous frame and the current frame in test video sequences, the initial motion vectors are estimated by Bidirectional Motion Estimation (BME) in a wide search range. Secondly, ADMF will be applied to update motion vectors until it meets terminal conditions. Thirdly, on the basis of updated motion vectors generated by ADMF, BME is employed again to refine motion vectors in a small search range. Fourthly, the algorithm of Motion-Compensated Interpolation will utilize the final motion vectors to generate interpolated frames.

2.1 Coarse Searching

In coarse searching, BME [3] is applied to estimate the initial motion vectors. The reasons why BME is chosen will be explained as follows. Firstly, its computational complexity is much less than the combination of Forward Motion Estimation and Backward Motion Estimation [12]. Secondly, the hole-problems resulted from Unidirectional Motion Estimation (UME) will not exist in BME. The hole-problems exist where no estimative blocks move to. Thirdly, the initial motion vectors calculated by BME are sufficient for the following motion refinement, which can utilize true motion vectors to correct wrong motion vectors.

Fig. 2.
figure 2

Bidirectional motion estimation

The schematic diagram of BME is as illustrated in Fig. 2. In the left half of Fig. 2, \( F(n-1) \), \( F(n-\dfrac{1}{2}) \) and F(n) denote the previous frame, the interpolated frame and the current frame, respectively. The motion of blocks is assumed to be linear. In addition, motion vectors v are estimated by comparing the similarity of different blocks. The discriminate criterion of the similarity is the sum of absolute difference (SAD) between the pixel values in the previous frame \( F(n-1) \) and that in the current frame F(n) .

As shown in the right half of Fig. 2, \( B_{ij} \) represents a block which is in the i th row and the j th column of the interpolated frame. It is defined as:

$$\begin{aligned} \begin{aligned} B_{ij}=\lbrace (x,y)|1+(j-1)\times BS\le x \le j\times BS,\\ 1+(i-1)\times BS\le y \le i\times BS\rbrace \end{aligned} \end{aligned}$$
(1)

where (x, y) denotes the position of the pixel in the interpolated frame and BS means the block size of \( B_{ij} \). In order to enhance the accuracy of motion estimation, a trick is applied here, which expands the block size of \( B_{ij} \). \( EB_{ij} \) represents an expanding block of \( B_{ij} \) with expanded size ES:

$$\begin{aligned} \begin{aligned} EB_{ij} = \lbrace (x,y)|1+(j-1)\times BS-ES\le x \le j\times BS+ES,\ \\ 1+(i-1)\times BS-ES\le y \le i\times BS+ES\rbrace . \end{aligned} \end{aligned}$$
(2)

After the definitions of block and block size, SAD is used to calculate motion vectors \( \lbrace \varvec{v}_{ij}\rbrace \). \( \varvec{v}_{ij}=(v_x,v_y)\), a motion vector, denotes the distance which the block \( EB_{ij} \) moves relative to the previous frame and the current frame. In order to differentiate SAD values in various stages, the SAD value in coarse searching is called SADC. The mathematical expressions of SADC and motion vectors \( \lbrace \varvec{v}_{ij}\rbrace \) are:

$$\begin{aligned} \begin{aligned} SADC(v_x,v_y)= \sum \limits _{(x,y)\in EB_{ij}} \vert F_{n-1}(x-v_x,y-v_y)-F_{n}(x+v_x,y+v_y)\vert \end{aligned} \end{aligned}$$
(3)

and

$$\begin{aligned} \varvec{v}_{ij}=(v_x,v_y)=\mathop {\arg \min }_{(v_x,v_y)\in CSR}\lbrace SADC(v_x,v_y)\rbrace \end{aligned}$$
(4)

where

$$\begin{aligned} CSR=\lbrace (v_x,v_y)|-CWS\le v_x,v_y\le CWS\rbrace . \end{aligned}$$
(5)

In the above Eq. (5), CSR represents the search range in coarse searching, while CWS means the search window size in coarse searching.

2.2 Angular-Distance Median Filter

When the initial motion vectors \( \lbrace \varvec{v}_{ij}\rbrace \) are generated by BME, Angular-Distance Median Filter is proposed to refine motion vectors as illustrated in Fig. 3. Red arrows mean wrong motion vectors, while black arrows mean true motion vectors. As the blue circle of Fig. 3 shows, motion vectors of adjacent blocks point to the similar position, which will result in blurry-problems in the final interpolated frame. It is observed that there exists a main direction in most frames of test video sequences, which means that wrong motion vectors can be improved or corrected by neighbouring motion vectors. Then, the mathematical theory about ADMF algorithm will be explained as follows.

Fig. 3.
figure 3

Motion vectors refined by ADMF (Color figure online)

ADMF is an algorithm using the angular and the distance. The definitions of the angular A and the distance D are:

$$\begin{aligned} A(\varvec{v},\varvec{v_0})=\arccos \left( \dfrac{\varvec{v}\cdot \varvec{v}_0^T}{\Vert \varvec{v}\Vert \Vert \varvec{v_0}\Vert }\right) \end{aligned}$$
(6)

and

$$\begin{aligned} D(\varvec{v})=\Vert \varvec{v}\Vert \end{aligned}$$
(7)

where \( \varvec{v} \) denotes the initial motion vector generated by BME, while \( \varvec{v}_0=(1,0) \) and it is chosen as a reference direction.

On the basis of \( A(\varvec{v},\varvec{v_0}) \) and \( D(\varvec{v}) \), Absolute Angular Difference (AAD) and Absolute Distance Difference (ADD) are calculated to judge the validity of motion vectors \( \lbrace \varvec{v}_{ij}\rbrace \). \( AAD(\varvec{v}_{ij}) \) and \( ADD(\varvec{v}_{ij}) \) are defined as:

$$\begin{aligned} AAD(\varvec{v}_{ij})=\vert A(\varvec{v}_{ij},\varvec{v}_0)-\dfrac{1}{N}\sum _{k=0}^{N-1} A(\varvec{v}_k,\varvec{v}_0)\vert \end{aligned}$$
(8)

and

$$\begin{aligned} ADD(\varvec{v}_{ij})=\vert D(\varvec{v}_{ij})-\dfrac{1}{N}\sum _{k=0}^{N-1} D(\varvec{v}_k)\vert \end{aligned}$$
(9)

where \( N=8 \) and \( \lbrace \varvec{v}_k \rbrace \) means 8 neighbour motion vectors of the center motion vector \( \varvec{v}_{ij} \).

After the calculation of \( AAD(\varvec{v}_{ij}) \) and \( ADD(\varvec{v}_{ij}) \), the reasonable threshold is set to judge the validity \( V_{ij} \) of the motion vector \( \varvec{v}_{ij} \):

$$\begin{aligned} V_{ij}=\left\{ \begin{aligned} 1,&\quad AAD(\varvec{v}_{ij})\le \dfrac{\pi }{6},\ ADD(\varvec{v}_{ij})\le \dfrac{BS}{16}\\ 0,&\quad AAD(\varvec{v}_{ij})\ge \dfrac{\pi }{4},\ ADD(\varvec{v}_{ij})\ge \dfrac{BS}{8}. \end{aligned} \right. \end{aligned}$$
(10)

If \( V_{ij}=0 \), the motion vector \( \varvec{v}_{ij} \) will be updated through median filter:

$$\begin{aligned} \varvec{v}_{ij}=median\lbrace \varvec{v}_1,\varvec{v}_2,\ldots ,\varvec{v}_k\rbrace ,\quad if\ V_{ij}=0. \end{aligned}$$
(11)

Terminal conditions include two parts that the number of times of filtering and the percentage of valid motion vectors. The upper limit of the number is set to be \( num\le 5 \), because the visual effect of the interpolated frames will become blurry after so many times of median filtering. The lower limit of the percentage of valid motion vectors is set to be \( 95\% \) that

$$\begin{aligned} \sum \limits _{i=1}^m \sum \limits _{j=1}^n V_{ij}\ge 95\%\times m \times n \end{aligned}$$
(12)

where m denotes the number of blocks in row direction, while n denotes the number of blocks in column direction.

The proposed ADMF is composed of three steps as shown in Table 1. Firstly, num and V are initialized to zero. Secondly, Validity \( V_{ij} \) and the motion vector \( v_{ij} \) are updated according to Eqs. (10) and (11). Thirdly, repeat step 2 until it meets terminal conditions.

Table 1. The proposed ADMF algorithm

2.3 Fine Searching

After ADMF, not only wrong motion vectors will be corrected, but also true motion vectors will have a minor adjustment. Ensuring the accuracy of motion vectors refined with a fine adjustment is a major concern in fine searching. So, BME is utilized to refine motion vectors in a small search range. The main difference between coarse searching and fine searching is the search range because of their own purposes. Coarse searching aims to get the initial motion vectors in a wide search range, while fine searching aims to refine motion vectors in a small search range.

The SAD value in fine searching is called SADF. \( \varvec{\hat{v}}_{ij} = (\hat{v}_x,\hat{v}_y) \) generated by ADMF denotes the refined motion vector, while \( \varvec{v}_{ij} = (\hat{v}_x+v_x,\hat{v}_y+v_y) \) denotes the final motion vector for Motion-Compensated Interpolation. The mathematical expressions of SADF and motion vectors \( \lbrace \varvec{v}_{ij}\rbrace \) are:

$$\begin{aligned} \begin{aligned} SADF(v_x,v_y)=\sum \limits _{(x,y)\in EB_{ij}} \vert F_{n-1}(x-\hat{v}_x-v_x,y-\hat{v}_y-v_y)\\ -F_{n}(x+\hat{v}_x+v_x,y+\hat{v}_y+v_y)\vert \end{aligned} \end{aligned}$$
(13)

and

$$\begin{aligned} \begin{aligned} \varvec{v}_{ij}=(\hat{v}_x+v_x,\hat{v}_y+v_y)=\mathop {\arg \min }_{(v_x,v_y)\in FSR}\lbrace SADF(v_x,v_y)\rbrace \end{aligned} \end{aligned}$$
(14)

where

$$\begin{aligned} FSR=\lbrace (v_x,v_y)|-FWS\le v_x,v_y\le FWS\rbrace . \end{aligned}$$
(15)

In the above Eq. (15), FSR represents the search range in fine searching, while FWS means the search window size in fine searching.

3 Experiments

In our experiments, 10 test video sequences are applied to verify the validity of the proposed motion estimation for frame interpolation. These video sequences include Akiyo, Crew, Football, Foreman, Ice, Mobile, Paris, Silent, Soccer and Stefan. Especially in the sequences of Football and Soccer, there exists significant difference between adjacent video sequences because of high-speed moving objects, which means a great challenge for motion estimation. The resolution of 10 test video sequences is \( 352\times 288 \). Furthermore, experiments are conducted on the platform that Matlab R2015b.

Fig. 4.
figure 4

As shown in (a), the original frame is the 78th frame in foreman sequences. As shown in (b), (c) and (d), there are the 78th interpolated frames with PSNR values generated by other methods and the proposed method.

Table 2. Average PSNR and SSIM values of various methods in 10 test sequences

In order to evaluate the performance of interpolated frames, even frames of video sequences are skipped and generated according to neighbouring odd frames by various methods of Frame Rate Up-Conversion. For example, the 2nd frame predicted can be calculated by the 1st frame and the 3rd frame. Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) [13] are utilized as the evaluative criteria for describing the difference between even frames predicted and true even frames.

A complete framework of frame interpolation can be divided into two modules including motion estimation and Motion-Compensated Interpolation. Motion estimation is a major focus, so our proposed motion estimation will be compared with five other methods. These algorithms in this paper are called Bidirectional Motion Estimation (BME) [3], Forward-Backward Jointing Motion Estimation (FBJME) [12], Dual Motion Estimation (DME) [6], Direction-Select Motion Estimation (DSME) [14] and Linear Quardratic Motion Estimation (LQME) [4]. In the procedure of Motion-Compensated Interpolation, Overlapped Block Motion Compensation (OBMC) described in [2, 5] is applied to generate the final interpolated frames.

Experimental settings of the proposed approach will be detailed in the following. Block size is set to be \( BS = 16 \), while expanded size is set to be \( ES = 8 \). In coarse searching, coarse-searching window size is set to be \( CWS = 12 \), and step size is set to be 2. In fine searching, fine-searching window size is set to be \( FWS = 4 \), and step size is set to be 1. In addition, experimental settings of compared methods are according to those in [3, 4, 6, 12, 14].

Experimental results will be analysed in both subjective evaluation and objective evaluation. In subjective evaluation, interpolated frames generated by various methods will be displayed in the form of pictures. In objective evaluation, three numerical indexes including PSNR, SSIM [13] and running time will be considered.

3.1 Subjective Evaluation

In order to test the superiority of diverse methods of motion estimation subjectively, the original frame and interpolated frames generated by DME, DSME and the proposed are as shown in Fig. 4. The 78th frame in foreman sequences is utilized as a reference picture, so we can compare the visual effect of it with interpolated frames.

As (b) and (c) of the Fig. 4 show, there exist the blurry-problems on the face of the person. It means that DME and DSME are inaccurate especially in details such as eyes, the nose and the mouth. Compared to DME and DSME, the interpolated frame of the proposed, that the picture (d), is much more clear and similar to the original picture (a). Furthermore, PSNR values of various methods also indicate that the proposed motion estimation outperforms DME and DSME.

Table 3. Average running time of various methods in 10 test sequences

3.2 Objective Evaluation

In objective evaluation, a series of experiments are performed for testing the performance of 6 methods of motion estimation. The 6 types of motion estimation are BME, FBJME, DME, DSME, LQME and the proposed, and they are used together in 10 test video sequences. Three numerical indexes including PSNR, SSIM and running time will be considered.

As shown in Table 2, it is firmly convinced that the proposed motion estimation outperforms other compared methods in consideration of average PSNR and SSIM values. In addition, the proposed approach has outstanding performance especially in the sequences of Football and Soccer. It means that ADMF is an excellent algorithm of motion refinement, which can effectively refine wrong motion vectors in scenes that objects move fast.

For the purpose of comparing the efficiency of motion estimation, these methods that FBJME, DME, DSME and the proposed will be analysed in comprehensive consideration of average PSNR, average SSIM and average running time. The running time means the time of generating every interpolated frame. As shown in Table 3, the proposed motion estimation is the most efficient algorithm in contrast to the compared methods.

4 Conclusion

This paper has proposed a novel method of motion estimation based on block matching and motion refinement for frame interpolation. Firstly, the proposed framework consists of coarse searching and fine searching using Bidirectional Motion Estimation. The framework has been proven to be efficient due to requiring only low computation. Secondly, Angular-Distance Median Filter as an excellent algorithm of motion refinement has been verified that it can effectively correct wrong motion vectors. Thirdly, our proposed motion estimation has been analysed in overall consideration of PSNR, SSIM, running time and different scenes. Fourthly, experimental results have shown that the performance of the proposed method outperforms the other compared techniques for frame interpolation in both subjective and objective evaluation.

In the research of Frame Rate Up-Conversion, how to get true motion vectors in motion estimation is our main focus. In addition, how to generate interpolated frames in Motion-Compensated Interpolation still need to be studied. Furthermore, it is also interesting to implement frame interpolation in other frameworks, e.g., the phase-based method [10] and the method based on convolution neural network [11].