Keywords

1 Introduction

Performance of any pattern recognition system is dependent on informative and discriminative power of the extracted features [26, 32]. Effectiveness in extraction of discriminative features is a vital consideration in pattern recognition [30]. There are two broad categories of face image representation techniques; holistic and local. Holistic face representation techniques like Principal Component Analysis (PCA) [27] and Linear Discriminant Analysis (LDA) [4] have been proved successful for various pattern recognition application domains. Limitation of such holistic features is that they often fail in small sample size problems. Local face descriptors include Local Binary Patterns (LBP) [16,17,18], Local Directional Patterns (LDP) [9], local ternary patterns (LTP) [25], Gabor wavelets [7], speeded-up robust features (SURF) [3], scale-invariant feature transform (SIFT) [14], histogram of oriented gradient (HOG) [6] among others. A good image representation technique should encode more between-class discriminative information with low intra-class variations.

Local texture descriptors have gained popularity in computer vision and pattern recognition research community due to their robustness to illumination and pose variations. Penev and Atick [21] proposed a second-order statistical technique called local feature analysis (LFA). LFA uses a set of local-topological fields for local feature extraction. Gabor wavelet [7] filters texture and wrinkle features using a sinusoidal plane at different orientations, frequencies and scales. Its ability to encode spatial information of an object makes it suitable for local feature extraction [30]. Gabor wavelets have frequency and orientation selectivity witnessed in visual cortex of mammals. This has led to Gabor wavelets intensive use in extraction of bio-inspired features (BIF). Local binary patterns [16, 17] is a first-order local texture descriptor that has been applied successfully to various pattern recognition problems like face recognition [2], facial expression analysis [31] and age estimation. LBP is not robust to noise [9, 10] and the first-order information it encodes fails to capture detailed information from a given image [30]. LTP improves LBP robustness to noise by using a threshold to filter out possible noise before encoding texture features. Although LTP improves performance of LBP, still it does not capture detailed information of the image. LDP [9, 10] represents an image by encoding directional responses at each pixel. The top k significant responses are used to derive LDP code of the reference pixel. LDP only considers top k responses and ignores the remaining \((8-k)\) responses. In LDP, a significant response denotes the presence of an edge and consequently less significant response denotes the absence of an edge an this ought to be captured. Absence of edge signifies a relatively constant surface. Encoding this across the image could help in tracking invariant features that could help improve recognition accuracies.

This paper proposes local ternary directional patterns (LTDP) operator for texture description. For every pixel in an image, LTDP considers directional responses to eight directions for encoding image gradient. The probability of a response appearing is calculated based on the absolute value of the responses. The threshold \(\tau \) used to generate LTDP code is adaptive to the local region of the image being encoded. This makes LTDP operator not only adaptive but also appropriate for all images in a dataset or all experimental datasets since the threshold is dynamic.

The rest of the paper is organized as follows. Section 2 presents a review of operators related to LTDP. Section 3 discusses details of how LTDP operator encodes image texture while Sect. 4 outlines the experiments done to illustrate robustness of LTDP operator. Section 5 is devoted to results and discussion of LTDP performance in age estimation and Sect. 6 concludes the study and gives some recommendations.

2 Related Work

2.1 Local Binary Patterns

Texture features have been extensively used in age estimation techniques [20]. LBP is a texture description technique that can detect microstructure patterns like spots, edges, lines and flat areas on the skin [16]. LBP is used to describe texture for face recognition, gender classification, age estimation, face detection, face and facial component tracking. Gunay and Nabiyev [8] used LBP to characterize texture features for age estimation. They reported accuracy of 80% on FERET [19] dataset using nearest neighbor classifier and 80–90% accuracy on FERET and PIE datasets using AdaBoost classifier [29]. Figure 1 shows sample \(3\times 3\) LBP operation.

Fig. 1.
figure 1

LBP operation with P = 8 R = 1. (a) Sample image region (b) Thresholding (c) Resultant LBP code.

LBP code is created using the function \(LBP_{P,R}\) defined as

$$\begin{aligned} LBP_{P,R}\left( x_c,y_c\right) = \sum _{n=0}^{N-1}2^n \tau \left( g_n-g_c\right) \end{aligned}$$
(1)

where the thresholding function \(\tau \) is defined as

$$\begin{aligned} \tau \left( x\right) = {\left\{ \begin{array}{ll} 1 &{} \text {if } x \ge 0 \\ \\ 0 &{} \text {otherwise } \end{array}\right. } \end{aligned}$$
(2)

N is the number of neighboring pixels, R is distance of each neighboring pixel from center pixel, \(g_c\) is gray-value of center pixel, \(g_n\) for \(n=0,1,2,\dots N-1\) correspond to gray value of neighboring pixel on circular symmetric neighborhood of distance \(R>0\). Concatenating all 8 bits gives a binary code of \((x_c, y_c)\). The resulting binary code is converted to decimal representation and allocated to central pixel as its LBP code. A histogram of LBP-encoded image \(f\left( x, y\right) \) is used to represent micro-pattern structures like spots, edges, corners and flat regions. This histogram is encoded as

$$\begin{aligned} H_i = \sum _{x, y}I\left( f\left( x, y\right) , i\right) , i = 0, 1, 2\dots , 2^p - 1 \end{aligned}$$
(3)

where p denotes number of patterns that can be encoded by the LBP operator and I is defined as

$$\begin{aligned} I(a,b) = {\left\{ \begin{array}{ll} 1, &{}\text {if a = b} \\ \\ 0 &{}\text {otherwise} \end{array}\right. } \end{aligned}$$
(4)

Ojala et al. [17] found that when using 8 neighbors and radius 1, 90% of all patterns are made up uniform patterns. The original LBP operator had limitation in capturing dominant features with large scale structures. The operator was latter extended to capture texture features with neighborhood of different radii [17]. Neighborhood is defined by a set of sampling pixels distributed evenly around the pixel to be labeled. Bilinear interpolation of points that fall outside the neighborhood is done to allow any radii and any number of sampling pixels.

Uniform patterns may represent microstructures as line, spot, edge or flat area. Ojala et al. [16] further categorized LBP codes as uniform and non-uniform patterns. LBP pattern with utmost two bitwise transition from 0 to 1 or 1 to 0 is categorized as a uniform pattern. For instance, 00000000, 00010000 and 11011111 patterns are uniform while 01010000, 11100101,  and 10101001 are non-uniform patterns. In order to extract rotational invariant features using LBP, the generated LBP code is circularly rotated until its minimum value is obtained [15].

Extended LBP operator could capture more texture features on an image but still it could not preserve spatial information about these features. Ahonen et al. [1] proposed a technique of dividing a face image into n cells. Histograms are generated for each cell then concatenated to a single spatial histogram. Spatial histogram preserves both spatial and texture description of an image. Image texture features are finally represented by histogram of LBP codes. LBP histogram contains detailed texture descriptor for all structures on the face image like spots, lines, edges and flat areas.

2.2 Local Ternary Patterns

Local Binary Pattern is sensitive to illumination and noise. LTP [25] seeks to improve robustness of image features in a fairly uniform region. LTP extends LBP to a 3-value code by comparing pixel values of the neighboring pixels with a preset threshold value \(\xi \). The code 0 is assigned to values within \(\pm \xi \), 1 is assigned to values above \(\xi \) while \(-1\) is assigned to values below \(\xi \). The thresholding function is defined as

$$\begin{aligned} f(x_i, x_c, \xi ) = {\left\{ \begin{array}{ll} 1 \text { if } x_i \ge x_c + \xi \\ \\ 0 \text { if } |x_c - x_i| < \xi \\ \\ -1 \text { if } x_i \le x_c - \xi \end{array}\right. } \end{aligned}$$
(5)

where \(\xi \) is a preset threshold, \(x_c\) is the value of the central pixel and \(x_i\) for \(i=0, 1, 2\dots 7\) are the neighboring pixels of \(x_c\). Although this extension makes LTP robust to noise and encode more patterns, it is not easy to practically select an optimum \(\tau \) for all images in a dataset or for all datasets and the resultant code is not invariant to pixel value transformations. LTP can encode \(3^8\) patterns. LTP codes are divided into positive and negative parts and a histogram is generated for each part. These histograms are concatenated and used as feature descriptor for pattern recognition. Figure 2 shows LTP codes for a \(3 \times 3\) sample image region.

Fig. 2.
figure 2

LTP code with \(\xi = \pm 5\) and corresponding positive and negative LBP codes. (a) Original image (b) LTP code (\(\xi =\pm 5\)) (c) Negative LBP code. (d) Positive LBP code.

2.3 Local Directional Patterns

LBP [18] was found to be unstable to image noise and variations in illumination. Jabid et al. [9] proposed LDP which is robust to image noise and non-monotonic variations in illumination. Figure 3 shows robustness of LDP operator to noise compared to LBP.

Fig. 3.
figure 3

Robustness of LDP compared to LBP. (a) Original image (b) Noisy image.

Local Directional Patterns compute 8-bit binary code for each pixel in the image by comparing edge response of each pixel in different orientations instead of comparing raw pixel intensities as LBP. Kirsch [11], Prewitt [23] and Sobel [24] are some of edge detectors that can be used [22]. Kirsch edge detector has gained popularity because it detects 8-directional edge responses more accurately compared to others [12].

Kirsch Edge Detector. Kirsch operator is a first-order derivative edge detector that gets image gradients by convolving \(3 \times 3\) image regions with a set of masks. Kirsch defines a nonlinear edge detector technique as [22]:-

$$\begin{aligned} P\left( x, y\right) = \max \left\{ 1, \max _{k=0}^7\left[ |5S_k - 3T_k|\right] \right\} \end{aligned}$$
(6)

where

$$\begin{aligned} S_k = P_k + P_{k+1} + P_{k+2} \end{aligned}$$

and

$$\begin{aligned} T_k = P_{k+3} + P_{k+4} + P_{k+5} + P_{k+6} + P_{k+7} \end{aligned}$$

where P(xy) is the Kirsch gradient, a in \(P_a\) is evaluated as \( a \% 8\) and \(P_k[k = 0, 1, 2\dots ,7]\) are eight neighboring pixels of P(xy) as shown in Fig. 4.

Fig. 4.
figure 4

(a) Eight neighbors of pixel p(xy) (b) corresponding Kirsch mask

The Kirsch gradient in a particular direction is found by convolving \(3 \times 3\) image region with the respective mask \(M_k\). Figure 5 shows Kirsch Masks (kernels) for 8 directions.

Given a pixel \(P\left( i, j\right) \) in an image, 8-directional responses are computed by convolving the neighboring pixels, \( 3 \times 3\) image region, with each of the Kirsch masks. For each pixel, there will be 8 directional response values. Presence of an edge or a corner will show high (absolute) response values in that particular direction. The interest of LDP is to determine k significant directional responses and set their corresponding bit-value to 1 and set the rest of \(8-k\) bits to 0. The resulting 8-bit binary string is converted to decimal and assigned to the P(ij) pixel. This process is repeated for all pixels in the image to obtain LDP representation of the image. Figure 6 shows process of encoding an image using LDP operator.

Fig. 5.
figure 5

Kirsch Masks in eight directions

Fig. 6.
figure 6

Process of encoding an image with LDP operator with \(k=3\) (a) Result of convolving each pixel in Fig. 3(a) with 8 Kirsch masks in Fig. 5(b) Pick top \(k=3\) significant responses, set there corresponding bit to 1 and the rest to 0 (c) Resultant LDP code.

Given an image region as shown in Fig. 3(a), Kirsch masks application responses are obtained by convolving \(3 \times 3\) image region with each of the Kirsch masks shown in Fig. 5. The absolute values of the directional responses are arranged in descending order. The \(LDP_k\) code is then calculated as

$$\begin{aligned} LDP_k = \sum _{i=0}^{i=7}\tau \left( m_i - m_k\right) \times 2^i \end{aligned}$$
(7)

where \(m_k\) is the \(k^{th}\) significant directional response, and \(\tau \) is defined in (2).

For \(k=3\), LDP operator generates \(C_3^8=\frac{8!}{3!\times \left( 8-3\right) !}=56\) distinct patterns in the LDP encoded image. A histogram \(H\left( i\right) \) with \(C_k^8\) bins can be used to represent the input image of size \(M \times N\) as:-

$$\begin{aligned} H_i=\sum _{m=0}^{M-1}\sum _{n=0}^{N-1}I\left( LDP_k\left( m,n\right) , i\right) \end{aligned}$$
(8)

where I is defined in (4) and \(0\le i \le C_k^8\). The resultant histogram has dimensions \(1 \times C_k^8\) and is used to represent the image. The resultant feature has spots, corners, edges and texture information about the image [10]. The limitation of LDP with \(k=3\) is that it uses responses of at most 3 directions out of the possible 8 directions. These directional responses could possibly be one sided as South-East, East and North-East. The eight directional responses could be paired as in [12] and guarantee that each directional response will be used to determine the image gradient.

3 Local Ternary Directional Patterns

LTP uses a static user defined threshold \(\xi \) for all images in a dataset or for all experimental datasets making it not invariant to pixel value transformations. It is not practically easy to select an optimum value for \(\xi \) in real application domains. The value of \(\xi \) should be adaptive to different image conditions and datasets. LDP only considers top k directional responses and disregards the rest of \(8-k\) responses in encoding image gradient. Furthermore, LDP does not consider current reference pixel when calculating the image gradient. The presence of an edge is depicted by sharp difference between a pixel and its neighbors [22]. LDP encodes image gradient without considering the central pixel thereby “capturing” an image edge even where there is non. This results into possible lost of discriminative information. In this section, we propose LTDP operator that considers central reference pixel and all directional responses in encoding image gradient. LTDP operator uses an adaptive \(\xi \) that depends on the directional responses of the image region.

Local Ternary Directional Patterns compute eight directional responses using Kirsch masks. Given a \(3 \times 3\) image region, LTDP first determines the differences in pixel intensities between central pixel and its neighboring pixels. The absolute magnitude of the difference is set as the edge difference of the respective pixel as

$$\begin{aligned} P_{i, j} = |P_{i,j} - P_c| \end{aligned}$$
(9)

where \(P_{i, j}\) is the pixel value at index (ij) and \(P_c\) is the pixel value of the central pixel. Figure 7 shows an example of calculating differential directional responses.

Fig. 7.
figure 7

Differential LDP responses. (a) Image region (b) Differential values (c) Differential directional LDP responses

Responses are then normalized before being used to generate LTDP code. Min-max normalization is done as

$$\begin{aligned} x_i^{norm} = \frac{x_i - min}{max - min} \end{aligned}$$
(10)

where \(x_i\) is the absolute value of respective responses for \(i=1, 2,\dots 7\), min and max are minimum and maximum responses respectively and \(x_i^{norm}\) is the normalized value of \(x_i\). The normalized responses are in the range of 0.0 and 1.0 which signify the probability of an edge from the central reference pixel stretching towards respective direction.

Threshold \(\xi \) is set to \(\pm 0.1667\) deviation from 0.50 value. The value 0.5 is selected as offset reference value for \(\xi \) because it shows equal chance of there being an edge or not. The value of \(\xi \) is chosen to ensure the probability space is divided into 3 equal segments, one for each ternary bit. If the normalized response value is greater or equal to \(0.5+\xi \), its corresponding bit is set to \(+1\), if the normalized response value is less or equal to \(0.5-\xi \), its corresponding bit is set to \(-1\), and the corresponding bit is set to 0 if the normalized response is between \(0.5-\xi \) and \(0.5+\xi \) as

$$\begin{aligned} f(x_i) = {\left\{ \begin{array}{ll} 1 &{}\text { if}\quad x_i^{norm} \ge 0.50 + \xi \\ \\ 0 &{}\text { if}\quad 0.50 - \xi< x_i^{norm} < 0.50 + \xi \\ \\ -1 &{}\text { if}\quad x_i^{norm} \le 0.50 - \xi \end{array}\right. } \end{aligned}$$
(11)

Figure 8 shows the process of encoding an image with the proposed LTDP operator.

Fig. 8.
figure 8

Process of encoding an image with LTDP operator. (a) Normalization of responses in Fig. 7c (b) Assigning LTDP code at \(\xi = 0.5\pm 0.1667\).

The presence of an edge towards a particular direction is signified by not only significant differential directional response towards that direction but also significant differential directional response of one of its neighboring direction. A differential directional response is significant if its value d is greater than \(\bar{m}=0.5 \times m + \xi \) where m is the maximum differential directional response of the local region. Differential directional responses closer to \(\bar{m}\) are coded as being invariant relative to central pixel hence there corresponding bit set to 0. The differential directional response further away below \(\bar{m}=0.5 \times m - \xi \) are coded as having a negative image gradient hence there corresponding bit set to \(-1\) and those further away above \(\bar{m}\) are coded as having positive image gradient hence there corresponding bit set to 1. Each LTDP is split into its corresponding negative and positive segments as shown in Fig. 9.

Fig. 9.
figure 9

Resultant LDP codes from the LTDP code. (a) Positive LDP code (b) Negative LDP code.

These codes are converted to decimal and assigned to corresponding central pixel of positive and negative LTDP encoded images respectively. A histogram is generated for for both negative and positive LTDP encoded images as

$$\begin{aligned} H_i = \sum _{x, y}I\left( f\left( x, y\right) , i\right) , i = 0, 1, 2\dots , 2^p - 1 \end{aligned}$$
(12)

where p is the number of patterns that can be encoded by the LDP operator (positive and negative) and I is defined in (4).

The resultant positive and negative histograms are concatenated and used as LTDP feature for pattern recognition. The histograms can be trimmed down by taking only uniform patterns into respective bins and put the rest of non-uniform patterns into one bin. A pattern is uniform if it contains utmost 2 transitions from 0 to 1 or vice versa. For n-bit patterns, the total number of uniform patterns is

$$\begin{aligned} P_U = n(n - 1) + 2 \end{aligned}$$
(13)

where n is the number of bits used to represent the patterns. LTDP generates \(8(8-1) + 2 = 58\) uniform patterns for both negative and positive LTDP encoded images. The resultant histogram could have 59 bins with 58 bins storing uniform patterns while the \(59^{th}\) bin storing all non-uniform patterns. These two histograms are concatenated to form final LTDP feature vector.

4 Experiments

Experiments are performed on FG-NET aging dataset to evaluate performance of LTP, LDP and LTDP operators in age estimation. Hybrid approach that consists of between-group classification followed by within-group regression is adopted. Multilayer Perceptron (MLP) [5] Artificial Neural Network (ANN) is used to classify an input image into age group before using SVR regressor for exact age estimation within each age-group. We use SVR-RBF since it can model complex aging patterns for large age ranges [28].

4.1 Dataset

FG-NET aging dataset was used to evaluate age estimation using LTDP. FG-NET has 1002 images of 82 subjects aged between 0 and 69 years. Images have wide variation in illumination, color and expression. Some images have poor quality since they were scanned.

4.2 Feature Extraction

Face region was detected from an input image using Haar-cascade face detection classifier [13]. The face is then cropped, converted to gray scale and resized to \(120\times 120\) pixels. The gray scale face image is smoothened using Gaussian filter. The face is then encoded using LTDP operator. Figure 10 shows image encoded with LTDP, LTP and LDP operators.

Fig. 10.
figure 10

Image encoded with LTDP, LTP and LDP operators. (a) Input image (b) Resultant positive LTDP image (c) Resultant negative LTDP image (d) Resultant positive LTP image \(\xi =\pm 3\) (e) Resultant negative LTP image \(\xi =\pm 3\) (f) Resultant LDP image \(k=3\)

A histogram is generated for each of these images. The positive and negative histograms are concatenated and used as a feature vector for age estimation. The dimensionality of the resultant feature vector is reduced using LDA.

4.3 Validation and Evaluation Protocol

We use Leave-One-Person-Out (LOPO) validation protocol to evaluate LTP, LDP and LTDP based age estimation techniques. In LOPO, in each iteration, images of one person are left out to be used as test images while images of the rest of the subjects are used to learn a model. Two commonly used measures of age estimation technique performance are; Cumulative Score (CS) and Mean Absolute Error (MAE). MAE is average of absolute errors between estimated age and actual age defined as

$$\begin{aligned} MAE = \frac{1}{N}\sum _{i=1}^{N}|a_i - \bar{a}_i| \end{aligned}$$
(14)

where N is size of the test set, \(a_i\) is ground truth age of image i, and \(\bar{a}_i\) is the estimated age of image i. CS is formulated as

$$\begin{aligned} CS(x) = \frac{N_{e\le x}}{N} \times 100\% \end{aligned}$$
(15)

where \(N_{e\le x}\) are images on which LBP, LDP and SOR-LDP age estimation techniques make an absolute error of less than x years error tolerance and N is size of test set. Our error tolerance x was 5 years.

5 Results and Discussion

FG-NET aging dataset is split into 7 age-groups of 10 years. The first age-group is 0–9 years and last age-group is 60–69 years. Table 1 shows MAE error achieved in each group for LTDP, LDP and LTP operators.

Table 1. Age-group MAE (years) comparison using LTDP, LDP and LTP

As shown in Table 1, LTDP achieved MAE of 2.72 years in age-group 0–9 compared to 2.83 and 2.94 achieved by LDP and LTP respectively. In age-group 10–19, LTDP achieved MAE of 3.26 compared to 3.47 achieved by LDP and 3.71 achieved by LTP in the same age-group. With 144 images in age-group 20–29, LTDP achieved MAE of 3.41 years compared to 4.20 and 4.04 achieved by LDP and LTP respectively. Performance of all three operators deteriorates drastically as from age-group 30–39 due to drastic decrease in number of images per group. Nevertheless, LTDP performed better than LDP and LTP in age-group 30–39 by achieving MAE of 7.58 years compared to 8.74 achieved by LDP and 8.98 achieved by LTP. The performance is poorest in age-group 60–69 because the dataset used has only 8 images in this age-group, which are not sufficient to learn any aging pattern. LTDP achieved overall MAE of 4.35 which is superior relative to 5.12 achieved by LDP and 5.74 achieved by LTP.

It is evident from the experiments that LTDP encodes more discriminative local texture features compared to LDP and LTP. LTDP improves age estimation accuracies by MAE of 0.77 compared to LDP and by 1.42 compared to LTP. The accuracy of LTDP could be attributed to involvement of central pixel as well considering all eight directional responses in calculating image gradient. This shows that all the responses as well as central pixel are vital in achieving age discriminative local texture features.

6 Conclusion and Recommendation

LTDP is proposed for local texture feature extraction. The magnitude of the neighboring pixels is determined by the difference of their values and the central reference pixel. Kirsch masks are applied to these difference in pixel values to obtain directional responses. The directional responses are min-max normalized to obtain the probability of an edge stretching towards a particular direction. Applying a threshold to this probability space, LTDP code is found and used to obtain positive and negative LDP images. Histograms of positive and negative LDP images are concatenated to obtain texture feature for pattern recognition. Experimental results on FG-NET aging dataset show that LTDP outperforms LTP and LDP in age estimation. Further research is required to make the threshold used in LTDP more adaptive to local image region for effective extraction of more discriminative features.