Keywords

1 Introduction

The literature approaches the problem of texture characterization through several methods. To classify an image, visual attributes presented in the image need to characterize the image singularly or highlight characteristics that are found in a class of images, making them distinguishable from other classes.

According to [4], natural image statistics is an extremely important field in several areas such as computer vision, statistics, neuroscience and physiology. The authors proposed the study of the local behavior of images by analyzing a space of small high-contrast regions (patches) extracted from the images.

The authors [9] present a novel framework for estimating and representing the distribution around low dimensional submanifolds of pixel space using the Klein Bottle space.

This work proposes a new approach to determine the feature vector of texture characterization via projections of patches on the topology of the Klein Bottle. A new configuration of cut-off frequency was proposed based on the definitions of [9] for projections in the Klein Bottle.

Experiments in classification show that the method is robust for texture classification and provides very high accuracy for several texture databases, outperforming other state-of-the-art descriptors, while reducing a number of dimensions on the feature vector. In Sect. (2, 2.1, 2.2), the results of Lee et al. ([8]), Carlson et al. ([4]) on the Klein Model \(\mathcal {K}\) are reviewed. In Sect. (2.3) the achievement of an estimation for the probability density function of the projected space in [9] is reviewed and new configurations for generating the descriptors are presented. In Sect. (3), the results of this work are stated. Finally, in Sect. (4) the results are discussed.

2 Overview of the Multi-scale Invariant Descriptor Process

The simplified steps created to generate a multi-scale invariant descriptor are illustrated in Fig. 1. Initially the image is selected (A) and the patches are extracted (B). These patches are projected onto the space of the Klein Bottle (C). The calculation of the Estimated K-Fourier Coefficients are based on the projected patches. After choosing the cut-off frequency to construct the estimated probability function, the vector descriptor EKFC is created (D). This process is executed over all images of the Texture Bases. Following on, this set of descriptor vectors is submitted for classification in order to analyze the accuracy of the method (E).

Fig. 1.
figure 1

Flow of processing steps.

2.1 Patches

Initially, patches (E) are extracted from the image (Fig. 1B) based on [8], in which patches with \(n \times n\) pixel dimensions are extracted. Then, 5.000 patches are randomly selected.

For each one of these selected patches \(E=[e_{ij}]\), the log is applied generating \(A=[a_{ij}]\) with \(a_{ij}=\ln {(e_{ij})}\) and the D-norm are calculated as \(\left\| A \right\| _D^2=\sum _{ij \sim kl}(a_{ij}-a_{kl})^2,\) with \(a_{ij} \sim a_{kl}\) if, and only if \(\left| i-k \right| +\left| j-l \right| \le 1\).

Only those patches with D-norm greater than or equal to a given threshold are considered (the authors [9] considered 0.01). As such, the 1.000 highest D-norm patches or the remaining patches are selected.

For each one of these patches, the average is subtracted and then normalized:

$$\begin{aligned} \mathbf {P}= \frac{\displaystyle A-\frac{1}{n^2}\sum _{j=1}^{n}\sum _{i=1}^ n a_{ij}B }{\displaystyle \left\| A-\frac{1}{n^2}\sum _{j=1}^{n}\sum _{i=1}^n a_{ij}B\right\| _D} \end{aligned}$$
(1)

where if B is the \(n\times n \) matrix with all elements equals 1.

2.2 Projection

To create the projection (Fig. 1C), the authors herein used the Klein Bottle space \(\mathcal {K}\) obtained from the rectangle \(R = \begin{bmatrix} \frac{\pi }{4} , \frac{5\pi }{4} \end{bmatrix}\) x \(\begin{bmatrix} \frac{-\pi }{2} , \frac{3\pi }{2} \end{bmatrix}\), where each point \(\left( \alpha ,\frac{-\pi }{2} \right) \) is identified with \(\left( \alpha ,\frac{3\pi }{2} \right) \) and each \(\left( \frac{\pi }{4}, \theta \right) \) is identified with \(\left( \frac{5\pi }{4}, \pi - \theta \right) \) for all \(\left( \alpha , \theta \right) \in R\). So, the space \(\mathscr {K}\) with the Topology of the Klein Bottle was created. Figure 1(C) exemplifies the patches projected onto the space \(\mathscr {K}\).

Each patch can be parameterized by the direction \(\alpha \in \left[ \frac{\pi }{4}, \frac{5\pi }{4}\right) \) and by the transition of the bar/edge structure defined by the angle \(\theta \). Different pairs of \((\alpha ,\theta )\) can describe the same sample, e.g., \(\left( \frac{\pi }{4},0 \right) \) and \(\left( \frac{5\pi }{4}, \pi \right) \) describe the edge steps with direction of the gradient toward the northwest.

To obtain \((\alpha ,\theta ) \in \mathscr {K}\), which determines a single patch, one adds \(\displaystyle a + ib = e^{i\alpha }\) and \(\displaystyle c + id = e^{i\theta }\).

These numbers are obtained from the fact that the intensity function is approximated by a polynomial p

$$\begin{aligned} p(x,y) = c \frac{(ax+by)}{2}+d\frac{\displaystyle \sqrt{3}(ax+by)^2}{4},\;\;c^2+d^2=1. \end{aligned}$$
(2)

We then can think the patch \(E=e_{ij}\) as coming from this polynomial via the local averaging

$$\begin{aligned} e_{ij}= \int \limits _{1-\frac{2i}{n}}^{1-\frac{2i-2}{n}} \int \limits _{-1+\frac{2j-2}{n}}^{-1+\frac{2j}{n}} \;\; p(x,y)\;dxdy, \;\;\;i,j=1,...,n. \end{aligned}$$
(3)

and \(\mathbf {P}= \left[ p_{ij} \right] \) is the centered log normalization of \(E=e_{ij}\).

The \(\nabla I_\mathbf {P}\) (gradient intensity function) is sectionally constant and equal to the discrete gradient of \(\mathbf {P}\), and through such, \(\nabla I_\mathbf {P} \cong \nabla \mathbf {P}\). The centralized discretization was used.

If \(2 \le i\), \(j \le n-1\) then

$$\begin{aligned} \mathbf {\nabla P}(i,j)= \frac{1}{2} \begin{bmatrix} p_{i+1,j} -p_{i-1,j} \\ p_{i,j+1}-p_{i,j-1} \end{bmatrix} \end{aligned}$$
(4)
$$\begin{aligned} H\mathbf {P}(i,j)=\begin{bmatrix} p_{i+1,j}-2p_{i,j}+p_{i-1,j}&H_{xy}\mathbf {P}(i,j) \\ H_{xy}\mathbf {P}(i,j)&p_{i,j+1}-2p_{ij}+p_{i,j-1} \end{bmatrix} \end{aligned}$$
(5)

where

$$\begin{aligned} H_{xy}\mathbf {P}(i,j)=\frac{p_{i+1,j+1}-p_{i-1,j+1}-p_{i+1,j-1}+p_{i-1,j-1}}{4}. \end{aligned}$$
(6)

If \(i \in \{1,n\}\) or \(t \in \{1,n\}\), then there is a single \((i,j) \in \{2,...,n-1\}^2\) that minimizes \(\left| r-i \right| + \left| t-j \right| \), with

$$\begin{aligned} \mathbf {\nabla P}(r,t)=\mathbf {\nabla P}(i,j)+H\mathbf {P}(i,j)\begin{bmatrix} r-i\\t-j \end{bmatrix} \end{aligned}$$
(7)

Using the first order Taylor expansion, the approximation of \(\mathbf {P}\) at a location (rt) near (ij) is calculated. For the gradient expansion \(I_\mathbf {P}\), let \(\nabla I_\mathbf {P}(x,y)=\mathbf {\nabla P}(i,j)\) if

$$\begin{aligned} \left| x-\left( -1+\frac{2j-1}{n} \right) \right| +\left| y-\left( 1-\frac{2i-1}{n} \right) \right| <\frac{1}{n} \end{aligned}$$
(8)

for some \((i,j)\in \left\{ 1,...,n\right\} ^2\), and \(\mathbf {0}\) otherwise.

If the eigenvalues of \(C_\mathbf {P}(i,j)=\iint _{[-1,1]^2}\frac{\partial I_\mathbf {P}}{\partial x_i}\frac{\partial I_\mathbf {P}}{\partial x_j}\;dxdy\;\;\;i,j=1,2,...,n.\) (obtained explicit from \(\mathbf {\nabla P}\) through the quadratic form in Eq. (4) and discretized as in Remark 3.2 of [9]) are real and different, then \(\alpha _\mathbf {P}\in \left[ \frac{\pi }{4},\frac{5\pi }{4}\right) \) is defined as the direction of the eigenspace corresponding to the highest eigenvalue, or patch is discarded otherwise. a and b are so that \(a+ib=\cos {\alpha _\mathbf {P}}+i\sin {\alpha _\mathbf {P}}=e^{i\alpha _\mathbf {P}}\).

Let \( \left\langle f,g \right\rangle =\iint _{[-1,1]^2}\left\langle \nabla f(x,y),\nabla g(x,y) \right\rangle dxdy \;\; \) denote the inner product inducing the D-norm \(\left\| . \right\| _D\). If \(u=\frac{(ax+by)}{2}\), then the vector \(\begin{bmatrix} c*\\ d* \end{bmatrix} \in S^1\) that minimizes \( \varPhi (c,d)=\left\| I_\mathbf {P}-(cu+d\sqrt{3}u^2) \right\| _D ,\) with \(c^2+d^2=1\) is given by

$$\begin{aligned} c^*=\frac{\left\langle I_\mathbf {P},u \right\rangle _{D}}{\sqrt{\left\langle I_\mathbf {P},u \right\rangle ^2_{D}}+3\left\langle I_\mathbf {P},u^2 \right\rangle ^2_{D}} \;\;\; d^*=\frac{\sqrt{3}\left\langle I_\mathbf {P},u^2 \right\rangle _{D}}{\sqrt{\left\langle I_\mathbf {P},u \right\rangle ^2_{D}}+3\left\langle I_\mathbf {P},u^2 \right\rangle ^2_{D}} \end{aligned}$$
(9)

whenever

$$\begin{aligned} \varphi (I_\mathbf {P},\alpha _\mathbf {P} )=\left\langle I_\mathbf {P},u \right\rangle ^2_{D} + 3\left\langle I_\mathbf {P},u^2 \right\rangle ^2_{D} \ne 0 \end{aligned}$$
(10)

and it determines a unique \(\theta _\mathbf {P} \in \left[ \frac{-\pi }{2}, \frac{3\pi }{2} \right) \) so that \(c^*+id^*=\cos (\theta _\mathbf {P})+i\sin {\theta _\mathbf {P}}=e^{i\theta _\mathbf {P}}\).

If

$$\begin{aligned} \varPhi (c^*,d^*)= & {} \left\| I_\mathbf {P}-(c^*u+d^*\sqrt{3}u^2) \right\| _D =\sqrt{2\left( 1-\sqrt{\varphi (I_\mathbf {P},\alpha _\mathbf P )}\right) }, \end{aligned}$$
(11)

then \( \varPhi (c^*,d^*)\) can be seen as the distance from \(\mathbf {P}\) to \(\mathcal {K}\) and \( \varPhi (c^*,d^*) \le \sqrt{2}\). So \(\varphi (I_\mathbf {P},\alpha _\mathbf {P})\) exists in the sample \(S \subset \mathcal {K}\) if \( \varPhi (c^*,d^*)\le r_n\), where \(r_n\) are the set, so that \(\sqrt{\varphi }\ge \frac{1}{2^{n-1}}\).

After finding a,b, c and d for a patch \(\mathbf {P}\), one can obtain \(\displaystyle a + ib = e^{i\alpha _\mathbf {P}}\) and \(\displaystyle c + id = e^{i\theta _\mathbf {P}}\).

Using \((\alpha _\mathbf {P}, \theta _\mathbf {P})\) for each selected patch, the projection on \(\mathscr {K}\) is made.

2.3 EKFC Descriptor

With these patches projected, the calculation for the Estimated K-Fourier Coefficients can be made, with the estimated \(\widehat{f}\) corresponding to the probability density function \(f:K\rightarrow \mathbb {R}\); \(\widehat{f}(\alpha ,\theta )=\sum _{k \in N_{\omega }} \widehat{f_{k}} \phi _{k}(\alpha ,\theta )\) where \(\begin{Bmatrix} \phi _{k} \end{Bmatrix}_{k \in \mathbb {N}}\) is a trigonometric base for \(L^{2}(K,\mathbb {R})\).

Let \(\varPi _{n,m}=\frac{\left( 1-(-1)^{n+m}\right) \pi }{4}\) and N be the number of projected patches; then the trigonometric base \(\left\{ \phi \right\} \) for \(L^2(K,\mathbb {R})\) is:

$$\begin{aligned}&1,\;\;\sqrt{2}\cos {\left( m\theta -\varPi _{0,m}\right) },\;\; \sqrt{2}\cos {\left( 2n\alpha \right) },\;\;\sqrt{2}\sin {\left( 2n\alpha \right) },&\\ {}&2\cos {\left( n\alpha \right) }\cos {\left( m\theta -\varPi _{n,m}\right) },\;\; 2\sin {\left( n\alpha \right) }\cos {\left( m\theta -\varPi _{n,m}\right) };&\nonumber \end{aligned}$$
(12)

and the Estimated K-Fourier Coefficients are:

$$\begin{aligned} \widehat{a}_m= & {} \frac{1}{N}\sum _{k=1}^{N}\sqrt{2}\cos {\left( m\theta _k-\varPi _{0,m}\right) },\; \widehat{b}_n=\frac{1}{N}\sum _{k=1}^{N}\sqrt{2}\cos {\left( 2n\alpha _k\right) },\; \widehat{c}_n=\frac{1}{N}\sum _{k=1}^{N}\sqrt{2}\sin {\left( 2n\alpha _k\right) },\\ \widehat{d}_{n,m}= & {} \frac{1}{N}\sum _{k=1}^N2\cos {\left( n\alpha _k\right) }\cos {\left( m\theta _k-\varPi _{n,m}\right) },\; \widehat{e}_{n,m}=\frac{1}{N}\sum _{k=1}^N2\sin {\left( n\alpha _k\right) }\cos {\left( m\theta _k-\varPi _{n,m}\right) }; \end{aligned}$$

where the summation is over all N \((\alpha _\mathbf {P}, \theta _\mathbf {P})\) selected patches.

If we order them with respect to their (total) frequencies and alphabetic placement as

$$\begin{aligned} \widehat{a}_1, \underbrace{\widehat{a}_2, \widehat{b}_1, \widehat{c}_1, \widehat{d}_{1,1}, \widehat{e}_{1,1}}_{frequency=2}, \underbrace{\widehat{a}_3, \widehat{d}_{1,2}, \widehat{d}_{2,1}, \widehat{e}_{1,2}, \widehat{e}_{2,1}}_{frequency=3}, \widehat{a}_4, \widehat{b}_2,... \end{aligned}$$

then we get the ordered sequence \(\widehat{K\mathscr {F}}(f)\) and the parameterization \(\widehat{K\mathscr {F}}_{\omega }(f,S)\) consists of the K-Fourier estimated coefficients of the estimated probability density function \(\widehat{f}(\alpha ,\theta )\) with a frequency less than or equal \(\omega \).

The presented paper proposed the investigation of different configuration, given by [9], of patch size (n) and cut-off frequencies (\(\omega \)) in the estimated probability density function.

In this paper \(EKFC_n = \left( \widehat{K\mathscr {F}}_{\omega }(f,S)\right) \) will be the Estimated K-Fourier Coefficients with frequency less than or equal to \(\omega \) obtained from \(n\times n\) projected \( patches \) and \(EKFC= \left[ EKFC_{n_1},EKFC_{n_2},...,EKFC_{n_j}\right] \) will be the concatenated Estimated K-Fourier Coefficients of j values of n that generate the descriptors (Fig. 1D).

For more details of the equations in Sects. 2.1, 2.2 and 2.3 see [11].

3 Results and Discussion

Analyzing the equations of the previous section, the size of the patches and the cut-off frequency have a great influence on the quality and quantity of feature vectors.

The patch sizes configuration was analyzed in [11] getting great results and reducing the setting for \(n = 4,5,6\) using the fixed cutoff frequency.

The cutoff frequency used by [9] kept the same value (\(\omega = 6\)) proposed by [14] to calculate the estimating function \(\widehat{f}\), influencing the calculation of the Estimated K-Fourier Coefficients.

Motivated also by the lack of criteria in choosing the cutoff frequency proposed by [9], we performed experiments with 5 datasets: KTH-TIPS [6], CUReT [5], Brodatz [2], Vistex [12] and ALOT [3].

Across all experiments we used the Large Margin Nearest Neighbor (LMNN) [13], in the metric learning with the 3 nearest neighbors to acquire a global metric, as well as 20% of the training set for cross validation. For classification we used the mean and variance of the percentage of test images, which were labeled correctly, and computed in 100 random split training/test sets. We used half the images per class for the training set, in all experiments.

For the first experiment, we used configurations of patch sizes (\(n = 3, 7, 11, 15, 19\)) as in [9]. We vary the cutoff frequency (\(\omega = 2, ..., 12\)). Table 1 presents the results obtained in the first experiment.

Table 1. Comparison of classification results of different frequencies for KTH-TIPS.

The results (Table 1) showed that when we increases the cutoff frequency, consequently the size of descriptor increases too and the accuracy increased too in most cases. The best results for the KTH-TIPS dataset using the cutoff frequency 12 obtained 99.27% of accuracy.

For the second experiment, we used two sets configurations of patch sizes (\(n = 3, 7, 11, 15, 19\) and \(n= 3, 4, 5\)) and vary the cutoff frequency (\(\omega = 2, ..., 12\)). The second setting was set after testing with various patch configurations [11]. In this experiment, we analyzed the results of the best accuracy in all tested datasets. However, we impose a limit of 210 on the size of descriptor based on [9]. We consider this limit because we want to have better accuracy with fewer descriptors. Table 2 presents the best results obtained in the second experiment.

Table 2. Results yielded for two different sets of patches.

The results (Table 2) shows the best accuracy of two different sets of patches for five datasets with the limited descriptor size. Comparing two sets for all datasets, an increase in accuracy with reduced frequency cutoff was observed.

Table 3. Comparison of the our approach with the methods in the literature.

We also performed a comparison with traditional and state of art texture analysis methods. Table 3 presents the results yielded for each method. These methods use classifiers other than LMNN, such as LDA or Neural Networks. Emphasis is given here to the fact that we were not able to implement all the approaches under comparison due to the complexity of the method or missing information in their respective papers. Thus, for methods that lack results for one or more datasets, these results correspond to those presented in their respective paper. We also compared our approach with the fixed configuration of patch sizes. For this comparison we used our results previously presented in Table 2.

The results on Table 3 indicate that our approach presents a reduced amount of descriptors and consistently high performance in across all texture datasets tested. It yielded the highest success rate in the Vistex dataset using a reduced number of descriptors.

4 Conclusions

The approach proposed in this work yields feature vectors smaller than other state of the art methods, while keeping on par with those results found in such methods, in terms of classification rates over several image databases, as shown by the experiments performed. The cutoff frequency and the size of the patches has a great influence on the computational cost and in our approach we managed to reduce the quantity and the size of the patches.

As future work, we intend to investigate the use of polynomial order greater than two or other different kinds of function, which may aid in reducing further reduce the size of the descriptor.