Cost-Sensitive Collaborative Representation Based Classification via Probability Estimation Addressing the Class Imbalance Problem

Zhenbing Liu⁴,
Chao Ma^5,6,
Chunyang Gao⁵,
Huihua Yang^4,7,
Tao Xu⁵,
Rushi Lan⁴ &
…
Xiaonan Luo⁴

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

2734 Accesses

Abstract

Collaborative representation has been successfully used in pattern recognition and machine learning. However, most existing collaborative representation classification methods are to achieve the highest classification accuracy, assuming the same losses for different misclassifications. This assumption may be ineffective in many real-word applications, as misclassification of different types could lead to different losses. Meanwhile, the class distribution of data is highly imbalanced in real-world applications. To address these problems, Cost-sensitive Collaborative Representation based Classification via Probability Estimation Addressing the Class Imbalance Problem method was proposed. The class label of test samples was predict by minimizing the misclassification losses which are obtained via computing the posterior probabilities. In this paper, a Gaussian function was defined as a probability distribution of collaborative representation coefficient vector and it was transformed into collaborative representation framework via logarithmic operator. The experiments on UCI and YaleB databases show that our method performs competitively compared with other methods.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Keywords

1 Introduction

In recent year, cost-sensitive learning has been studied widely and become one of the most important topics for solving the class imbalance problem [1]. In [2], Zhou et al. studied empirically the effect of sampling and threshold-moving in training cost-sensitive neural networks, and revealed threshold-moving and soft-ensemble are relatively good choices in training cost-sensitive neural networks. In [3], Sun et al. proposed a cost-sensitive boosting algorithms, which are developed by introducing cost items into the learning framework of AdaBoost. In [4], Jiang et al. proposed a novel Minority Cloning Technique (MCT) for class-imbalanced cost-sensitive learning. MCT alters the class distribution of training data by cloning each minority class instance according to the similarity between it and the mode of the minority class. In [5], a new cost-sensitive metric was proposed by George to find the optimal tradeoff between the two most critical performance measures of a classification task-accuracy and cost. Generally, users focus more on the minority class and consider the cost of misclassifying a minority class to be more expensive. In our study, we adopt the same strategy to addressing this problem.

Motivated by probabilistic collaborative representation based approach for pattern classification [6] and Zhang’s work [7], in this paper we propose a new method to handle misclassification cost and class-imbalance problem called Cost-sensitive Collaborative Representation based Classification via Probability Estimation Addressing the Class Imbalance Problem (CSCRC). In Zhang’s cost-sensitive learning framework, posterior probabilities of a testing sample are estimated by KLR or KNN method. In [6], Probabilistic Collaborative Representation based approach for pattern Classification (ProCRC) is designed to achieve the lowest recognition errors and assume the same losses for different types of misclassifications, it is difficult to resolve the class imbalance problem. For this case, we introduce cost-sensitive learning framework into ProCRC, which not only derive the relationship between Gaussian function and collaborative representation but also resolve the cost-sensitive problem in [6]. Firstly, we use the probabilistic collaborative representation framework to estimate the posterior probabilities. The posterior probabilities are generated directly from the coding coefficients by using a Gaussian function and applying the logarithmic operator to the probabilistic collaborative representation framework, this explained clearly the l₂-norm regularized representation scheme used in collaborative representation based classifier (CRC). Secondly, calculating all the misclassification losses use Zhang’s cost-sensitive learning framework. At last, the test sample is assigned to the class whose loss is minimal. Experimental results on UCI databases validate the effectiveness and efficiency of our methods.

2 Proposed Approach

In Cai’s work [6], different data points x have different probabilities of $ l(x)\, \in \,l_{X} $, where l(x) means the label of x, l_X means the label set of all candidate classes in X, and $ P\left( {l\left( x \right)\, \in \,l_{X} } \right) $ should be higher if the l₂-norm of $ \alpha $ is smaller, vice versa. One intuitive choice is to use a Gaussian function to define such a probability:

$$ P\left( {l(x)\, \in \,l_{X} } \right)\,{ \propto }\,exp\left( { - c\left\| \alpha \right\|_{2}^{2} } \right) $$

(1)

where c is a constant and data points are assigned different probabilities based on $ \alpha $, where all the data points are inside the subspace spanned by all samples in X. For a sample y outside the subspace, the probability as:

$$ P\left( {l(y) \in l_{X} } \right) = P\left( {l(y) = l(x)\left| {l(x) \in l_{X} } \right.} \right)P\left( {l(x) \in l_{X} } \right) $$

(2)

$ P(l(x) \in l_{X} ) $ has been defined in Eq. (7). $ P(l(y) = l(x)\left| {l(x) \in l_{X} } \right.) $ can be measured by the similarity between x and y. Here we adopt the Gaussian kernel to define it:

$$ P(l(y) = l(x)\left| {l(x) \in l_{X} } \right.)\, \propto \,exp( - k\left\| {y - x} \right\|_{2}^{2} ) $$

(3)

where k is a constant, with Eqs. (1)–(3), we have

$$ P(l(y) \in l_{X} )\, \propto \,exp( - (k\left\| {y - X\alpha } \right\|_{2}^{2} + c\left\| \alpha \right\|_{2}^{2} )) $$

(4)

In order to maximize the probability, we can apply the logarithmic operator to Eq. (4). There is:

$$ \begin{aligned} & \hbox{max} P(l(y)\, \in \,l_{X} ) = \hbox{max} \ln (P(l(y)\, \in \,l_{X} )) \\ & = min_{\alpha } k\left\| {y - X\alpha } \right\|_{2}^{2} + c\left\| \alpha \right\|_{2}^{2} \\ & = min_{\alpha } \left\| {y - X\alpha } \right\|_{2}^{2} + \lambda \left\| \alpha \right\|_{2}^{2} \\ \end{aligned} $$

(5)

where $ \lambda = c/k $. Interestingly, Eq. (5) shares the same formulation of the representation formula of CRC [4], but it has a clear probabilistic interpretation.

A sample x inside the subspace can be collaboratively represented as: $ x = X\alpha = \sum\nolimits_{k = 1}^{K} {X_{k} \alpha_{k} } $, where $ \alpha = [\alpha_{1} ;\alpha_{2} ; \ldots ;\alpha_{k} ] $ and $ \alpha_{k} $ is the coding vector associated with X_k. Note that $ x_{k} = X_{k} \alpha_{k} $ is a data point falling into the subspace of class k. Then, we have

$$ P(l(x) = k\left| {l(x) \in l_{X} } \right.)\, \propto \,exp( - \delta \left\| {x - X_{k} \alpha_{k} } \right\|_{2}^{2} ) $$

(6)

where $ \delta $ is a constant. For a query sample y, we can compute the probability that $ l(y) = k $ as:

$$ \begin{aligned} & P(l(y) = k) \\ & = P(l(y) = l(x)\left| {l(x) = k} \right.) \cdot P(l(x) = k) \\ & = P(l(y) = l(x)\left| {l(x) = k} \right.) \cdot P(l(x) = k\left| {l(x) \in l_{X} } \right.) \cdot P(l(x) \in l_{X} ) \\ \end{aligned} $$

(7)

Since the probability definition in Eq. (3) is independent of k as long as $ k \in l_{X} $, we have $ P(l(y) = l(x)\left| {l(x) = k} \right.) = P(l(y) = l(x)\left| {l(x) \in l_{X} } \right.) $. With Eqs. (5)–(7), we have

$$ \begin{aligned} P(l(y) = k) = P(l(y) \in l_{X} ) \cdot P(l(x) = k\left| {l(x) \in l_{X} } \right.) \hfill \\ \, \propto \,exp( - \left\| {y - X\alpha } \right\|_{2}^{2} + \lambda \left\| \alpha \right\|_{2}^{2} + \gamma \left\| {X\alpha - X_{k} \alpha_{k} } \right\|_{2}^{2} ) \hfill \\ \end{aligned} $$

(8)

where $ \gamma = \delta /k $. Applying the logarithmic operator to Eq. (8) and ignoring the constant term, we have:

$$ (\hat{\alpha }) = \arg \min_{\alpha } \{ \left\| {y - X\alpha } \right\|_{2}^{2} + c\left\| \alpha \right\|_{2}^{2} + \left\| {X\alpha - X_{k} \alpha_{k} } \right\|_{2}^{2} \} $$

(9)

Refer to Eq. (9), let $ X^{\prime}_{k} $ be a matrix which has the same size as X, while only the samples of $ X_{k} $ will be assigned to $ X^{\prime}_{k} $ at their corresponding locations in X, i.e., $ X^{\prime}_{k} = \left[ {0, \ldots ,X_{k} , \ldots ,0} \right] $. Let $ \bar{X}^{\prime}_{k} = X - X^{\prime}_{k} $. We can then compute the following projection matrix offline:

$$ T = (X^{T} X + (\bar{X}^{\prime}_{k} )^{T} \bar{X}^{\prime}_{k} + \lambda I)^{ - 1} X^{T} $$

(10)

where I denotes the identity matrix. Then, $ \hat{\alpha } = Ty $.

With the model in Eq. (9), a solution vector $ \hat{\alpha } $ is obtained. The probability P(l(y) = k) can be computed by:

$$ P(l(y) = k) \propto \,exp( - (\left\| {y - X\hat{\alpha }} \right\|_{2}^{2} + \lambda \left\| {\hat{\alpha }} \right\|_{2}^{2} + \left\| {X\hat{\alpha } - X_{k} \hat{\alpha }_{k} } \right\|_{2}^{2} )) $$

(11)

Note that $ \left( {\left\| {y - X\hat{\alpha }} \right\|_{2}^{2} + \lambda \left\| {\hat{\alpha }} \right\|_{2}^{2} } \right) $ is the same for all classes, and thus we can omit it in computing P(l(y) = k). Then we have

$$ P_{k} = exp\left( { - \left( {\left\| {X\hat{\alpha } - X_{k} \hat{\alpha }_{k} } \right\|_{2}^{2} } \right)} \right) $$

(12)

In cost-sensitive learning, the loss function is regarded as an objective function to identify the label of a test sample. In binary classification problem, there are two misclassification costs, and we denote the cost that misclassify positive class as negative class by C₁₀, and the cost by C₀₁ conversely. Then a cost matrix can be constructed as shown in Table 1, where G₁, G₀ represents the label of minority class and majority class, respectively.

Table 1 The classification accuracy for the 5 methods on 10 data sets

Full size table

It is well known that the loss function can be related to the posterior probability $ P (\phi (y )\left| y \right. )\, \approx \,P (l (y )= k ) $. Then the loss function can be rewritten as follow:

$$ loss (y,\phi (y ) ) { = }\left\{ {\begin{array}{*{20}c} {\sum\limits_{{i = G_{1} }} {P_{i} C_{10} \,\,{\text{if }}\phi (y )= G_{0} } } \\ {\sum\limits_{{i = G_{0} }} {P_{i} C_{01} \,\,{\text{if }}\phi (y )= G_{1} } } \\ \end{array} } \right. $$

(13)

The test sample y belongs to the class with higher probability. We can obtain the label of test sample y by minimizing Eq. (13):

$$ L (y ) {\text{ = arg }}\mathop { \hbox{min} }\limits_{{i \in {\text{\{ 0,1\} }}}} loss (y,\phi (y ) ) $$

(14)

3 Results

Experiment 1 We compare the performance of these 5 methods (sparse representation based classification (SRC), CRC, SVM, ProCRC, CSCRC) on 10 UCI data sets, and the results are summarized in Tables 1 and 2. The last row of Table 1 is the average Accuracy value for the method on ten data sets. We select 31 positive samples and 31 negative samples randomly from data sets Haberman, Housing, Ionosphere and Balance as test samples, 41 positive samples and 41 negative samples as training samples; 61 positive samples and 61 negative samples as test samples, 101 positive samples and 101 negative samples as training samples from the other 6 data sets. The cost ratio (the cost of false acceptance respect to false rejection) set as 10. We perform the process for 50 times and get the average results.

Table 2 The average cost for the 5 methods on 10 data sets

Full size table

On Letter, Balance, Abalone, Car, Nursery, Cmc and Haberman, our method achieves very high Accuracy value respect to the other four methods. One of the three data sets does not get the highest value of Accuracy, but we achieve the highest value of average Accuracy. The values of accuracy are higher than 0.93. In other words, our method have better performance than SRC, CRC, SVM, ProCRC and CSCRC.

We calculate the misclassification cost of this 5 method on 10 UCI data sets and summarized as Table 2. On Letter, Balance, Abalone, Car, Pima, Nursery, Cmc and Haberman, our method achieves very low average misclassification cost. In Table 1, SRC has the highest value on Pima, but CSCRC has the highest value of Average Cost on Pima. Obviously, CSCRC classify the positive samples correctly. Furthermore, the value of Accuracy and is lower than CRC on Housing and Ionosphere, but the value of Average Cost is inverse.

Experiment 2 Similarly, we compare the performance of these 5 methods (SRC, CRC, SVM, ProCRC, CSCRC) on Letter, and evaluate the performance via G-mean and Average Cost for the class-imbalance problem. In this experiment, we taken the imbalance ratio from [1, 2, …, 10], respectively. The size of minority class is 30 and the majority class is 30 multiply the imbalance ratios in train set, accordingly. We select 61 positive samples and 61 negative samples as test set. The cost is set as mentioned above.

Note that there are also situations in which CSCRC is preferred. From the results on Figs. 1 and 2 we can see that CSCRC has higher G-mean than the other four methods except the imbalance ratio is 1. Meanwhile, CSCRC achieves the lowest Average Cost respect to the other methods. This suggests that Cascade can focus on more useful data. With the increasing of imbalance ratio, we have more training samples, and the proposed method can classify the samples correctly when the imbalance ratio is up to 4. Generally speaking, class-imbalance does affect the proposed method CSCRC. Concretely, CSCRC is not influenced by the distribution of samples, we can also get a better classify result when the imbalance ratio is high.

4 Conclusions

The class imbalanced datasets occurs in many real-world applications where the class distributions of data are highly imbalanced. This paper, we propose a novel method to handle misclassification cost and class imbalance problem called Cost-sensitive Collaborative Representation Classification based Probability Estimation. The proposed approach adopted probabilistic model and sparse representation coefficient matrix to estimate the prior probability and then obtained the label of a testing sample by minimizing the misclassification losses. The experimental results show that the proposed CSCRC has a comparable or even lower average cost with higher accuracy compare to the other four classification algorithm.

In order to simplify the cost matrix, we restrict our discussion to two-class problems. So, extending our current work to multi-class scenario is a main research direction for our future work.

References

Lan, R., Yang J.: Orthogonal projection transform with application to shape description. In: 2010 IEEE International Conference on Image Processing (2010)
Google Scholar
Zhou, Z.H., Liu, X.Y.: Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowl. Data Eng. 18(1), 63–77 (2006)
Article MathSciNet Google Scholar
Sun, Y., Kamel, M.S., Wong, A.K.C., et al.: Cost-sensitive boosting for classification of imbalanced data. Pattern Recogn. 40(12), 3358–3378 (2007)
Article MATH Google Scholar
Jiang, L., Qiu, C., Li, C.: A novel minority cloning technique for cost-sensitive learning. Int. J. Pattern Recogn. Artif. Intell. 29(4) (2015)
Google Scholar
George, N.I., Lu, T.P., Chang, C.W.: Cost-sensitive performance metric for comparing multiple ordinal classifiers. Artif. Intell. Res. 5(1), p135 (2016)
Article Google Scholar
Cai, S., Zhang, L., Zuo, W., et al.: A probabilistic collaborative representation based approach for pattern classification. Comput. Vis. Pattern Recogn. (2016)
Google Scholar
Yin, Z., Zhi-Hua, Z.: Cost-sensitive face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 32(10), 1758–1769 (2010)
Article Google Scholar

Download references

Acknowledgements

The authors want to thank the anonymous reviewers and the associate editor for helpful comments and suggestions. This work is supported by the National Natural Science Foundation of China (Grant Nos. 61562013, 21365008 and 61320106008), Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics (No. LD16096x), the Center for Collaborative Innovation in the Technology of IOT and the Industrialization (WLW20060610).

Author information

Authors and Affiliations

Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics, Guilin University of Electronic Technology, Guilin, China
Zhenbing Liu, Huihua Yang, Rushi Lan & Xiaonan Luo
School of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin, China
Chao Ma, Chunyang Gao & Tao Xu
Guangxi Colleges and Universities Key Laboratory of Intelligent Processing of Computer Images and Graphics, Guilin University of Electronic Technology, Guilin, China
Chao Ma
School of Automation, Beijing University of Posts and Telecommunications, Beijing, China
Huihua Yang

Authors

Zhenbing Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chunyang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Huihua Yang
View author publications
You can also search for this author in PubMed Google Scholar
Tao Xu
View author publications
You can also search for this author in PubMed Google Scholar
Rushi Lan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaonan Luo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rushi Lan .

Editor information

Editors and Affiliations

Department of Mechanical and Control Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
Huimin Lu
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China
Xing Xu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Liu, Z. et al. (2018). Cost-Sensitive Collaborative Representation Based Classification via Probability Estimation Addressing the Class Imbalance Problem. In: Lu, H., Xu, X. (eds) Artificial Intelligence and Robotics. Studies in Computational Intelligence, vol 752. Springer, Cham. https://doi.org/10.1007/978-3-319-69877-9_31

Download citation

DOI: https://doi.org/10.1007/978-3-319-69877-9_31
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69876-2
Online ISBN: 978-3-319-69877-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Cost-Sensitive Collaborative Representation Based Classification via Probability Estimation Addressing the Class Imbalance Problem

Abstract

Keywords

1 Introduction

2 Proposed Approach

3 Results

4 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation