Abstract
Within-network regression addresses the task of regression in partially labeled networked data where labels are sparse and continuous. Data for inference consist of entities associated with nodes for which labels are known and interlinked with nodes for which labels must be estimated. The premise of this work is that many networked datasets are characterized by a form of autocorrelation where values of the response variable in a node depend on values of the predictor variables of interlinked nodes. This autocorrelation is a violation of the independence assumption of observation. To overcome to this problem, the lagged predictor variables are added to the regression model. We investigate a computational solution for this problem in the transductive setting, which asks for predicting the response values only for unlabeled nodes of the network. The neighborhood relation is computed on the basis of the node links. We propose a regression inference procedure that is based on a co-training approach according to separate model trees are learned from both attribute values of labeled nodes and attribute values aggregated in the neighborhood of labeled nodes, respectively. Each model tree is used to label the unlabeled nodes for the other during an iterative learning process. The set of labeled data is changed by including labels which are estimated as confident. The confidence estimate is based on the influence of the predicted labels on known labels of interlinked nodes. Experiments with sparsely labeled networked data show that the proposed method improves traditional model tree induction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abreu, M., de Groot, H., Florax, R.: Space and growth: A survey of empirical evidence and methods. Region and Development, 12–43 (2005)
Anselin, L.: Spatial externalities, spatial multipliers and spatial econometrics. International Regional Science Review (26), 153–166 (2003)
Appice, A., Dzeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007)
Blum, A., Mitchell, T.M.: Combining labeled and unlabeled data with co-training. In: COLT, pp. 92–100 (1998)
Brefeld, U., Gärtner, T., Scheffer, T., Wrobel, S.: Efficient co-regularised least squares regression. In: Cohen, W.W., Moore, A. (eds.) 23th International Conference on Machine Learning, ICML 2006. ACM International Conference Proceeding Series, vol. 148, pp. 137–144. ACM, New York (2006)
Charlton, M., Fotheringham, S., Brunsdon, C.: Geographically weighted regression. In: ESRC National Centre for Research Methods NCRM Methods Review Papers NCRM/006 (2005)
Cortez, P., Morais, A.: A data mining approach to predict forest fires using meteorological data, pp. 512–523. APPIA (2007)
Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: Abstracts of the 90th ESA Annual Meeting, The Ecological Society of America, p. 152 (2005)
Gallagher, B., Tong, H., Eliassi-Rad, T., Faloutsos, C.: Using ghost edges for classification in sparsely labeled networks. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2008, pp. 256–264. ACM, New York (2008)
David, J., Jennifer, N., Brian, G.: Why collective inference improves relational classification. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 593–598. ACM, New York (2004)
Macskassy, S.A., Provost, F.: A Brief Survey of Machine Learning Methods for Classification in Networked Data and an Application to Suspicion Scoring. In: Airoldi, E.M., Blei, D.M., Fienberg, S.E., Goldenberg, A., Xing, E.P., Zheng, A.X. (eds.) ICML 2006. LNCS, vol. 4503, pp. 172–175. Springer, Heidelberg (2007)
Macskassy, S., Provost, F.: Classification in networked data: a toolkit and a univariate case study. Machine Learning 8, 935–983 (2007)
Macskassy, S.A.: Improving learning in networked data by combining explicit and mined links. In: Proceedings of the 22nd Conference on Artificial Intelligence, AAAI 2007, pp. 590–595. AAAI Press, Menlo Park (2007)
McPherson, M., Smith-Lovin, L., Cook, J.: Birds of a feather: Homophily in social networks. Annual Review of Sociology 27, 415–444 (2001)
Jennifer, N., David, J.: Relational dependency networks. Journal of Machine Learning Research 8, 653–692 (2007)
Neville, J., Simsek, O., Jensen, D.: Autocorrelation and relational learning: Challenges and opportunities. In: Proceedings of the Workshop on Statistical Relational Learning (2004)
Pace, P., Barry, R.: Quick computation of regression with a spatially autoregressive dependent variable. Geographical Analysis 29(3), 232–247 (1997)
Rey, S.J., Montouri, B.D.: U.s. regional income convergence: a spatial econometric perspective. Regional Studies (33), 145–156 (1999)
Sen, P., Namata, G., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective classification in network data. AI Magazine 29(3), 93–106 (2008)
Tobler, W.: Cellular geography. In: Gale, S., Olsson, G. (eds.) Philosophy in Geography (1979)
Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)
Zhou, Z.-H., Li, M.: Semisupervised regression with cotraining-style algorithms. IEEE Transaction in Knowledge Data Engineering 19(11), 1479–1493 (2007)
Zhu, X., Ghahramani, Z., Lafferty, J.D.: Semi-supervised learning using gaussian fields and harmonic functions. In: Fawcett, T., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning, ICML 2003, pp. 912–919. AAAI Press, Menlo Park (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Appice, A., Ceci, M., Malerba, D. (2009). An Iterative Learning Algorithm for Within-Network Regression in the Transductive Setting. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds) Discovery Science. DS 2009. Lecture Notes in Computer Science(), vol 5808. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04747-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-04747-3_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04746-6
Online ISBN: 978-3-642-04747-3
eBook Packages: Computer ScienceComputer Science (R0)