Skip to main content

Logic and the Automatic Acquisition of Scientific Knowledge: An Application to Functional Genomics

  • Chapter
Computational Discovery of Scientific Knowledge

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4660))

Abstract

This paper is a manifesto aimed at computer scientists interested in developing and applying scientific discovery methods. It argues that: science is experiencing an unprecedented “explosion” in the amount of available data; traditional data analysis methods cannot deal with this increased quantity of data; there is an urgent need to automate the process of refining scientific data into scientific knowledge; inductive logic programming (ILP) is a data analysis framework well suited for this task; and exciting new scientific discoveries can be achieved using ILP scientific discovery methods. We describe an example of using ILP to analyse a large and complex bioinformatic database that has produced unexpected and interesting scientific results in functional genomics. We then point a possible way forward to integrating machine learning with scientific databases to form intelligent databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Adams, et al.: The genome sequence of Drosophilia Melanogaster. Science 287, 2185–2195 (2000)

    Article  Google Scholar 

  • Alizadeh, A., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)

    Article  Google Scholar 

  • Altschul, S.F., Madden, T.L., Schaffer, A.A, Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucl. Acid Res. 25, 3389–3402 (1997)

    Article  Google Scholar 

  • The Arabidopsis genome initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000)

    Google Scholar 

  • Bairoch, A., Apweiler, R.: The SWISS-PROT protein sequence database and its supplement. TrEMBL Nucleic Acids Research 28, 45–48 (2000)

    Article  Google Scholar 

  • Blackstock, W.P., Weir, M.P.: Proteomics: quantitative and physical mapping of cellular proteins. Tibtech 17, 121–127 (1999)

    Google Scholar 

  • Blattner, F.R., et al.: The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1461 (1997)

    Article  Google Scholar 

  • Boden, M.: Artificial intelligence and natural man. The Harvester Press, Brighton, Sussex (1977)

    Google Scholar 

  • Bork, P., Dandekar, T., Diaz-Lazcoz, Y., Eisenhaber, F., Huynen, M., Yuan, Y.P.: Predicting function: From genes to genomes and back. Journal of Molecular Biology 283, 707–725 (1998)

    Article  Google Scholar 

  • Bowers, A.F., Giraud-Carrier, C., Lloyd, J.W.: Classification of Individuals with Complex Structure. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 81–88. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  • Brenner, E.: Errors in gene annotation. Trends in Genetics 15, 132–133 (1999)

    Article  Google Scholar 

  • Brent, R.: Functional genomics: Learning to think about gene expression data. Current Biology 9, 338–R341 (1999)

    Article  Google Scholar 

  • Brown, P.O., Botstein, D.: Exploring the new world of the genome with DNA microarrays. Nature Genetics 21, 33–37 (1999)

    Article  Google Scholar 

  • Buchanan, B.G., Sutherland, G.L., Feigenbaum, E.A.: Heuristic DENDRAL: A program for generating explanatory hypotheses in organic chemistry. In: Meltzer, B., Michie, D. (eds.) Machine Intelligence 4, Edinburgh University Press, pp. 209–254 (1969)

    Google Scholar 

  • Bussey, H.: 1997 ushers in an era of yeast functional genomics. Yeast 13, 1501–1503 (1997)

    Article  Google Scholar 

  • C. elegans Sequencing Consortium: Genome sequence of the nematode C. elegans: A platform for investigating biology. Science 282, 2012–2018 (1998)

    Google Scholar 

  • Cole, S.T., et al.: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature 393, 537–544 (1998)

    Article  Google Scholar 

  • Cussens, J.: Parameter estimation in stochastic logic programs. Machine Learning 44, 245–271 (2001)

    Article  MATH  Google Scholar 

  • Dehaspe, L., Toivonen, H., King, R.D.: Finding frequent substructures in chemical compounds. In: The Fourth International Conference on Knowledge Discovery and Data Mining, pp. 30–36. AAAI Press, Menlo Park (1998)

    Google Scholar 

  • DeRisi, J.L., Iyer, V.R., Brown, P.O.: Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680–686 (1997)

    Article  Google Scholar 

  • Dzeroski, S., Blockeel, H., Kompare, B., Kramer, S., Pfahringer, B., Van Laer, W.: Experiments in Predicting Biodegradability. In: Džeroski, S., Flach, P.A. (eds.) Inductive Logic Programming. LNCS (LNAI), vol. 1634, pp. 80–91. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  • Dzeroski, S., Lavrac, N.: Relational Data Mining. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  • Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R.: Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Boston (1996)

    Google Scholar 

  • Finn, P., Muggleton, S., Page, D., Srinivasan, A.: Pharmacophore discovery using the inductive logic programming system Progol. Machine Learning 30, 241–271 (1998)

    Article  Google Scholar 

  • Flach, P.A., Giraud-Carrier, C., Llyoyd, J.W.: Strongly typed inductive concept learning. In: Page, D.L. (ed.) Inductive Logic Programming. LNCS, vol. 1446, pp. 185–194. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  • Fujita, H., Yagi, N., Ozaki, T., Furukawa, K.: A new design and implementation of Progol by bottom-up computation. In: Inductive Logic Programming. LNCS, vol. 1314, pp. 163–174. Springer, Heidelberg (1997)

    Google Scholar 

  • FUNCTION, http://www.aber.ac.uk/~dcswww/Research/bio/ProteinFunction

  • GenProtEC, http://genprotec.mbl.edu

  • Gilbert, R.J., Johnson, H.E., Winson, M.K., Rowland, J.J., Goodacre, R., Smith, A.R., Hall, M.A., Kell, D.B.: Genetic programming as an analytical tool for metabolome data. In: Langdon, W.B., Poli, R., Nodin P., Fogarty, T. (eds.): Late-breaking papers of EuroGP-99, Software Engineering, CWI, pp. 23–33 (1999)

    Google Scholar 

  • Goffeau, A., et al.: Life with 6000 genes. Science 274, 546–567 (1996)

    Article  Google Scholar 

  • Gordon, A., Sleeman, D., Edwards, P.: Informal Qualitative Models: A Systematic Approach to their Generation. In: Valdes-Perez, R. (ed.) Proceedings of AAAI 1995 Spring Symposium on Systematic Methods of Scientific Discovery, pp. 18–22. AAAI Press, Stanford (1995)

    Google Scholar 

  • HGP, http://www.sanger.ac.uk/HGP

  • Hieter, P., Boguski, N.: Functional genomics: it’s all how you read it. Science 278, 601–602 (1997)

    Article  Google Scholar 

  • Humphery-Smith, I., Cordwell, S.J., Blackstock, W.P.: Proteome research: complementarity and limitations with respect to the RNA and DNA worlds. Electrophoresis 18, 1217–1242 (1997)

    Article  Google Scholar 

  • International human genome sequencing consortium: Initial Sequencing and analysis of the human genome. Nature 409, 860–921 (2001)

    Google Scholar 

  • Kell, D., King, R.D.: On the optimization of classes for the assignment of unidentified reading frames in functional genomics programmes: the need for machine learning. Trends in Biotechnology 18, 93–98 (2000)

    Article  Google Scholar 

  • Kersting, K., DeRaedt, L.: Bayesian Logic Programs. Linkoping Electronic Articles in Computer and Information Science. 5(034) (2001)

    Google Scholar 

  • King, R.D., Muggleton, S., Lewis, R.A., Sternberg, M.J.E.: Drug design by machine learning - the use of inductive logic programming to model the structure-activity-relationships of trimethoprim analogs binding to dihydrofolate-reductase. Proceedings of the National Academy of Sciences of the USA 89, 11322–11326 (1992)

    Article  Google Scholar 

  • King, R.D., Clark, D.A., Shirazi, J., Sternberg, M.J.E.: On the use of machine learning to identify topological rules in the packing of beta-strands. Protein Engineering 7, 1295–1303 (1994)

    Article  Google Scholar 

  • King, R.D., Muggleton, S.H., Srinivasan, A., Sternberg, M.J.E.: Structure-activity relationships derived by machine learning: The use of atoms and their bond connectivities to predict mutagenicity by inductive logic programming. Proceedings of the National Academy of Sciences of the USA 93, 438–442 (1996)

    Article  Google Scholar 

  • King, R.D., Karwath, A., Clare, A., Dehapse, L.: Genome scale prediction of protein functional class from sequence using data mining. In: Ramakrishnan, R., Stolfo, S., Bayardo, R., Parsa, I. (eds.) The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. The Association for Computing Machinery, New York, USA, pp. 384–389 (2000a)

    Google Scholar 

  • King, R.D., Karwath, A., Clare, A., Dehapse, L.: Accurate prediction of protein class in the M. tuberculosis and E. coli genomes using data mining. Yeast (Comparative and Functional Genomics) 17, 283–293 (2000b)

    Google Scholar 

  • King, R.D., Karwath, A., Clare, A., Dehapse, L.: The utility of different representations of protein sequence for predicting functional class. Bioinformatics 17, 445–454 (2001)

    Article  Google Scholar 

  • Kramer, S., De Raedt, L., Helma, C.: Molecular feature mining in HIV Data. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 136–143 (2001)

    Google Scholar 

  • Kramer, S., Lavrac, N., Flach, P.: Propositionalization approaches to relational data mining. In: Dzeroski, S., Lavrac, N. (eds.) Relational Data Mining, Springer, Heidelberg (2001)

    Google Scholar 

  • Jaynes, E.T.: Probability theory: The logic of Science (1994), http://omega.albany.edu:8008/JaynesBook.html

  • Langley, P., Simon, H.A., Bradshaw, G.L., Zytkow, J.M.: Scientific Discovery: Computational Explorations of the Creative Process. MIT Press, Cambridge, MA (1987)

    Google Scholar 

  • Lavrac, N., Dzeroski, S.: Inductive logic programming: techniques and applications. Ellis Horwood, Chichester (1994)

    Google Scholar 

  • Mannila, H.: Inductive database and condensed representations for data mining. In: Maluszynski, J. (ed.) Proceedings of the International Logic Programming Symposium, pp. 21–30. MIT Press, Cambridge (1997)

    Google Scholar 

  • Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1, 241–258 (1997)

    Article  Google Scholar 

  • Mitchell, T.M.: Generalization as search. Artificial Intelligence 18, 203–226 (1982)

    Article  MathSciNet  Google Scholar 

  • Mitchell, T.M.: Machine Learning. McGraw-Hill, London (1997)

    MATH  Google Scholar 

  • Muggleton, S.H.: Inductive Logic Programming. New Generation Computing 8, 295–318 (1990)

    Article  Google Scholar 

  • Muggleton, S.H.: Inductive Logic Programming. Academic Press, London (1992)

    MATH  Google Scholar 

  • Muggleton, S.: Inverse Entailment and Progol. New Generation Computing Journal 13, 245–286 (1995)

    Google Scholar 

  • Muggleton, S., King, R.D., Sternberg, M.J.E.: Protein secondary structure prediction using logic-based machine learning. Protein Engineering 5, 647–657 (1992)

    Article  Google Scholar 

  • Muggleton, S.: Learning Stochastic Logic Programs. Linkoping Electronic Articles in Computer and Information Science 5(041) (2001)

    Google Scholar 

  • Oliver, S.G., Baganz, F.: The yeast genome: systematic analysis of DNA sequence and biological function. In: Copping, L.G., Dixon, G.K., Livingstone, D.J. (eds.) Genomics: commercial opportunities from a scientific revolution, Bios, pp. 37–51, Oxford (1998)

    Google Scholar 

  • Ouali, M., King, R.D.: Cascaded multiple classifiers for secondary structure prediction. Protein Science 9, 1162–1176 (2000)

    Google Scholar 

  • Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the USA 85, 2444–2448 (1988)

    Article  Google Scholar 

  • Plato, http://plato.stanford.edu/entries/logic-relevance

  • Quinlan, R.: C4.5: Programs for machine learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  • Rabitz, H., de Vivie-Riedle, R., Motzkus, M., Kompa, K.: Whither the Future of Controlling Quantum Phenomena? Science 288, 824–828 (2000)

    Article  Google Scholar 

  • Reichardt, T.: It’s sink or swim as a tidal wave of data approaches. Nature 399, 517–520 (1999)

    Article  Google Scholar 

  • Russel, S.J., Norvig, P.: Artificial Intelligence: A modern approach. Prentice Hall, Englewood Cliffs (1995)

    Google Scholar 

  • Sleeman, D.H., Stacy, M.K., Edwards, P., Gray, N.A.B.: An Architecture for Theory-Driven Scientific Discovery. In: Morik, K. (ed.) Proceedings of the Fourth European Working Session on Learning, pp. 11–23, Pitman, London (1989)

    Google Scholar 

  • Srinivasan, A., King, R.D.: Feature construction with Inductive Logic Programming: A study of quantitative predictions of biological activity aided by structural attributes. Data Mining and Knowledge Discovery 3, 37–57 (1999)

    Article  Google Scholar 

  • Srinivasan, A.: A study of two probabilistic methods for searching large spaces with ILP. Data Mining and Knowledge Discovery 3, 95–123 (2001)

    Article  Google Scholar 

  • Sternberg, M.J.E., King, R.D., Lewis, R.A., Muggleton, S.: Application of machine learning to structural molecular biology. Philosophical Transactions of the Royal Society of London Series B- Biological Sciences 344, 365–371 (1994)

    Article  Google Scholar 

  • TB - http://www.sanger.ac.uk/Projects/M_tuberculosis/gene_list_full.shtm

  • Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, London (1977)

    MATH  Google Scholar 

  • Turcotte, M., Muggleton, S.H., Sternberg, M.J.E.: The effect of relational background knowledge on learning of protein three-dimensional fold signatures. Machine Learning 12, 81–96 (2001)

    Article  Google Scholar 

  • Ullman, J.D.: Principles of databases and knowledge-base systems, vol. 1. Computer Science Press, Rockville, MD (1988)

    Google Scholar 

  • Valdes-Perez, R.E.: Discovery tools for science applications. Communications of the ACM 42, 37–41 (1999)

    Article  Google Scholar 

  • Venter, J.C., et al.: The sequence of the human genome. Science 291, 1304–1351 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Sašo Džeroski Ljupčo Todorovski

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

King, R.D., Karwath, A., Clare, A., Dehaspe, L. (2007). Logic and the Automatic Acquisition of Scientific Knowledge: An Application to Functional Genomics. In: Džeroski, S., Todorovski, L. (eds) Computational Discovery of Scientific Knowledge. Lecture Notes in Computer Science(), vol 4660. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73920-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73920-3_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73919-7

  • Online ISBN: 978-3-540-73920-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics