Skip to main content

Processing Neurology Clinical Data for Knowledge Discovery: Scalable Data Flows Using Distributed Computing

  • Chapter
  • First Online:
Machine Learning for Health Informatics

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9605))

Abstract

The rapidly increasing capabilities of neurotechnologies are generating massive volumes of complex multi-modal data at a rapid pace. This neurological big data can be leveraged to provide new insights into complex neurological disorders using data mining and knowledge discovery techniques. For example, electrophysiological signal data consisting of electroencephalogram (EEG) and electrocardiogram (ECG) can be analyzed for brain connectivity research, physiological associations to neural activity, diagnosis, and care of patients with epilepsy. However, existing approaches to store and model electrophysiological signal data has several limitations, which make it difficult for signal data to be used directly in data analysis, signal visualization tools, and knowledge discovery applications. Therefore, use of neurological big data for secondary analysis and potential development of personalized treatment strategies requires scalable data processing platforms. In this chapter, we describe the development of a high performance data flow system called Signal Data Cloud (SDC) to pre-process large-scale electrophysiological signal data using open source Apache Pig. The features of this neurological big data processing system are: (a) efficient partitioningof signal data into fixed size segments for easier storage in high performance distributed file system, (b) integration and semantic annotation of clinical metadata using an epilepsy domain ontology, and (c) transformation of raw signal data into an appropriate format for use in signal analysis platforms. In this chapter, we also discuss the various challenges being faced by the biomedical informatics community in the context of Big Data, especially the increasing need to ensure data quality and scientific reproducibility.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brain Research through Advancing Innovative Neurotechnologies (BRAIN). The White House, Washington, D.C. (2013)

    Google Scholar 

  2. Bargmann, C., Newsome, W., Anderson, D., et al.: BRAIN 2025: a scientific vision. US National Institutes of Health 2014

    Google Scholar 

  3. Marcus, D.S., Harwell, J., Olsen, T., Hodge, M., Glasser, M.F., Prior, F., Jenkinson, M., Laumann, T., Curtiss, S.W., Van Essen, D.C.: Informatics and data mining tools and strategies for the human connectome project. Front. Neuroinformatics 5 2011

    Google Scholar 

  4. Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, S., Franklin, M., Gehrke, J., Haas, L., Halevy, A., Han, J., Jagadish, H.V., Labrinidis, A., Madden, S., Papakonstantinou, Y., Patel, J.M., Ramakrishnan, R., Ross, K., Shahabi, C., Suciu, D., Vaithyanathan, S., Widom, J.: Challenges and Opportunities with Big Data. Purdue University 2011

    Google Scholar 

  5. Sejnowski, T.J., Churchland, P.S., Movshon, J.A.: Putting big data to good use in neuroscience. Nature Neurosci. 17, 1440?1441 (2014)

    Article  Google Scholar 

  6. Hagmann, P., Jonasson, L., Maeder, P., Thiran, J.P., Wedeen, V.J., Meuli, R.: Understanding diffusion MR imaging techniques: from scalar diffusion-weighted imaging to diffusion tensor imaging and beyond. RadioGraphics 26, 205?223 (2006)

    Article  Google Scholar 

  7. Wendling, F., Ansari-Asl, K., Bartolomei, F., Senhadji, L.: From EEG signals to brain connectivity: a model-based evaluation of interdependence measures. J. Neurosci. Methods 183, 9?18 (2009)

    Article  Google Scholar 

  8. Epilepsy Foundation. http://www.epilepsyfoundation.org/aboutepilepsy/whatisepilepsy/statistics.cfm. Accessed May 3, 2016

  9. Wendling, F., Bartolomei, F., Senhadji, L.: Spatial analysis of intracerebral electroencephalographic signals in the time and frequency domain: identification of epileptogenic networks in partial epilepsy. Philos. Tansa. Maths Phys. Eng. Sci. 367, 297?316 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  10. Fisher, R.S.: Emerging antiepileptic drugs. Neurology 43, 12?20 (1993)

    Article  Google Scholar 

  11. Wagenaar, J.B., Brinkmann, B.H., Ives, Z., Worrell, G.A., Litt, B.: A multimodal platform for cloud-based collaborative research. In: Presented at the 6th International IEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA (2013)

    Google Scholar 

  12. Kemp, B., Olivan, J.: European data format ?plus? (EDF+), an EDF alike standard format for the exchange of physiological data. Clin. Neurophysiol. 114, 1755?1761 (2003)

    Article  Google Scholar 

  13. Sahoo, S.S., Wei, A., Valdez, J., Wang, L., Zonjy, B., Tatsuoka, C., Loparo, K.A., Lhatoo, S.D.: NeuroPigPen: a data management toolkit using hadoop pig for processing electrophysiological signals in neuroscience applications. Front. Neuroinformatics (2016)

    Google Scholar 

  14. Gates, A.F., Natkovich, O., Chopra, S., Kamath, P., Narayanamurthy, S.M., Olston, C., Reed, B., Srinivasan, S., Srivastava, U.: Building a high-level dataflow system on top of Map-Reduce: the Pig experience. In: 35th International Conference on Very Large Data Bases, Lyon, France, pp. 1414?1425 (2009)

    Google Scholar 

  15. Dean, J., Ghemawat, S.: MapReduce: a flexible data processing tool. Commun. ACM 53, 72?77 (2010)

    Article  Google Scholar 

  16. Friston, K.J.: Functional and effective connectivity: a review. Brain Connectivity 1, 13?36 (2011)

    Article  Google Scholar 

  17. Kramer, M.A., Cash, S.S.: Epilepsy as a disorder of cortical network organization. Neuroscientist 18, 360?372 (2012)

    Article  Google Scholar 

  18. Rogers, B.P., Morgan, V.L., Newton, A.T., Gore, J.C.: Assessing functional connectivity in the human brain by fMRI. Magn. Reson. Imaging 25, 1347?1357 (2007)

    Article  Google Scholar 

  19. Bodenreider, O., Stevens, R.: Bio-ontologies: Current trends and future directions. Briefings Bioinform. 7, 256?274 (2006)

    Article  Google Scholar 

  20. Fisher, R.S., Boas, W.E., Blume, W., Elger, C., Genton, P., Lee, P.Engel, Jr., J.: Epileptic Seizures and epilepsy: definitions proposed by the international league against epilepsy (ILAE) and the international bureau for epilepsy (IBE). Epilepsia 46, 470?472 (2005)

    Article  Google Scholar 

  21. Dean, J.: Challenges in building large-scale information retrieval systems. In: Invited Talk, ed. ACM International Conference on Web Search and Data Mining (WSDM) (2009)

    Google Scholar 

  22. Freeman, J., Vladimirov, N., Kawashima, T., Mu, Y., Sofroniew, N.J., Bennett, D.V., Rosen, J., Yang, C.T., Looger, L.L., Ahrens, M.B.: Mapping brain activity at scale with cluster computing. Nat. Methods 11, 941?950 (2014)

    Article  Google Scholar 

  23. Chen, D., Wang, L., Ouyang, G., Li, X.: Massively parallel neural signal processing on a many-core platform. Comput. Sci. Engg. 13, 42?51 (2011)

    Article  Google Scholar 

  24. Wang, L., Chen, D., Ranjan, R., Khan, S.U., KolOdziej, J., Wang, J.: Parallel processing of massive EEG data with MapReduce. presented at the ICPADS (2012)

    Google Scholar 

  25. Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv. Adapt. Data Anal. 1, 1?41 (2009)

    Article  Google Scholar 

  26. Boubela, R.N., Kalcher, K., Huf, W., Na?el, C., Moser, E.: Big data approaches for the analysis of large-scale fMRI data using apache spark and GPU processing: a demonstration on resting-state fMRI data from the human connectome project. Front. Neurosci. 9 (2016)

    Google Scholar 

  27. Guye, M., Bettus, G., Bartolomei, F., Cozzone, P.J.: Graph theoretical analysis of structural and functional connectivity MRI in normal and pathological brain networks. Magn. Reson. Mater. Phys., Biol. Med. 23, 409?421 (2010)

    Article  Google Scholar 

  28. Yang, S., Tatsuoka, C., Ghosh, K., Lacuey-Lecumberri, N., Lhatoo, S.D., Sahoo, S.S.: Comparative Evaluation for Brain Structural Connectivity Approaches: Towards Integrative Neuroinformatics Tool for Epilepsy Clinical Research. In: Presented at the AMIA 2016 Joint Summits on Translational Science, San Francisco, CA (2016)

    Google Scholar 

  29. Sahoo, S.S., Lhatoo, S.D., Gupta, D.K., Cui, L., Zhao, M., Jayapandian, C., Bozorgi, A., Zhang, G.Q.: Epilepsy and seizure ontology: towards an epilepsy informatics infrastructure for clinical research and patient care. J. Am. Med. Inform. Assoc. 21, 82?89 (2014)

    Article  Google Scholar 

  30. Hitzler, P., Krötzsch, M., Parsia, B., Patel-Schneider, P.F., Rudolph, S.: OWL 2 web ontology language primer. In: World Wide Web Consortium W3C2009

    Google Scholar 

  31. Lacuey, N., Zonjy, B., Kahriman, E.S., Marashly, A., Miller, J., Lhatoo, S.D., Lüders, H.O.: Homotopic reciprocal functional connectivity between anterior human insulae. Brain Struct. Funct. 221, 1?7 (2015)

    Google Scholar 

  32. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M., Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., Harris, M.A., Hill, D.P., Issel-Tarver, L., Kasarskis, A., Lewis, S., Matese, J.C., Richardson, J.E., Ringwald, M., Rubin, G.M., Sherlock, G.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25, 25?29 (2000)

    Article  Google Scholar 

  33. Rector, A.L., Brandt, S., Schneider, T.: Getting the foot out of the pelvis: modeling problems affecting use of SNOMED CT hierarchies in practical applications. J. Am. Med. Inform. Assoc. 18, 432?440 (2011)

    Article  Google Scholar 

  34. Köhler, S., Doelken, S.C., Mungall, C.J., et al.: The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, 966?974 (2014)

    Article  Google Scholar 

  35. Diehn, M., Sherlock, G., Binkley, G., Jin, H., Matese, J.C., Hernandez-Boussard, T., Rees, C.A., Cherry, J.M., Botstein, D., Brown, P.O., Alizadeh, A.A.: SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data. Nucleic Acids Res. 31, 219?223 (2003)

    Article  Google Scholar 

  36. Xie, H., Wasserman, A., Levine, Z., Novik, A., Grebinskiy, V., Shoshan, A., Mintz, L.: Large-scale protein annotation through gene ontology. Genome Res. 12, 785?794 (2002)

    Article  Google Scholar 

  37. Jayapandian, C., Wei, A., Ramesh, P., Zonjy, B., Lhatoo, S.D., Loparo, K., Zhang, GQ, Sahoo, S.S.: A scalable neuroinformatics data flow for electrophysiological signals using MapReduce. Front. Neuroinformatics 9 (2015)

    Google Scholar 

  38. Yildirim, P., Majnaric, L., Ekmekci, I.O., Holzinger, A.: Knowledge discovery of drug data on the example of adverse reaction prediction. BMC Bioinform. 15, S7 (2014)

    Article  Google Scholar 

  39. Holzinger, A.: Trends in interactive knowledge discovery for personalized medicine: cognitive science meets machine learning. IEEE Intell. Inf. Bull. 15, 6?14 (2014)

    Google Scholar 

  40. Preuß, M., Dehmer, M., Pickl, S., Holzinger, A.: On terrain coverage optimization by using a network approach for universal graph-based data mining and knowledge discovery. In: Ślȩzak, D., Tan, A.-H., Peters, James, F., Schwabe, L. (eds.) BIH 2014. LNCS (LNAI), vol. 8609, pp. 564?573. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09891-3_51

    Google Scholar 

  41. Holdren, J.P., Lander, E.: Realizing the full potential of health information technology to improve healthcare for americans: the path forward. PCAST Report, Washington, D.C. (2010)

    Google Scholar 

  42. Dean, D.A., Goldberger, A.L., Mueller, R., Kim, M., Rueschman, M., Mobley, D., Sahoo, S.S., Jayapandian, C.P., Cui, L., Morrical, M.G., Surovec, S., Zhang, G.Q., Redline, S.: Scaling up scientific discovery in sleep medicine: the National Sleep Research Resource. Sleep 39, 1151?1164 (2016)

    Article  Google Scholar 

  43. Lebo, T., Sahoo, S.S., McGuinness, D.: PROV-O: The PROV Ontology. World Wide Web Consortium W3C2013

    Google Scholar 

  44. Goble, C.: Position statement: musings on provenance, workflow and (semantic web) annotations for bioinformatics. In: Workshop on Data Derivation and Provenance, Chicago (2002)

    Google Scholar 

  45. Missier, P., Sahoo, S.S., Zhao, J., Goble, C., Sheth, A.: Janus: from Workflows to semantic provenance and linked open data. In: Presented at the IPAW 2010, Troy, NY (2010)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Institutes of Biomedical Imaging and Bioengineering (NIBIB) Big Data to Knowledge (BD2 K) grant (1U01EB020955) and the National Institutes of Neurological Disorders and Stroke (NINDS) Center for SUDEP Research grant (1U01NS090407-01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Satya S. Sahoo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Sahoo, S.S., Wei, A., Tatsuoka, C., Ghosh, K., Lhatoo, S.D. (2016). Processing Neurology Clinical Data for Knowledge Discovery: Scalable Data Flows Using Distributed Computing. In: Holzinger, A. (eds) Machine Learning for Health Informatics. Lecture Notes in Computer Science(), vol 9605. Springer, Cham. https://doi.org/10.1007/978-3-319-50478-0_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-50478-0_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-50477-3

  • Online ISBN: 978-3-319-50478-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics