Framework for Data Processing

Osherenko, Alexander

doi:10.1007/978-1-4471-6260-5_5

Alexander Osherenko⁴

Part of the book series: Human–Computer Interaction Series ((HCIS))

782 Accesses
2 Altmetric

Abstract

Until now, this book formed a basis for an approach to building intercultural social simulation: it described the purposes of this book and what approaches exist. Different scenarios of social interaction social simulation were introduced. Data that can be used in intercultural experiments was discussed. However, something is still missing, namely a robust framework that relies on these findings and implements flexible prototypes of social systems. Such framework would, for example, compose social systems that realize required simulation behavior and tackle shortcomings of existing approaches. This chapter describes the framework for statistical processing and prototyping, SocioFramework. Moreover, it presents additional findings focusing on intercultural processing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Diaz-Agudo, B., Recio-García, J. A., & Gonzalez-Calero, P. A. (2007). Natural language queries in CBR systems. In 19th IEEE international conference on tools with artificial intelligence, ICTAI 2007 (Vol. 2, pp. 468–472). doi:10.1109/ICTAI.2007.27.
Chapter Google Scholar
Elidan, G., Ninio, M., Friedman, N., & Schuurmans, D. (2002). Data perturbation for escaping local maxima in learning. In Eighteenth national conference on artificial intelligence (pp. 132–139). Menlo Park: American Association for Artificial Intelligence. ISBN:978-0-262-51129-0. http://dl.acm.org/citation.cfm?id=777092.777116.
Google Scholar
Elliott, J., Eckstein, R., Loy, M., Wood, D., & Cole, B. (2002). Java swing, second edition (2nd ed.). New York: O’Reilly Media. ISBN:978-0-596-00408-8. http://amazon.com/o/ASIN/0596004087/.
Google Scholar
François, J. M. (2012). JAHMM. An implementation of hidden Markov models in Java. https://code.google.com/p/jahmm/.
Friedl, J. E. F. (2006). Mastering regular expressions (3rd ed.). New York: O’Reilly Media. ISBN:978-0-596-52812-6. http://amazon.com/o/ASIN/0596528124/.
Google Scholar
Hall, M. A. (1999). Correlation-based feature selection for machine learning. PhD thesis, Department of Computer Science, The University of Waikato.
Google Scholar
Joachims, T. (1999). Making large-scale support vector machine learning practical. In B. Schölkopf, C. J. C. Burges & A. J. Smola (Eds.), Advances in kernel methods. (pp. 169–184). Cambridge: MIT Press. 0-262-19416-3. http://dl.acm.org/citation.cfm?id=299094.299104.
Google Scholar
Juang, B. H., & Rabiner, L. R. (1985). A probabilistic distance measure for hidden Markov models. AT&T Technical Journal, 64(2), 391–408. http://citeseer.ist.psu.edu/context/244209/0.
Article MathSciNet Google Scholar
Mitchell, T. M. (1997). Machine learning (1st ed.). New York: McGraw-Hill Science/Engineering/Math. ISBN:978-0-070-42807-2. http://amazon.com/o/ASIN/0070428077/.
MATH Google Scholar
Osherenko, A. (2011). Opinion mining and lexical affect sensing: computer-aided analysis of opinions and emotions in texts. Berlin: Südwestdeutscher Verlag für Hochschulschriften. ISBN:978-3-838-12488-9. http://amazon.de/o/ASIN/383812488X/.
Google Scholar
Petri, J. (2010). Netbeans platform 6.9 developer’s guide. New York: Packt. ISBN:978-1-849-51176-6. http://amazon.com/o/ASIN/1849511764/.
Google Scholar
Ray, E. T. (2003). Learning XML (2nd ed.). New York: O’Reilly Media. ISBN:978-0-596-00420-0. http://amazon.com/o/ASIN/0596004206/.
Google Scholar
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the international conference on new methods in language processing, Manchester, UK.
Google Scholar
Witten, I., & Frank, E. (2005). Data mining: practical machine learning tools and techniques (2nd ed.). San Francisco: Morgan Kaufmann.
Google Scholar

Download references

Author information

Authors and Affiliations

Socioware Development, Augsburg, Germany
Alexander Osherenko

Authors

Alexander Osherenko
View author publications
You can also search for this author in PubMed Google Scholar

Appendices

Appendix A: The HMM Classifier

This appendix shows an implementation of an HMM Classifier that relies on JAHMM and is compatible with WEKA:

public class JAHMM extends AbstractClassifier {

//number of observations in case of emotional E/A

//space - 5

protected int m_NumClasses;

//number of states in case of emotional E/A space - 5

protected int m_States;

//the number of the sequence attribute

protected int m_SeqAttr = -1;

//0 -- k-means, 1 -- Baum-Welch, 2 -- scaled Baum-Welch

protected int m_LearningMethod = 0;

//Builds the current classifier. m_SeqAttr specifies

//the sequential attribute.

@Override

public void buildClassifier(Instances data) throws

Exception {

...

//build HMM

OpdfIntegerFactory factory =

new OpdfIntegerFactory(m_NumClasses);

Hmm<ObservationInteger> hmm =

new Hmm<ObservationInteger>(m_States, factory);

hmm.getOpdf(0).fit(new ObservationInteger(4));

List<List<ObservationInteger>> sequences;

sequences = extractSequences(data);

switch (m_LearningMethod) {

case 0:

// KMeans learning

KMeansLearner<ObservationInteger> kml =

new KMeansLearner<ObservationInteger>

(m_NumClasses, factory, sequences);

learntHmm_ = kml.learn();

break;

case 1:

/* Baum-Welch learning */

BaumWelchLearner bwl = new BaumWelchLearner();

learntHmm_ = bwl.learn(hmm, sequences);

break;

case 2:

BaumWelchScaledLearner bwsl =

new BaumWelchScaledLearner();

learntHmm_ = bwsl.learn(hmm, sequences);

break;

}

//Classifies the given test instance

@Override

public double classifyInstance(Instance instance) throws

Exception {

Instances seq = instance.relationalValue(m_SeqAttr);

//extracts a training sequence from the given instance

List<ObservationInteger> sequence =

extractSequenceFromInstance(seq.instance(0));

int[] states =

learntHmm_.mostLikelyStateSequence(sequence);

double bestClass = states[states.length - 1];

return bestClass;

}

} //endclass

Appendix B: The ARFF Wrapper

This appendix presents the Java source of a wrapper that maintains a WEKA-compatible classifier and ARFF data in the prototypes of SS systems. Note that the wrapper uses names of the dataset files to extract and evaluate statistical features for identifying necessary POS processing and lemmatization done by TreeTagger (Schmid 1994). For example, TreeTagger assumes that POS tagging is necessary for evaluating features in a dataset with the name containing string _grammar_:

package coreEmotionalEngine.emotext;

//imports emotext

//import TreeTagger

//imports java

//imports WEKA

public class ARFFWrapper {

//a base classifier used for analyzing texts

private Classifier c_ = null;

//name of the dataset file

private String datasetName_ = null;

//WEKA instances used for training and testing

protected Instances instances_ = null;

//interface to evaluate statistical features

private IFeatureEvaluation fe_ = null;

public Classifier buildClassifier(Classifier clsr,

String datasetName) throws Exception {

...

return classifier;

}

public ARFFWrapper(Classifier c, TreeTagger tagger,

String datasetName) {

treetagger_ = tagger;

try {

if (datasetName.contains("fusion") &&

(datasetName.endsWith(".spec"))) {

//build a fusion classifier

...

else {

c_ = buildClassifier(c, datasetName);

datasetName_ = datasetName;

}

} catch (Exception e) {

e.printStackTrace();

}

public double classifyInstance(Instance instance) throws

Exception {

return c_.classifyInstance(instance);

}

public Instance buildInstance(String text) {

Instance i = null;

if (datasetName_.contains("_fusion")) {

//a spec is found that identifies part datasets

//used for fusion

i = buildFusedInstance(text);

} else if (datasetName_.contains("_lexical_")) {

//build a lexical instance of processed data

i = buildLexicalInstance(text);

} else if (datasetName_.contains("_stylometry_")) {

//build a stylometric instance of processed data

i = buildStylometricInstance(text);

} else if (datasetName_.contains("_grammar_")) {

//build a grammatical instance of processed data

i = buildGrammarInstance(text);

} else if (datasetName_.contains("_deixis_")) {

//build a deictic instance of processed data

i = buildDeixisInstance(text);

} else {

assert (false);

}

return i;

}

private Instance buildFusedInstance(String text) {

...

return instance;

}

private Instance buildDeixisInstance(String text) {

...

return instance;

}

private Instance buildGrammarInstance(String text) {

...

return instance;

}

private Instance buildLexicalInstance(String text) {

...

return instance;

}

private Instance buildStylometricInstance(String text) {

...

return instance;

}

To build statistical instances, the ARFFWrapper class references variable fe_ that refers to the IFeatureEvaluation interface maintaining feature evaluation:

package coreEmotionalEngine.emotext;

public interface IFeatureEvaluation {

public abstract double value(double original);

}

Three implementations of this interface are available: the presence evaluation that evaluates features according to their presence in the analyzed text as 1 or 0 (PresenceFeatureEvaluation), the inverse evaluation that evaluates a feature as a reciprocal frequency value (InverseFeatureEvaluation), or the frequency evaluation that evaluates a feature as a frequency value (FrequencyFeatureEvaluation). See (Osherenko 2011, p. 80) for details of feature evaluation.

Appendix C: Storing Configuration

This appendix shows a configuration file used to store the parameters of SocioFramework (	 is interpreted by the XML engine in Java as a tab character; 
 as a linebreak; " as a quotation mark):

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Osherenko, A. (2014). Framework for Data Processing. In: Social Interaction, Globalization and Computer-Aided Analysis. Human–Computer Interaction Series. Springer, London. https://doi.org/10.1007/978-1-4471-6260-5_5

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6260-5_5
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6259-9
Online ISBN: 978-1-4471-6260-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Framework for Data Processing

Abstract

Access this chapter

References

Author information

Authors and Affiliations

Appendices

Appendix A: The HMM Classifier

Appendix B: The ARFF Wrapper

Appendix C: Storing Configuration

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation