Skip to main content
Log in

Annotating opinion—evaluation of blogs: the Blogoscopy corpus

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

The blog phenomenon is universal. Blogs are characterized by their evaluative use, in that they enable Internet users to express their opinion on a given subject. From this point of view, they are an ideal resource for the constitution of an annotated sentiment analysis corpus, crossing the subject and the opinion expressed on this subject. This paper presents the Blogoscopy corpus for the French language which was built up with personal thematic blogs. The annotation was governed by three principles: theoretical, as opinion is grounded in a linguistic theory of evaluation, practical, as every opinion is linked to an object, and methodological as annotation rules and successive phases are defined to ensure quality and thoroughness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Translation of « un type de site web composé essentiellement de billets (ou d’actualités) publiés au fil de l’eau et apparaissant selon un ordre anté-chronologique (les plus récents en haut de page), le plus souvent enrichis de liens hypertextes externes ».

  2. Evan Williams launched Pyra Labs in 1999. This company created the first platform which allows people to create their own blog (Blogger.com). http://www.useit.com/alertbox/20001001_comments.html.

  3. Wikipedia—http://www.fr.wikipedia.org/wiki/M%C3%A9dia.

  4. The enunciation is also considered as constituent of the act which consists in using the elements of the language to put them into discourse. Within the framework of a “textual linguistics”, we do not use this meaning of the term.

  5. Over-Blog is a platform of blogs, which means a tool enabling the creation of blogs. This platform is managed by the company JFG network, the industrial partner application software of the Blogoscopy project, loaded with the extraction of the textual data.

  6. For a complete typology of the modal zones, consult Galatanu (2002, pp. 17–32).

  7. http://www.nrrc.mitre.org/NRRC/publications.htm.

  8. It should be noted, however, that the set of tags used to annotate [Blogoscopy] does not differentiate between sarcastic or ironic uses and clusters under the same attribute irony. In cases where the blogger employs metaphor to express an evaluation, the form is simply tagged according to the category of evaluation to which it belongs.

  9. http://www.lina.univ-nantes.fr/Ressources.html.

References

  • Anscombre, J.-C. (1989). Théorie de l’argumentation, topoï et structuration discursive. Revue Québécoise de linguistique, 18(1), 13–56.

    Google Scholar 

  • Anscombre, J.-C., & Ducrot, O. (1983). L’argumentation dans la langue. Bruxelles: Pierre Mardaga.

    Google Scholar 

  • Banea, C., Mihalcea, R., & Wiebe J. (2008). A bootstrapping method for building subjectivity lexicons for languages with scarce resources. In Proceedings of the 6th international language resources and evaluation (LREC 2008).

  • Banfield, A. (1982). Unspeakable sentences: Narration and representation in the language of fiction. London: Routledge & Kegan Paul.

    Google Scholar 

  • Benveniste, E. (1966). Problèmes de linguistique générale. Paris: Gallimard.

    Google Scholar 

  • Benveniste, E. (1974). de linguistique générale II. Paris: Gallimard.

    Google Scholar 

  • Cardon, D., & Delaunay-Téterel, H. (2006). La production de soi comme technique relationnelle: un essai de typologie des blogs par leurs publics. Réseaux, 138, 15–71.

    Article  Google Scholar 

  • Charaudeau, P. (1983). Langage et discours. Paris: Hachette.

    Google Scholar 

  • Charaudeau, P. (1992). Grammaire du sens et de l’expression. Paris: Hachette.

    Google Scholar 

  • Devitt, A., & Ahmad, K. (2007, August). A lexicon for polarity: Affective content in financial news text. In Proceedings of language for special purposes (LSP’07), Hamburg, Germany.

  • Dubreil, E., Monceaux, L., & Vernier, M. (2009). De l’usage des évaluations dans les blogs thématiques personnels. In Proceedings of the 11th symposium on social communication, January 1922, Santiago de Cuba.

  • Fievet, C., & Turrettini, E. (2004). In Eyrolles (Eds.), Blog story.

  • Fleiss, J. (1971). Measuring nominal scale agreement among many raters. Psycological Bulletin, 76(5), 378–382.

    Article  Google Scholar 

  • Fourour, N., & Morin, E. (2003). Apport du web dans la reconnaissance des entités nommées. Revue Québécoise de Linguistique (RQL), 32(1), 41–60.

    Google Scholar 

  • Galatanu, O. (2002) Le concept de modalité: les valeurs dans la langue et dans le discours. In Proceedings Les valeurs, Séminaire Le lien social (pp. 17–32).

  • Galatanu, O. (2005). La sémantique des modalités et ses enjeux théoriques et épistémologiques dans l’analyse des textes. In J. M. Gouvard (Ed.), De la langue au style (pp. 157–170). Paris: Presses Universitaires de Lyon.

    Google Scholar 

  • Galatanu, O. (2006). La dimension axiologique de la dénomination, In M. Riegel, C. Schnedecker, P. Swiggers, & I. Tamba (Eds.), Aux carrefours du sens (pp. 499–510). Hommages offerts à Georges Kleiber, Louvain, Peeters.

  • Hu, M., & Liu, B. (2004). Mining and summarising customer reviews. In Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining (KDD) (pp. 168–177).

  • Jakobson, R. (1963). Essais de linguistique générale. Paris: Edition de Minuit.

    Google Scholar 

  • Kerbrat-Orecchioni, C. (1997) L’Énonciation, de la subjectivité dans le langage. Paris: Colin (réédition 2002).

  • Kessler, J., & Nicolov, N. (2009). Targeting sentiment expressions through supervised ranking of linguistic configurations. In Proceedings of the 3rd international AAAI conference on weblogs and social media (ICWSM 2009).

  • Kim, S.-M., & Hovy, E. H. (2004). Determining the sentiment of opinions. In Proceedings of the 20th international conference on computational linguistics (COLING ‘04), Geneva, Switzerland.

  • Kobayashi, N., Kentaro, I., & Matsumoto, Y. (2007). Extracting aspect-evaluation and aspect-of relations in opinion mining. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CoNLL 2007) (pp. 1065–1074), Prague, Czech Republic.

  • Legallois, D., & Ferrari, S. (2006). Vers une grammaire de l’évaluation des objets culturels. In: Schedae, 2006, fascicule n°1. Actes du colloque international discours et document, ISDD06, Caen, 15 et 16 juin 2006 (pp. 57–68). Presses universitaires de Caen. prépublication n°8.

  • Liu, B. (2010) Sentiment analysis and subjectivity. In Handbook of natural language processing (2nd ed.).

  • Maingueneau, D. (1987). Nouvelles tendances en analyse du discours. Paris: Hachette.

    Google Scholar 

  • Maingueneau, D. (1990). Pragmatique pour le discours littéraire. Paris: Nathan.

    Google Scholar 

  • Maingueneau, D. (1991). L’Analyse du discours. Paris: Hachette.

    Google Scholar 

  • Maingueneau, D. (1995). « Présentation » du numéro 117 de Langages, mars 1995, “Les analyses du discours en France” (pp. 5–12).

  • Maingueneau, D. (1996). In Moirand, S. (éd.), L’analyse du discours en France aujourd’hui (pp. 8–15).

  • Martin, J., & White, P. (2005). The language of evaluation, appraisal in English. London, New York: Palgrave Macmillan.

    Google Scholar 

  • Mishne, G. (2006). Multiple ranking strategies for opinion retrieval in blogs. In Proceedings of the text retrieval conference (TREC 2006).

  • MUC-6. (1995). In Proceedings of the 6th message understanding conference. Columbia, MD: Morgan Kauffmann.

  • Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1–2), 1–135.

    Article  Google Scholar 

  • Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. In Proceedings of the conference on empirical methods in natural language processing (EMNLP) (pp. 79–86).

  • Pêscheux, M., & Fuchs, C. (1975). Mise au point et perspective à propos de l’analyse du discours. Langages, 37, 7–80.

    Article  Google Scholar 

  • Popescu, A.-M., & Etzioni, O. (2005). Extracting product features and opinions from reviews. In Proceedings of the conference on human language technology and empirical methods in natural language processing (HLT-EMNLP 2005) (pp. 339–346), Vancouver, BC.

  • Quirk, R., Greenbaum, S., Leech, G., & Svartvik, J. (1985). A comprehensive grammar of the English language. London: Longman.

    Google Scholar 

  • Rastier, F. (2001). Arts et sciences du texte. Paris: Presses Universitaires de France.

    Google Scholar 

  • Riloff, E., & Wiebe, J. (2003). Learning extraction patterns for subjective expressions. In Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP-03).

  • Torres-Moreno, J.-M., El-Bèze, M., Béchet, F., & Camelin, N. (2007). Comment faire pour que l’opinion forgée à la sortie des urnes soit la bonne? In Application au défi fouille de textes 2007, DEFT07 (pp. 119–133), AFIA 2007, Grenoble, France.

  • Turney, P. D. (2002). Thumbs up or thumbs down? Semantic orientation applied to supervised classification of reviews. In Proceedings of the 40th annual meeting of the association for computational linguistics, Philadelphia.

  • Wiebe, J., Wilson, T., & Cardie, C. (2006). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39(2–3), 165–210.

    Google Scholar 

  • Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003 conference on empirical methods in natural language processing (EMNLP 2003).

Download references

Acknowledgments

We are grateful to the Syllabs’ coders: Helena Blancafort, Sandra Goncalves, Marguerite Leenhardt. This work was supported by the French National Research Agency (ANR) under grant number ANR-06-TLOG-028.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Béatrice Daille.

Appendix: Example of an annotated post

Appendix: Example of an annotated post

<?xml version=“1.0” encoding=“UTF-8”?>

<!DOCTYPE page SYSTEM “../pagev4.dtd”>

<page mes_blog_rank=“84” mes_mediametrie=““ tags_blog=“xbox old-gen” thematique=“Wii” url=“http://www.hoaxgames.net/”>

<billet age=““ auteur=“Olivier &amp; Maxence” id_b=“B1020329120” profession=““ url=“http://www.hoaxgames.net/article-13982542.html“ orthographe=“standard” syntaxe=“correcte”>

<date>2007-11-21 21:28:00</date>

<titre><IA cc=“console”>Wii</IA>LE<CA cc=“C1”>CADEAU</CA><Appreciation type=“PIA” forme=“cadeau, Wii”>LE PLUS EN VOGUE</Appreciation>A<IA cc=“C2”>NOEL</IA></titre>

<texte>

<partie organisation=“narratif”>

[Techno.branchez-vous.com] La<CC id_c=“C1”>console</CC>Wii de<IA cc=“console”>Nintendo</IA>serait encore une fois, cette année, le<CA cc=“C1”>cadeau de Noël</CA>le plus demandé. Le président de<IA cc=“console”>Nintendo America</IA>prévoit même des<CA cc=“C1, Wii”>ruptures de stock</CA>aux<IA cc=“régions du monde”>États-Unis</IA>et dans d’autres<CC id_c=“C3”>régions du monde</CC>. Même si on peut encore trouver la Wii dans plusieurs<CA cc=“C1, Wii”>magasins</CA>,<Opinion type=“Medium_Supposition_Certitude” forme=“Wii”>elle risque vite</Opinion>de devenir<Appreciation type=“PIA” forme=“Wii”>introuvable</Appreciation>d’ici le temps des<CC id_c=“C2”>fêtes</CC>même si Nintendo en produit 1,8 million par mois. Selon<IA cc=“analyste”>Gerrick Johnson</IA>, un<CA cc=“C4”>analyste</CA>de l’<CC id_c=“C4”>industrie du jouet</CC>chez<IA cc=“industrie du jouet”>BMO Capital Markets</IA>, plus personne n’achète de<CA cc=“C4”>jouets</CA>aux États-Unis en raison des<CA cc=“C4”>multiples rappels</CA>. En effet, des milliers de jouets ont été rappelés à cause de la présence de<CA cc=“C4”>peinture au plomb</CA>. À cause de cela, les<CA cc=“C4”>gens</CA><Appreciation=“PIA” forme=“jouets”>ont perdu confiance</Appreciation>dans les jouets.[…]

</partie>

</texte>

</billet>

</page>

Rights and permissions

Reprints and permissions

About this article

Cite this article

Daille, B., Dubreil, E., Monceaux, L. et al. Annotating opinion—evaluation of blogs: the Blogoscopy corpus. Lang Resources & Evaluation 45, 409–437 (2011). https://doi.org/10.1007/s10579-011-9154-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-011-9154-z

Keywords

Navigation