Skip to main content

Advertisement

Log in

Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Over the past years, several approaches have been developed to create algorithmic music composers. Most existing solutions focus on composing music that appears theoretically correct or interesting to the listener. However, few methods have targeted sentiment-based music composition: generating music that expresses human emotions. The few existing methods are restricted in the spectrum of emotions they can express (usually to two dimensions: valence and arousal) as well as the level of sophistication of the music they compose (usually monophonic, following translation-based, predefined templates or heuristic textures). In this paper, we introduce a new algorithmic framework for autonomous music sentiment-based expression and composition, titled MUSEC, that perceives an extensible set of six primary human emotions (e.g., anger, fear, joy, love, sadness, and surprise) expressed by a MIDI musical file and then composes (creates) new polyphonic (pseudo) thematic, and diversified musical pieces that express these emotions. Unlike existing solutions, MUSEC is: (i) a hybrid crossover between supervised learning (SL, to learn sentiments from music) and evolutionary computation (for music composition, MC), where SL serves at the fitness function of MC to compose music that expresses target sentiments, (ii) extensible in the panel of emotions it can convey, producing pieces that reflect a target crisp sentiment (e.g., love) or a collection of fuzzy sentiments (e.g., 65% happy, 20% sad, and 15% angry), compared with crisp-only or two-dimensional (valence/arousal) sentiment models used in existing solutions, (iii) adopts the evolutionary-developmental model, using an extensive set of specially designed music-theoretic mutation operators (trille, staccato, repeat, compress, etc.), stochastically orchestrated to add atomic (individual chord-level) and thematic (chord pattern-level) variability to the composed polyphonic pieces, compared with traditional evolutionary solutions producing monophonic and non-thematic music. We conducted a large battery of tests to evaluate MUSEC’s effectiveness and efficiency in both sentiment analysis and composition. It was trained on a specially constructed set of 120 MIDI pieces, including 70 sentiment-annotated pieces: the first significant dataset of sentiment-labeled MIDI music made available online as a benchmark for future research in this area. Results are encouraging and highlight the potential of our approach in different application domains, ranging over music information retrieval, music composition, assistive music therapy, and emotional intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Similar content being viewed by others

Notes

  1. Simplest form of musical textures where only one note is played at a time, in contrast with polyphonic music where more than one note is played simultaneously.

  2. Musical Instrument Digital Interface: a digital music format designed for symbolic music representation and processing by computers.

  3. A fuzzy classifier is a classifier which assigns membership scores to input data objects, producing fuzzy categories with fuzzy boundaries, such that an object, e.g., a musical piece, can be part of one category and the other at the same time (e.g., 80% excitement and 20% fear), in contract with traditional crisp classifiers which categorize data in crisp/distinct categories (Kotsiantis 2007). In our current system, we utilize fuzzy k-NN due to its flexibility and effectiveness, yet any other fuzzy classifier could be used, e.g., (Abu 2017; Abu et al. 2016; Amin et al. 2018; Fahmi et al. 2017, 2018, 2019).

  4. A chord is a combination of 3 or more notes (cf. Sect. 2).

  5. http://www.lau.edu.lb/news-events/news/archive/music_composers_face_off_with_/. Details are provided in Sect. 8.

  6. Available online at: http://sigappfr.acm.org/Projects/MUSEC, including MUSEC synthetic compositions and all experimental results.

  7. A fundamental frequency is the lowest frequency produced by the oscillation of an object. In music, it is perceived as the lowest partial (simple tone) present that is distinct from the harmonics of higher frequency. In the remainder of this paper, terms frequency and fundamental frequency will be used interchangeably, unless explicitly stated otherwise.

  8. Music produced following the traditions of Western (European) culture, compared with Oriental (Byzantine, Mizrahi, or Asian) music.

  9. Also referred to in the literature as affective music composition.

  10. With respect to.

  11. Also known as.

  12. Support vector machine.

  13. k-nearest neighbor.

  14. An ANN with several hidden layers between the input and the output layers is called a deep neural network or a deep learner.

  15. It is a machine learning approach which allows the learning of a function that maps an input (e.g., musical piece) to an output (e.g., sentiment category or sentiment score) based on sample input–output pairs, so-called labeled training data, where each sample pair consists of a given input object (e.g., a music feature vector) and a desired output value (e.g., a sentiment category or a sentiment score). The produced mapping function is an approximation of the true mapping function between the sample training pairs (Kotsiantis 2007).

  16. An evolutionary algorithm can be defined as a population-based metaheuristic optimization algorithm, which uses mechanisms inspired by biological evolution, such as reproduction, mutation, crossover, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. The evolution of the population then takes place after the repeated application of the above operators (Goldberg 1989; Whitley and Sutton 2012).

  17. More features could be later added following the user’s needs.

  18. To determine the dominant key, a chroma histogram for the input music file is first computed, denoting the percentage of total piece duration in which every chroma can be heard. The histogram is later used to compute likelihood scores using Temperley’s key profiles (Temperley 2002). A Bayesian Approach. The key with the highest score is finally selected as the dominant key (Temperley 2002).

  19. Dominant key misidentification can occasionally occur, particularly for pieces where modulations occur very frequently and for atonal music (Temperley 2002; Kyogu 2008) (e.g., modern music which does not abide by a fixed key).

  20. Note that 100% accuracy in chord progression identification is difficult to obtain due to the very nature of chord progressions: where i) the same chord progression can be played in so many different ways while still portraying the same musical structure, and ii) it can be often difficult to separate between consecutive chords since notes are sometimes combined between them. Our heuristic performs accurately on relatively simple music where there is a clear chord structure, and a clear separation between chords with no rapid transitions between them.

  21. It requires O(n × m log(n + m)) where n and m designate the number of chords in the two pieces (chord progressions sequences) being processed.

  22. Consider two chord progression sequences A and B, consisting of chords A1, A2, …, Am and B1, B2, …, Bn, respectively. Without loss of generality, consider the case where m < n. Following the standard TPSD algorithm in Ayadi et al. (2016), the shorter sequence is compared with the longer one at every position, e.g., A1, …, Am versus B1,…,Bm, then A1, …, Am versus B2,…,Bm+ 1, and so forth until A1, …, Am versus Bnm,…,Bn. Then the comparison yielding the smallest difference is selected as the final similarity (or distance) value. With the more efficient version of the TPSD algorithm in Bas De Haas et al. (2013), the chord progression sequences are only compared from their starting positions, e.g., A1, …, Am is only compared with B1,…,Bm, and that score is utilized as the chord progressions similarity (distance) score. Despite this linear relaxation of the original algorithm, TPSD computation remains the most expensive among all other feature similarity computations put together (cf. experiments in Sect. 6.2.2).

  23. To the expense of a potential loss of precision when processing long musical pieces (consisting of a large chord progression sequences).

  24. Available online at: http://sigappfr.acm.org/Projects/MUSEC, SL survey form #1 (first part, 24-pieces), #2 (second part, 8-pieces), and #3 (third part, 8-pieces), along with the resulting sentiment-labeled dataset.

  25. In our current implementation of MC, we hard-coded the chord probability distribution (through which a chord is selected) based on empirical sampling from our training set. Yet, learning the chord probability distribution can be a research project in and of itself, and can entail different composition styles. For instance, the distribution could be learned from a composer’s composition corpus, to produce pieces following the composer’s own style (which we further discuss as an ongoing work in Sect. 8).

  26. Randomness is guided by MUSEC’s KB music-theoretic rules.

  27. Emphasizing sentiment expression, while also promoting diversity.

  28. Pearson correlation coefficient. Note that any other vector similarity measure (such as cosine or dice) could have been used. We adopt PCC here since it is commonly utilized in the literature (Abbasi et al. 2008; O’Connor et al. 2010).

  29. We consider this strategy to be similar to the way some human composers usually write music: producing multiple candidate (trial) pieces, slicing and mixing them up, developing them and making them evolve until reaching a final pool of best candidates, from which the single best candidate is usually adopted as the actual final piece.

  30. We adopted a ratio R = 0.7 in our current study, so that 70% of the offspring would be subject to fitness trimming, whereas only 30% would undergo variability trimming.

  31. Available online at: http://sigappfr.acm.org/Projects/MUSEC.

  32. Note that the number of beats in a piece is naturally less than the number of notes. While there is no straightforward relationship between the two, they can be paralleled to sentences and words in flat text: where beats represent music sentences, and notes represent the sentences’ words. In our sample test dataset of 100 pieces, the number of beats was on average 4-to-8 times less than the number of notes.

  33. PCC = δXY/(δX × δY) where: x and y designate user and system generated similarity values, respectively, δX and δY denote the standard deviations of x and y, respectively, and δXY denotes the covariance between the x and y variables. The values of PCC ∈ [− 1, 1] such that: − 1 designates that one of the variables is a decreasing function of the other variable (i.e., music pieces deemed similar by human testers are deemed dissimilar by the system, and vice versa), 1 designates that one of the variables is an increasing function of the other variable (i.e., pieces are deemed similar/dissimilar by human testers and the system alike), and 0 means that the variables are not correlated.

  34. MSE, computed as an average Euclidian distance measure, is a good indication of how close similarity scores are to human ratings: one by one (for every pair of pieces), whereas PCC compares the behavior of the vector of similarity ratings (for all pairs or pieces) as a whole.

  35. Available online at: http://sigappfr.acm.org/Projects/MUSEC.

  36. http://sigappfr.acm.org/Projects/MUSEC, SL survey form #1 (first part, 24-pieces), #2 (second part, 8-pieces), and #3 (third part, 8-pieces).

  37. While we could have asked the testers to provide a confidence score associated with every sentiment score, yet, we felt this would complicate things for non-expert testers, especially that our objective was to capture their inherent feelings when listening to the music pieces, rather than have them “rationalize” their ratings by adding confidence scores. Nonetheless, considering tester rating confidence is an interesting factor that we plan to evaluate in a future study.

  38. With the 100-piece training set, the system had “less” to learn since it was training on a more or less homogeneous training set, and thus over-fitted w.r.t. the well represented sentiments, namely joy and sadness, but was less successful in inferring less represented sentiments like anger and fear.

  39. To help illustrate this concept, let’s consider the following example, consisting of three vectors: V1 = (0.8, 0.6), V2 = (0.95, 0.45), and V3 = (0.65, 0.75). Let V1 be our target vector and let V2 and V3 be our system estimate vectors. Upon first inspection, it is obvious that V2 is a better representative of V1 than V3, since it more or less exhibits the same behavior as V1 (higher first term). This similarity in behavior is visible through PCC, where PCC(V1, V2) = 1 and PCC(V1, V3) = − 1. However with MSE, we obtain MSE(V1,V2) = MSE(V1,V3) = 0.0225. This shows that MSE is only a good indication of how close scores are to target sentiments one by one, while PCC reflects the overall similarity of a predicted sentiment vector to the target vector as a whole.

  40. The Turing test was proposed by Alan Turing in 1950, designed to test the ability of a machine to exhibit intelligent behavior that is equivalent to or indistinguishable from that of a human. It was originally used to evaluate machines mimicking human conversation (originally referred to as the “imitation game”). A machine passes the Turing test if, after a number of questions, the human tester (asking questions) cannot know if the answers come from a human or a machine (Epstein et al. 2009).

  41. Anthony Bou Fayad is a processional composer, pianist, and music instructor in the Antonine University’s School of Music, located in Baabda, Mont Lebanon. He also holds a Master’s of Computer Engineering, specializing in multimedia data processing, which allowed him to easily understand the context and purpose of our study, helping us set up the experimental process. Mr. Bou Fayad was partly remunerated for his efforts, mainly for playing and digitally recording all pieces, while volunteering his consulting services.

  42. Some music composition systems provide sample pieces online, e.g., (Diaz-Jerez 2011; SACEM 2016), yet none of them are sentiment-based.

  43. Using a population size S = 50, a generation size N varying between 50 and 80, a branching factor B = 10 and a fitness-to-variability ratio R = 0.7. All mutation probabilities were set to 0.1.

  44. Available online at: http://sigappfr.acm.org/Projects/MUSEC, MC survey forms #1-to-#10.

  45. http://www.conservatory.gov.lb/disciplines/discipline/21.

  46. http://comm.lau.edu.lb/joseph-khalife.

  47. Recall that states where both valence and arousal dimensions converge (e.g., both valence and arousal are high, or both are low) occur more often than states were they diverge, indicating a potential bias or ambiguity in the model (as stated by the model’s creator in (Russell 1980)).

  48. https://sv.wikipedia.org/wiki/Jean-Marie_Riachi.

  49. http://www.lau.edu.lb/news-events/news/archive/music_composers_face_off_with_/. The event included an active participation from a live audience of LAU students, faculty, staff, and friends, who helped rate MUSEC’s compositions and evaluate its sentiment scoring accuracy.

References

  • Abbasi A, Chen H, Thoms S, Fu T (2008) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180

    Google Scholar 

  • Abboud R, Tekli J (2018) MUSE prototype for music sentiment expression. In: IEEE international conference on cognitive computing (ICCC'18). San Francisco, pp 106–109

  • Abu AO (2017) Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm–Volterra integrodifferential equations. Neural Comput Appl 28(7):1591–1610

    Google Scholar 

  • Abu AO, Abo-Hammour ZS (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415

    MathSciNet  MATH  Google Scholar 

  • Abu AO, Al-Smadi M, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method. Soft Comput 20(8):3283–3302

    MATH  Google Scholar 

  • Adiloglu K, Alpaslan FN (2007) A Machine Learning Approach to Two-Voice Counterpoint Composition. Knowl-Based Syst 20(3):300–309

    Google Scholar 

  • Amin F, Fahmi A, Abdullah S, Ali A, Ahmed R, Ghani F (2018) Triangular cubic linguistic hesitant fuzzy aggregation operators and their application in group decision making. J Intell Fuzzy Syst 34(4):2401–2416

    Google Scholar 

  • Ayadi MG, Bouslimi R, Akaichi J (2016) A medical image retrieval scheme with relevance feedback through a medical social network. Soc Netw Anal Min 6(1):53:1–53:23

    Google Scholar 

  • Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. ACM Press Books, New York

    Google Scholar 

  • Barrett FS, Grimm KJ, Robins RW, Wildschut T, Sedikides C (2010) Music-evoked nostalgia: affect, memory, and personality. Emotion 10(3):390–403

    Google Scholar 

  • Bas De Haas W, Veltkamp RC, Wiering F (2008) Tonal pitch step distance: a similarity measure for chord progressions. In: International society of music information retrieval (ISMIR), pp 51–56

  • Bas De Haas W, Wiering F, Veltkamp RC (2013) A geometrical distance measure for determining the similarity of musical harmony. Int J Multimed Inf Retr 2(3):189–202

    Google Scholar 

  • Berrett LF (2017) How emotions are made: the secret life of the brain. Macmillan, London

    Google Scholar 

  • Boden MA (1994) Precis of the creative mind: myths and mechanisms. Behav Brain Sci 17(3):519–570

    MathSciNet  Google Scholar 

  • Bradley M, Lang P (1999) Affective norms for English Words (ANEW): instruction manual and affective ratings. Technical report C-1, Center for Research in Psychophysiology, University of Florida

  • Burton AR (1998) A hybrid neuro-genetic pattern evolution system applied to musical composition. Ph.D. thesis, University of Surrey, UK

  • Cai Z, Hu H (2018) Session-aware music recommendation via a generative model approach. Soft Comput 22(3):1023–1031

    MATH  Google Scholar 

  • Cao Y, Jia L, Chen Y, Lin N, Yang C, Zhang B, Liu Z, Li X, Dai H (2019) Recent advances of generative adversarial networks in computer vision. IEEE Access 7:14985–15006

    Google Scholar 

  • Carnie A (2013) Syntax: a generative introduction, 3rd edn. Wiley, Malden

    Google Scholar 

  • Chen Y, Garcia E, Gupta M, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776

    MathSciNet  MATH  Google Scholar 

  • Chivadshetti P, Sadafale K, Thakare K (2015) Content based video retrieval using integrated feature extraction and personalization of results. In: International conference on information processing (ICIP’15). https://doi.org/10.1109/infop.2015.7489372

  • Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  • Costa Y, Oliveira L, Silla C Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38

    Google Scholar 

  • Danhauser A (1994) Theory of music (French). Henri Lemoine, Paris (original edition published in 1950)

    Google Scholar 

  • Dell AG, Newton DA, Petroff JG (2011) Assistive technology in the classroom: enhancing the school experiences of students with disabilities, 2nd edn. Pearson, New Delhi

    Google Scholar 

  • Demopoulos RJ, Katchabaw MJ (2007) Music information retrieval: a survey of issues and approaches. Technical report #677, Department of Computer Science, University of Western Ontario

  • Di Nunzio A (2014) Illiac suite for string quartet. http://www.musicainformatica.org/topics/illiac-suite.php. Accessed July 2017

  • Diaz-Jerez G (2011) Composing with melomics: delving into the computational world for musical inspiration. MIT Press J 21:3–14

    Google Scholar 

  • Dubois RL (2003) Applications of generative string-substitution systems in computer music. Ph.D. dissertation, Columbia University

  • Ekman P (1993) Facial expression of emotion. Am Psychol 48:384–392

    Google Scholar 

  • Epstein R, Roberts G, Beber G (2009) Parsing the Turing test: philosophical and methodological issues in the quest for the thinking computer, 2009th edn. Springer, Berlin

    MATH  Google Scholar 

  • Fahmi A, Abdullah S, Amin F, Siddiqui N, Ali A (2017) Aggregation operators on triangular cubic fuzzy numbers and its application to multi-criteria decision making problems. J Intell Fuzzy Syst 33(6):3323–3337

    Google Scholar 

  • Fahmi A, Abdullah S, Amin F, Ali A, Khan WA (2018) Some geometric operators with triangular cubic linguistic hesitant fuzzy number and their application in group decision-making. J Intell Fuzzy Syst 35(2):2485–2499

    Google Scholar 

  • Fahmi A, Abdullah S, Amin F, Sajjad Ali Khan M (2019) Trapezoidal cubic fuzzy number Einstein hybrid weighted averaging operators and its application to decision making. Soft Comput 24(14):5753–5783

    MATH  Google Scholar 

  • Fernandez J, Vico F (2013) AI methods in algorithmic composition: a comprehensive survey. J Artif Intell Res 48:513–582

    MathSciNet  Google Scholar 

  • Fleischman MB, Deb KR (2013) Displaying estimated social interest in time-based media. U.S. patent no. 8,516,374

  • Freeman J (2015) Survey of music technology. Coursera. https://www.coursera.org/learn/music-technology. Accessed Jul 2017

  • Ghosh A, Strehl J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617

    MathSciNet  MATH  Google Scholar 

  • Gkonou C, Mercer S (2017) Understanding emotional and social intelligence among English language teachers. ELT research papers 17.03, British Council, ISBN 978-0-86355-842-9

  • Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading

    MATH  Google Scholar 

  • Goleman D (2005) Emotional intelligence: why it can matter more than IQ, Bantam Books 10th Anniversary edn. Bloomsbury Publishing, London

    Google Scholar 

  • Hauger D, Schedl M, Kosir A, Tkalcic M (2013) The million musical tweet dataset: what we can learn from microblogs. In: Proceedings of the 14th international society for music information retrieval conference (ISMIR’13)

  • Hevner K (1935) The affective character of the major and minor modes in music. Am J Psychol 47(1):03–118

    Google Scholar 

  • Hiller L (1970) Music composed with computers: a historical survey. In: Lincoln HB (ed) The computer and music. Cornell University Press, Ithaca, pp 42–97

    Google Scholar 

  • Hiller L, Isaaccson L (1959) Experimental music: composition with an electronic computer. McGraw-Hill, New York

    Google Scholar 

  • Hoeberechts M, Shantz J (2009) Realtime emotional adaptation in automated composition. In: Proceedings of audio mostly, pp 1–8

  • Holland S, Wilkie K, Mulholland P, Seago A (2013) Music and human–computer interaction. Springer series on cultural computing. Springer, Berlin. https://doi.org/10.1007/978-1-4471-2990-5

    Book  Google Scholar 

  • Hopfield J, Tank D (1985) Neural computation of decisions in optimization problems. Biol Cybern 52(3):52–141

    MATH  Google Scholar 

  • Hovy E (2015) What are sentiment, affect, and emotion? Applying the methodology of Michael Zock to sentiment analysis. In: Gala N et al (eds) Language production, cognition, and the lexicon, text, speech and language technology, vol 48. Springer, Berlin, pp 13–24

    Google Scholar 

  • Huang C, Lin E (2013) An emotion-based method to perform algorithmic composition. In: The 3rd international conference on music & emotion, pp. 244–247

  • Husarik S (1983) John Cage and LeJaren Hiller: HPSCHD, 1969. Am Music 1(2):1–21

    Google Scholar 

  • Iakovidou C, Anagnostopoulos N, Kapoutsis A, Chatzichristofis Y, Boutalis Y (2014) Searching images with MPEG-7 (& MPEG-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In: 12th international workshop on content-based multimedia indexing (CBMI). pp 18–20

  • Iren D, Liem C, Yang J, Bozzon A (2016) Using social media to reveal social and collective perspectives on music. In: International ACM conference on web sciencs (WebSci’16), Hannover, Germany, pp 296–300

  • Katayose H, Kato H, Imai M, Inokuchi S (1989) An approach to an artificial music expert. In: International computer music conference, pp 138–146

  • Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585

    Google Scholar 

  • Kim J, Wigram T, Gold C (2009) Emotional, motivational and interpersonal responsiveness of children with autism in improvisational music therapy. Autism 13(4):389–409

    Google Scholar 

  • Kirke A, Miranda ER (2009) A survey of computer systems for expressive music performance. ACM Comput Surv (CSUR) 42(1):3

    Google Scholar 

  • Kirke A, Miranda E (2011) Combining EEG frontal asymmetry studies with affective algortihmic composition and expressive performance model. In: International computer music conference, Huddersfield

  • Kirke A, Miranda E (2017) Aiding soundtrack composer creativity through automated film script-profiled algorithmic composition. J Creat Music Syst 1(2)

  • Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268

    MathSciNet  MATH  Google Scholar 

  • Kyogu L (2008) A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio. Dissertation, Department of Music, Stanford University

  • L’Hadj LS, Boughanem M, Amrouche K (2016) Enhancing information retrieval through concept-based language modeling and semantic smoothing. J Assoc Inf Sci Technol (JASIST) 67(12):2909–2927

    Google Scholar 

  • Lin CL, Shih YH, Tzeng GH, Yu HC (2016) A service selection model for digital music service platforms using a hybrid MCDM approach. Appl Soft Comput 48:385–430

    Google Scholar 

  • Liu J, Zhong W, Jiao L (2010) A multiagent evolutionary algorithm for combinatorial optimization problems. IEEE Trans Syst Man Cybern 40(1):229–240

    Google Scholar 

  • Livingstone SR et al (2010) Changing musical emotion: a computational rule system for modifying score and performance. Comput Music J 34(1):41–64

    Google Scholar 

  • Manousakis S (2006) Musical L-systems. Koninklijk Conservatorium, The Hague Master Thesis

  • Marques M, Oliveira V, Vieira S, Rosa AC (2000) Music composition using genetic evolutionary algorithms. In: Proceedings of the IEEE conference on evolutionary computation. IEEE Press, New York, NY

  • Matic D (2010) A genetic algorithm for composing music. Yugoslav J Oper Res 20(1):157–177

    MathSciNet  MATH  Google Scholar 

  • McAndrew S, Everett M (2015) Music as collective invention: a social network analysis of composers. Cult Sociol J 9(1):56–80. https://doi.org/10.1177/1749975514542486

    Article  Google Scholar 

  • McChord KA (2004) Moving beyond “that’s all i can do”: encouraging musical creativity in children with learning disabilities. Bull Counsil Res Music Educ 159:23–32

    Google Scholar 

  • McCormack J (1996) Grammar-based music composition. In: Stocker S et al (eds) Complex systems. IOS Press, Amsterdam, pp 321–336

    Google Scholar 

  • Molina A, Daniel D, Moya JC, Vico FJ (2016) An Evo-Devo system for algorithmic composition that actually works. In: Proceedings of the 2016 on genetic and evolutionary computation conference companion. ACM, pp 37–38

  • Morreale F, de Angeli A (2016) Collaborating with an autonomous agent to generate affective music. ACM Trans Comput Entertain (ACM CIE) 14(3):1–21

    Google Scholar 

  • Mühling M et al (2016) Content-based video retrieval in historical collections of the German broadcasting archive. In: International conference on theory and practice of digital libraries (TPLD’16), pp 67–78

    Google Scholar 

  • O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the fourth international AAAI conference on weblogs and social media, pp 122–129

  • Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1(11):90

    MATH  Google Scholar 

  • Ozcan E, Erçal T (2008) A genetic algorithm for generating improvized music. Lecture notes in computer science. Springer, Heidelberg, p 4926

    Google Scholar 

  • Panda R, Malheiro R, Rocha B, Oliveira A, Paiva RP (2013) Multi-modal music emotion recognition: a new dataset, methodology and comparative analysis. In: 10th international symposium on computer music multidisciplinary research (CMMR), pp 1–13

  • Papadopoulos G, Wiggins G (1999) AI methods for algorithmic composition: a survey, a critical view and future prospects. In: AISB symposium on musical creativity, pp 110–117

  • Pavlov S, Olsson C, Svensson C, Anderling V, Wikner J, Andreasson O (2014) Generation of music through genetic algorithms. Bachelor’s Thesis, University of Gothenburg, Sweden

  • Prusinkiewicz P, Lindenmayer A (1990) The algorithmic beauty of plants. Springer, New York

    MATH  Google Scholar 

  • Rahim A, Civelek I, Liang FH (2015) A model of department chairs’ social intelligence & faculty members’ turnover intention. Intelligence 53:65–71

    Google Scholar 

  • Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst 89:14–46

    Google Scholar 

  • Reimer MA, Garnett GE (2014) A hierarchical system for autonomous musical creation. In: Tenth artificial intelligence and interactive digital entertainment conference, pp 45–49

  • Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 6(39):1161–1980

    Google Scholar 

  • SACEM (Society of Auhors, C., and Editors of Music) (2016) AIVA: artificial intelligence virtual artist. http://www.aiva.ai/about. Accessed May 2018

  • Sandred O, Laurson M, Kuuskankare M (2009) Revisiting the Illiac suite—a rule-based approach to stochastic processes. Sonic Ideas/Ideas Sonicas 2:42–46

    Google Scholar 

  • Schank RC, Cleary C (1995) Making machines creative. In: Smith S, Ward TB, Finke RA (eds) The creative cognition approach. MIT Press, Cambridge, pp 229–247

    Google Scholar 

  • Schedl M, Flexer A, Urbano J (2013) The neglected user in music information retrieval research. J Intell Inf Syst (JIIS) 41(3):523–539

    Google Scholar 

  • Schedl M, Gómez E, Urbano J (2014) Music information retrieval: recent developments and applications. Found Trends Inf Retr 8(2–3):127–161

    Google Scholar 

  • See CM (2012) The use of music and movement therapy to modify behaviour of children with autism. Pertanika J Soc Sci Hum 20(4):1103–1116

    Google Scholar 

  • Serra MH (1993) Stochastic composition and stochastic timbre: Gendy3 by Iannis Xenakis. Perspect New Music 237–257

  • Shang W et al (2005) An improved kNN algorithm—fuzzy kNN. In: Computational intelligence and security, pp 741–746

    Google Scholar 

  • Song Y, Dixon S, Pearce M (2012) A survey of music recommendation systems and future perspectives. In: 9th international symposium on computer music modeling and retrieval, pp 395–410

  • Subasic P, Huettner A (2001) Affect analysis of text using fuzzy semantic typing. IEEE Trans Fuzzy Syst 9(4):483–496

    Google Scholar 

  • Temperley D (2002) A Bayesian approach to key-finding. In: International conference on music and artificial intelligence, LNAI 2445, pp 195–206

    Google Scholar 

  • Troiano L, Birtolo C, Armenise R (2017) Modeling and predicting the user next input by Bayesian reasoning. Soft Comput 21(6):1583–1600

    Google Scholar 

  • Verbeurgt K, Fayer M, Dinolfo M (2004) A hybrid neural-markov approach for learning to compose music by example. In: Conference of the Canadian society for computational studies of intelligence, pp 480–484

    Google Scholar 

  • Wan CY et al (2011) Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study. PLoS ONE 6(9):e25505. https://doi.org/10.1371/journal.pone.0025505

    Article  Google Scholar 

  • Whipple J (2004) Music in intervention for children and adolescents with autism: a meta-analysis. J Music Ther 41(2):90–106

    Google Scholar 

  • Whitley D, Sutton AM (2012) Genetic algorithms—a survey of models and methods. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, pp 637–671

    Google Scholar 

  • Wohlfahrt-Laymanna J, Heimbürgerb A (2017) Content aware music analysis with multi-dimensional similarity measure. Inf Model Knowl Bases XXVIII:292–303

    Google Scholar 

  • Wolfram Tones Inc (2005) WolframTones: how it works—scientific foundations. http://tones.wolfram.com/about/how-it-works. Accessed June 2019

  • Worth P, Stepney S (2005) Growing music: musical interpretations of L-systems. In: Workshop on applications of evolutionary computing, pp 545–550

    Google Scholar 

  • Xiao H, Downie SJ (2010) Improving mood classification in music digital libraries by combining lyrics and audio. In: Proceedings of the 10th annual joint conference on digital libraries. ACM, pp 159–168

  • Yiu R (2013) A composer’s imagining of musical tradition and the reinvention of heritage. Doctoral thesis, City, University of London Institutional Repository

  • Yuanyuan W (2014) Music emotion cognition model and interactive technology. In: IEEE workshop on electronics, computer and applications, Ottawa, Canada https://doi.org/10.1109/iweca.2014.6845608

  • Zangerle E, Pichl M, Gassler W, Specht G (2014) #nowplaying music dataset: extracting listening behavior from Twitter. In: Proceedings of the first international workshop on internet-scale multimedia management (WISMM), ACM Multimedia, Orlando, Florida, USA

  • Zentner M, Eerola T (2010) Self-report measures and models. In: Juslin PN, Sloboda JA (eds) Handbook of music and emotion: theory, research, applications. Oxford University Press, New York, pp 187–221

    Google Scholar 

  • Zenz V (2007) Automatic chord detection in polyphonic audio data. Master’s thesis University of Wien, Austria

Download references

Acknowledgements

This study is partly funded by the National Council for Scientific Research - Lebanon (CNRS-L, grant number: NCSRLAU#887), by LAU (Grant Number: SOERC1516R003), as well as the Fulbright Visiting Scholar program (sponsored by the US Department of State, grant number: PS00232737). Special thanks go to music experts: Anthony Bou Fayad, Robert Lamah, and Joseph Khalifé, who helped evaluate the synthetic compositions, as well as Jean Marie Riachi for his participation in a live demonstration of the system. We would also like to thank the non-expert testers (including LAU students, faculty, staff, and friends) who volunteered to participate in the experimental evaluation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joe Tekli.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Informed consent

Additional informed consent was obtained from all individual participants for whom identifying information is included in this article.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ralph Abboud is currently completing his Master’s and has been accepted in the Ph.D. program of the Computer Science Dept., Univ. of Oxford, UK.

Appendix

Appendix

1.1 Appendix I: pseudo-code for Chord_Realizations function

The pseudo-code for the chord-Realizations recursive function is shown in Fig. 23. The is Valid method mentioned in this pseudo-code refers to a method from MUSEC’s KB module which verifies that the progression from one chord to another is music-theoretically valid (i.e., conforming to all the rules built into KB, such as: no consecutive fifths, resolution of sensible tone, etc.).

Fig. 23
figure 23

Pseudo-code of Chord_Realizations function

1.2 Appendix II: mutation operators

1.2.1 II.1: Trille operator

The trille mutation operator affects the highest note, in terms of MIDI pitch, played in the chord. Its overall operation is visualized in Fig. 24. Based on the current key of the mutated chord, this operator retrieves the next note above the previously mentioned note in the key, using MUSEC’s KB module. It then proceeds to alternate rapidly between the two aforementioned notes over the first half-beat of the chord being mutated. This mutation mainly increases overall piece note density and note onset density. For more variability, a random decision is made at the time of the mutation’s execution to decide the range in which the alternating notes are performed. The following outcomes are possible: (i) first quarter-beat, (ii) second quarter-beat, and (iii) full half-beat. Alternatively, based on the previous mutations that have affected the chord being mutated, the trille operator can also affect the final half-beat rather than the first half-beat of the given chord (i.e., at the end of a chord’s execution rather than at the beginning).

Fig. 24
figure 24

Simplified activity diagram describing the trille mutation operator

1.2.2 II.2: Staccato operator

This mutation affects all the notes being played as part of a chord. It mainly alters the way they are performed. In music theory, a “staccato” refers to a note being played in a manner detached and separated from the others (such that its own duration is very short). This operator reduces the duration of every note to an eighth beat so as to emulate this effect.

1.2.3 II.3: Repeat operator

This mutation operator repeats the notes being played as part of a chord’s realization a second time within the current duration of the chord. Figure 25 provides a simplified activity diagram describing the process. It basically divides the current chord duration into two parts based on a random decision, takes all the notes currently being played, and then puts a copy of all the notes in both divisions. In order to maximize variability while maintaining musical structure, three divisions are allowed: (i) 0.75/0.25, where the first duplicate receives three quarters of the total chord duration and the latter duplicate receives the remaining quarter, (ii) 0.5/0.5, an equal split of duration among the two duplicates, and (iii) 0.25/0.75, as the inverse of the first division. To avoid overlap between notes, the copies are compressed to fit their new total duration (i.e., the individual note duration and onset time are scaled down to fit the smaller duplicate size). Any notes that have too short a duration following compression are discarded.

Fig. 25
figure 25

Simplified activity diagram describing the repeat mutation operator

1.2.4 II.4: Compress operator

This mutation affects the duration of a chord, and aims to raise overall piece note and note onset density. Unlike the repeat operator where the notes are duplicated, compressed, and then repeated across the whole chord duration, the compress operator only performs compression, thereby shrinking the chord’s overall duration. It halves a chord’s duration, and applies to the chord’s notes following the same compression process used with the repeat mutation operator.

1.2.5 II.5: Extend operator

The extend operator, as its name suggests, extends the length of a chord. Unlike the compress and repeat operators, this operator aims to lower the piece note and note onset densities. It first randomly decides an extension value in beats: either half a beat, or a full beat. Then, it identifies the notes that are played (i.e., that are audible) at the end of the chord’s duration and increases their duration by the extension value, while also increasing overall chord duration.

1.2.6 II.6: Silence operator

Similar to the extend operator, the silence operator lowers overall note and note onset density by extending the mutated chord’s duration. However, this operator does not add or extend any notes, instead creating a silence at the end of the chord. This mutation emulates the “rest” concept in music theory.

1.2.7 II.7: Single Suspension operator

The single suspension operator affects the notes that make up the chord’s definition (i.e., its root, third, and fifth notes) as specified in its frontier. This mutation identifies the note realizations of the frontier notes then randomly chooses one of them and delays its entry by a quarter-beat, thus increasing note onset density. Note that since no new notes are added through this mutation, and no changes are made to a chord’s duration, note density is preserved. However, note onset density increases since notes that would otherwise be played together are now played separately.

1.2.8 II.8: Progressive Entrance operator

This mutation, like the single suspension mutation, also increases note onset density. Unlike the latter operator however, progressive entrance rather affects all but one of the frontier notes’ onsets. A simplified activity diagram highlighting its behavior is shown in Fig. 26. It randomly chooses a starting distribution, spreading over a half-beat duration, indicating the beat timing at which every frontier note should be played. For structural and musical purposes, the smallest beat timing unit used for this process is the eighth beat. This process produces 20 possible distributions, three of which are shown in the activity diagram for illustration purposes. Due to this distribution’s decision process, some frontier notes could be dropped. This occurs when the duration distribution assigns zero values for certain frontier notes, which would subsequently produce unexpected and musically diverse results, while also lowering note density and note onset density in a novel way. Finally, the operator plays the surviving frontier notes sequentially (from lowest to highest pitch) following the chosen distribution.

Fig. 26
figure 26

Simplified activity diagram describing the progressing entrance mutation operator

1.2.9 II.9: Nota Cambiata operator

The nota cambiata operator emulates the music-theoretical principle of nota cambiata, and is used to decorate the highest note of a chord. In a typical nota cambiata realization, the decorated note is preceded by three other notes in its key. In order, these are the notes: (i) a third above, (ii) a second above, and (ii) a second below, it in its chord’s key. The operator assigns a random duration to each of these notes following the same logic as the one described with the progressive entrance operator (i.e., eighth beat time step, half-beat total duration), meaning that notes among the decorative notes could also be dropped. It also delays the decorated note’s onset by half a beat so as to accommodate the decoration notes. As a result, this operator increases note density and note onset density in the given musical piece.

1.2.10 II.10: Appoggiatura operator

Another music-theoretically inspired operator, the appoggiatura, precedes the decorated note with an adjacent note in its key, typically the note a second above it or a second below it in the given key, analogously to the music-theoretic “appoggiatura” decoration. The operator first identifies the highest note in the chord, then retrieves both of its adjacent notes in the chord’s key using MUSEC’s KB module, and randomly chooses one of them to add to the first half-beat of the piece. Similarly to the nota cambiata operator, the decorated note is delayed by half a beat to accommodate the new decoration. This operator increases the piece’s note density and note onset density.

1.2.11 II.11: Double Appoggiatura operator

A more sophisticated version of the appoggiatura mutation, the double appoggiatura precedes the decorated note with both its adjacent notes, in random order. A simplified activity diagram describing its behavior is shown in Fig. 27. It first identifies the decorated note and its adjacent notes using MUSEC’s KB module. Yet unlike the appoggiatura operator, double appoggiatura does not select one of the two adjacent pitches, but rather chooses an order (i.e., which note is played first) and a duration distribution (using eighth beat time units) for these two notes over the half-beat they are allocated, following which these notes are sequentially added and the decorated note is delayed by half a beat to fit the decoration. To avoid redundancy, the distribution in this case cannot include zero values, so that this operator, when applied, does not boil down to the appoggiatura operator described earlier. Following this constraint, three possible duration distribution are possible: (0.375, 0.125), (0.25, 0.25), and (0.125, 0.375), as shown in Fig. 27.

Fig. 27
figure 27

Simplified activity diagram describing the double appoggiatura mutation operator

1.2.12 II.12: Octava operator

The octava operator affects the composition’s average MIDI pitch by shifting a chord’s notes’ MIDI pitches up or down by an octave (i.e., it adds/subtracts 12 to the said notes’ MIDI pitches). The choice of octave jump (up or down) is stochastically governed by the current average pitch of the chord such that chords with a lower average pitch are likelier to be shifted up by an octave, and chords with higher average pitch are likelier to be shifted down an octave.

1.2.13 II.13: Tempo Steal operator

The tempo steal operator, unlike all previous operators, affects two chords, rather than just one. In this mutation, two consecutive chords are selected such that one “steals” a certain duration in beats from the other. The steal value used in MUSEC is half a beat. This mutation would not take place should the chord that is “stolen” be less than a beat long. Essentially, this operation extends a chord by half a beat using the extend operator described previously, and shrinks the other by half a beat. Shrinking works using a similar logic to extending, where the notes at the beginning of the shrunken chord are shortened by half a beat. This mutation was introduced to break the static duration distribution among chords, and to make compositions more rhythmically diverse.

1.2.14 II.14: Passing Notes operator

This mutation also runs on two adjacent chords, by adding notes to the first chord based on the higher note in the following chord. A simplified activity diagram describing the passing notes operator is shown in Fig. 28. It checks the highest notes in both chords. It then checks if both chords are in the same key and whether both highest notes are less than an octave apart so as to ensure a reasonable number of notes is subsequently added. If these conditions are verified, MUSEC’s KB module is called to identify all intermediary notes between the two higher notes based on the common key. Finally, these notes are added in sequence (over a total duration of half a beat) to the end of the first chord following a duration distribution. The latter is decided by randomly allocating duration “chunks” equal to the total duration divided by the number of passing notes.

Fig. 28
figure 28

Simplified activity diagram describing the passing notes mutation operator

As with the progressive entrance and nota cambiata duration distributions, zero values are possible and notes shorter than an eighth beat are discarded, which adds variability to this operator’s results. In total, \( \left( {\begin{array}{*{20}c} {2N - 1} \\ {N - 1} \\ \end{array} } \right) \) distributions are possible for the addition of n passing notes, resulting in an increase in piece note density and note onset density.

1.2.15 II.14: Anticipation operator

Also applied to two consecutive chords within the piece, the anticipation operator checks the highest notes of the second chord, and then inserts it in the final half-beat of the first chord. This operator emulates the music-theoretical concept of “anticipation” and increases both note density and note onset density.

1.2.16 II.15: Tempo Change operator

The tempo change operator targets the piece’s overall tempo. It changes tempo value in increments or decrements of 4 BPM (beats per minute). The increase/decrease decision is stochastically governed by in the individual’s current tempo, where pieces that are slower are likelier to speed up following this mutation, and vice versa.

1.2.17 II.16: Intensity Change operator

The intensity change operator changes the piece’s current intensity value (i.e., MIDI Velocity) in steps of 20, such that pieces that are too quiet are likelier to become louder and vice versa, thereby producing an effect similar to a composer’s dynamics. This is the only operator affecting a piece (individual)’s average velocity.

1.2.18 II.17: Modulation/Demodulation operator

This is the most music-theoretic intensive mutation implemented in MUSEC, changing a piece’s current key to a new key. In theory, a piece can change to any key of the 23 possible other keys. For the sake of simplicity, MUSEC was artificially restricted to modulate only to its neighbor keys, i.e., keys with which it shares an edge (direct connection) in the circle of fifths (c.f. Sect. 4.3.2). Note that we adopt a transient approach to modulations in MUSEC: using a common chord between the source and destination key to modulate (other modulation approaches, such as abrupt modulation, are not yet included in the current version of the system).

A simplified activity diagram describing the operator’s behavior is shown in Fig. 29. Modulation checks the last chord in the individual and identifies potential destination keys using MUSEC’s KB module. In the event that many alternatives are possible, it randomly selects a destination key. Alternatively, when no destination keys are compatible, the mutation is aborted. To announce the modulation, the operator also appends two chords to the modulated individual: (i) the new key’s dominant chord, and (ii) the root chord, thereby producing a perfect cadence. Finally, the piece’s current key is changed to the new key.

Fig. 29
figure 29

Simplified activity diagram describing the modulation/demodulation mutation operator

Demodulation occurs when the individual’s main key is different from the current key, and follows an analogous procedure to that of modulation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abboud, R., Tekli, J. Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition. Soft Comput 24, 9875–9925 (2020). https://doi.org/10.1007/s00500-019-04503-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04503-4

Keywords

Navigation