Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition

Abboud, Ralph; Tekli, Joe

doi:10.1007/s00500-019-04503-4

Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition

Methodologies and Application
Published: 12 November 2019

Volume 24, pages 9875–9925, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

456 Accesses
32 Citations
Explore all metrics

Abstract

Over the past years, several approaches have been developed to create algorithmic music composers. Most existing solutions focus on composing music that appears theoretically correct or interesting to the listener. However, few methods have targeted sentiment-based music composition: generating music that expresses human emotions. The few existing methods are restricted in the spectrum of emotions they can express (usually to two dimensions: valence and arousal) as well as the level of sophistication of the music they compose (usually monophonic, following translation-based, predefined templates or heuristic textures). In this paper, we introduce a new algorithmic framework for autonomous music sentiment-based expression and composition, titled MUSEC, that perceives an extensible set of six primary human emotions (e.g., anger, fear, joy, love, sadness, and surprise) expressed by a MIDI musical file and then composes (creates) new polyphonic (pseudo) thematic, and diversified musical pieces that express these emotions. Unlike existing solutions, MUSEC is: (i) a hybrid crossover between supervised learning (SL, to learn sentiments from music) and evolutionary computation (for music composition, MC), where SL serves at the fitness function of MC to compose music that expresses target sentiments, (ii) extensible in the panel of emotions it can convey, producing pieces that reflect a target crisp sentiment (e.g., love) or a collection of fuzzy sentiments (e.g., 65% happy, 20% sad, and 15% angry), compared with crisp-only or two-dimensional (valence/arousal) sentiment models used in existing solutions, (iii) adopts the evolutionary-developmental model, using an extensive set of specially designed music-theoretic mutation operators (trille, staccato, repeat, compress, etc.), stochastically orchestrated to add atomic (individual chord-level) and thematic (chord pattern-level) variability to the composed polyphonic pieces, compared with traditional evolutionary solutions producing monophonic and non-thematic music. We conducted a large battery of tests to evaluate MUSEC’s effectiveness and efficiency in both sentiment analysis and composition. It was trained on a specially constructed set of 120 MIDI pieces, including 70 sentiment-annotated pieces: the first significant dataset of sentiment-labeled MIDI music made available online as a benchmark for future research in this area. Results are encouraging and highlight the potential of our approach in different application domains, ranging over music information retrieval, music composition, assistive music therapy, and emotional intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic Music Composition with Evolutionary Algorithms: Digging into the Roots of Biological Creativity

Evolutionary Music Composition

Affective evolutionary music composition with MetaCompose

Article 09 June 2017

Notes

Simplest form of musical textures where only one note is played at a time, in contrast with polyphonic music where more than one note is played simultaneously.
Musical Instrument Digital Interface: a digital music format designed for symbolic music representation and processing by computers.
A fuzzy classifier is a classifier which assigns membership scores to input data objects, producing fuzzy categories with fuzzy boundaries, such that an object, e.g., a musical piece, can be part of one category and the other at the same time (e.g., 80% excitement and 20% fear), in contract with traditional crisp classifiers which categorize data in crisp/distinct categories (Kotsiantis 2007). In our current system, we utilize fuzzy k-NN due to its flexibility and effectiveness, yet any other fuzzy classifier could be used, e.g., (Abu 2017; Abu et al. 2016; Amin et al. 2018; Fahmi et al. 2017, 2018, 2019).
A chord is a combination of 3 or more notes (cf. Sect. 2).
http://www.lau.edu.lb/news-events/news/archive/music_composers_face_off_with_/. Details are provided in Sect. 8.
Available online at: http://sigappfr.acm.org/Projects/MUSEC, including MUSEC synthetic compositions and all experimental results.
A fundamental frequency is the lowest frequency produced by the oscillation of an object. In music, it is perceived as the lowest partial (simple tone) present that is distinct from the harmonics of higher frequency. In the remainder of this paper, terms frequency and fundamental frequency will be used interchangeably, unless explicitly stated otherwise.
Music produced following the traditions of Western (European) culture, compared with Oriental (Byzantine, Mizrahi, or Asian) music.
Also referred to in the literature as affective music composition.
With respect to.
Also known as.
Support vector machine.
k-nearest neighbor.
An ANN with several hidden layers between the input and the output layers is called a deep neural network or a deep learner.
It is a machine learning approach which allows the learning of a function that maps an input (e.g., musical piece) to an output (e.g., sentiment category or sentiment score) based on sample input–output pairs, so-called labeled training data, where each sample pair consists of a given input object (e.g., a music feature vector) and a desired output value (e.g., a sentiment category or a sentiment score). The produced mapping function is an approximation of the true mapping function between the sample training pairs (Kotsiantis 2007).
An evolutionary algorithm can be defined as a population-based metaheuristic optimization algorithm, which uses mechanisms inspired by biological evolution, such as reproduction, mutation, crossover, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. The evolution of the population then takes place after the repeated application of the above operators (Goldberg 1989; Whitley and Sutton 2012).
More features could be later added following the user’s needs.
To determine the dominant key, a chroma histogram for the input music file is first computed, denoting the percentage of total piece duration in which every chroma can be heard. The histogram is later used to compute likelihood scores using Temperley’s key profiles (Temperley 2002). A Bayesian Approach. The key with the highest score is finally selected as the dominant key (Temperley 2002).
Dominant key misidentification can occasionally occur, particularly for pieces where modulations occur very frequently and for atonal music (Temperley 2002; Kyogu 2008) (e.g., modern music which does not abide by a fixed key).
Note that 100% accuracy in chord progression identification is difficult to obtain due to the very nature of chord progressions: where i) the same chord progression can be played in so many different ways while still portraying the same musical structure, and ii) it can be often difficult to separate between consecutive chords since notes are sometimes combined between them. Our heuristic performs accurately on relatively simple music where there is a clear chord structure, and a clear separation between chords with no rapid transitions between them.
It requires O(n × m log(n + m)) where n and m designate the number of chords in the two pieces (chord progressions sequences) being processed.
Consider two chord progression sequences A and B, consisting of chords A₁, A₂, …, A_m and B₁, B₂, …, B_n, respectively. Without loss of generality, consider the case where m < n. Following the standard TPSD algorithm in Ayadi et al. (2016), the shorter sequence is compared with the longer one at every position, e.g., A₁, …, A_m versus B₁,…,B_m, then A₁, …, A_m versus B₂,…,B_m+ 1, and so forth until A₁, …, A_m versus B_n−m,…,B_n. Then the comparison yielding the smallest difference is selected as the final similarity (or distance) value. With the more efficient version of the TPSD algorithm in Bas De Haas et al. (2013), the chord progression sequences are only compared from their starting positions, e.g., A₁, …, A_m is only compared with B₁,…,B_m, and that score is utilized as the chord progressions similarity (distance) score. Despite this linear relaxation of the original algorithm, TPSD computation remains the most expensive among all other feature similarity computations put together (cf. experiments in Sect. 6.2.2).
To the expense of a potential loss of precision when processing long musical pieces (consisting of a large chord progression sequences).
Available online at: http://sigappfr.acm.org/Projects/MUSEC, SL survey form #1 (first part, 24-pieces), #2 (second part, 8-pieces), and #3 (third part, 8-pieces), along with the resulting sentiment-labeled dataset.
In our current implementation of MC, we hard-coded the chord probability distribution (through which a chord is selected) based on empirical sampling from our training set. Yet, learning the chord probability distribution can be a research project in and of itself, and can entail different composition styles. For instance, the distribution could be learned from a composer’s composition corpus, to produce pieces following the composer’s own style (which we further discuss as an ongoing work in Sect. 8).
Randomness is guided by MUSEC’s KB music-theoretic rules.
Emphasizing sentiment expression, while also promoting diversity.
Pearson correlation coefficient. Note that any other vector similarity measure (such as cosine or dice) could have been used. We adopt PCC here since it is commonly utilized in the literature (Abbasi et al. 2008; O’Connor et al. 2010).
We consider this strategy to be similar to the way some human composers usually write music: producing multiple candidate (trial) pieces, slicing and mixing them up, developing them and making them evolve until reaching a final pool of best candidates, from which the single best candidate is usually adopted as the actual final piece.
We adopted a ratio R = 0.7 in our current study, so that 70% of the offspring would be subject to fitness trimming, whereas only 30% would undergo variability trimming.
Available online at: http://sigappfr.acm.org/Projects/MUSEC.
Note that the number of beats in a piece is naturally less than the number of notes. While there is no straightforward relationship between the two, they can be paralleled to sentences and words in flat text: where beats represent music sentences, and notes represent the sentences’ words. In our sample test dataset of 100 pieces, the number of beats was on average 4-to-8 times less than the number of notes.
PCC = δXY/(δX × δY) where: x and y designate user and system generated similarity values, respectively, δX and δY denote the standard deviations of x and y, respectively, and δXY denotes the covariance between the x and y variables. The values of PCC ∈ [− 1, 1] such that: − 1 designates that one of the variables is a decreasing function of the other variable (i.e., music pieces deemed similar by human testers are deemed dissimilar by the system, and vice versa), 1 designates that one of the variables is an increasing function of the other variable (i.e., pieces are deemed similar/dissimilar by human testers and the system alike), and 0 means that the variables are not correlated.
MSE, computed as an average Euclidian distance measure, is a good indication of how close similarity scores are to human ratings: one by one (for every pair of pieces), whereas PCC compares the behavior of the vector of similarity ratings (for all pairs or pieces) as a whole.
Available online at: http://sigappfr.acm.org/Projects/MUSEC.
http://sigappfr.acm.org/Projects/MUSEC, SL survey form #1 (first part, 24-pieces), #2 (second part, 8-pieces), and #3 (third part, 8-pieces).
While we could have asked the testers to provide a confidence score associated with every sentiment score, yet, we felt this would complicate things for non-expert testers, especially that our objective was to capture their inherent feelings when listening to the music pieces, rather than have them “rationalize” their ratings by adding confidence scores. Nonetheless, considering tester rating confidence is an interesting factor that we plan to evaluate in a future study.
With the 100-piece training set, the system had “less” to learn since it was training on a more or less homogeneous training set, and thus over-fitted w.r.t. the well represented sentiments, namely joy and sadness, but was less successful in inferring less represented sentiments like anger and fear.
To help illustrate this concept, let’s consider the following example, consisting of three vectors: V₁ = (0.8, 0.6), V₂ = (0.95, 0.45), and V₃ = (0.65, 0.75). Let V₁ be our target vector and let V₂ and V₃ be our system estimate vectors. Upon first inspection, it is obvious that V₂ is a better representative of V₁ than V₃, since it more or less exhibits the same behavior as V₁ (higher first term). This similarity in behavior is visible through PCC, where PCC(V₁, V₂) = 1 and PCC(V₁, V₃) = − 1. However with MSE, we obtain MSE(V₁,V₂) = MSE(V₁,V₃) = 0.0225. This shows that MSE is only a good indication of how close scores are to target sentiments one by one, while PCC reflects the overall similarity of a predicted sentiment vector to the target vector as a whole.
The Turing test was proposed by Alan Turing in 1950, designed to test the ability of a machine to exhibit intelligent behavior that is equivalent to or indistinguishable from that of a human. It was originally used to evaluate machines mimicking human conversation (originally referred to as the “imitation game”). A machine passes the Turing test if, after a number of questions, the human tester (asking questions) cannot know if the answers come from a human or a machine (Epstein et al. 2009).
Anthony Bou Fayad is a processional composer, pianist, and music instructor in the Antonine University’s School of Music, located in Baabda, Mont Lebanon. He also holds a Master’s of Computer Engineering, specializing in multimedia data processing, which allowed him to easily understand the context and purpose of our study, helping us set up the experimental process. Mr. Bou Fayad was partly remunerated for his efforts, mainly for playing and digitally recording all pieces, while volunteering his consulting services.
Some music composition systems provide sample pieces online, e.g., (Diaz-Jerez 2011; SACEM 2016), yet none of them are sentiment-based.
Using a population size S = 50, a generation size N varying between 50 and 80, a branching factor B = 10 and a fitness-to-variability ratio R = 0.7. All mutation probabilities were set to 0.1.
Available online at: http://sigappfr.acm.org/Projects/MUSEC, MC survey forms #1-to-#10.
http://www.conservatory.gov.lb/disciplines/discipline/21.
http://comm.lau.edu.lb/joseph-khalife.
Recall that states where both valence and arousal dimensions converge (e.g., both valence and arousal are high, or both are low) occur more often than states were they diverge, indicating a potential bias or ambiguity in the model (as stated by the model’s creator in (Russell 1980)).
https://sv.wikipedia.org/wiki/Jean-Marie_Riachi.
http://www.lau.edu.lb/news-events/news/archive/music_composers_face_off_with_/. The event included an active participation from a live audience of LAU students, faculty, staff, and friends, who helped rate MUSEC’s compositions and evaluate its sentiment scoring accuracy.

References

Abbasi A, Chen H, Thoms S, Fu T (2008) Affect analysis of web forums and blogs using correlation ensembles. IEEE Trans Knowl Data Eng 20(9):1168–1180
Google Scholar
Abboud R, Tekli J (2018) MUSE prototype for music sentiment expression. In: IEEE international conference on cognitive computing (ICCC'18). San Francisco, pp 106–109
Abu AO (2017) Adaptation of reproducing kernel algorithm for solving fuzzy Fredholm–Volterra integrodifferential equations. Neural Comput Appl 28(7):1591–1610
Google Scholar
Abu AO, Abo-Hammour ZS (2014) Numerical solution of systems of second-order boundary value problems using continuous genetic algorithm. Inf Sci 279:396–415
MathSciNet MATH Google Scholar
Abu AO, Al-Smadi M, Momani S, Hayat T (2016) Numerical solutions of fuzzy differential equations using reproducing kernel Hilbert space method. Soft Comput 20(8):3283–3302
MATH Google Scholar
Adiloglu K, Alpaslan FN (2007) A Machine Learning Approach to Two-Voice Counterpoint Composition. Knowl-Based Syst 20(3):300–309
Google Scholar
Amin F, Fahmi A, Abdullah S, Ali A, Ahmed R, Ghani F (2018) Triangular cubic linguistic hesitant fuzzy aggregation operators and their application in group decision making. J Intell Fuzzy Syst 34(4):2401–2416
Google Scholar
Ayadi MG, Bouslimi R, Akaichi J (2016) A medical image retrieval scheme with relevance feedback through a medical social network. Soc Netw Anal Min 6(1):53:1–53:23
Google Scholar
Baeza-Yates R, Ribeiro-Neto B (2011) Modern information retrieval: the concepts and technology behind search, 2nd edn. ACM Press Books, New York
Google Scholar
Barrett FS, Grimm KJ, Robins RW, Wildschut T, Sedikides C (2010) Music-evoked nostalgia: affect, memory, and personality. Emotion 10(3):390–403
Google Scholar
Bas De Haas W, Veltkamp RC, Wiering F (2008) Tonal pitch step distance: a similarity measure for chord progressions. In: International society of music information retrieval (ISMIR), pp 51–56
Bas De Haas W, Wiering F, Veltkamp RC (2013) A geometrical distance measure for determining the similarity of musical harmony. Int J Multimed Inf Retr 2(3):189–202
Google Scholar
Berrett LF (2017) How emotions are made: the secret life of the brain. Macmillan, London
Google Scholar
Boden MA (1994) Precis of the creative mind: myths and mechanisms. Behav Brain Sci 17(3):519–570
MathSciNet Google Scholar
Bradley M, Lang P (1999) Affective norms for English Words (ANEW): instruction manual and affective ratings. Technical report C-1, Center for Research in Psychophysiology, University of Florida
Burton AR (1998) A hybrid neuro-genetic pattern evolution system applied to musical composition. Ph.D. thesis, University of Surrey, UK
Cai Z, Hu H (2018) Session-aware music recommendation via a generative model approach. Soft Comput 22(3):1023–1031
MATH Google Scholar
Cao Y, Jia L, Chen Y, Lin N, Yang C, Zhang B, Liu Z, Li X, Dai H (2019) Recent advances of generative adversarial networks in computer vision. IEEE Access 7:14985–15006
Google Scholar
Carnie A (2013) Syntax: a generative introduction, 3rd edn. Wiley, Malden
Google Scholar
Chen Y, Garcia E, Gupta M, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
MathSciNet MATH Google Scholar
Chivadshetti P, Sadafale K, Thakare K (2015) Content based video retrieval using integrated feature extraction and personalization of results. In: International conference on information processing (ICIP’15). https://doi.org/10.1109/infop.2015.7489372
Cormen TH, Leiserson CE, Rivest RL, Stein C (2009) Introduction to algorithms, 3rd edn. MIT Press, Cambridge
MATH Google Scholar
Costa Y, Oliveira L, Silla C Jr (2017) An evaluation of convolutional neural networks for music classification using spectrograms. Appl Soft Comput 52:28–38
Google Scholar
Danhauser A (1994) Theory of music (French). Henri Lemoine, Paris (original edition published in 1950)
Google Scholar
Dell AG, Newton DA, Petroff JG (2011) Assistive technology in the classroom: enhancing the school experiences of students with disabilities, 2nd edn. Pearson, New Delhi
Google Scholar
Demopoulos RJ, Katchabaw MJ (2007) Music information retrieval: a survey of issues and approaches. Technical report #677, Department of Computer Science, University of Western Ontario
Di Nunzio A (2014) Illiac suite for string quartet. http://www.musicainformatica.org/topics/illiac-suite.php. Accessed July 2017
Diaz-Jerez G (2011) Composing with melomics: delving into the computational world for musical inspiration. MIT Press J 21:3–14
Google Scholar
Dubois RL (2003) Applications of generative string-substitution systems in computer music. Ph.D. dissertation, Columbia University
Ekman P (1993) Facial expression of emotion. Am Psychol 48:384–392
Google Scholar
Epstein R, Roberts G, Beber G (2009) Parsing the Turing test: philosophical and methodological issues in the quest for the thinking computer, 2009th edn. Springer, Berlin
MATH Google Scholar
Fahmi A, Abdullah S, Amin F, Siddiqui N, Ali A (2017) Aggregation operators on triangular cubic fuzzy numbers and its application to multi-criteria decision making problems. J Intell Fuzzy Syst 33(6):3323–3337
Google Scholar
Fahmi A, Abdullah S, Amin F, Ali A, Khan WA (2018) Some geometric operators with triangular cubic linguistic hesitant fuzzy number and their application in group decision-making. J Intell Fuzzy Syst 35(2):2485–2499
Google Scholar
Fahmi A, Abdullah S, Amin F, Sajjad Ali Khan M (2019) Trapezoidal cubic fuzzy number Einstein hybrid weighted averaging operators and its application to decision making. Soft Comput 24(14):5753–5783
MATH Google Scholar
Fernandez J, Vico F (2013) AI methods in algorithmic composition: a comprehensive survey. J Artif Intell Res 48:513–582
MathSciNet Google Scholar
Fleischman MB, Deb KR (2013) Displaying estimated social interest in time-based media. U.S. patent no. 8,516,374
Freeman J (2015) Survey of music technology. Coursera. https://www.coursera.org/learn/music-technology. Accessed Jul 2017
Ghosh A, Strehl J (2003) Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J Mach Learn Res 3:583–617
MathSciNet MATH Google Scholar
Gkonou C, Mercer S (2017) Understanding emotional and social intelligence among English language teachers. ELT research papers 17.03, British Council, ISBN 978-0-86355-842-9
Goldberg D (1989) Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading
MATH Google Scholar
Goleman D (2005) Emotional intelligence: why it can matter more than IQ, Bantam Books 10th Anniversary edn. Bloomsbury Publishing, London
Google Scholar
Hauger D, Schedl M, Kosir A, Tkalcic M (2013) The million musical tweet dataset: what we can learn from microblogs. In: Proceedings of the 14th international society for music information retrieval conference (ISMIR’13)
Hevner K (1935) The affective character of the major and minor modes in music. Am J Psychol 47(1):03–118
Google Scholar
Hiller L (1970) Music composed with computers: a historical survey. In: Lincoln HB (ed) The computer and music. Cornell University Press, Ithaca, pp 42–97
Google Scholar
Hiller L, Isaaccson L (1959) Experimental music: composition with an electronic computer. McGraw-Hill, New York
Google Scholar
Hoeberechts M, Shantz J (2009) Realtime emotional adaptation in automated composition. In: Proceedings of audio mostly, pp 1–8
Holland S, Wilkie K, Mulholland P, Seago A (2013) Music and human–computer interaction. Springer series on cultural computing. Springer, Berlin. https://doi.org/10.1007/978-1-4471-2990-5
Book Google Scholar
Hopfield J, Tank D (1985) Neural computation of decisions in optimization problems. Biol Cybern 52(3):52–141
MATH Google Scholar
Hovy E (2015) What are sentiment, affect, and emotion? Applying the methodology of Michael Zock to sentiment analysis. In: Gala N et al (eds) Language production, cognition, and the lexicon, text, speech and language technology, vol 48. Springer, Berlin, pp 13–24
Google Scholar
Huang C, Lin E (2013) An emotion-based method to perform algorithmic composition. In: The 3rd international conference on music & emotion, pp. 244–247
Husarik S (1983) John Cage and LeJaren Hiller: HPSCHD, 1969. Am Music 1(2):1–21
Google Scholar
Iakovidou C, Anagnostopoulos N, Kapoutsis A, Chatzichristofis Y, Boutalis Y (2014) Searching images with MPEG-7 (& MPEG-7-like) powered localized descriptors: the SIMPLE answer to effective content based image retrieval. In: 12th international workshop on content-based multimedia indexing (CBMI). pp 18–20
Iren D, Liem C, Yang J, Bozzon A (2016) Using social media to reveal social and collective perspectives on music. In: International ACM conference on web sciencs (WebSci’16), Hannover, Germany, pp 296–300
Katayose H, Kato H, Imai M, Inokuchi S (1989) An approach to an artificial music expert. In: International computer music conference, pp 138–146
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Syst Man Cybern 4:580–585
Google Scholar
Kim J, Wigram T, Gold C (2009) Emotional, motivational and interpersonal responsiveness of children with autism in improvisational music therapy. Autism 13(4):389–409
Google Scholar
Kirke A, Miranda ER (2009) A survey of computer systems for expressive music performance. ACM Comput Surv (CSUR) 42(1):3
Google Scholar
Kirke A, Miranda E (2011) Combining EEG frontal asymmetry studies with affective algortihmic composition and expressive performance model. In: International computer music conference, Huddersfield
Kirke A, Miranda E (2017) Aiding soundtrack composer creativity through automated film script-profiled algorithmic composition. J Creat Music Syst 1(2)
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
MathSciNet MATH Google Scholar
Kyogu L (2008) A system for acoustic chord transcription and key extraction from audio using hidden Markov models trained on synthesized audio. Dissertation, Department of Music, Stanford University
L’Hadj LS, Boughanem M, Amrouche K (2016) Enhancing information retrieval through concept-based language modeling and semantic smoothing. J Assoc Inf Sci Technol (JASIST) 67(12):2909–2927
Google Scholar
Lin CL, Shih YH, Tzeng GH, Yu HC (2016) A service selection model for digital music service platforms using a hybrid MCDM approach. Appl Soft Comput 48:385–430
Google Scholar
Liu J, Zhong W, Jiao L (2010) A multiagent evolutionary algorithm for combinatorial optimization problems. IEEE Trans Syst Man Cybern 40(1):229–240
Google Scholar
Livingstone SR et al (2010) Changing musical emotion: a computational rule system for modifying score and performance. Comput Music J 34(1):41–64
Google Scholar
Manousakis S (2006) Musical L-systems. Koninklijk Conservatorium, The Hague Master Thesis
Marques M, Oliveira V, Vieira S, Rosa AC (2000) Music composition using genetic evolutionary algorithms. In: Proceedings of the IEEE conference on evolutionary computation. IEEE Press, New York, NY
Matic D (2010) A genetic algorithm for composing music. Yugoslav J Oper Res 20(1):157–177
MathSciNet MATH Google Scholar
McAndrew S, Everett M (2015) Music as collective invention: a social network analysis of composers. Cult Sociol J 9(1):56–80. https://doi.org/10.1177/1749975514542486
Article Google Scholar
McChord KA (2004) Moving beyond “that’s all i can do”: encouraging musical creativity in children with learning disabilities. Bull Counsil Res Music Educ 159:23–32
Google Scholar
McCormack J (1996) Grammar-based music composition. In: Stocker S et al (eds) Complex systems. IOS Press, Amsterdam, pp 321–336
Google Scholar
Molina A, Daniel D, Moya JC, Vico FJ (2016) An Evo-Devo system for algorithmic composition that actually works. In: Proceedings of the 2016 on genetic and evolutionary computation conference companion. ACM, pp 37–38
Morreale F, de Angeli A (2016) Collaborating with an autonomous agent to generate affective music. ACM Trans Comput Entertain (ACM CIE) 14(3):1–21
Google Scholar
Mühling M et al (2016) Content-based video retrieval in historical collections of the German broadcasting archive. In: International conference on theory and practice of digital libraries (TPLD’16), pp 67–78
Google Scholar
O’Connor B, Balasubramanyan R, Routledge BR, Smith NA (2010) From tweets to polls: linking text sentiment to public opinion time series. In: Proceedings of the fourth international AAAI conference on weblogs and social media, pp 122–129
Orio N (2006) Music retrieval: a tutorial and review. Found Trends Inf Retr 1(11):90
MATH Google Scholar
Ozcan E, Erçal T (2008) A genetic algorithm for generating improvized music. Lecture notes in computer science. Springer, Heidelberg, p 4926
Google Scholar
Panda R, Malheiro R, Rocha B, Oliveira A, Paiva RP (2013) Multi-modal music emotion recognition: a new dataset, methodology and comparative analysis. In: 10th international symposium on computer music multidisciplinary research (CMMR), pp 1–13
Papadopoulos G, Wiggins G (1999) AI methods for algorithmic composition: a survey, a critical view and future prospects. In: AISB symposium on musical creativity, pp 110–117
Pavlov S, Olsson C, Svensson C, Anderling V, Wikner J, Andreasson O (2014) Generation of music through genetic algorithms. Bachelor’s Thesis, University of Gothenburg, Sweden
Prusinkiewicz P, Lindenmayer A (1990) The algorithmic beauty of plants. Springer, New York
MATH Google Scholar
Rahim A, Civelek I, Liang FH (2015) A model of department chairs’ social intelligence & faculty members’ turnover intention. Intelligence 53:65–71
Google Scholar
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl Based Syst 89:14–46
Google Scholar
Reimer MA, Garnett GE (2014) A hierarchical system for autonomous musical creation. In: Tenth artificial intelligence and interactive digital entertainment conference, pp 45–49
Russell J (1980) A circumplex model of affect. J Pers Soc Psychol 6(39):1161–1980
Google Scholar
SACEM (Society of Auhors, C., and Editors of Music) (2016) AIVA: artificial intelligence virtual artist. http://www.aiva.ai/about. Accessed May 2018
Sandred O, Laurson M, Kuuskankare M (2009) Revisiting the Illiac suite—a rule-based approach to stochastic processes. Sonic Ideas/Ideas Sonicas 2:42–46
Google Scholar
Schank RC, Cleary C (1995) Making machines creative. In: Smith S, Ward TB, Finke RA (eds) The creative cognition approach. MIT Press, Cambridge, pp 229–247
Google Scholar
Schedl M, Flexer A, Urbano J (2013) The neglected user in music information retrieval research. J Intell Inf Syst (JIIS) 41(3):523–539
Google Scholar
Schedl M, Gómez E, Urbano J (2014) Music information retrieval: recent developments and applications. Found Trends Inf Retr 8(2–3):127–161
Google Scholar
See CM (2012) The use of music and movement therapy to modify behaviour of children with autism. Pertanika J Soc Sci Hum 20(4):1103–1116
Google Scholar
Serra MH (1993) Stochastic composition and stochastic timbre: Gendy3 by Iannis Xenakis. Perspect New Music 237–257
Shang W et al (2005) An improved kNN algorithm—fuzzy kNN. In: Computational intelligence and security, pp 741–746
Google Scholar
Song Y, Dixon S, Pearce M (2012) A survey of music recommendation systems and future perspectives. In: 9th international symposium on computer music modeling and retrieval, pp 395–410
Subasic P, Huettner A (2001) Affect analysis of text using fuzzy semantic typing. IEEE Trans Fuzzy Syst 9(4):483–496
Google Scholar
Temperley D (2002) A Bayesian approach to key-finding. In: International conference on music and artificial intelligence, LNAI 2445, pp 195–206
Google Scholar
Troiano L, Birtolo C, Armenise R (2017) Modeling and predicting the user next input by Bayesian reasoning. Soft Comput 21(6):1583–1600
Google Scholar
Verbeurgt K, Fayer M, Dinolfo M (2004) A hybrid neural-markov approach for learning to compose music by example. In: Conference of the Canadian society for computational studies of intelligence, pp 480–484
Google Scholar
Wan CY et al (2011) Auditory-motor mapping training as an intervention to facilitate speech output in non-verbal children with autism: a proof of concept study. PLoS ONE 6(9):e25505. https://doi.org/10.1371/journal.pone.0025505
Article Google Scholar
Whipple J (2004) Music in intervention for children and adolescents with autism: a meta-analysis. J Music Ther 41(2):90–106
Google Scholar
Whitley D, Sutton AM (2012) Genetic algorithms—a survey of models and methods. In: Rozenberg G, Bäck T, Kok JN (eds) Handbook of natural computing. Springer, Berlin, pp 637–671
Google Scholar
Wohlfahrt-Laymanna J, Heimbürgerb A (2017) Content aware music analysis with multi-dimensional similarity measure. Inf Model Knowl Bases XXVIII:292–303
Google Scholar
Wolfram Tones Inc (2005) WolframTones: how it works—scientific foundations. http://tones.wolfram.com/about/how-it-works. Accessed June 2019
Worth P, Stepney S (2005) Growing music: musical interpretations of L-systems. In: Workshop on applications of evolutionary computing, pp 545–550
Google Scholar
Xiao H, Downie SJ (2010) Improving mood classification in music digital libraries by combining lyrics and audio. In: Proceedings of the 10th annual joint conference on digital libraries. ACM, pp 159–168
Yiu R (2013) A composer’s imagining of musical tradition and the reinvention of heritage. Doctoral thesis, City, University of London Institutional Repository
Yuanyuan W (2014) Music emotion cognition model and interactive technology. In: IEEE workshop on electronics, computer and applications, Ottawa, Canada https://doi.org/10.1109/iweca.2014.6845608
Zangerle E, Pichl M, Gassler W, Specht G (2014) #nowplaying music dataset: extracting listening behavior from Twitter. In: Proceedings of the first international workshop on internet-scale multimedia management (WISMM), ACM Multimedia, Orlando, Florida, USA
Zentner M, Eerola T (2010) Self-report measures and models. In: Juslin PN, Sloboda JA (eds) Handbook of music and emotion: theory, research, applications. Oxford University Press, New York, pp 187–221
Google Scholar
Zenz V (2007) Automatic chord detection in polyphonic audio data. Master’s thesis University of Wien, Austria

Download references

Acknowledgements

This study is partly funded by the National Council for Scientific Research - Lebanon (CNRS-L, grant number: NCSRLAU#887), by LAU (Grant Number: SOERC1516R003), as well as the Fulbright Visiting Scholar program (sponsored by the US Department of State, grant number: PS00232737). Special thanks go to music experts: Anthony Bou Fayad, Robert Lamah, and Joseph Khalifé, who helped evaluate the synthetic compositions, as well as Jean Marie Riachi for his participation in a live demonstration of the system. We would also like to thank the non-expert testers (including LAU students, faculty, staff, and friends) who volunteered to participate in the experimental evaluation.

Author information

Authors and Affiliations

E.C.E. Department, School of Engineering, Lebanese American University (LAU), Byblos Campus, 36, Byblos, Lebanon
Ralph Abboud & Joe Tekli

Authors

Ralph Abboud
View author publications
You can also search for this author in PubMed Google Scholar
Joe Tekli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joe Tekli.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

Informed consent

Additional informed consent was obtained from all individual participants for whom identifying information is included in this article.

Additional information

Communicated by V. Loia.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Ralph Abboud is currently completing his Master’s and has been accepted in the Ph.D. program of the Computer Science Dept., Univ. of Oxford, UK.

Appendix

1.1 Appendix I: pseudo-code for Chord_Realizations function

The pseudo-code for the chord-Realizations recursive function is shown in Fig. 23. The is Valid method mentioned in this pseudo-code refers to a method from MUSEC’s KB module which verifies that the progression from one chord to another is music-theoretically valid (i.e., conforming to all the rules built into KB, such as: no consecutive fifths, resolution of sensible tone, etc.).

1.2 Appendix II: mutation operators

1.2.1 II.1: Trille operator

The trille mutation operator affects the highest note, in terms of MIDI pitch, played in the chord. Its overall operation is visualized in Fig. 24. Based on the current key of the mutated chord, this operator retrieves the next note above the previously mentioned note in the key, using MUSEC’s KB module. It then proceeds to alternate rapidly between the two aforementioned notes over the first half-beat of the chord being mutated. This mutation mainly increases overall piece note density and note onset density. For more variability, a random decision is made at the time of the mutation’s execution to decide the range in which the alternating notes are performed. The following outcomes are possible: (i) first quarter-beat, (ii) second quarter-beat, and (iii) full half-beat. Alternatively, based on the previous mutations that have affected the chord being mutated, the trille operator can also affect the final half-beat rather than the first half-beat of the given chord (i.e., at the end of a chord’s execution rather than at the beginning).

1.2.2 II.2: Staccato operator

This mutation affects all the notes being played as part of a chord. It mainly alters the way they are performed. In music theory, a “staccato” refers to a note being played in a manner detached and separated from the others (such that its own duration is very short). This operator reduces the duration of every note to an eighth beat so as to emulate this effect.

1.2.3 II.3: Repeat operator

This mutation operator repeats the notes being played as part of a chord’s realization a second time within the current duration of the chord. Figure 25 provides a simplified activity diagram describing the process. It basically divides the current chord duration into two parts based on a random decision, takes all the notes currently being played, and then puts a copy of all the notes in both divisions. In order to maximize variability while maintaining musical structure, three divisions are allowed: (i) 0.75/0.25, where the first duplicate receives three quarters of the total chord duration and the latter duplicate receives the remaining quarter, (ii) 0.5/0.5, an equal split of duration among the two duplicates, and (iii) 0.25/0.75, as the inverse of the first division. To avoid overlap between notes, the copies are compressed to fit their new total duration (i.e., the individual note duration and onset time are scaled down to fit the smaller duplicate size). Any notes that have too short a duration following compression are discarded.

1.2.4 II.4: Compress operator

This mutation affects the duration of a chord, and aims to raise overall piece note and note onset density. Unlike the repeat operator where the notes are duplicated, compressed, and then repeated across the whole chord duration, the compress operator only performs compression, thereby shrinking the chord’s overall duration. It halves a chord’s duration, and applies to the chord’s notes following the same compression process used with the repeat mutation operator.

1.2.5 II.5: Extend operator

The extend operator, as its name suggests, extends the length of a chord. Unlike the compress and repeat operators, this operator aims to lower the piece note and note onset densities. It first randomly decides an extension value in beats: either half a beat, or a full beat. Then, it identifies the notes that are played (i.e., that are audible) at the end of the chord’s duration and increases their duration by the extension value, while also increasing overall chord duration.

1.2.6 II.6: Silence operator

Similar to the extend operator, the silence operator lowers overall note and note onset density by extending the mutated chord’s duration. However, this operator does not add or extend any notes, instead creating a silence at the end of the chord. This mutation emulates the “rest” concept in music theory.

1.2.7 II.7: Single Suspension operator

The single suspension operator affects the notes that make up the chord’s definition (i.e., its root, third, and fifth notes) as specified in its frontier. This mutation identifies the note realizations of the frontier notes then randomly chooses one of them and delays its entry by a quarter-beat, thus increasing note onset density. Note that since no new notes are added through this mutation, and no changes are made to a chord’s duration, note density is preserved. However, note onset density increases since notes that would otherwise be played together are now played separately.

1.2.8 II.8: Progressive Entrance operator

This mutation, like the single suspension mutation, also increases note onset density. Unlike the latter operator however, progressive entrance rather affects all but one of the frontier notes’ onsets. A simplified activity diagram highlighting its behavior is shown in Fig. 26. It randomly chooses a starting distribution, spreading over a half-beat duration, indicating the beat timing at which every frontier note should be played. For structural and musical purposes, the smallest beat timing unit used for this process is the eighth beat. This process produces 20 possible distributions, three of which are shown in the activity diagram for illustration purposes. Due to this distribution’s decision process, some frontier notes could be dropped. This occurs when the duration distribution assigns zero values for certain frontier notes, which would subsequently produce unexpected and musically diverse results, while also lowering note density and note onset density in a novel way. Finally, the operator plays the surviving frontier notes sequentially (from lowest to highest pitch) following the chosen distribution.

1.2.9 II.9: Nota Cambiata operator

The nota cambiata operator emulates the music-theoretical principle of nota cambiata, and is used to decorate the highest note of a chord. In a typical nota cambiata realization, the decorated note is preceded by three other notes in its key. In order, these are the notes: (i) a third above, (ii) a second above, and (ii) a second below, it in its chord’s key. The operator assigns a random duration to each of these notes following the same logic as the one described with the progressive entrance operator (i.e., eighth beat time step, half-beat total duration), meaning that notes among the decorative notes could also be dropped. It also delays the decorated note’s onset by half a beat so as to accommodate the decoration notes. As a result, this operator increases note density and note onset density in the given musical piece.

1.2.10 II.10: Appoggiatura operator

Another music-theoretically inspired operator, the appoggiatura, precedes the decorated note with an adjacent note in its key, typically the note a second above it or a second below it in the given key, analogously to the music-theoretic “appoggiatura” decoration. The operator first identifies the highest note in the chord, then retrieves both of its adjacent notes in the chord’s key using MUSEC’s KB module, and randomly chooses one of them to add to the first half-beat of the piece. Similarly to the nota cambiata operator, the decorated note is delayed by half a beat to accommodate the new decoration. This operator increases the piece’s note density and note onset density.

1.2.11 II.11: Double Appoggiatura operator

A more sophisticated version of the appoggiatura mutation, the double appoggiatura precedes the decorated note with both its adjacent notes, in random order. A simplified activity diagram describing its behavior is shown in Fig. 27. It first identifies the decorated note and its adjacent notes using MUSEC’s KB module. Yet unlike the appoggiatura operator, double appoggiatura does not select one of the two adjacent pitches, but rather chooses an order (i.e., which note is played first) and a duration distribution (using eighth beat time units) for these two notes over the half-beat they are allocated, following which these notes are sequentially added and the decorated note is delayed by half a beat to fit the decoration. To avoid redundancy, the distribution in this case cannot include zero values, so that this operator, when applied, does not boil down to the appoggiatura operator described earlier. Following this constraint, three possible duration distribution are possible: (0.375, 0.125), (0.25, 0.25), and (0.125, 0.375), as shown in Fig. 27.

1.2.12 II.12: Octava operator

The octava operator affects the composition’s average MIDI pitch by shifting a chord’s notes’ MIDI pitches up or down by an octave (i.e., it adds/subtracts 12 to the said notes’ MIDI pitches). The choice of octave jump (up or down) is stochastically governed by the current average pitch of the chord such that chords with a lower average pitch are likelier to be shifted up by an octave, and chords with higher average pitch are likelier to be shifted down an octave.

1.2.13 II.13: Tempo Steal operator

The tempo steal operator, unlike all previous operators, affects two chords, rather than just one. In this mutation, two consecutive chords are selected such that one “steals” a certain duration in beats from the other. The steal value used in MUSEC is half a beat. This mutation would not take place should the chord that is “stolen” be less than a beat long. Essentially, this operation extends a chord by half a beat using the extend operator described previously, and shrinks the other by half a beat. Shrinking works using a similar logic to extending, where the notes at the beginning of the shrunken chord are shortened by half a beat. This mutation was introduced to break the static duration distribution among chords, and to make compositions more rhythmically diverse.

1.2.14 II.14: Passing Notes operator

This mutation also runs on two adjacent chords, by adding notes to the first chord based on the higher note in the following chord. A simplified activity diagram describing the passing notes operator is shown in Fig. 28. It checks the highest notes in both chords. It then checks if both chords are in the same key and whether both highest notes are less than an octave apart so as to ensure a reasonable number of notes is subsequently added. If these conditions are verified, MUSEC’s KB module is called to identify all intermediary notes between the two higher notes based on the common key. Finally, these notes are added in sequence (over a total duration of half a beat) to the end of the first chord following a duration distribution. The latter is decided by randomly allocating duration “chunks” equal to the total duration divided by the number of passing notes.

As with the progressive entrance and nota cambiata duration distributions, zero values are possible and notes shorter than an eighth beat are discarded, which adds variability to this operator’s results. In total, \( \left( {\begin{array}{*{20}c} {2N - 1} \\ {N - 1} \\ \end{array} } \right) \) distributions are possible for the addition of n passing notes, resulting in an increase in piece note density and note onset density.

1.2.15 II.14: Anticipation operator

Also applied to two consecutive chords within the piece, the anticipation operator checks the highest notes of the second chord, and then inserts it in the final half-beat of the first chord. This operator emulates the music-theoretical concept of “anticipation” and increases both note density and note onset density.

1.2.16 II.15: Tempo Change operator

The tempo change operator targets the piece’s overall tempo. It changes tempo value in increments or decrements of 4 BPM (beats per minute). The increase/decrease decision is stochastically governed by in the individual’s current tempo, where pieces that are slower are likelier to speed up following this mutation, and vice versa.

1.2.17 II.16: Intensity Change operator

The intensity change operator changes the piece’s current intensity value (i.e., MIDI Velocity) in steps of 20, such that pieces that are too quiet are likelier to become louder and vice versa, thereby producing an effect similar to a composer’s dynamics. This is the only operator affecting a piece (individual)’s average velocity.

1.2.18 II.17: Modulation/Demodulation operator

This is the most music-theoretic intensive mutation implemented in MUSEC, changing a piece’s current key to a new key. In theory, a piece can change to any key of the 23 possible other keys. For the sake of simplicity, MUSEC was artificially restricted to modulate only to its neighbor keys, i.e., keys with which it shares an edge (direct connection) in the circle of fifths (c.f. Sect. 4.3.2). Note that we adopt a transient approach to modulations in MUSEC: using a common chord between the source and destination key to modulate (other modulation approaches, such as abrupt modulation, are not yet included in the current version of the system).

A simplified activity diagram describing the operator’s behavior is shown in Fig. 29. Modulation checks the last chord in the individual and identifies potential destination keys using MUSEC’s KB module. In the event that many alternatives are possible, it randomly selects a destination key. Alternatively, when no destination keys are compatible, the mutation is aborted. To announce the modulation, the operator also appends two chords to the modulated individual: (i) the new key’s dominant chord, and (ii) the root chord, thereby producing a perfect cadence. Finally, the piece’s current key is changed to the new key.

Demodulation occurs when the individual’s main key is different from the current key, and follows an analogous procedure to that of modulation.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abboud, R., Tekli, J. Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition. Soft Comput 24, 9875–9925 (2020). https://doi.org/10.1007/s00500-019-04503-4

Download citation

Published: 12 November 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00500-019-04503-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integration of nonparametric fuzzy classification with an evolutionary-developmental framework to perform music sentiment-based analysis and composition

Abstract

Access this article

Similar content being viewed by others

Automatic Music Composition with Evolutionary Algorithms: Digging into the Roots of Biological Creativity

Evolutionary Music Composition

Affective evolutionary music composition with MetaCompose

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Appendix

Appendix

1.1 Appendix I: pseudo-code for Chord_Realizations function

1.2 Appendix II: mutation operators

1.2.1 II.1: Trille operator

1.2.2 II.2: Staccato operator

1.2.3 II.3: Repeat operator

1.2.4 II.4: Compress operator

1.2.5 II.5: Extend operator

1.2.6 II.6: Silence operator

1.2.7 II.7: Single Suspension operator

1.2.8 II.8: Progressive Entrance operator

1.2.9 II.9: Nota Cambiata operator

1.2.10 II.10: Appoggiatura operator

1.2.11 II.11: Double Appoggiatura operator

1.2.12 II.12: Octava operator

1.2.13 II.13: Tempo Steal operator

1.2.14 II.14: Passing Notes operator

1.2.15 II.14: Anticipation operator

1.2.16 II.15: Tempo Change operator

1.2.17 II.16: Intensity Change operator

1.2.18 II.17: Modulation/Demodulation operator

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation