Keywords

1 Introduction

Over the past few years, the recommender system [19] has been proposed as a critical role to help users choose the preferred product from a massive amount of data. With the continuous development and enrichment of music content in recent years, it is difficult for users to find music suitable for themselves in the huge ocean of music. The purpose of personalized music recommendation is to provide users with tailor-made music services. It is a research topic that benefits users and music platforms.

With the development of natural language processing [17] in recent years, music recommendation based on lyrics has become possible. Emotional description [7, 8] is very useful and effective in describing music taxonomy. Since songs generally have a certain emotion, we can get the emotions of songs through emotional analysis based on lyrics and comment text. As a pioneering effort to describe human emotions, Russel [9] proposed a rotation model, and each of these emotions is shown in two dimensions. These two dimensions represent unpleasant to pleasure, calmness to excitement. Therefore, each emotional word can be defined as some combination of these two dimensions. Later, Thayer [22] further improved Russel’s model. The two main dimensions of the Thayer model are “arousal” and “valence”. In this model, in the arousal dimension, emotional terms are described as calm to excitement and are described as negative to positive in the valence dimension. The two-dimensional emotion plane of the Thayer model can be divided into four quadrants, composed of 11 emotions, as Fig. 1 shows.

Fig. 1.
figure 1

Thayer’s emotion model [22].

There are many types of attributes in music, such as singers, lyrics, audios, etc. More and more researches try to explore the user listening intention based on attributes of the music. Sanchez-Moreno et al. [23]. Use KNN algorithm to find similar users based on singers for music recommendation. Bu et al. [2]. Combines emotional analysis based on music audio with user’s social relationships for music recommendation. These works use attributes of the music to construct a music recommendation model. Emotional analysis based on music audio usually analyze the melody, beat and rhythm. It requires a certain music expertise and faces an issue of high cost. Meanwhile, existing work usually ignores the user’s social relationship [11, 13, 14], which makes it difficult to find a song that users may like. In this paper, we combine emotional analysis based on lyrics and comment texts with the user’s social relationships to improve music recommendations. To be specific, we aim to (1) find the user’s listening interest by analyzing the correlation between the user’s preference and the emotional category of songs, and (2) analyze the similarity between the listening behavior of users and their followees to find songs that the user has potential interest.

Our contributions are threefold.

  1. 1.

    We analyze the correlation between user preferences and the emotional categories of songs. We find that the user’s listening behavior is related to the emotion of song. At the same time, the user preference is also related to the emotion of song. These analyses can give us an idea of how to identify the user’s listening intentions.

  2. 2.

    We analyze the similarity of the emotional categories of songs that users and their followees listen to. We find that the similarity between the listening behavior of users and their followees is different. We can say that the similarity between the listening interest of users and their followees is diversity. Through the above analysis, we find that the user’s listening intention can be reflected from their followees.

  3. 3.

    We construct the hybrid personalized music recommendation model. We construct the classification model based on the KMeans and adjust different features (the correlation between the emotional category of a song and user preferences, and the similarity between users and their followees) to predict whether a user will listen to a song. The experiment results show that the importance of the features are different, for different features have different contributions to users’ listening intentions.

    The method of this paper applies to all music sites because the data types of the datasets used in this paper is common in music websites.

2 Related Work

Since the advent of the music platform, various music recommendation studies have been proposed. There are content-based recommendations [20]. For example, Sanchez-Moreno et al. [23]. Use KNN algorithm to find similar users based on singers for music recommendation. There are model-based recommendations [5, 15, 18]. For example, Pacula [21]. use matrix decomposition based on user’s implicit feedback for music recommendation. There are user-based recommendations [10, 16]. For example, Deng et al. [6]. generate music recommendations from listening records of similar users by obtaining the emotion of the user from the blog text. There are recommended methods based on hybrid models [3]. For example, Bu et al. [2] combines content-based recommendations with collaborative filtering to build a hybrid recommendation model.

At present, some music recommendation systems combine music audios and the user’s social relationships for mixed recommendations, but they do not consider another important attribute of music, namely lyrics. they only consider the genre of music, but ingore the emotions of music. There are also emotional analysis based on lyrics [1, 4, 24], but they did not consider the feelings of the user after listening, that is, the emotion in the comment text [12]. The emotions in the comment text can predict the emotions of the music more accurately. The user’s social relationship is not considered, which makes it difficult to find the user potential interest songs.

In conclusion, our work differs from others in mainly two aspects. (1) We use emotional analysis based on lyrics and comment texts to get attributes of songs. (2) We consider the user’s social relationship to find songs that the target user is potentially interested in.

3 Correlation Analysis of User Listening Behavior in Online Music Platform

In this section, we focus on the user’s listening intention as the research object and the analysis of various attributes related to it. We aim to identify the emotions of the song and the user’s social relationship how to affects the user’s listening behavior. The results of these analyses can help the company to understand the user’s listening intentions. Our method is suitable for most music platform websites because we only need the user’s listening records, lyrics, comments and the user’s followees.

3.1 Raw Data

We use the scrapy crawler framework to crawl data on music website, such as music.163Footnote 1. This dataset contains the user’s listening records, lyrics, the comment texts and the user’s followees, as the original data. Some user’s listening records are shown in Table 1. The score field value has a range of 0 to 100, it indicates how much the user likes the song. The larger the value, the more the user likes the song.

Table 1. Some records of user behavior logs

The user’s followees information is displayed in Table 2. The first field represents the target user’s ID, and the second field indicates her or his followee’s ID. For example, if A follows B, B is the followee of A and A is the follower of B. Figure 2 shows some song informations, such as song’ID, lyrics and comments. Songs here are limited to Chinese.

Fig. 2.
figure 2

Some lyrics and comments information.

Table 2. Followees information of some users

3.2 Data Preprocessing

According to the statistics of the data, the number of items in the dataset is about 93319, and the number of users is about 1103. Firstly, Chinese songs are filtered from the original data, which gets 16482 items and 1103 users. What’s more, ID of songs and users are renumbered, which are represented by 1 to 16482 and 1 to 1103. Then, We remove non-Chinese symbols from lyrics and comment texts to get text containing only Chinese. Finally, we use the tool of THULACFootnote 2 to segment Chinese text.

3.3 Word Embedding

Since the computer cannot directly process Chinese text information, the Chinese text should be constructed as a word vector. We use the emotional value of the emotional words in the text to construct it in this paper. Because not all words in lyrics and comment texts have emotions, some words are only used for connections. Our paper uses the tool of SenticNetFootnote 3 to analyze emotion in lyrics and comments. SenticNet contains more emotional words than others (such as NRC, DUTIR). It not only includes the polarity of emotional words, but also includes the emotional value in the four emotional dimensions, which makes the emotional analysis more fine-grained.

We take the first 99 comments and the lyric for each song to make up 100 texts. We match the lyrics and comment texts after the word segmentation with SenNetic to get the emotional value of the emotional word. We use the emotional values in the four sentiment dimensions to construct four matrices E\(_dj\), where d is the four emotional dimensions and j is the song’s ID. Since the number of emotional words in the lyrics and the comment text is not all the same. For the convenience of processing, we use the maximum number of emotional words in the lyrics and the comment text as the standard. In this paper, it is set to 169. If the number of emotional words is less than 169, we use 0 to fill. The four emotional dimensions in the SenticNet are Pleasantness, Attention, Sensitivity, and Aptitude.

We use the SVD(Singular Value Decomposition) to perform matrix dimension reduction on the emotional matrix E\(_dj\), which has a dimension of 100 * 169. Firstly, The matrix is reshaped to obtain a square matrix with a dimension of 130 * 130, and then it is reduced to obtain a matrix e\(_dj\) with a dimension of 1 * 130. Finally, we combine the emotion matrices in the four emotional dimensions to obtain a matrix with a dimension of 1 * 520 as the word vector of the song.

3.4 Statistical Analysis

From Fig. 3, we can see that the number of listening songs for 1,103 users is mostly between 20 and 40. A few people listen to more than 50 songs. At the same time, a few people listen to less than 10 songs. In Fig. 4, We can see that most songs are listened to by no more than 50 people, and a few songs are listened to by many people, which indicates that the popularity of each song is different.

Fig. 3.
figure 3

The number of users listening to songs.

Fig. 4.
figure 4

The number of users whose songs are listened.

Fig. 5.
figure 5

The emotional clustering map of all songs.

Fig. 6.
figure 6

The number of samples in each cluster.

Figure 5 displays the distribution of the emotional categories for songs in the dataset. We regard the emotional category to which the song belongs as a classification problem. We train a classifier based on KMeans algorithm. We classify the emotional categories of songs in the dataset into 11 categories. Figure 6 shows the number of samples included in each cluster. We can see that the number of samples contained in each cluster is different. We can say that users have different preferences for songs in different emotional categories.

4 Correlation Analysis

In this section, we analyze the correlation between the emotional category of songs and the user’s preferences. We also analyze the similarity between users and their followees. Through analysis, we try to find out the impact of these attributes on the user’s listening intentions.

Fig. 7.
figure 7

The relationship between emotional categories of songs and user’s preferences.

4.1 Correlation Analysis Between Emotional Categories of Songs and User’s Preferences

For each user, the emotional category of the song he or she likes may be different. So, when predicting the user’s listening intention, we have to consider the different attributes of each user. This section analyzes the correlation between emotional categories of songs and user’s preferences to find out the listening interests of different users.

Figure 7 displays the correlation between the emotional category of songs and the user’s preference. We find that the user’s preference is different in different emotional categories. For different emotional categories, the number of songs that the user listen to is different. The user usually listen to the song in his or her favorite emotional category, and vise Versa.

As can be seen from the above analysis, although the user listens to songs in several categories, the amount in each categories is different. Therefore, we can say that the user’s listening preferences are related to the emotional category of songs.

4.2 Analysis of the Similarity Between Users and Their Followees

The emotional categories of songs that different users like are different. Therefore, when looking for similar users, it is also necessary to consider the similarity of their listening interests. We analyze the similarity between the listening records of users and their followees to get the strength of relationship between users and their followees.

Figure 8 displays the distribution of the emotional categories of songs that users and their followees listen to. We can see that the similarity between the target user and the first followee is less than he or she and the second followee. The second followee’s listening interest is more similar to the target user. We can say that the similarity between the user and her or his followees is different.

Fig. 8.
figure 8

The similarity between emotional categories of songs that users and their followees listen to.

In summary, we get the following findings. (1) The listening behavior of users and their followees have certain similarities. (2) The similarity between the user and her or his followers is diverse. Therefore, it is necessary to consider the strength of relationship between users and their followees.

5 Recommendation Generation

In this section, we generate candidate items based on the similarity between the user and her or his followers and the followees’ preferences for songs. We use the Gaussian function to calculate the similarity between the candidate item and the emotional category of the target user’s listening records. We generate recommended items based on the similarity and category weights. We use metrics such as precision and recall to measure recommendations.

5.1 Measuring User’s Preferences

We use the trained classification model to get the emotional categories C\(_{ui}\) of the target user’s listening records R based on the emotion vector e\(_j\) of the song. Meanwhile, we calculate the proportion w\(_{ui}\) of each category C\(_{ui}\). The formula is as follows:

$$\begin{aligned} \mathrm {w}_{u i}=m_{ui} / M_{u} \end{aligned}$$
(1)

where m\(_{ui}\) represents the number of samples in the category C\(_{ui}\), M\(_{u}\) represents the total number of samples of all categories of the target user u, and i indicates the category.

5.2 Selecting Candidate Item and Calculating the Similarity

Selecting Candidate Item. We select songs that his or her followees like listening to but the target user does not listen to as candidate items. The song that the followee v likes listening to is judged by its score\(_{vj}\). The formula is as follows:

$$\begin{aligned} \text{ score } _{\mathrm {vj}} \ge \frac{a}{ \text{ num } } \times \sum _{j=1}^{ \text{ num } } \text{ score } _{\mathrm {vj}} \end{aligned}$$
(2)

Where num represents the number of songs that the user v has listened to, and a represents the similarity between the user v and the target user u.

Calculating the Similarity. We use the Gaussian function to calculate the similarity between the candidate item and the emotional category of the target user’s listening records. We use the following formular to describe this:

$$\begin{aligned} S_{\mathrm {h,Cui}}=\frac{1}{\sqrt{2 \pi k_{\mathrm {g}} \delta _{\mathrm {Cui}}^{2}}} \exp \left( -\frac{\left( \mathrm {e}_{h}-\overline{p_{\mathrm {Cui}}}\right) }{2 k_{\mathrm {g}} \delta _{\mathrm {Cui}}^{2}}\right) \end{aligned}$$
(3)

Where k\(_g\) represents a constant, \(\sigma _{\mathrm {Cui}}^{2}\) represents the variance of the category \(\mathrm {C}_{\mathrm {ui}}\), \(\overline{\mathrm {p}}_{\mathrm {Cui}}\) represents the mean of the category \(\mathrm {C}_{\mathrm {ui}}\), and \(\mathrm {e}_{\mathrm {h}}\) represents the emotional vector of the candidate item.

5.3 Generating the Recommendation

We calculate the target user’s preference \(\mathrm {g}_{\mathrm {uh}}\) for the candidate item based on the similarity between the candidate item and the emotional category of the target user’s listening records, and the weight of each emotional category. The formula is as follows:

$$\begin{aligned} \mathrm {g}_{\mathrm {uh}}=\frac{1}{\mathrm {c}} \sum _{\mathrm {i}=1}^{\mathrm {c}} w_{\mathrm {u} i} s_{\mathrm {h,Cui}} \end{aligned}$$
(4)

where c is the number of the emotional categories of the target user listening records, \(\mathrm {W}_{\mathrm {ui}}\) is the weight of emotional category and \(s_{\mathrm {h,Cui}}\) is the similarity between the candidate item and the emotional category of the target user’s listening records.

When the target user’s preference \(\mathrm {g}_{\mathrm {uh}}\) for the candidate item h exceeds the threshold t. The candidate item h is added as a recommendation to the recommendation list CL, otherwise, the candidate item h is discarded.

5.4 Result Analysis

We use the sklearn, which is a Python library, as the tool to train the model and get the experimental result. Besides the random baseline, we present three methods to compare the effects of the two features as follows.

  1. 1.

    The Method 1 only uses the similarity of users and their followees, which is described in Sect. 4.2.

  2. 2.

    The Method 2 uses the correlation of user’s preference and the emotional categories of songs, which is described in Sect. 4.1.

  3. 3.

    The Method 3 uses the similarity of between users and their followees, combining with the correlation of user’s preference and the emotional categories of songs.

The experimental results are shown in Table 3.

Table 3. Experimental results under four different methods.

The Integrated Effects of Different Features. Comparing with the random baseline, the improvement of Method 1 is 75.64% in precision and 162.60% in recall; that of Method 2 is 20.09% in precision and 32.55% in recall; that of Method 3 is 89.74% in precision and 210.10% in recall. The experiment results verify that it is effective to consider the similarity and the correlation, and the similarity takes more effects.

6 Conclusions

In this paper, we construct the hybrid personalized music recommendation model by combining emotional analysis based on lyrics and comment texts with user social relationships. To be specific, we analyze the correlation between user preferences for songs and emotional categories of songs. Moreover, we analyze the similarity of the emotional categories of songs that users and their followees listen to. We find that the similarities between listening behaviors of users and their followees is different. Our paper constructs the recommendation model by combining the above (correlation and similarly). Our paper test their effects in real data set, and we find that they have different importance. In future work, we are interested in making music recommendation based on English songs and applying our model into more fields.