Keywords

1 Introduction

In today’s contemporary world, people tend to rely upon technology for foraging of the next relatable content that they would probably like it online rather than manually finding recommendations from people they know because of the tedious nature of that very work. For example, an online book recommendation system will probably recommend books that people have already bought/read adjoining the same book that a user would certainly buy/read but that will not be enough. This paper extensively proposes a strong and rational framework of a recommender system that recommends jokes to its users who haven’t read them yet, based on the previous rating pattern of jokes by other users. So, Collaborative filtering approach is chosen to recommend the next joke the users will like based on the preferences and tastes of many other users. Hence, users are to be divided into specific groups (clusters) depending upon their preference patterns in order to:

  • relate the Joke-Readers in each cluster for magnifying the value of each Joke-Reader.

  • help Joke-Writers to create jokes that will draw maximum attention and are suitable for specific groups of readers.

In this paper, the Bernoulli Restricted Boltzmann Machine (RBM) based Recommendation System takes the user’s likes and dislikes as input, analyzes and correlates the rating patterns of other users and gives the possible like or dislike rating as output. After that, k-means Clustering Model is trained only on the recommended set of ratings of all the jokes for all users and applied on the true set of ratings for the jokes that are read by the users and recommended ratings only for those jokes that are not read by them till now.

This paper has been organized as follows: Sect. 2 discusses the related works forming the Literature Review, Sect. 3 illuminates the Proposed Methodology followed by Sect. 4 elucidating the Implementation Details, Sect. 5 showcases the Results, Visualization and Performance Analysis, Sect. 6 highlights the future scope followed by Sect. 7 concluding the article.

2 Literature Review

There are many related works involving recommendation and segmentation attempted in the recent past.

Some of the existing contributions on Recommendation Systems associated to Jokes are as follows:

  • Goldberg et al. introduced Eigentaste Collaborative Filtering Algorithm which is of constant time and implemented it to build a Joke Recommendation System [5].

  • Babu et al. applied Bayesian Networks, Pearson & Cosine Correlations, Clustering and Horting for constructing Joke Recommender System [2].

  • Polatidis et al. proposed a Multi-Label Collaborative Filtering Algorithm and tested its efficiency by demonstrating experiments using 5 real datasets: MovieLens 100 thousand, MovieLens 1 million, Jester (joke dataset), Epinions and MovieTweetings [6].

  • Corso et al. formulated a Non-negative Matrix Factorization (NMF) of the rating matrix for dealing with Recommendation System problems and applied on MovieLens Dataset, Jester Dataset (jokes) and Amazon Fine Foods Dataset [3].

There also have been attempts in which Recommendation System is developed banking on Segmentation:

  • Rezaeinia et al. extracted and calculated the weights associated with the RFM (Recency, Frequency and Monetary) attributes of the clients. The weighted RFM and clustering algorithms based on closest k-neighbours are used for obtaining independent recommendations for each cluster [7].

  • Vandenbulcke et al. proposed a way of segmenting consumers in the retail zone based on shopping behaviours. Firstly products to be recommended to a unique consumer is resolved and secondly, consumers are segmented according to the recommendations [10]

  • Eskandanian et al. developed a user segmentation technique for developing personalized recommendation system and also highlighted the impact of segmentation on the originality of recommendations [4].

  • Shi et al. proposed a smart recommendation system based on customer segmentation. Usefulness is substantiated and the fidelity of customers (or customer net worth) is also graded by FCM clustering approach [9].

  • Rodrigues et al. proposed a hybrid recommender model combining content-based, collaborative filtering and data mining techniques in order to take care of the varying customer behaviour [8].

But here, Joke-Reader Segmentation is done on the basis of Recommendation Results given by the Recommender System and true values of likes and dislikes.

3 Proposed Methodology

3.1 Dataset

The Jester Dataset containing 4.1 million anonymous ratings from 73,421 users of 100 specific jokes [1] is used for developing the Joke Recommender and Joke-Reader Segmentation Models. Each and every rating in the dataset is a value between −10.00 to +10.00. The dataset was prepared by Goldberg et al. [5] at UC Berkeley. In this dataset, every row represents ratings of certain number of jokes (among the 100 specific jokes) by a unique user with some jokes left unrated by the users. The unrated jokes are actually those jokes which are not read by the user and hence, has to be recommended and are denoted by the number 99 in the dataset.

3.2 Data Preprocessing

The Data Pre-processing steps employed in the Jester Dataset are as follows:

  1. 1.

    All the jokes with user rating between +7.00 to +10.00 (both inclusive) are considered as liked by the user and marked as ‘1’.

  2. 2.

    All the jokes with user ratings between −10.00 to +6.00 (both inclusive) are considered as disliked by the user and marked as ‘0’.

  3. 3.

    All the jokes with no user ratings and with default marker ‘99’ are replaced by new marker, ‘\(-1\)’.

3.3 Development of Joke-Recommender System Using Bernoulli Restricted Boltzmann Machine (RBM)

Bernoulli Restricted Boltzmann Machine is an Energy-based Probabilistic Graphical Deep Learning Model. In RBM, there is a Visible Layer representing the Input Nodes and a Hidden Layer representing the Hidden Nodes. The hidden nodes in the RBM intuitively helps in Internal Feature Identification of the jokes which is necessary in any Collaborative Filtering Approach. There is an energy associated with the model at different states of training and is governed by weights connecting the Visible and Hidden Layer synapses (nodes) and biases. At each step of RBM training, these weights and biases are updated in such a way that the model attains its minimum energy possible. A Graphical Interpretation and formulation of the same is given in Fig. 1.

Fig. 1.
figure 1

RBM: an energy-based Probabilistic Graphical Model in which v\(^{0}\) represents initial energy, \(v^0\) & \(v^{\infty }\) denotes the vector of visible nodes initially & finally respectively, \(h^{0}\) & \(h^{\infty }\) denotes the vector of hidden nodes initially & finally respectively and w is the weight matrix

Gibbs Sampling is done for Training the RBM following 20-Step Contrastive Divergence as the Learning Algorithm.

The RBM Architecture consists of

  1. 1.

    100 visible nodes corresponding to the 100 specific jokes and 20 hidden nodes.

  2. 2.

    The weight matrix of dimension \(20 \times 100\) is randomly initialized according to Standard Normal Distribution with Mean of 0 and Std Deviation of 1.

  3. 3.

    The bias for the probability, \(p(h = 1|v)\) is randomly initialized according to Standard Normal Distribution (with Mean of 0 and Standard Deviation of 1) as a vector of size 20.

  4. 4.

    The bias for the probability, \(p(v = 1|h)\) is randomly initialized according to Standard Normal Distribution (with Mean of 0 and Standard Deviation of 1) as a vector of size 100.

A sample RBM Architecture with 6 visible nodes and 5 hidden nodes is shown in Fig. 2.

Fig. 2.
figure 2

RBM Architecture with 6 visible nodes denoted by blue colour and 5 hidden nodes denoted by red colour (Color figure online)

Gibbs Sampling: The steps employed in Gibbs Sampling are as follows:

  1. 1.

    Given the visible layer nodes forming the 1st sample of visible nodes, the probability \(p(h = 1|v)\) and values of the hidden nodes are calculated.

    So, this gives a vector of size 20 which represents the hidden layer values forming the \(1^{st}\) sample of hidden nodes.

  2. 2.

    Now, given the sample of hidden nodes created in the previous step, the probability \(p(v = 1|h)\) and values of the visible nodes are calculated.

    So, this gives a vector of size 100 which represents the visible layer values forming the \(2^{nd}\) sample of visible nodes. Considering this sample, steps 1 and 2 are repeated 19 times more.

Here, Step 1 creates Gibbs Samples of Hidden Layer nodes and Step 2 creates Gibbs Samples of Visible Layer Nodes.

This whole process of Gibbs Sampling, with steps 1 and 2 repeated 20 times (in total) is known as 20-step Contrastive Divergence. An illustrative diagram of Gibbs Sampling is shown in Fig. 3.

Fig. 3.
figure 3

Illustrative explanation of Contrastive Divergence

Training the Bernoulli RBM Model: The dataset is split into Training and Validation Sets such that 80% of the 73,421 instances i.e., 58,736 instances are used for Training the RBM and remaining 14,685 instances are used for Validation and Performance Analysis. Here, 20-Step Contrastive Divergence is used as the Learning Algorithm for Training the RBM with 10 as the number of epochs and with batch size of 3,671. The k-Step Contrastive Divergence is given in Algorithm 1.

figure a

For Validation of remaining 14,685 instances, blind walk by 1-step Contrastive Divergence is done.

3.4 Joke-Reader Segmentation Using k-Means Clustering

Now, the recommended ratings obtained in both Training and Validation Set are combined to form a Dataset D1. And the training set and test set with original values of joke-ratings and recommended ratings of the unrated jokes are combined to form a Dataset D2. Now, for Joke-Reader Segmentation, a k-Means Clustering Model is to be developed. The k-means clustering algorithm is given in Algorithm 2.

figure b

Before training, optimal number of clusters needs to be obtained. Elbow Method is employed for obtaining the optimal number of clusters. In the Elbow Method, the k-Means Clustering Model is trained multiple times on a dataset for different values of k and the corresponding Clustering Cost i.e., sum of the distances of the samples to the closest cluster centroid are plotted. This plot of Clustering Cost vs k value is known as Elbow Curve. From the curve, the value of k is taken as the optimal number of clusters that represents a ‘perfect elbow’. The Elbow Method is applied taking Datasets D1 and D2 separately and Elbow Curves for the same are shown in Fig. 4.

Fig. 4.
figure 4

Elbow curves for datasets D1 and D2

From Fig. 4, the elbow curve for Dataset D1 gives a ‘perfect elbow’ at k = 3. On the contrary, from Fig. 4, no such perfect elbow is obtained and moreover the Clustering Cost using Dataset D2 as Training Set, is much higher than that of D1. Hence, Dataset D1 is chosen as the Training Set for the k-Means Clustering Model.

The trained k-Means Clustering Model is applied on Dataset D2 for Joke-Reader Segment Visualization in Sect. 5.2.

4 Implementation Details

The Deep Learning Model (RBM) is deployed in PyTorch with each and every vector and matrix used as Tensor and the Machine Learning Model (k-Means) is implemented using Python’s Scikit-Learn Machine Learning Toolbox on a TPU configured Google Colab Notebook. The Elbow Curves and Training-Loss Curve (shown in Figs. 4 and 5 respectively) are generated by Python’s Data Visualization Library, Matplotlib. The Joke-Reader Segment (Cluster) visualizations in Fig. 6 is done using Microsoft Excel’s Line Charts.

5 Results, Visualization and Performance Analysis

5.1 Bernoulli RBM for Joke Recommender System

The evaluation of the performance of Bernoulli RBM Model is done on the following metrics:

  1. 1.

    Training Loss: It is the Mean Absolute Error on the Training Set.

  2. 2.

    Validation/Test Loss: It is the Mean Absolute Error on the Test Set.

Also, the Mean Absolute Error is calculated only with respect to the jokes that are read and rated (marked as liked or disliked) initially excluding the ones that were unrated (marked as ‘\(-1\)’) The Performance Analysis of the Bernoulli-RBM for Joke Recommender is tabulated in Table 1.

The training-loss history curve is given in Fig. 5.

Fig. 5.
figure 5

Training-loss history curve

Table 1. RBM model performance analysis

5.2 Visualization of the Joke-Reader Segments

The Performance Analysis of the k-Means Clustering Model is done by visualizing the clusters of Joke-Readers with Line Charts representing Preference Patterns of Jokes of each segment (cluster) of Joke-Readers. Each of the 3 clusters, so obtained from D2 (after validation) is characterized by a 100-sized vector (for 100 jokes) of Preference Values in which,

$$\begin{aligned} Preference(c_{i},j_{k})=\frac{\textit{Number of Readers belonging to cluster }c_{i}{} \textit{ and liking the joke } j_{k}}{\textit{Total Number of Readers belonging to cluster }c_{i}} \end{aligned}$$

Now, the 3 vectors of preference values, each representing a cluster are graphically plotted into Line Charts in order to generate the Preference Patterns or the shape of the Line Charts characterizing each cluster shown in Fig. 6.

Fig. 6.
figure 6

Joke-reader segment visualization

In Fig. 6, the preference pattern curves representing the 3 clusters are of entirely different shapes but the preference values for certain jokes or the preference patterns of certain consecutive jokes are quite similar. This reflects the degree of overlapping among the 3 clusters.

6 Future Scope

This methodology can be further utilized on Product-Customer Relation where products can be recommended and customer segmentation can be done accordingly. Similarly, the same can be adopted on Movie-Audience and Book-Reader Relations.

7 Conclusion

We have proposed a methodology for recommending jokes to readers as per their preferences as well as the Joke-writers for writing new jokes to draw the attention of readers. The key highlights of the composed methodology are as follows: Firstly, our model uses Collaborating Filtering approach for developing the Joke Recommendation System banking on which Segmentation of Joke-Readers is done. Secondly, Bernoulli Restricted Boltzmann Machine (RBM), which is an Unsupervised Deep Learning Approach is used for building the recommendation system. Thirdly, the Performance Analysis of the k-Means Clustering Model is done by visualizing the clusters of Joke-Readers with Line Charts each representing Preference Pattern of Jokes of each segment (cluster) of Joke-Readers. Moreover, the methodology might prove useful for the Joke-writers as well because they can get a rough idea about the jokes they should compose depending upon the mass preferences of all the Jokes-readers for Generalized Entertainment and of joke-readers within the specific segments (clusters) for Personalized Entertainment.