Keywords

1 Introduction

As a novel online learning style, micro learning aims to utilize users’ fragmented spare time by helping them to carry out effective personalized learning activities [1,2,3]. Such online learning activities could be formal, informal, and non-formal [4], and online knowledge sharing is one way of non-formal learning. Quora,Footnote 1 Zhihu, Footnote 2and StackoverflowFootnote 3 are the most representative and successful online knowledge platforms, where users share knowledge by asking and answering questions. In the meantime, the online platforms continuously recommend questions and topics to the users based on their interests, background, and learning requirements.

As the key to the personalized online learning service, the recommendation strategy determines what information will be finally delivered to the target user [5]. As for a new online learning service in the big data era, conventional recommendation strategies, such as collaborative filtering and content-based filtering [6], are no longer suitable for catering the personalized learning requirements. A recommender system always needs to handle and merge different types and format of information ranging from the user’s profile to the resource’s profiles. Moreover, higher-order feature interaction is crucial for good performance [7]. How to precisely weight different features is also vital for a recommender system, as different features have various importance levels for a personalized recommendation task [8].

In this paper, we propose a novel model, which combines several advantages from different state-of-the-art recommender systems and offers them in a smooth one-stop manner. The rest of this paper will be organized as follows. Section 2 discusses some prior related work about recommender system used in micro learning. The proposed model is introduced and explained in Sect. 3. The relevant experiment of this study is discussed and analysed in Sect. 4. The conclusions are discussed in Sect. 5.

2 Related Work

The recommendation problem has been investigated for many years in different domains. However, the recommendation task in online education always involves some unique requirements or characteristics [9, 10]. In one prior study [11], the ant colony optimization (ACO) algorithm was proposed to recommend personalized learning paths to users based on the demographic information. The ontology-based method was used to add extra user’s profile information and relieve the cold-start problem for micro learning service [12, 13]. Another study [14] investigated the learning path recommendation from micro learning service from an exploitation perspective. So far, there are little efforts on deep learning solutions to this problem.

Feature interaction means features involved in a recommendation task tend to influence each other with various combinations. Factorization machine (FM) [15] uses embedding techniques to model the latent features in low dimensional space and represents the pair-wise feature interactions by using the inner product. It also shows a satisfactory performance when the dataset is in high sparsity, whereas SVMs fails [15]. However, due to the high computational complexity, in many cases, only 2-order feature interactions are involved in the FM.

Deep learning has demonstrated its powerful strength in modelling non-linear transformation in various AI tasks. Besides using deep neural for a recommendation task in isolation (for example [16]), many researchers argue that combining the advantages of deep neural networks (DNN) with classical methods such as linear model or FM could better learn sophisticated feature interactions [17,18,19].

3 The Proposed Model

In this study, we aim to effectively combine these functionalities: mining and generating high-order feature interaction, distinguishing the importance difference of both implicit and explicit features, and maintaining the original input information in a single network. To this end, we proposed a new deep cross attention network (DCAN) model for the recommendation task of the online knowledge sharing service. The input of the model contains both user-side and question-side information, and the embedding layer maps such information onto a low dimensional space. The embedding vectors are then passed into the DNN network and crossing network separately for mining latent information and high-order feature interactions. The processed results are combined together, and an attention network is used to distinguish the importance differences of different features. Finally, the output layer is used to make predictions with weighted features.

4 Experiments and Analysis

4.1 Evaluation Metrics and Baselines

Evaluation Metrics.

As a binary classification task, the first evaluation metric used is Area Under Curve (AUC), which indicates how much a model is capable of distinguishing the two labels. Another metric used in our experiments is mean squared error (MSE), which directly reflects the prediction error of the involved models. Moreover, we also compared the binary cross entropy of the involved models.

Baselines.

We compared our model with several state-of-the-art recommendation models, ranging from DeepFM [17], AutoInt [7], DCN [20], AFM [21], and FM [15]. The characteristics of used baselines are introduced in the previous sections.

4.2 Dataset

The dataset is collected from an online knowledge-sharing platform, which contains around 1.8 million questions and users, and more than 4 million answers for the questions. Nearly 10 million <question, user> pairs are involved in this dataset.

4.3 Experiment Results

Based on the experiment results from Table 1, we can clearly see FM and AFM have lowest AUC values and highest MSE scores. These two models only involve low-order feature interactions. While others involve high-order feature interactions. Hence, high-order (complex) feature interactions are vital in the online learning resource recommendation tasks.

Table 1. Experiment results of different models

According to Table 1, the AUC scores of our proposed model and AutoInt model are the highest two. These two models refine the results of high-order feature interaction via the attention mechanism [22]. Such performance improvement demonstrates that different features/feature combinations are not equally important for personalized learning service, and attention mechanism can automatically distinguish the importance differences of the latent features or the feature combinations generated by the prior layers of the network.

5 Conclusions

In this study, we proposed a deep cross attention network (DCAN) for recommending personalized online learning resources to online learners. The experiment results clearly demonstrated that our model had potential in handling complex online learning recommendation problem. More specifically, according to the experiment results with authentic online knowledge sharing data, the strengths of DCAN can be concluded into two points: 1.this model can automatically mine and generate high-order feature interactions in both explicit and implicit ways; 2. the proposed model can further distinguish the importance differences of different features.