PARMTRD: Parallel Association Rules Based Multiple-Topic Relationships Detection

Liu, Xin; Zhang, Xiaomiao; Wang, Yiwen; Zhou, Jiehan; Helal, Sumi; Xu, Zhidong; Zhang, Weishan; Cao, Shuai

doi:10.1007/978-3-319-94289-6_27

Xin Liu¹⁶,
Xiaomiao Zhang¹⁶,
Yiwen Wang¹⁶,
Jiehan Zhou¹⁷,
Sumi Helal¹⁸,
Zhidong Xu¹⁹,
Weishan Zhang¹⁶ &
…
Shuai Cao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10966))

Included in the following conference series:

International Conference on Web Services

1333 Accesses
3 Citations

Abstract

Lots of events happened everyday make social big data have plenty of topics. A topic usually comprises a series of stories. Clues of associations among stories are usually clear, but hidden associations among topics are not always intuitive. It is challenging to find topic associations due to intrinsic complexities of social big data, while analyzing relationships among topics is valuable to explore and reach to origination sources of specific events. Existing research rarely pay attention to analyze multiple-topic relationships. This paper proposes a mining approach for topic relationships detection based on parallel association rules, namely PARMTRD (Parallel Association Rules based Multiple-Topic Relationships Detection). PARMTRD obtains association keyword sets for each topic using parallel association rules based on large-scale frequent keyword sets, which mines association rules for multiple topics in parallel. PARMTRD detects the relevance among multiple topics by selecting and assembling association keywords from association keyword sets, which help to find sources of events. Experiments show that PARMTRD can detect the hidden relationships among multiple topics accurately and efficiently.

You have full access to this open access chapter, Download conference paper PDF

Discovering Concept-Level Event Associations from a Text Stream

Rule-Based Topic Trend Analysis by Using Data Mining Techniques

Clustering Based Topic Events Detection on Text Stream

Keywords

1 Introduction

Social media data are accumulating rapidly, which come from bulletin board system (BBS), Wiki, instant communication, Blog, and video/images from various social channels like Facebook, Twitter, Sina Weibo and so on. On the one hand, information on those data may be relatively consistent and persistent that is valuable for the occurring process of social events [1, 2]. On the other hand, there are a big variety of topics in those data that have complex hidden relationships. How to accurately and effectively mining these hidden relationships among those topics is a challenging topic, but the mine results are critical to identify root causes for public opinions, and then timely take corresponding actions to answer these causes.

The mining of topic relationships is related to the research of Topic Detection and Tracking (TDT) [3,4,5]. Topic detection finds out unknown topics by analyzing relationships between the new stories and the known ones, and clustering relevant stories into specific topics. Topic tracking is to identify and follow events by monitoring the progressive relationships for a given topic and its follow-up stories, which mainly emphasizes on internal topic relationships among multiple stories related to only one given topic. There are few studies on tracking their progress and finding the relationships of the new stories after detecting topics with respect to a specific topic.

Some researchers improved apriori algorithm for association rules [6,7,8,9]. In fact, association rules are good at detecting relationships among multiple topics. In this paper, we propose an approach for detecting multiple-topic relationships based on parallel association rules (i.e., PARMTRD). Our main contributions include:

(1)
PARMTRD can detect unobvious but critical relationships among multiple different topics hidden under superfacial phenomenon of events and then explore root causes of multiple events, which is different from existing research work that focuses on detecting relationships among stories within only one topic.
(2)
We improve Apririor algorithm and apply it to find topic relationships under complex scenarios by running parallel mining association rules. After getting frequent keyword sets and the association rules, we obtain association keyword sets for each topic, from which we can select and assemble keywords to find the relevance among multiple events seemingly unrelated.

The rest of the paper is organized as follows. Section 2 presents an overview of the related work. Section 3 introduces definitions of concepts used in our work in this paper. Section 4 proposes the idea of parallel association rules used for efficient multiple-topic relationships mining. Section 5 presents the PARMTRD method. Section 6 presents experiments and results. Section 7 concludes the paper and presents future work.

2 Related Work

A vector space model (VSM) ranks a document with respect to its similarity to a given user query. This similarity can be estimated by calculating the cosine of the angle between a document vector and a query vector [2]. However, VSM assumes that the term is independent of each other, completely ignores the implicit relationship among the terms in a document, which leads to the lack of sequential information of keywords.

LSI (Latent Semantic Indexing) is introduced into the text representation model. For example, the subject model represented by LDA [5, 10,11,12] (Latent Dirichlet Allocation) is widely used. LDA is a three-layer Bayesian parameter model which introduces Dirichlet prior distribution based on PLSA (Probabilistic Latent Semantic Analysis) [13, 14]. The implicit theme of the text is modeled by probability generation model to describe relationships between documents, topics and words. Improved models based on LDA have been proposed, for example, the TOT [15] (Topic over Time) model is proposed to include time as an observable variable into the LDA. Some models use time window for detecting relationships among stories, such as DTM [16, 17] (Dynamic Topic Model), CTDTM [18, 19] (Continuous Time Dynamic Topic Model), DMM [18, 20] (Dynamic Mixture Model), OLDA [21,22,23] (Online Latent Dirichlet Allocation), etc. Although the semantic information is introduced in the subject model, the word co-occurrence relation is not explicitly considered in these work. In addition, the LDA needs to perform the sampling operation repeatedly, which increases space complexity.

Some research is based on co-occurrence relationships of keywords [24,25,26,27,28]. The word co-occurrence refers to the fact that two or more keywords often appear together in the same part of the text, such as an article or a passage. We assume that these keywords are related, and if the probability of the words co-occurrence is higher, then these words have close relationships. Sayyadi and Raschid [24] proposed a KeyGraph approach based on co-occurrence relationships and demonstrated the accuracy of this method for the small data sets. Since the scenario graph visualized by KeyGraph is machine-oriented, Wang et al. [26] proposed a human-oriented algorithm called IdeaGraph. The above two methods do not take into account semantic relations between keywords, therefore Chen et al. [28] combined LDA with KeyGraph, proposing a hybrid term-term relationships analysis approach. In addition, Li et al. [27] and Zhao et al. [25] applied the word co-occurrence graph to the Micro-blog topic detection, demonstrating the effectiveness of this method. Although the word co-occurrence expresses the semantic relations between the words in a certain degree, it is mainly concerned with the co-occurrence relationships between two keywords, ignoring the actual situation of multiple words co-occurrence. For detecting topic relationships among multiple topics we need to get the relationship among multiple keywords. Therefore, we propose the method for topic relationship detection based on parallel association rules.

3 Concept Definitions

Definition 1.

Support. Support is an indication of how frequently an itemset appears in a dataset $ I $. TDT tries to find hotness of a keyword set. An association rule is defined as an implication of the form $ A \Rightarrow B $, where $ A,B \in I $. The support of the itemset $ \{ A,B\} $ is defined as (1):

$$ sup(A \Rightarrow B) = P(A \cup B) = num(A \cup B)/num(I) $$

(1)

$ sup(A \Rightarrow B) $ is the support of the itemset $ \{ A,B\} $. $ P(A \cup B) $ is the probability of the itemset $ \{ A,B\} $ occurrence in the dataset $ I $. $ num(A \cup B) $ is the occurrence number of the itemset $ \{ A,B\} $ in the dataset $ I $. $ num(I) $ is the number of records in the dataset.

Definition 2.

Confidence. Confidence is an indication of the probability that event $ B $ will occur at the same time at the case of prerequisite $ A $. In TDT, it informs that how often items in $ A $ and $ B $ are included in an itemset together in case of itemset $ A $ is included in dataset $ I $. The confidence of an association rule $ A \Rightarrow B $ is defined as (2):

$$ conf(A \Rightarrow B) = P(B\left| A \right.) = \,sup(A \cup B)/sup(A) = num(A \cup B)/num(A) $$

(2)

Where $ num(A \cup B) $ is the number of occurrences of the itemset $ \{ A,B\} $ in the dataset I; $ num(A) $ is the number of occurrences of the itemset $ \{ A\} $ in the dataset $ I $.

Definition 3.

Candidate k_itemset. In TDT, candidate k_itemset is a $ k $ element itemset which has $ k $ $ (k = 1,2, \ldots ,n) $ keywords. The i-th candidate k_itemset is denoted by $ c_{k} [i] = \{ w_{1} [i],w_{2} [i], \ldots ,w_{k} [i]\} $, where $ w_{1} [i] $ is the 1^st keyword in $ c_{k} [i] $. A set of candidate k_itemset denoted by $ C_{k} = \{ c_{k} [1],c_{k} [2], \ldots ,c_{k} [i]\} $ includes all candidate k_itemsets in the dataset $ I $, where i is the number of candidate k_itemset in $ C_{k} $.

Definition 4.

Frequent k_itemset. It is a set of $ k $ element keywords whose frequency is above a given support threshold. The j-th frequent k_itemset is denoted by $ l_{k} [j] = \{ w_{1} [j],w_{2} [j], \ldots ,w_{k} [j]\} $, a set of frequent k_itemset is denoted by $ L_{k} = \{ l_{k} [1],l_{k} [2], \ldots ,l_{k} [j]\} $ includes all frequent k_itemsets in the dataset $ I $, where $ j \le i $.

Definition 5.

Association k_itemset. It is a special frequent k_itemset whose confidences of all association rules are above a given threshold. The h-th association k_itemset is denoted by $ a_{k} [h] = \{ w_{1} [h],w_{2} [h], \ldots ,w_{k} [h]\} $, a set of association k_itemset is denoted by $ AS_{k} = \{ a_{k} [1],a_{k} [2], \ldots ,a_{k} [h]\} $ includes all association k_itemsets in the dataset $ I $, where $ h \le j $.

4 Parallel Association Rules for Mining Associations Keyword Sets

In order to improve the performance of mining topic relationships, we propose parallel association rule. Comparing with the traditional Apriori algorithm, the parallel association rule has two advantages:

The parallel association rule improves the mining speed of frequent keyword sets by processing public opinion data in parallel based on MapReduce paradigm, which is more suitable for big data processing.
The parallel association rule introduces the concept of association keyword sets. By calculating the confidence of frequent keyword sets from each intermediate process, we get the association keyword sets that all the association rules satisfy confidence threshold to obtain the important hidden information that Apriori algorithm ignores.

The parallel association rule divides a calculation task into $ N $ separate subtasks, each of which is $ 1/N $. Based on $ L_{1} $, each subtask implements the iteration of $ AS_{k - 1} $ to $ L_{k} $ according to the assigned k-1_items association keyword set. The global variable $ L_{k} $ is obtained by combining the results of all the subtasks and removing duplicates, obtaining the association rule and getting all the k_item association keyword set. On this basis, the next iteration is performed until the $ AS_{k + 1} $ is empty. The process for obtaining an association keyword set for a topic is shown in Fig. 1.

4.1 K_Item Frequent Keyword Set Mining

The acquisition of $ L_{k} $ consists of three processes. First, $ L_{1} $ of each topic is taken as the premise of each iteration of the corresponding topic. Second, $ AS_{k - 1} $ of each topic is divided into $ N $ subtasks, independently forming $ C_{k} $. Finally, the global variable $ L_{k} $ can be obtained according to each subtask results.

1_item Frequent Keyword Set Mining

According to the TOP keywords of all topics we filter out the corresponding data, therefore all the details of popular topics can be contained in the data we got. Therefore, specific steps obtaining $ L_{1} $ of each topic are as follows:

First, a collection of candidate 1_item keyword set is composed of all the keywords of each topic-related dataset. From Definition 3 in Sect. 3, we know that the candidate 1_item keyword set in the position of $ i $ is $ c_{1} [i] $, and the $ C_{1} = \{ c_{1} [1],c_{1} [2], \ldots ,c_{1} [t]\} $, where $ t $ is the number of all the keywords for each topic data.

Second, we scan each topic-related data, counting the frequency $ num(c_{1} [i]) $ of each $ c_{1} [i] $. Derived from (1), the support of $ c_{1} [i] $ is calculated as follows:

$$ sup\_c_{1} [i] = num(c_{1} [i])/num(I) $$

(3)

Finally, we set the support threshold $ min\_sup $. If $ min\_sup \le \,\sup \_c_{1} [i] $, then add $ c_{1} [i] $ to $ L_{1} $, otherwise discard $ c_{1} [i] $. Thus, $ L_{1} = \{ l_{1} [1],l_{1} [2], \ldots ,l_{1} [j]\} $, $ j \le i $.

Generating Candidate k _item Keyword Set

The generation of candidate keyword sets includes a joining and a pruning step. The joining step divides the $ AS_{k - 1} $ into $ N $ separate subtasks, each of which consists of one or $ m $ non-repetition $ a_{k - 1} $, where the value of $ m $ is determined by the number of $ a_{k - 1} $. Then we combine all $ a_{k - 1} $ and $ l_{1} $ one by one, independently generating $ C_{k} $ in each subtask. According to the prior knowledge that all non-empty subsets of frequent keyword sets must also be frequent. The pruning step matches all the subset of each $ c_{k} $ in $ C_{k} $ with all the x_items association keyword sets $ (1 \le x \le k - 1) $, pruning $ c_{k} $ that does not satisfy the prior knowledge and obtaining $ C_{k} $ for generating frequent keyword sets.

For example, the candidate 3_items keyword set {St.Petersburg, subway, explosion} of “the explosion of St.Petersburg” includes 2_items frequent subsets {St.Petersburg, subway}, {St.Petersburg, explosion}, {subway, explosion} and 1_item frequent subsets {St.Petersburg}, {subway}, and {explosion}. As long as any of these subsets does not match all 2_items association keyword sets in $ AS_{2} $ and 1_item frequent keyword sets in $ L_{1} $, the candidate 3_itemset must not be a frequent keyword set according to the prior knowledge and should be pruned.

Generating k _item Frequent Keyword Set

The process of obtaining the global variable $ L_{k} $ from $ C_{k} $ pruned in each subtask is similar to getting $ L_{1} $. The specific steps are as follows:

First, we scan the data set of the corresponding topic, counting the frequency $ num(c_{k} [i]) $ of each $ c_{k} [i] $. Derived from (1), the support of $ c_{k} [i] $ is as follows:

$$ sup\_c_{k} [i] = num(c_{k} [i])/num(I) $$

(4)

Second, we determine the relationship between $ min\_sup $ and $ sup\_c_{k} [i] $ according to the $ min\_sup $ set in advance. If $ min\_sup \le \,sup\_c_{k} [i] $, then add $ c_{k} [i] $ corresponding to $ sup\_c_{k} [i] $ to $ L_{k} $, denote $ l_{k} [j] $, otherwise discard $ c_{k} [i] $. Thus, $ L_{k} $ is independently generated in each subtask.

At last, the global variable $ L_{k} $ is obtained by combining the results of $ N $ separate subtasks and removing duplicates.

4.2 Association Keyword Set Mining

The support reflects the heat of discussion of the keyword set in public opinion datasets. The confidence reflects the relevance of the relationship among different keywords in one keyword set. Thus, the support and confidence of the keyword can directly indicate the relationship between the keyword set and the topic. We can filter out the association keyword set that satisfies both the support and confidence thresholds. Then the potential relations of the keyword set can be found by selecting and assembling obtained association keyword sets. The specific steps to get the k_items association keyword set are as follows:

First, we calculate the confidence of all association rules. Each $ l_{k} [j] $ of global variables $ L_{k} $ can generate multiple association rules. We set $ l_{k} [j_{[s]} ] $ as the keyword set consisting of $ s $ keywords in $ l_{k} [j] $, where $ 1 \le s \le k $, and $ l_{k} [j_{[k - s]} ] $ is the rest of the keywords set. Derived from (2), the confidence of association rule $ l_{k} [j_{[s]} ] \Rightarrow l_{k} [j_{[k - s]} ] $ is as follows:

$$ conf(l_{k} [j_{[s]} ] \Rightarrow l_{k} [j_{[k - s]} ]) = \,sup\_l_{k} [j]/sup\_l_{k} [j_{[s]} ] $$

(5)

Where $ sup\_l_{k} [j] $ is the support of $ l_{k} [j] $, and $ sup\_l_{k} [j_{[s]} ] $ is the support of the keyword set consisting of s keywords in $ l_{k} [j] $.

Second, we set the confidence threshold $ min\_conf $. If $ min\_conf \le conf(l_{k} [j_{[s]} ] \Rightarrow l_{k} [j_{[k - s]} ]) $, then save association rule $ l_{k} [j_{[s]} ] \Rightarrow l_{k} [j_{[k - s]} ] $, otherwise discard it.

Finally, it is judged whether all the rules of $ l_{k} [j] $ satisfy the given confidence threshold. If so, then add $ l_{k} [j] $ to $ AS_{k} $. If not, discard it.

5 Topic Relationships Detection Using Parallel Association Rules

We propose the PARMTRD method to detect relationships among multiple topics. Firstly, PARMTRD selects the public opinion data related to each topic. Secondly, PARMTRD applies the parallel association rule to the data of each topic to obtain the association keyword sets. Then PARMTRD carries out parallel association rules for multiple topics in parallel. Finally, PARMTRD obtains the hidden relationships among multiple topics by selecting and assembling association keyword sets.

The PARMTRD algorithm is described as follows:

Algorithm 1. PARMTRD

Input: the relevant data sets for all topics

Output: the association keyword sets for all topics

(1)
Scan the relevant data sets for all topics to filter out the data sets corresponding each topic according to the TOP keyword.
(2)
Obtain the $ L_{1} $ satisfying the $ min\_sup $ of each topic by counting the frequency of each keyword of each topic in the corresponding data sets. At this time $ k = 1 $.
(3)
Processing each topic is a subtask. Copy $ L_{1} $ corresponding each topic to each subtask, performing step 4 to step 8 for each subtask.
(4)
When $ k = k + 1 $, $ AS_{k - 1} $ is divided into $ N $ subtasks, independently forming $ C_{k} $.
(5)
Obtain the $ L_{k} $ by counting the frequency of each $ c_{k} $ in the $ C_{k} $, which satisfies the $ min\_sup $.
(6)
The global variable $ L_{k} $ is obtained by combining the results of $ N $ separate subtasks and removing duplicates, generating the association rules.
(7)
Obtain the $ AS_{k} $ by filtering out the $ l_{k} $ that all the associated rules satisfy the $ min\_conf $.
(8)
Repeat step 4 to 7 until all of $ AS_{k + 1} $ is empty, and record the maximum number of the association keyword set as n.
(9)
Obtain all association keyword sets by combining the n_items association keyword set and then removing duplicates to obtain related information from multiple topics to detect the potential relevance of multiple topics, where $ 2 \le n \le k $.

6 Experiments and Evaluation

6.1 Experimental Data

A web spider is used to collect news stories of 7 topics from 2017/4/1 to 2017/4/28, where there were 50 to 300 stories for each topic. We used Ansj to extract 10, 15, 20, and 25 keywords from each story separately, and found that extracting 15 keywords can achieve the most appropriate expression of a topic. So we selected 15 keywords from each story as experimental data. The data set for topics is shown in Table 1.

Table 1. Data set for topics relationships mining

Full size table

6.2 Evaluation

Set the Threshold of Support

Take the topic “US military strike Syria”, for example, the related keyword set of different supports are shown in Table 2 when $ min\_conf = 0.60 $.

Table 2. Associated keyword set results with different supports

Full size table

We can see that when $ min\_sup $ is between 0.10 and 0.13, the association keyword sets contain all necessary information of the topic. When $ min\_sup = 0.14 $, we will not get the association keyword sets {strike, military} and {United States, strike}, so we will miss some information on this topic. When $ min\_sup = 0.09 $, we get another association keyword set like {terror, terrorism}. This association keyword set has no relationships with other association keyword sets, so it cannot be the necessary information on this topic. Thus we take 0.13 as the support in the experiments.

Set the Threshold of Confidence

We randomly pick keywords for two topics as the experimental data, and we set 0.12, 0.15 and 0.18 respectively as support thresholds. We study the influence of related keyword sets with different support thresholds and the number of stories. Table 3 presents different support thresholds for different number of stories. With different support thresholds, we get different association keyword sets in Table 4. Then we can get the trend graph (Fig. 2) demonstrating the relationships between different confidences and the number of stories based on Table 3.

Table 3. Support thresholds in the different number of stories

Full size table

Table 4. Associated keyword sets with different number of stories

Full size table

Increasing the number of stories leads to an increase of the number of association keyword sets for a topic. If we want to filter out redundant keyword sets, we should reduce the confidence threshold while increasing the number of stories.

Increasing the support threshold will filter out some valuable keyword sets when the dataset is constant. In contrast, reducing support threshold will cause redundant association keyword sets. Thus, the polylines with higher supports are always below those with lower supports in the case of the same number of stories. That is, the higher the support threshold is, the lower the confidence threshold. In contrast, the lower the support threshold is, the higher the confidence threshold.

6.3 The Results of Topic Relationship Detection

This experiment selects the “explosion” theme as an example for three topics, which is the explosion in St. Petersburg, the explosion of a church in Egypt and the explosion of the bus with a German football team. The association keyword set of each topic can be obtained from the public opinion dataset. Table 5 presents the specific parameter settings and the experimental results.

Table 5. The parameter setting and experimental results

Full size table

We treat each keyword in the association keyword sets as a data node (only one node for repeated keywords). The keywords in the same association keyword set build up the relationship among 3 topics. Figure 3 demonstrates the topology of relationships for the three “explosion” topics.

From Table 5 and Fig. 3 we can see that there are obvious associated relationships among the three topics with the theme of “explosion”. Three topics include “explosion”, “attack”, “happen”, “Islamic State” and “terror”, reflecting that these three topics may be related to the attacks of the terrorist organization “Islamic State”. In addition, the keywords in the bus explosion {unidentified, Islamic State} can also indicate the actual cause of this case.

6.4 Comparison and Evaluation

In order to verify that PARMTRD could accurately and efficiently detect relationships among multiple topics, we compare it with the word co-occurrence graph [25].

The word co-occurrence graph is based on co-occurrence of words at the same time. Firstly, we divide the data according to time slice. Secondly, to get a keyword set for each time slice, we select the keywords according to the frequency of a keyword for the current time slice and the frequency of the keyword for the last time slice. Then the topic keyword sets are obtained by integrating the keyword sets of each time slice. Finally, we calculate the value of word co-occurrence for each two keywords.

If the value of word co-occurrence is greater than a given threshold, then we add a link between these two keywords. The experimental results are shown in Fig. 4, where each connected graph is a cluster on behalf of a topic.

The keywords of the word co-occurrence graph are clearly divided into three clusters in Fig. 4, which is very different from Fig. 3. Each cluster represents a topic, these clusters have no external relationship with each other, which means that it fails to detect the relationships among the topics. PARMTRD obtains more keywords than the word co-occurrence method and each node has more degree, as shown in Table 6, which means PARMTRD can get more information about topics and more relevance among topics than the word co-occurrence method.

Table 6. Comparison between PARMTRD and Word co-occurrence

Full size table

7 Conclusion and Future Work

Mining topic relationships from social big data is valuable to understand origination sources of specific events. There lacks research for this direction. This paper proposes a mining approach for multiple-topic relationships detection based on parallel association rules called PARMTRD. It mines the association keyword sets of multiple topics by the parallel association rules from public opinion data with low value density. It can obtain related information from multiple topics by selecting and assembling association keyword sets to detect the potential relevance of multiple topics. The experiments show that our approach can be used to discover the root causes of multiple events, with valuable information that could be not mined with other existing approaches.

Our future work focuses on the tracking and early warning of multiple topics. And we aim to grasp the trend of dynamic development and evolution of multiple topics over time in case of topic drift, and then predict the unknown public opinion behind the cause.

References

Xia, H., Yan, Z., Bowen, A.: The mechanism and influencing factors of herding effect of college students’ network public opinion. Anthropologist 23(1–2), 226–230 (2016)
Article Google Scholar
Stokes, N.: Applications of lexical cohesion analysis in the topic detection and tracking domain. Doctoral dissertation, University College Dublin (2004)
Google Scholar
Li, W., Joo, J., Qi, H., et al.: Joint image-text news topic detection and tracking by multimodal topic and-or graph. IEEE Trans. Multimed. 19(2), 367–381 (2017)
Article Google Scholar
Amayri, O., Bouguila, N.: Online news topic detection and tracking via localized feature selection. In: 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2013)
Google Scholar
Yeh, J.F., Tan, Y.S., Lee, C.H.: Topic detection and tracking for conversational content by using conceptual dynamic latent Dirichlet allocation. Neurocomputing 216, 310–318 (2016)
Article Google Scholar
Al-Maolegi, M., Arkok, B.: An improved apriori algorithm for association rules. arXiv preprint arXiv:1403.3948 (2014)
Dongre, J., Prajapati, G.L., Tokekar, S.V.: The role of Apriori algorithm for finding the association rules in data mining. In: 2014 Issues and Challenges in Intelligent Computing Techniques (ICICT), pp. 657–660. IEEE (2014)
Google Scholar
Nguyen, D., Vo, B., Le, B.: Efficient strategies for parallel mining class association rules. Expert Syst. Appl. 41(10), 4716–4729 (2014)
Article Google Scholar
Soysal, Ö.M., Gupta, E., Donepudi, H.: A sparse memory allocation data structure for sequential and parallel association rule mining. J. Supercomput. 72(2), 347–370 (2016)
Article Google Scholar
Haidar, M.A., O’Shaughnessy, D.: Unsupervised language model adaptation using LDA-based mixture models and latent semantic marginals. Comput. Speech Lang. 29(1), 20–31 (2015)
Article Google Scholar
Shan, B., Li, F.: A survey of topic evolution based on LDA. J. Chin. Inf. Process. 24(6), 43–50 (2010)
Google Scholar
Leng, B., Zeng, J., Yao, M., et al.: 3D object retrieval with multitopic model combining relevance feedback and LDA model. IEEE Trans. Image Process. 24(1), 94–105 (2015). A Publication of the IEEE Signal Processing Society
Article MathSciNet Google Scholar
Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. ACM (1999)
Google Scholar
Bassiou, N.K., Kotropoulos, C.L.: Online PLSA: batch updating techniques including out-of-vocabulary words. IEEE Trans. Neural Netw. Learn. Syst. 25(11), 1953–1966 (2014)
Article Google Scholar
He, Y., Lin, C., Gao, W., et al.: Dynamic joint sentiment-topic model. ACM Trans. Intell. Syst. Technol. (TIST) 5(1), 6 (2013)
Google Scholar
Derntl, M., Günnemann, N., Tillmann, A., et al.: Building and exploring dynamic topic models on the web. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 2012–2014. ACM (2014)
Google Scholar
Glynn, C., Tokdar, S.T., Banks, D.L., et al.: Bayesian analysis of dynamic linear topic models. arXiv preprint arXiv:1511.03947 (2015)
Gad, S., Javed, W., Ghani, S., et al.: ThemeDelta: dynamic segmentations over temporal topic models. IEEE trans. Vis. Comput. Graph. 21(5), 672–685 (2015)
Article Google Scholar
Sasaki, K., Yoshikawa, T., Furuhashi, T.: Twitter-TTM: an efficient online topic modeling for Twitter considering dynamics of user interests and topic trends. In: 2014 Joint 7th International Conference on and Advanced Intelligent Systems (ISIS), pp. 440–445. IEEE (2014)
Google Scholar
Bhadury, A., Chen, J., Zhu, J., et al.: Scaling up dynamic topic models. In: Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 381–390 (2016)
Google Scholar
Kalyanam, J., Mantrach, A., Saez-Trumper, D., et al.: Leveraging social context for modeling topic evolution. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 517–526. ACM (2015)
Google Scholar
Zeng, J., Liu, Z.Q., Cao, X.Q.: Fast online EM for big topic modeling. IEEE Trans. Knowl. Data Eng. 28(3), 675–688 (2016)
Article Google Scholar
Hu, J., Sun, X., Li, B.: Explore the evolution of development topics via on-line LDA. In: International Conference on Software Analysis, Evolution and Reengineering, pp. 555–559. IEEE (2015)
Google Scholar
Sayyadi, H., Raschid, L.: A graph analytical approach for topic detection. ACM Trans. Internet Technol. (TOIT) 13(2), 4 (2013)
Article Google Scholar
Zhao, W., Hou, X.: News topic recognition of Chinese microblog based on word co-occurrence graph. CAAI Trans. Intell. Syst. 07(5), 444–449 (2012)
Google Scholar
Wang, H., Xu, F., Hu, X., et al.: IdeaGraph: a graph-based algorithm of mining latent information for human cognition. In: 2013 International Conference on Systems, Man, and Cybernetics (SMC), pp. 952–957. IEEE (2013)
Google Scholar
Li, Y., Wang, Z., Feng, X., et al.: Micro-blog hot-spot topic discovery based on real-time word co-occurrence network. J. Comput. Appl. 36(5), 1302–1306 (2016)
Google Scholar
Zhang, C., Wang, H., Cao, L., et al.: A hybrid term–term relations analysis approach for topic detection. Knowl. Based Syst. 93, 109–120 (2016)
Article Google Scholar

Download references

Acknowledgment

The work presented in this paper is supported by the Key Program of Shandong Province (No. 2017GGX10140), National Natural Science Foundation of China (No. 61309024), National Social Science Foundation of China (No. 15CKS031), and also supported by Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

China University of Petroleum, Qingdao, 266580, China
Xin Liu, Xiaomiao Zhang, Yiwen Wang, Weishan Zhang & Shuai Cao
University of Oulu, Oulu, Finland
Jiehan Zhou
Lancaster University, Lancaster, UK
Sumi Helal
Beijing Normal University, Beijing, China
Zhidong Xu

Authors

Xin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaomiao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yiwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jiehan Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Sumi Helal
View author publications
You can also search for this author in PubMed Google Scholar
Zhidong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Weishan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shuai Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Liu .

Editor information

Editors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Hai Jin
Louisiana State University, Baton Rouge, Louisiana, USA
Qingyang Wang
Kingdee International Software Group CO., LTD, Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X. et al. (2018). PARMTRD: Parallel Association Rules Based Multiple-Topic Relationships Detection. In: Jin, H., Wang, Q., Zhang, LJ. (eds) Web Services – ICWS 2018. ICWS 2018. Lecture Notes in Computer Science(), vol 10966. Springer, Cham. https://doi.org/10.1007/978-3-319-94289-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-94289-6_27
Published: 19 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94288-9
Online ISBN: 978-3-319-94289-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

PARMTRD: Parallel Association Rules Based Multiple-Topic Relationships Detection

Abstract

Similar content being viewed by others

Discovering Concept-Level Event Associations from a Text Stream

Rule-Based Topic Trend Analysis by Using Data Mining Techniques

Clustering Based Topic Events Detection on Text Stream

Keywords

1 Introduction

2 Related Work

3 Concept Definitions

Definition 1.

Definition 2.

Definition 3.

Definition 4.

Definition 5.

4 Parallel Association Rules for Mining Associations Keyword Sets

4.1 K_Item Frequent Keyword Set Mining

4.2 Association Keyword Set Mining

5 Topic Relationships Detection Using Parallel Association Rules

6 Experiments and Evaluation

6.1 Experimental Data

6.2 Evaluation

Set the Threshold of Support

Set the Threshold of Confidence

6.3 The Results of Topic Relationship Detection

6.4 Comparison and Evaluation

7 Conclusion and Future Work

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation