Keywords

1 Introduction

The service sector stands out in the 21st century economy as an option that allows for the consolidation of production structures of countries that do not have a broad industrial base. Although these processes are not new, the service sector has entered an era where social media has become a benchmark for features, quality, and customer perception of them, as Twitter, Facebook, and social networks users express their opinions, desires, perceptions, feelings or, simply, post their experience with service companies. This document establishes a methodology for the service sector and evaluates two sub-sectors that are antagonistic but, with social media, these and other services, can come to define patterns in how to connect with their target market. The perception of service did not manifest only in the place of provision, now the dynamics of the market and the perception of the users can be understood in one place: the web. Not exclusively through a computer, but also from a mobile phone. Human mobility makes it easier for people in real-time to interact with the company that provides a service. The feeling is perceived immediately and without a filter.

Literature has several approaches to analyze feeling and opinion construction. The analysis of feelings and opinions based on the Web articulates different situations that users present when they interact in the social media [1,2,3,4]. Additionally, emotion analysis use algorithm techniques [5,6,7,8] and the processes where anxiety, emotions, and passions can direct and influence social media behaviors [9, 10]. However, they are not always articulated with specific industries and even less by companies that have directed their presence in social networks as a process of information and, occasionally, of marketing. In some countries, competitive intelligence processes have not been consolidated because the social media data mining, the feeling analysis, and the opinion construction are not very frequent. The competitive intelligence as part of knowledge management [11], not yet is an active part of the strategies of various companies and the social media data are not being fully exploited.

Consequently, it is necessary to provide appropriate data visualization tools for those companies that need to support their competitive intelligence tasks and improve the process of extracting knowledge from massive data in a reasonable time. The objective of this document is to identify non-explicit patterns, trends and interactions in data using data mining to extract non-trivial information contained in large volumes of data, machine learning techniques and data analytics that allow service companies to build competitive intelligence as part of their digital strategy and, also, make an intelligent exploitation of the data produced and disseminated in social media. The research is based on a heuristic, which establishes how the interaction of users in the service sector social media permits for several possible solutions to evaluate the guidelines and define a competitive intelligence strategy based on data from social media. To achieve this, the document is divided as follows: a first section where the methodology is presented, the data and the model, a second section that presents the results, and finally a discussion and final remarks section.

2 Method

2.1 Data and Competitive Intelligence

As a result of the Web democratization and the desire of people to participate more actively in social media, was the interest of customers to express an opinion and the emotion that comes out of a given situation. Users are motivated to participate based on their interests. Consequently, social networks became the agglomeration center of opinions and individual ideas about an issue and managed to bring them together regardless of the distance or geographical location where they were generated and consolidated the information that integrates a society [1] and created a digital community. Something that could not be developed previously due to geographical, spatial, cultural and linguistic limitations that made communication between communities more difficult.

Likewise, companies have developed strategies in social media as an initial part of a marketing process and participate in a new way of disclosing the relevant facts of their organizations, and sometimes of their products or services, but not at the same speed that is experienced in developed markets. In developing markets, the companies have not built a competitive intelligence structure that allows them to improve their position in the market or approach the users’ interaction to obtain their feedback with the brand, service quality and their corrective schemes when presented with difficulties or develop strategies that allow them to offer new products.

However, the definition of competitive intelligence from data requires first of all the evaluation of the social media contents, where opinions and feelings that allow modeling are grouped based on what is considered text mining and the schemes of data mining in social networks [12, 13]. Subsequently, competitive intelligence is defined as a continuous cycle [14] where the processes of planning, collecting, processing, analyzing, disseminating, interacting and giving feedback are a factor that can optimize the competitive intelligence process [15, 16]. Finally, the integration of data visualization and competitive intelligence facilitates the determination of which tools and best practices can be applied to the selected sectors from the analysis of the Web [11, 17, 18].

2.2 Data

The data collection is defined from different aspects. First, the analysis uses monthly periods to do data mining in the official accounts of some service companies in Colombia. Second, the analysis focuses on the main banks and some of the most important universities. Third, it was identified that the social media activity of the universities intensifies in periods at the beginning of the academic term while in the banks there is a continuity during the whole year. This data collection allows to identify the interaction of selected companies in social media about brand analytics and the quality of services, in other words, an opinion mining. From a data collection, social media act in block and sometimes there may be a contagion as of a text that dissipates over time. This contagion can considerably affect the opinion about brand and quality of its products and services, while in other cases it can simply be an individual reaction. In some cases, this contagion can be framed in the formation of information clusters that also dissipate. Additionally, with the Twitter option to create hashtag to follow the trends that can be generated at a specific time [19], the selected companies and their data do not frequently use hashtag as part of their social media communication and the clients either, as result, breaking of the network of terms presented in Figs. 1c and  2c.

Fig. 1.
figure 1

(Source: R Studio results)

Data visualization of education sector. (a) Word cloud. (b) Frequent words diagram. (c) Network of terms

Fig. 2.
figure 2

(Source: R Studio results)

Data visualization of banking sector. (a) Word cloud. (b) Frequent words diagram. (c) Network of terms.

This is one of the definitions achieved by the collection of data that is reflected in the results of mining and subsequent visualization. This allows to advance in another element such as the information ecology in social media, this ecology identifies the parameters of user interaction on Twitter and Facebook and allows defining the populations formation and performance and their interactions [20], that is, a dynamic topology that establishes the necessity for a competitive intelligence in permanent transformation based on data. However, data mining identifies the shallowness of the information ecology in the selected companies and their way of developing their social media strategy.

2.3 Model

From the interaction of service companies in social media, we made a text mining where a representative set of words is incorporated to properly identify the processes of perception and additionally the likes and retweets. We use a N-Gram modify model that allows a better recognition of terms, defined comments, variables and parameters previously established in the data mining process. For the analysis has been developed some scripts in R language and Java application to advance the tasks associated with the project, such as download the information from social networks (Twitter and Facebook), pre-processing, initial statistical analysis, model generation and data mining visualization. After the pre-processed texts can be filtered according to different criteria, for example, we can select tweets from one specific user or group. We can also filter by a given date range or messages that contain certain terms. For the selection of the most relevant words, algorithms of attribute selection were used to determine the utility and the value of the attribute [21]. Likewise, the filter algorithms were used to evaluate attributes independently of learning algorithm and enveloping algorithms that use the learning algorithm performance to determine what is desired in a set of attributes [22]. These algorithms can define a way to classify the attributes that are derived from searches in data composed by text, that is, in text classifiers or text categorizations [23, 24]. Then:

$$ H\left( C \right) = - \mathop \sum \limits_{c \in C} p\left( c \right)log_{2} p\left( c \right) $$
(1)
$$ H\left( {C|A} \right) = - \mathop \sum \limits_{a \in A} p\left( a \right)\mathop \sum \limits_{c \in C} p\left( {c|a} \right)log_{2} p\left( {c|a} \right) $$
(2)

If A is an attribute and C is the class, Eqs. (1) and (2) define the entropy class before and after observing the attribute. The amount by which the entropy decreases reflects the additional information about the class provided by the attribute and is called information gain, where each attribute Ai is assigned a score based on the gain obtained [22, 25].

3 Results

Data visualization is usually the best way to understand and analyze the results of data mining compared to other techniques because it facilitates in a simple way to understand and to analyze the knowledge [26, 27]. The visualization techniques of the neural methods to map the data are derived from the interaction of social media due to the large amount of data that is generated with the parameters and variables determined in the data mining process, allowing to classify the high amount of data and in turn to map the comparative opinions and feelings towards the selected organizations [28, 29]. The visualization of the data allows to understand the rapid changes in the market and the customer experience and, under these possibilities, a more agile strategy is articulated that takes advantage of the information in real time and identifies the events that can influence its performance. When confronting the results of two sub-sectors of antagonistic services such as banking services and higher education services, the following general results were found.

3.1 Higher Education

In the educational sector, we have the Educational Data Mining area (EDM) that emerges as a paradigm oriented to design models, tasks, methods, and algorithms for exploring data from educational settings. EDM pursues to find out patterns and make predictions that characterize learner’s behaviors and achievements, domain knowledge content, assessments, educational functionalities, and applications. In this sector we studied the possibilities to use social media analysis to propose new (and relevant) programs in universities. In the last twenty years, the universities environment has experienced a new pressure because of competition among universities, new financing models, and the introduction of business methodologies in educational systems [30]. For these reasons, some universities need to adopt new tools to address the strategy, competitive advantage, and information systems that make it possible to understand the educational environment [31]. Additionally, in the selected universities, patterns were identified in different periods at the beginning of the academic term, where there is a greater connection of students with the university and academic activities and not exclusively with the claims (see Fig. 1a). Otherwise, they are closer to greater use of hashtags.

3.2 Banking

The most important banks were selected and through data mining in different periods, several patterns were found that can be seen in the Colombian largest bank. Figure 2a shows the word cloud generated by 1283 tweets during October 2017 and concentrates mainly on transactional issues and a customer service channel, rather than on a product strategy or an accurate identification of the different customer perceptions. There is a tendency towards negative comments because it is a channel of attention and not a digital strategy of services.

4 Discussion and Final Remarks

This work allows to identify brand and opinion elements about the services and products of some companies in the service sector in Colombia. Several alternatives of text analytics were proposed that better identify the sentiment towards selected companies. The results were not very encouraging because there is a strong inclination towards negative comments that allow us to identify that companies are not taking advantage of the social media space to incorporate improvements in their competitive strategy, but that it has become a client attention space. In our opinion, the use of social media is reactionary and in a very few events it is possible to identify elements that encourage competitive intelligence. The selected service companies show a similar pattern: Not exist a strategic use of social media except for some moments where universities achieve greater interactions with users (likes, retweets). We need to continue with some activities at the current stage of our project as: First, establish a process to know which the most frequently used hashtags are; second, sort the different tweet topics using clustering approaches; third, find the more frequent bigrams to generate a graph visualization; fourth, show how can associate the different terms in the analyzed texts. Finally, find a causal relation, in a graphical way, between the social media content and new proposal of university programs. Also, the future works will extend the periods and the selected companies, making it possible to specify the patterns identified here and verify the information ecology of the social media for the service companies and the networks that are generated from the proper use of a digital strategy.