Keywords

1 Introduction

To acquire and to maintain SAW demands the characterization of the current situation, that is, what is going on with the entities of interest in the environment.

The process of situation assessment typically starts with routines of objects assessment (also known as acquisition of data for situation assessment) to perform an initial analysis of input data to find the first entities, their attributes and the relationships among them, which may be of interest to start the SAW process. In the emergency management domain, such analysis may be triggered by a crime report made by the victim or a witness (from Human Intelligence - HUMINT).

To accomplish this, HUMINT data must be retrieved and compared with known information to help systems to identify incoming useful information. However, HUMINT data are typically imperfect, due to the influence of several internal and external factors, since the level of stress until variables of the physical environment where the reporter is. Such imperfections inherent to information may introduce uncertainties into human’s mind.

To mitigate the problem of dealing with imperfect reports from humans, data must be qualified and quality meta data be used as criteria for mining and processing.

This paper presents the organization of a objects assessment process that involves techniques of Natural Language Processing (NLP) to better acquire and to process information from humans in benefit of SAW.

The paper is organized as fallow: Introduction, Related Works, The Process of Object Assessment using NLP and Information Quality Awareness, Conclusion and References.

2 Related Work

Identification of entities of interest inhuman reports is a challenging task. The acquisition of terms while performing NLP is not just about retrieving and matching a specific word or sequence of words from a data model, but it is also the assignment of a meaning to sentences in order to obtain gain more useful information that helps to describe objects of interest, their relationships and also their statuses.

Regarding semantic approaches, Reckman et al. [8] show a way to get the natural speech and attributes to the collected words, applying meaning using machine learning. Using a virtual restaurant game, they developed a technique to perform the analysis and the association of meaning to the words that was considered as unknown. Hence, they were able to develop patterns to identify items of the menu. Variations of the AI Technique developed in Recman’s works may be used to future works to improve the results of the data acquisition step.

Paralic and Kostial [9] show a different way to use ontology to perform information retrieval. According to this work, it is possible to link the information that is being analyzed with the ontology, using association of words. This work tends to be a very related to our work because to perform a good acquisition and preliminary assessment of information coming from human reports, depends on the vocabulary, that can vary frequently and have lots of different meanings to the same words. Using this technique, it may be possible to perform the identification of relevant elements even when the word was just acquired and presented with a different meaning than it was expected. This works proposes a process that may contribute to the acquisition and preliminary assessment of information provided by natural language, using techniques of natural language processing (NLP) and then identify relevant items and objects to support the other assessment routines of situations in the emergency management domain.

Known solutions proved that is possible to improve the results provided by the acquisition using ontologies that are able to specify the meaning of the words present into textual reports. However, through a quality-aware ontology, it is possible to improve the assessment by quantifying sentences while identify words and meanings and hence acquiring trustfulness to the results provided by NLP.

The preliminary assessment process is the main port to entry data to be analyzed. Incomplete or erroneous information coming from the NLP may cause problems to the situational awareness and jeopardize decision-making, resulting in threat to life and property.

2.1 General Architecture for Objects and Situation Assessment

To accomplish the objective of providing means to assess emergency situations under imperfect and ever-changing data and information, a new situation assessment architecture was proposed. The architecture is an integrated approach of syntactic and semantic objects and situation assessment routines to the emergency management domain. Figure 1 describes the process to perform objects and situation analysis starting from HUMINT data. HUMINT data are prone to failure, once they can be incomplete, incorrect and imprecise. Each failure that persists may propagate through the system and jeopardize decision-making.

Fig. 1.
figure 1

The general architecture for objects and situation assessment to improve emergency situation awareness.

To solve this problem this architecture is based on a Data And Information Quality Assessment layer, since the firsts steps until the user interface. Performing this, the information that is currently being managed, if uncertain, will be know by all phases of the assessment process, including the human operator at the user interface. The new architecture for objects and situation assessment in the context of situational awareness has the following modules:

  • Data Sources (acquisition of data): the module in charge of the acquisition of HUMINT data from the environment to feed the process. The acquisition is performed aiming data information from human reports made to the emergency response center and also from social networks.

  • Object Assessment: the main module of this work (red part of Fig. 1), regarding the identification of relevant objects, their attributes and properties which define their actions and activities among them. Such assessment is performed through a syntactical (for further information about the syntactical analysis the reader must refer to Sanches et al. [13]) and a semantic perspective to verify similarities by object’s grammatical form and also by means of their meaning. This module is better described in the next section.

  • Data and Information Quality Assessment: every assessed object pass through this module to acquire indexes that quantify them under data quality dimensions, such as completeness, precision, timeliness, consistency, relevance and uncertainty. Hence, it’s possible to verify what information is better than others and choose the best ones to feed the following modules of the process. The results of this parts are also used to improve the next evaluations of the Object Assessment. For further information regarding data quality assessment, the reader should refer to [12].

  • Situation Assessment: is the module where the qualified objects (objects with quality indexes) are analyzed to verify the synergy among to compose relations that define situations. Such analysis, as the Objects Assessment, is also performed under syntactical and semantic perspectives.

  • Information Representation: In order to improve the results from the object assessment this step give meanings to the words and sentences that are being analyzed. This stage is responsible for the representation of the semantic of all objects, attributes and their relation.

3 The Process for Objects Assessment

In this Section, the internal methods for the objects assessment approach is described, as also shown in Fig. 2.

Fig. 2.
figure 2

The process of object assessment using NLP and information quality awareness

3.1 Data Acquisition and Preparation

The acquisition of the data coming from the Data Sources, that are text transcripts from natural language speech or posts in social medias, are received for a web server and unpacked from a JSON and transformed into an object. The preparation of the information is responsible for clean the data and add specific information that are necessary to search important information into the data. Each step is described below.

As soon as the information is acquired it need to be cleared to remove slangs and words that can harm the final analysis. In order to resolve this, it was developed an application that continues the flow of the acquisition. This application receives the data at the moment that the acquisition finish the process.

Comparing each of the words received with a dictionary predefined when a slang is find it will be replaced with the right word. For instance when the letter “u” is found it is replaced with the word “you”.

The second step of preparation of the information is add to the sentence and words important information. To perform this action it was used a software called Cogroo [6], that is a parser. The entire sentence is submitted to Cogroo, after this every word will become a block of information. The result of Cogroo analysis will be similar to this: “{ “tokens”:[ { “features”:“F=S”, “POSTag”:“n”, “lemmas”:[“moto”], “lexeme”:“moto” }], “TAG”:“NP” }”.

Every word in the sentence will receive this information, “features” is compost by the genre and number of the word, male or female and singular or plural. “POSTag” is the class of the word, in this example is a “noun”. “lemmas” are the variations of the word and “lexeme” is the word that was analyzed.

3.2 Semantic Analysis

Different from the Syntactic Analysis [15] that check each word, grammar class and their combination inside the text, the function of the Semantic Analysis is give meaning to the words and sentences. This step work aside to the Syntactical Analysis, for this reason they are executed at the same time.

Words and words related to a certain keyword are submitted to an ontology created for this propose that will return the in semantic representation. Utilizing this semantic representation, it’s possible to better understanding the meaning of the sentences and words. The ontology received a word and return the words that are related to the word received.

Later, the word “guy” is submitted to the ontology, that will return words related to it, and this way causing to be possible to identify what the word “guy” mean. Once its know that this may mean the criminal, words close to this are send to the ontology, the word “stole” is sent to the ontology and then it will say that this word is related to the criminal and the action of stole something. After this analysis its possible to identify the criminal in the sentence and his action in the report.

4 Conclusion

This work proposed a new general architecture and specific objects assessment process for guiding the development of new solutions regarding the identification of objects that may be useful for situation assessment routines. Preliminary and promising results indicate a great applicability of our approach in this context.

After the Objects Assessment, the results are always submitted to the Data and Information Quality Assessment to receive data and information quality scores.

The full path of our approach to asses the information about objects for the emergency management domain can be simplified by: acquisition of the information from reports made to the police emergency response center and social networks. Then, this information is sent to the information preparation. The Information Preparation removes slangs and replace them with the complete words and remove words that can perturb the analysis. With the data prepared, they are sent to the Syntactic and Semantic Analysis.

As the last step of this process, after all the information already acquired, they are transformed into a JSON object and structured in a way that allows the next stage to understand and handle this data. The results are saved on a database and the next stage are informed with the identification of this analysis.