Predictive Dynamics: Modeling for Virological Surveillance and Clinical Management of Dengue

Rao, V. Sree Hari; Kumar, M. Naresh

doi:10.1007/978-1-4614-3961-5_1

V. Sree Hari Rao³ &
M. Naresh Kumar⁴

2097 Accesses

Abstract

Dengue fever is a flu-like illness spread by the bite of an infected mosquito and is fast emerging as a major public health concern. Timely and cost-effective diagnosis would reduce the mortality rates besides providing better grounds for clinical management and disease surveillance. Identifying the clinical features for early diagnosis of dengue would be useful in reducing the virus transmission in a community. In addition to the clinical features, obtaining the influential laboratory attributes and their range would aid in quick identification of disease severity in the suspected individuals. In this chapter a new alternating decision tree methodology which generates more accurate and simplified decision tree structures with simplified classification rules is discussed. This approach helps one to obtain the influential clinical and laboratory features which would aid in identifying the suspected dengue individuals and assess the severity of infection in them.

You have full access to this open access chapter, Download chapter PDF

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics and Outbreak Models

Prediction of dengue outbreaks based on disease surveillance, meteorological and socio-economic data

Article Open access 21 March 2019

Characterization of clinical patterns of dengue patients using an unsupervised machine learning approach

Article Open access 22 July 2019

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Dengue fever (DF) is a mosquito-borne infectious disease caused by the viruses of the genus Togaviridae subgenus Flavirus. The disease has first appeared in the Phillipines in 1953, and from then on it has become the most important anthropod-borne viral disease due to its spread among humans (Monath 1994). The reemergence of this disease worldwide is causing larger, more frequent epidemics especially in cities and in the tropics. Dengue virus infection has been reported in more than 100 countries, with 2.5 billion people living in areas where dengue is endemic (CDC 2000; Guzman and Kouri 2002; PAHO 2007) (see Fig. 1.1). Dengue is one of the major international public health concerns of World Health Organization (WHO) because of the growing geographic distribution of virus and mosquito vectors, co-circulation of multiple virus serotypes and higher frequency of the epidemics.

The disease is caused by four distinct, but closely related viruse serotypes DEN1, DEN2, DEN3, and DEN4, which are transmitted to humans through the bites of infective female Aedes mosquitoes (Gubler 1998). A person who recovers from the infection due to one of the virus serotypes would have life long immunity against that serotype but he is susceptible to subsequent infection by the other three serotypes. There is strong evidence (De Paula and Fonseca 2004; Gubler 1998; Halstead 2007; Harris et al. 2000; Monath 1994; Nimmannitya 1997; Ooi et al. 2007; Wilder-Smith and Schwartz 2005) that subsequent infections would increase the risk of more acute forms of the disease known as dengue hemorrhagic fever (DHF) and dengue shock syndrome (DSS) which could be fatal and may even lead to death. The annual occurrence is estimated to be around 100 million cases of DF and 250,000 cases of DHF. The mortality rate is around 25,000 per year (Gibbons 2002). The mortality rate is most common in children. The main pathophysiology of DHF and DSS is the development of plasma leakage from the capillaries, resulting in hemoconcentration, ascites, and pleural effusion that may lead to shock (Halstead 1998).

The clinical symptoms of dengue illness overlap with other illnesses (George and Lum 1997; Harris et al. 2000; Wilder-Smith and Schwartz 2005) causing a confounding problem in disease surveillance and management (Ooi et al. 2007). Definitive laboratory diagnosis requires isolation of the virus ribonucleic acid (RNA) by polymerase chain reaction (PCR) test, immunofluorescence, or immunohistochemistry (De Paula and Fonseca 2004; Halstead 1998; Vaughn et al. 2000). Further, the places where dengue is endemic may not have the necessary infrastructure to carry out these tests (Ooi et al. 2007). Thus, a scheme for a reliable clinical diagnosis based on the data would be useful for early recognition of dengue fever.

WHO (2009) has evolved a scheme for classifying dengue infection based on the symptoms of the disease (see Table 1.1). Halstead (Halstead 2007) reviewed the clinical diagnosis and pathophysiology of vascular permeability and coagulopathy, parenteral treatment of DHF/DSS, and suggested new laboratory tests.

Table 1.1 WHO characteristics of dengue fever

Input:(a) Data sets for purpose of decision making S(m, n) where m and n are number of records and attributes, respectively and the members of S may have missing values in any of the attributes except in the decision attribute.
(b) The type of attribute C of the columns in the data set.
(c) The number of boosting iterations T.
(d) The number of validation folds k.
Output: (a) Classification accuracy of the RNIADT for a given data set S.
(b) RNIADT consisting of a rule that is the sign of the sum of all the base rules in \( \text{class}(x)=\text{sign}({\displaystyle \sum _{t=1}^{T}r}t(x))\)
Algorithm
(1) Identify and collect all records in a data set S and split them into training and testing data sets using a k fold cross validation procedure. Denote the training and testing data sets by T _k and R _k, respectively.
(2) Consider records in the training data pertaining to a particular cross fold and impute the missing values using the following procedure.
(i)Identify and collect all records in the data record set S which have missing values in one or several attributes but not those with missing values in the decision attribute. Denote this set by M i.e. M ⊆ S.
(ii)Pick up a record R from the set M and compute its relative distances with all members of S using the procedure given in Sree Hari Rao and Naresh Kumar (2011c). Denote this set by D.
(iii) Arrange the elements of set D in an ascending order and identify the nearest neighbors using the following procedure.
(iv) (a) Compute the score α defined as follows: where {x1, x2, …, xn} denote the distances of R from R _k.
(b) Collect the data records in set S whose distances from the record R satisfies the condition α(x _k) ≤ 0. Denote this set by P.
(v)If the type of the attribute to be imputed in R is nominal or categorical, then determine the frequent item set from P using the following procedure:
(a) Find the frequency of each categorical value of the categorical attribute.
(b) The value to be imputed may be taken as the highest categorical value of the frequent item set obtained in Step (v) item (a).
(vi)If the type of attribute is numeric and non-integer, then determine the value to be imputed using following procedure.
(a) Identify and collect all non-zero elements in the set D computed in Step (ii). Denote this set by B.
(b) For each element in set B compute the quantity where g denotes the cardinality of the set B.
(c) Compute the weight matrix as \( W(j)={\scriptscriptstyle \frac{{b}_{j}}{{\displaystyle \sum _{i=1}^{\gamma }\beta }(i)}}\forall j=1,\dots,g\)
(d) The value to be imputed may be taken as \( {\displaystyle {\sum }_{i=1}^{j}P(j)\rm\rm ·\rm W(j)\rm\forall j=1,\dots,g}\)
(vii)If the type of attribute is numeric and integer, the procedure given in Step (v) is followed.
(viii) Repeat Steps (2)(i)–(vi) for every record R in the set M.
(3) Build the ADTree on the records obtained in Step (2) as follows.
(i)Initialize the rule set R₁ to consist of the single base rule whose precondition and condition are set to True P ₁ = True. The symbols P _t and R _t denote the set of preconditions and rules, respectively.
(ii)Initialize the weights of each training sample with 1 i.e.
(iii)The prediction value of the root node is calculated as. W(c) represents the total weight of the training samples that satisfies the base condition c. W+(c) and W–(c) denote the weights of those examples that satisfy the condition c and are labeled +1 or −1.
(iv)Pre-adjustment: re-weight the training instances using the formula \( {w}_{i,1}={w}_{i,0}\text{\text{e}}^{-a{y}_{t}}\) (for binary classification, the value of y _t is either +1 or −1).
(v)Perform the following steps for each boosting iteration t.
(a) For each base condition c ₁ ∈ P _t and each condition c ₂ ∈ C calculate \( {Z}_{t}({c}_{1},{c}_{2})=2(\sqrt{{W}_{+}({c}_{1}\wedge {c}_{2}){W}_{-}({c}_{1}\wedge {c}_{2})}+\sqrt{{W}_{+}({c}_{1}\wedge ~{c}_{2}){W}_{-}({c}_{1}\wedge ~{c}_{2})}+W(~{c}_{2}). \) The set of base conditions (inequalities comparing a single feature and a constant) is denoted by C.
(b) Select c ₁, c ₂ which minimizes Z _t(c ₁, c ₂) and set R _t + 1 to be R _t with addition of rules rt whose precondition is c ₁, condition c ₂ and two prediction values are \( a={\scriptscriptstyle \frac{1}{2}}I\rm{n}{\scriptscriptstyle \frac{{W}_{+}({c}_{1}\wedge {c}_{2})+1}{{W}_{-}({c}_{1}\wedge {c}_{2})+1}},\rm\rm b={\scriptscriptstyle \frac{1}{2}}I\rm{n}{\scriptscriptstyle \frac{{W}_{+}({c}_{1}\wedge \sim {c}_{2})+1}{{W}_{-}({c}_{1}\wedge \sim {c}_{2})+1}} \)
(c) Set P _{t + 1} to be P _t with the addition of c ₁ ∧ c ₂ and c ₁ ∧ ∼ c ₂
(d) Update the weights of each training example following the equation
(4) Consider the records in the testing data set pertaining to that cross fold and classify using the tree built in Step (3).
(5) Compute the percentage classification accuracy for a particular cross fold by identifying the number of correctly classified instances with the total number of instances in the testing data set.
(6) Repeat the Steps (2)–(5) for each cross fold.
(7) Compute the mean accuracy A by summing up the accuracies of each cross fold and dividing with the number of cross folds.
(8) RETURN A
(9) END

Predictive Dynamics: Modeling for Virological Surveillance and Clinical Management of Dengue

Abstract

Similar content being viewed by others

Arboviral Epidemic Disease Forecasting—A Survey on Diagnostics and Outbreak Models

Prediction of dengue outbreaks based on disease surveillance, meteorological and socio-economic data

Characterization of clinical patterns of dengue patients using an unsupervised machine learning approach

Keywords

1 Introduction

2 Dengue Virus Biology

2.1 Life Cycle of Dengue Virus

3 Transmission of Dengue Virus

4 Clinical Epidemiology

4.1 Pathological Features

4.2 Serotypes

4.3 Symptoms

4.4 Diagnosis

4.5 WHO Guidelines for Diagnosis of Dengue

5 Knowledge Extraction Methods

5.1 Missing Values: Concerns

5.2 Statistical Procedures

5.3 What Are Decision Trees?

5.4 How to Generate and Interpret an Alternating Decision Tree?

5.5 What are Influential Attributes?

5.6 How to Extract the Influential Attributes?

5.7 How to Identify Optimal Feature Subsets?

5.7.1 Genetic Search

5.7.2 Particle Swarm Optimization Search

5.8 Does Descretization of Numeric Attributes Improve Decision Making?

5.8.1 Discretization Methods

5.9 Standard Classification Methods

5.10 Performance Metrics for Comparing Machine Classifiers

5.11 Data Set

6 A Predictive Modeling Strategy

6.1 Predictive Clinical Features in Children

6.2 Predictive Clinical Features in Adults

6.3 Predictive Clinical and Laboratory Features in Children

6.4 Predictive Clinical and Laboratory Features in Adults

6.5 Identifying Predictive Clinical and Laboratory Features Using Feature Selection Methods

7 Comparisons of Methodologies

8 Conclusions and Discussion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Algorithm 1: The RNIADT Algorithm (Sree Hari Rao and Naresh Kumar 2011c)

Algorithm 1: The RNIADT Algorithm (Sree Hari Rao and Naresh Kumar 2011c)

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation