Keywords

1 Emotional Machines

1.1 Why Design Emotion into Automated Systems?

Man-made systems exhibiting emotional behaviors intuitively similar to those exhibited by humans occur frequently in popular literature [3]. Such automata provide a convenient literary device for portraying bald human behaviors that are uninformed by human experience, and therefore uncomplicated by the checks and balances that humans acquire as they mature. However, to the extent fiction is successful in distilling behaviors in this way, its characterizations becomes less authentic avatars for humans.

Automated systems are not endowed with what most people would regard as “emotions” unless these are essential to system effectiveness. Using a human voice in the interface to a query system [4] might not be essential to its proper functioning, but adding the appearance of emotion is probably a great help to gaining acceptance among a tech-skittish user base. Adding human-like “emotions” to a launch-control system makes an interesting movie concept [5], but no engineering sense.

Humans think of emotion as being somewhat spontaneous, presumably because of its unpredictability. Unpredictability can be emulated in an automated system by the introduction of random elements into its processes. Interestingly enough, such unpredictability is an essential aspect of “human intelligence”. A system that always answers the same questions in exactly the same way will certainly not pass the Turing Test [6].

Must “emotions” in automated systems necessarily be of the intuitive “human-like” variety? Certainly not. To further refine what “emotion” in an automated system is and does, we consider in the next section the relevant system engineering principles.

1.2 A Model for Emotion in Automated Systems: Functional and Performance Requirements

System Engineers design systems to satisfy specified user requirements. These requirements are of two types:

  • Functional Requirements: WHAT the system must do

  • Performance Requirements: HOW the system must do it

Under a behaviorist model, functional requirements can be thought of as stimulus-response pairs linking system states to system responses: “When the engine gets hot, cool it down” might become the pair (H → C), which is a rule informing the action of the system: when “Engine Hot” is the value of a particular system state variable, set a particular state variable’s value to “Activate Engine Cooling System”.

Such a rule needs more than just vague concepts like “hot” and “cool” for effective control (but, consider Fuzzy Logic [7]). Therefore, performance requirements are introduced to quantify, refine, and condition functional requirements. In the (H → C) scenario, there would likely be associated performance requirements that call out specific temperature thresholds, and the specific type and aggressiveness of cooling the system is to perform.

These two types of requirements provide the basis for our understanding of “emotion” in automated systems. Hard principles will be likened to functional requirements specifying unquantified and unrestricted cause-and-effect operational rules. Emotional principles will be likened to performance requirements, which amend functional requirements by appending conditions which must be satisfied when the rule is applied.

Performance requirements can also be thought of as a means of establishing secondary system goals. For example, an autopilot system has a primary goal of getting an airplane from A to B, and a secondary goal of maintaining passenger comfort. The “emotion” here, such as it is, appears in the form of derived requirements like, “Do not subject passengers to excessive G-forces”. The system is effective in getting passengers to their destination in a “kind” way (Fig. 2).

Fig. 2.
figure 2

A single EMOS/NOOS decision system. It scans the data state S, applies the EMOS and NOOS Rule Bases independently, then adjudicates the two recommendations. If cognitive dissonance is present, DISRAT is invoked to negotiate an agreement to produce a final recommendation to update P.

2 Experiment Concept

A “double-minded” decision system was developed having two Knowledge-Based System components: the EMOS and NOOS Knowledge-Based Systems described above. These two system components use the same decision-making algorithm, but different heuristics: to assess and combine facts, EMOS applies “soft emotional” factors, while NOOS applies “rigid principled” factors. Not surprisingly, the decisions they produce, given identical inputs, can be quite different.

Because the machines’ preferences are numerically specified, the level of cognitive dissonance that arises in the double-minded machine as a whole can be quantified using the difference between the components’ “commitments” to their separate conclusions. Further, for each preference, each machine has a numerical “psychological inertia” quantifying its reluctance to compromise on each of its preferences.

We selected a jury in a criminal trial as the domain for experiments with this system, the task to be performed is to consider facts-in-evidence and apply the rule bases to produce verdicts of “guilty”, “not guilty”, or “deadlocked”. This scenario offers a fundamentally binary decision problem that can be easily understood without special knowledge. To make interpretation of the machine’s decision processes transparent, EMOS and NOOS were developed with a conclusion justification capability by which each can express (in natural language) how the pieces of evidence affected their decisions.

Once the double-minded machine produced a verdict, reinforcement learning was applied to the machines’ preferences to reduce the measured cognitive dissonance to force a compromise. This was done subject to the condition that the psychological cost (given by the psychological inertia terms) would be kept small.

Because the case being tried involves 8 possible evidentiary facts, each of which could be present or absent, there are 28 = 256 possible evidence suites a trial could present. All 256 cases were presented to the system, and the results tabulated.

3 Knowledge-Based Systems

3.1 Using Rules to Accrue Evidence

For simplicity and definiteness, the reasoning problem will be described here as the use of evidence to select one or more possible conclusions from a closed, finite list that has been specified a priori (the “Classifier Problem”).

Expert reasoning is based upon facts (colloquially, “interpretations of the collected data”). Facts function both as indicators and contra-indicators for conclusions. Positive facts are those that increase our belief in specified conclusions. Negative facts are those that increase our disbelief in specified conclusions. Negative facts can also be thought of as being exculpatory: they impose constraints upon the space of conclusions, militating against those unlikely to be correct. Facts are salient to the extent that they increase belief in the “truth”, and/or increase disbelief in “untruth”.

Pieces of evidence are quantified by how they alter beliefs, independent of other pieces of evidence. That is, by the belief held when that piece of evidence is the only one known.

A rule is an operator that uses facts to update beliefs by applying biases. In software, rules are often represented as structured constructs such as IF-THEN-ELSE, CASE, or SWITCH statements. We use the IF-THEN-ELSE in what follows since they correspond to familiar colloquial processes.

Rules consist of an antecedent and a multi-part body. The antecedent evaluates a BOOLEAN expression; depending upon the truth-value of the antecedent, different parts of the rule body are executed.

The following is a notional example of a rule for classifying animals based upon their various attributes (features). It is intended to mimic the amount by which a human expert would alter her beliefs about an unknown animal should she determine whether or not it is a land-dwelling omnivore:

figure a

Using an INCREASE BELIEF function, and a DECREASE BELIEF function (“aggregation functions”, called AGG in Fig. 3 below), many such rules can be efficiently implemented in a looping structure. Such functions can be thought of as “update methods”, since they specify how existing beliefs are to be updated when new evidence is obtained.

Fig. 3.
figure 3

Multiple rule execution loop to accumulate beliefs and disbeliefs for feature vector v

More generally, define, as demonstrated in Fig. 3:

The process depicted in Fig. 3 creates for each feature vector v an ordered- tuple of positive class-membership beliefs Belief(k,v) = (b(v,1), b(v,2), …, b(v,K)), and an ordered-tuple of negative class-membership beliefs Disbelief(k,v) = (d(v,1), d(v,2), …, d(v,K)). These two vectors are then combined (“adjudicated”) for each feature vector to determine a classification decision. For example, they can be adjudicated by simple differencing to create the vector of class-wise beliefs, B:

$$ \varvec{B}\left( v \right) = \left( {Belief\left( {v,1} \right) {-} Disbelief\left( {v,1} \right) , \ldots , Belief\left( {v,K} \right) {-} Disbelief\left( {v,K} \right)} \right) $$

This ordered-tuple of beliefs is the belief vector for the input feature vector v.

A final classification decision can be chosen by selecting the class having the largest belief. Confidence factors for the classification in several ways (e.g., the difference between the two largest beliefs).

Clearly, the inferential power here is not in the rule structure, but in the “knowledge” held numerically as biases. As is typical with heuristic reasoners, BBR allows the complete separation of knowledge from the inferencing process. This means that the structure can be retrained, even repurposed to another problem domain, by modifying only data; the inference engine need not be changed. An additional benefit of this separability is that the engine can be maintained openly apart from sensitive data.

3.2 Combining Accrued Evidence: Aggregation Methods

We propose a simple aggregation method that has many desirable properties; it can be modified to handle monotonicity or reasoning under uncertainty. Because it has a principled basis in probability theory, it can give excellent results as a classifier, and can be extended in a number of useful ways.

Proper aggregation of belief is essential. In particular, belief is not naively additive. For example, if I have 20 pieces of very weak evidence, each giving me 5% confidence in some conclusion, it would be foolish to assert that I am 100% certain of this conclusion just because 20 × 5% = 100%.

Important in all that follows, it is required that biases be in [0,1].

Illustrative Example of Belief Aggregation. Two rules, r1 and r2, having positive biases b1 and b2, respectively, are applied in sequence:

Belief(v,1) = 0.0 ‘the belief vector is initialized to the zero vector

Rule 1:

Belief (v,1) = AGG(Belief(v,1), b1)) ‘accrue belief bias = b1

Rule 2:

Belief (v,1) = AGG(Belief(v,1), b2)) ‘accrue belief bias = b2

What will the aggregate belief be after both rules are run? We define the simple aggregation rule for this two-rule system as the “probability AND”. The aggregate belief from combining two biases, b1 and b2, is:

$$ rules r_{1} \,and\, r_{2} \,both\, run, \,AGG = b_{1} + b_{2} \left( {1 - b_{1} } \right) = b_{1} + b_{2} - b_{1} b_{2} = 1 - \left( {1 - b_{1} } \right)\left( {1 - b_{2} } \right) $$

If b1 and b2 separately give me 50% belief in a conclusion, after aggregation using this rule my belief in this conclusion is:

$$ 1 {-} \left( {1 - 0.5} \right)\left( {1 - 0.5} \right) = 0.75 = 75\% $$

(If this rule is applied for the case of twenty, 5% beliefs, we arrive at an aggregate belief of about 64% far from certainty.)

This simple aggregation rule says that we accrue additional belief as a proportion of the “unused” belief.

If a third rule r3 with belief b3 fires, we find:

$$ \begin{array}{*{20}l} {aggregate\, belief \left( {\left( {r_{1} \,and\,r_{2} } \right)\,and\,r_{3} } \right) = } \hfill \\ {\left( {b_{1} + b_{2} \left( {1 - b_{1} } \right)} \right) + b_{3} (1 - \left( {b_{1} + b_{2} \left( {1 - b_{1} } \right)} \right) = b_{1} + b_{2} + b_{3} - b_{1} b_{2} - b_{1} b_{3} - b_{2} b_{3} + b_{1} b_{2} b_{3} } \hfill \\ { = 1 - \left( {1 - b_{1} } \right)\left( {1 - b_{2} } \right)\left( {1 - b_{3} } \right)} \hfill \\ \end{array} $$

In general, firing J rules rj having isolated beliefs bj gives aggregate belief:

$$ aggregate\,belief \left( {r_{1} \,AND\, r_{2} \,AND\, \ldots \,AND\, r_{J} } \right) = 1 - \varPi_{j} \left\{ {1 - b_{j} } \right\} $$

The aggregate belief can be accumulated by application of the simple aggregation rule as rules fire. For, if J−1 rules have fired, giving a current belief of b, and another rule rJ having isolated belief bJ fires, the simple aggregation rule gives a new belief of b + bJ(1−b), which is easily shown to be in agreement with the above.

The simple aggregation rule is clearly independent of the order of rule firings, assumes values in [0,1], and has partial derivatives of all orders in the biases bj. In fact, because…

$$ \partial (1 \, - \varPi_{m} \left\{ {1 - b_{n} } \right\})/\partial b_{n} = \, (\varPi_{m} \left\{ {1 - b_{m} } \right\})/\left( {1 - b_{n} } \right) $$

… all partials having multi-indices with repeated terms are zero.

Important: biases accrued must be in [0,1]. The aggregation rule defined here will not work if negative biases are accrued. This is why we accrue positive belief and positive disbelief, then difference them.

Given a set of data having ground truth tags, an iterative cycle using this (or a similar) update rule can be used to learn the beliefs and disbeliefs that will cause the heuristics to give correct answers on the training set (Fig. 4):

Fig. 4.
figure 4

Learning Loop: how trainable systems learn from data tagged with “ground truth.”

Note that if a belief bj = 1 is accrued, the aggregate belief becomes and remains 1, no matter what other beliefs might be accrued. Similarly, accruing a belief bj = 0 has no effect on the aggregate belief. These are both consistent with intuitive expectations.

4 Experimental Data

To establish psychological “engrams” for EMOS and NOOS, researchers filled out a standard form specifying how each mind-set, EMOS and NOOS, might approach jury membership (Priors favoring guilt or innocence), and how reluctant they would be to change these to achieve a compromise verdict (Fig. 5).

Fig. 5.
figure 5

Human experimenter can specify the prior beliefs of EMOS and NOOS, and the level of their resistance to adjusting them during negotiations to reduce cognitive dissonance.

In essence, these priors quantify the minds’ assessment of the law enforcement process. Those with relatively higher priors favoring a guilty verdict would be more confident that defendants arrested, charged, and brought to trial are actually likely to be guilty (Fig. 5).

To further fill out the psychological “engrams” for EMOS and NOOS, researchers filled out a standard form specifying how each mind-set, EMOS and NOOS, might assess the relative importance and evidentiary significance of particular facts in evidence (Fig. 6).

Fig. 6.
figure 6

Establish the relative evidentiary importance of Contingency 6; set up the justification statements to be reported by the KBS when considering Contingency 6; and determine the resistance to changing the Contingency 6 biases in order to reduce cognitive dissonance. (Experiments are based upon what brains are given as evidence; admissibility is not relevant).

For this work, eight possible facts in evidence are used. Each fact can be either true or false. Because of this, it was felt that the term “fact in evidence” could be confusing, so the term “contingency” was selected to designate each piece of evidence. These were expressed as statements, known to be either true or false. The eight contingencies are:

  1. 1.

    TRUE or FALSE: The Defendant has a criminal history.

  2. 2.

    TRUE or FALSE: The Defendant has been identified as the perpetrator by eye witnesses.

  3. 3.

    TRUE or FALSE: There is forensic evidence that ties the Defendant to the crime.

  4. 4.

    TRUE or FALSE: The Defendant had a motive for the crime.

  5. 5.

    TRUE or FALSE: The Defendant is a member of a minority racial or religious group.

  6. 6.

    TRUE or FALSE: The Defendant is under 26 years of age.

  7. 7.

    TRUE or FALSE: The Defendant has a history of gang membership.

  8. 8.

    TRUE or FALSE: The Defendant dropped out of high school.

Finally, the amounts by which EMOS and NOOS individually adjust their beliefs in light of the truth and falsity 9the “biases) of the eight contingencies was specified by researchers. This had to be done for each brain, for each contingency, for each conclusion. Many different specifications were run as experiments, primarily to assess the sensitivity of the brains to the various factors. To avoid inconsistency, (“schizophrenic” machines), each set of parameters for an experiment was prepared by one researcher (Fig. 7).

Fig. 7.
figure 7

Establish the “biases” that NOOS will apply when considering Contingency 6.

5 Details for a Specific Case: #155

Figures 8 and 9 below summarize the results for a particular case run: Case #155. The facts in evidence for this case are:

Fig. 8.
figure 8

Execution trace for Case #155 showing how EMOS used the evidence to arrive at its belief that the defendant in this case is innocent.

Fig. 9.
figure 9

Execution trace for Case #155 showing how NOOS used the evidence to arrive at its belief that the defendant in this case is guilty.

  • TRUE: The Defendant has a criminal history.

  • FALSE: The Defendant has been identified as the perpetrator by eye witnesses.

  • FALSE: There is forensic evidence that ties the Defendant to the crime.

  • TRUE: The defendant had a motive for the crime.

  • TRUE: The Defendant is a member of a minority racial or religious group.

  • FALSE: The Defendant is under 26 years of age.

  • TRUE: The Defendant has a history of gang membership.

  • TRUE: The Defendant dropped out of high school.

EMOS believes that this defendant is innocent, while NOOS believes this defendant is guilty. After adjudication, this defendant was found innocent (Fig. 10).

Fig. 10.
figure 10

EMOS and NOOS disagree, so the verdict is decided if favor of the most confident brain.

When EMOS and NOOS are adjudicated, the defendant is found innocent (Fig. 10):

Each case represents a unique combination of evidence features. As an experiment, each of these 255 combinations of evidence was tried 30 times, and the proportion of guilty and innocent verdicts rendered by the double-minded BOT were tabulated. Data were then sorted (ascending) in the proportion of GUILTY verdicts rendered for that combination. The proportions of GUILTY and INNOCENT verdicts are plotted above in this sorted order (Fig. 11).

Fig. 11.
figure 11

Plots are shown sorted in ascending order of proportion of joint GUILTY verdicts. Plots show results of running each of the 255 combinations of evidence 30 times. On the left are plotted the proportions of verdicts rendered by EMOS and NOOS jointly, after adjudication. Proportions of verdicts rendered by EMOS, NOOS before adjudication for 30 runs of the 255 combinations of cases on right.

The same plot for EMOS and NOOS separately are on the right. With the settings defining EMOS for this experiment, the emotional reaction to the evidence resulting in a larger proportion of GUILTY verdicts, while NOOS seems relatively balanced. Together, EMOS AND NOOS moderated to rendering verdicts of guilty about 1/3 of the time.

6 Conclusions

This investigation has demonstrated that specifics aspect of unstructured mentation can be modeled as an efficient KBS. Further, the interplay between these aspects can be adjusted, and the affects observed.

Additional experiments (not detailed in this paper for the sake of brevity) used the psychological inertia terms to negotiate compromises between EMOS and NOOS in such a way that the total psychological cost (sum of inertia terms) could be minimized. It is an interesting and somewhat surprising fact that the approach taken to resolve cognitive dissonance is both conceptually similar to how humans might behave and is also an NP-Complete problem. In fact, minimizing the psychological cost during negotiation by the EMOS and NOOS brains is an instance of the Knapsack Problem [8].

The sensitivity of the EMOS – NOOS combination to small changes in bias settings was not as significant as had been anticipated, though some care was required in assigning biases to factors depending upon their specific manner of use.

The software implementation of the EMOS-NOOS KBS was very computationally efficient, requiring an average of only 13 microseconds to hear a case, adjudicate the results, and write the findings to disc, (single core INTEL i3 processor). This efficiency facilitated the execution of a large number of experiments.

The modeling approach here can be made trainable. Given a set of desired adjudicated verdicts, machine learning can be used to establish biases and priors that will, to the extent possible while maintaining model consistency, return those verdicts (and we have done this).

This also suggests a forensic application for a KBS model of a multi-component decision system. The internal hidden variables leading to a particular set of decision outcomes could be estimated by:

  • holding known KBS parameters fixed

  • training the KBS components so that the target decision are rendered

In this way, plausible estimates for hidden variables are obtained.

7 Future Work

The larger concept is to allow the decision system to move through time (Fig. 11). The upper track represents state variables sampled periodically from the external world and coded as the state sequence Pk. The lower track represents the decisions of the system coded as the state sequence Sk. The decision system also has available to it (through the diagonal connections) a history of prior input data and state estimates. This is to facilitate adjustment of the EMOS and NOOS rules bases, based upon a domain ontology (Fig. 12).

Fig. 12.
figure 12

Making the system Time-Aware: Time increases from upper-left to lower-right.