Keywords

1 Introduction

Mobile augmented reality (MAR) is a media form that “augments” the real world (or its representation) with virtual objects. Since the augmented virtual objects are situated in the “real” world, it is intended to be used as a mobile system. The recent advancement in mobile computing, spearheaded by the smartphones and similar embedded systems, has made it possible for the AR to become truly mobile. The same goes for the display which has to be convenient to use for carry them or put on to use. There have three typical display systems used: (1) hand-held video see-though (smartphone) LCD as is, (2) video see-through (smartphone) LCD inserted into and isolated with the cardboard case and magnifying lenses, and (3) optical see-though (glass-like) displays. Recently, an alternative form has appeared in the market in which the magnifying lenses are simply clipped onto the smartphone. The four displays differ in few ways (see table below), which in turn can affect the levels of their usability, presence and immersion. Understanding of these relative qualities is important in assessing the proliferation possibility of the consumer-level AR by the right type of display and platform (e.g. cost/benefit/usability).

In this paper, we examine and compare the levels of usability, presence and immersion as provided by these four different display configurations of Mobile AR. We also control another related factor, the amount of environment light which is different and affects the display quality, carrying out the experiment under three different conditions: indoor (office-level lighting), outdoor (medium sun light), or outdoor (bright sun light).

Note that in this comparison, ideally, the optical see-through or glass type display would be considered as the base line with the best usability and probably the highest user experience, however, the current level of technology unfortunately does not guarantee either the wearability like the regular glasses or the augmentation image quality. Instead, the bare smartphone based AR (first row in Table 1), with the commercial success of the Pokemon Go [1] and the proven usability, could be used as the base line for the relative evaluation and comparison.

Table 1. Characteristics of four typical mobile AR displays

2 Related Work: User Experience in AR

The most typical platform for the Mobile AR (M-AR) is perhaps the smartphone, which now is equipped with a high resolution display and camera, sufficient computing and networking capability and other sensors to digest the needs of AR, e.g. self-contained, convenient, inexpensive, and targeted for casual use. Immersive mobile VR platform such as the Cardboard type (cheap lens equipped headset into which a smartphone can be inserted) too can serve as an alternative, which can offer the augmented imagery of different quality with the immersive isolation from the real world and magnified imagery with almost matched scale (vs. viewing the smartphone from a nominal usual arm-length distance). The recent open flip-on lenses offer similar features except for the real world isolation [2]. All of these displays are what is called the video see-through systems, which uses the live camera image as the backdrop to the augmented imagery that the user sees. Such video see-through systems generally offer, through computer vision, tracking and image processing techniques, more accurate object-augmentation registration and even image manipulation for harmonization. However, it can suffer from image quality (limited resolution and field of view), processing time (leading to latency), and focus problems.

On the other hand, the optical see-through glass has been envisioned to be the ultimate display for AR [3]. For one, it preserves the richness of the real world as seen with the right focus by the naked eyes. However, the accurate alignment and registration of the virtual objects onto the real object is difficult, and require often cumbersome calibration process. The optical and projective display systems still lack the technological sophistication to make natural looking renderings, often perceived as ghost-like images in the presence of bright environment light, not to mention seen with a fixed focus distance. Finally, the state of the art AR glasses still do not possess the ever-wished form factor of the regular vision glasses yet, being bulky and significantly heavy. Obviously, different levels of usability and user experience are expected from these displays, further compounded by the environment conditions.

There have not been much studies on the important factors that affect the user experience for AR. By contrast, there have been an extensive line of studies on what types of elements and how they affect the level of presence and immersion (the sense of the user feeling to be inside the virtual world, different than the real one that user is in [4]), one of the main objective of VR content, in the context of VR. For example, the display type is regarded one of the more important system oriented factors that affect the level of presence and usability/UX. The display type can be further characterized and explained in terms of the resolution, stereoscopy, display size, field of view (FOV), world isolation and other convenience or ergonomics related factors (e.g. headset weight). However, in AR (even though AR might be treated as one type of VR), user presence is perhaps ill-defined since AR is already used in the real world where the user is. Nevertheless, the sense of user presence or immersion can still be somewhat affected as the augmented real world, in various lighting condition, is seen through the “framed” display system. Several literatures also point to the concept of “object” presence, as a way of assessing or evaluating AR systems [5]. The object presence refers how much the virtual augmentation feels to be realistic, physical, actually part of the real world, natural and harmonious.

In our study, the focus is mainly on the effect of the display size, FOV and world isolation (and amount of ambient light) and the field of view with regards to the extent of how much and how the outer real environment is visible in the background. In VR, studies have shown that a higher level of immersion and presence is obtained through a display with a large size/FOV and high resolution, isolated from the distraction of the outer world [6]. Whether the same applies to AR remains to be seen in this study.

3 Experiment

3.1 Experiment Design

The experiment examines the levels of user felt immersion and presence and general usability in four different display configurations of mobile AR (also see Table 1): (1) hand-held video see-though (smartphone) LCD – “PhoneAR”, (2) video see-through (smartphone) LCD inserted into and isolated with the cardboard case and magnifying lenses – “ClosedAR”, (3) video see-through (smartphone) LCD with flip-on lenses – “EasyAR”, and (4) optical see-though (glass-like) displays – “OpenAR”. As we project that the environment background condition to be an important factor, we test and compare these platforms under three different lighting conditions: (1) “Indoor” at office level luminance without extreme or direct sunlight, (2) “Outdoor low” – at usual outdoor daylight luminance, but without direct sunlight toward the screen, and (3) “Outdoor high” – at outdoor daylight luminance under direct sunlight toward the screen and operating environment. In summary, the experiment was designed as a 4 × 3 (resulting in 12 different testing conditions) within subject repeated measure (see Table 2).

Table 2. Twelve experimental conditions from the two factors.

To make sure the user is able to get as much sense of the augmented reality space as affected by the seam between the main display (whose video background shows part of the real space), and the rest of the real environment seen in the periphery, and the given environment light condition, we set the experimental task as a navigated viewing of the immediate environment with 8 augmentation objects scattered 360° around the initial user position (see Fig. 2). After the navigation, the user’s sense of immersion, object presence, general usability and various aspects of the user experience were assessed through a survey.

We hypothesized that ClosedAR, OpenAR and EasyAR would be regarded more immersive, with higher user/object presence compared to PhoneAR. In addition, we had expected that EasyAR would show a similar level of presence and UX at least as ClosedAR, and also even higher than OpenAR under the direct sunlight (OOh).

3.2 Experimental Set-Up

PhoneAR was implemented and viewed (at a nominal arm length) on the Samsung Galaxy S8 smartphone [7] using the Unity [8] and marker recognition module from Vuforia [9]. The same went for ClosedAR and EasyAR except that the former used the Samsung GearVR [10] for the display (into which the smartphone was inserted) and the latter used the flip-on lenses from Homido [2]. OpenAR was implemented on the Microsoft Hololens (same development environment). Viewing the marker augmented objects and navigating around the test augmented reality scene with different devices are illustrated in Fig. 1. The three different lighting conditions and the scattered object placements are shown in Fig. 2.

Fig. 1.
figure 1

Viewing the augmented reality scene using the four different display configurations of AR (1) PhoneAR, (2) ClosedAR, (3) EasyAR and (4) OpenAR (from the left).

Fig. 2.
figure 2

Three different lighting conditions for the test augmented space: Indoor (left), Outdoor low (middle) and Outdoor high (right).

The AR space the user viewed and navigated were placed with 8 objects (augmented on side of the markers) in a circular fashion around the initial user position (see Fig. 2). The objects (e.g. fire hydrant, bottle, etc.) were scaled to their actual life sizes for as much realism. The markers (or augmentation objects) were put on at around 1.2 m above the ground (on chairs/boxes) so that the user could view them closely without much difficulty while standing.

One of the main differences among the four display systems were their field of view. Although the view into the real world is open in PhoneAR, the display itself, when held and viewed from the arm length, was about 23–30°. Similarly (peripheral view into the real world open), the magnified imagery of EasyAR had a much larger FOV at around 76.5°. Both of these displays, being open have the overall FOV to that of the human. ClosedAR had around 96° of FOV but the rest of the visual periphery was shut (black). OpenAR has the full human FOV, however, the portion for augmentation covered only about 30°. However, the objects were sized and augmented such that the entire object could be seen at once without being clipped. Figures 3, 4, 5 and 6 show the augmented views in the 12 different testing conditions.

Fig. 3.
figure 3

Looking at an augmented object (bottle) with PhoneAR, Indoor (left), Outdoor low (middle), and Outdoor high (right).

Fig. 4.
figure 4

Looking at an augmented object (bottle) with ClosedAR, Indoor (left), Outdoor low (middle), and Outdoor high (right) – p

Fig. 5.
figure 5

Looking at an augmented object (bottle) with EasyAR, Indoor (left), Outdoor low (middle), and Outdoor high (right).

Fig. 6.
figure 6

Looking at an augmented object (bottle) with OpenAR (Hololens), Indoor (left), Outdoor low (middle), and Outdoor high (right).

3.3 Detailed Experimental Procedure

Twelve people (mean age = 23) participated in the experiment. Most of them had prior AR experiences using the smartphone such as the Pokémon GO. We first collected the subjects’ background information and had them fill out the informed consent forms. Then, the subjects were briefed about the purpose of the experiment and given instructions for the experimental task. Each participant asked to stand in the middle of the test augmented space (see Fig. 2) and was given one of the four display system (held in hand or worn) with which one went around the space and browsed through the eight augmented objects for 2.5 min with 1 min break between each treatment. The test condition was administered in the balanced Latin square fashion. The whole experiment took about an hour.

After each condition, the participant filled out survey which contained four categories of questions for evaluating the AR user experience (see Table 3): (1) user felt presence and immersion, (2) object presence, (3) basic usability and (4) preference and overall satisfaction – all answered in the 7 level Likert scale (1: negative ~ 7: positive). In particular, object presence refers how much the virtual augmentation objects felt to be realistic, physical, actually part of the real world, natural and harmonious. The preference was asked after the user experienced all the treatments.

Table 3. The survey assessing various aspects of the AR experience, all answered in the 7 level Likert scale (1: negative ~ 7: positive).

The experiment was held in three different places according to the prescribed lighting conditions, but all located very closely for almost immediate proceeding to the next. Each participant was compensated with ten dollars.

4 Results

The one-way ANOVA/Tukey HSD were applied to statistically analyze for any effects of the control factors to the various AR experience survey questions. We only highlight and report the main results.

4.1 User Presence and Immersion

The effects toward overall presence and immersion scores by the display type is shown in Fig. 7. Significant differences were only found between PhoneAR, ClosedAR, EasyAR and OpenAR. Our expectation of PhoneAR to exhibit the lowest presence and immersion, while ClosedAR and EasyAR to show similar levels was validated only partially. OpenAR showed the lowest level most likely attributed to its small augmentation FOV, bad image projection quality, and low usability (see other results). The lighting condition did not produce any significant differences.

Fig. 7.
figure 7

A one-way ANOVA performed on the factor of display type (left) and lighting condition (right) for level of presence/immersion (P5 + P6).

4.2 Object Presence

Figure 8 show the effects toward augmentation object presence scores among the four display types by the one-way ANOVA. The analysis indicated, similarly to the case of user presence/immersion, OpenAR exhibited significantly lower object presence than the other three, possibly for the same reason. In fact, the response to O5 (object realism) is likewise significantly lower for OpenAR. The lighting conditions again had no effects.

Fig. 8.
figure 8

A one-way ANOVA performed on the factor of display type for level of object presence (O3 + O4) and object realism (O5).

4.3 General Usability and Satisfaction/Preference

There were seven major usability questions: U1: ease of use, U2: comfort, U3: suitability, U4: social awareness/unacceptance, U5: fatigue, U6: effect of the peripheral view, U7: effect of the lighting condition. Figure 9 shows the results. Only OpenAR and ClosedAR was considered generally relatively unusable in terms of the ease of use, comfort, suitability, and fatigue (PhoneAR, EasyAR > ClosedAR > OpenAR). PhoneAR, as expected, showed the highest level of social acceptance. Peripheral view and lighting condition brought about no significant differences. Such a trend clearly points to the possibility that the user experience in AR is heavily dependent on good basic usability. User and object presence is perhaps of less importance compared to the case of VR. Again a similar trend was found with regards to the general satisfaction and relative preference, correlating to the effect of the usability of display types (Fig. 10).

Fig. 9.
figure 9

A one-way ANOVA performed on the factor of display type for level of six categories of usability.

Fig. 10.
figure 10

A one-way ANOVA performed on the factor of display type for general satisfaction and preference.

5 Discussion and Conclusion

In this paper, we have compared the user experiences of 4 different AR displays under three different lighting conditions. The OpenAR (or Hololens) display we used was still technologically short of the user expectation in its display performance and usability form factor, leading to a very low user experience. Surely, such a result could be changed as the device becomes smaller, lighter with better image quality in the coming future. In AR, the user has to wear and use a display (or a glass as the display surface) of certain size and FOV. Depending on whether the display system is shut from the rest of the environment or not, and the seam/boundary between the display and the rest of the visible real environment (in the case of open displays) do not seem to affect the user experience all that much. This is shown by the PhoneAR being on par with EasyAR or ClosedAR in its user experience. The same argument goes with the absolute display size, for which PhoneAR is smaller, even though small absolute display size seems to induce underestimation [11]. It was rather the convenience of the PhoneAR (same as the regular smartphone) that wins the hearts of the users. Although not tested, casual usage of VR will necessitate a quick switch between the regular smartphone mode and access to the touch screen for the seamless and familiar touch based interaction. Again, in this regard, PhoneAR and EasyAR have advantages. Also there is a recent rise in the concept of Extended Reality (XR), a platform (or display) both AR and VR. EasyVR, the flip-on lens version of VR has already been proven to offer immersion and presence at the equal level of the ClosedVR [12]. Therefore, EasyAR might be the best middle ground, offering reasonable usability with higher immersion/presence (even though a statistical difference was not found), and quick and easy dual usage with the smartphone.

In addition, the user experience results can be dramatically different if interaction was involved. In particular, PhoneAR and EasyAR offers the usual touchscreen interaction, while ClosedAR and OpenAR must result to something else such as mid-air gestures and separate interaction controllers. We plan to further conduct the relative comparison considering this important user experience feature.