Keywords

1 Introduction

Computer vision refers to the discipline where the combination of the theory and technology of the artificial systems aims to extract, analyze and understand necessary information from digital image or video. Computer vision is used in different application fields, e.g. in medical computer vision or medical image processing, in industry, for military purposes, for the autonomous land based and aerial vehicles [1]. Currently the scope of our work is directed to computer vision in space applications, namely, visual pose estimation of the target spacecraft during the rendezvous phase for On-Orbit Servicing (OOS) missions. Main duties of OOS mission include, e.g. refuel, repairs/upgrades of some parts of satellite and also deorbiting of no more usable spacecraft. The problem can be addressed as follow: the chaser spacecraft should autonomously navigate to the target by estimating relative position and orientation of the target spacecraft using visual sensor and extracting information from an image or sequence of images.

Different types of visual sensors have been considered and tested for the pose estimation during the rendezvous phase. These are LiDAR [2], monocular and stereo cameras [3] and also time-of-flight sensors based on Photonic Mixer Device (PMD) [4, 5] technology. Since the PMD camera has never been used in space so far, we continue investigating it as one possible candidate for visual navigation in space. In this paper we provide an experimental research on the fusion of the state vectors estimated using PMD sensor and suitable algorithms.

Having a look into a recent state-of-the-art techniques for the sensor data fusion in other different areas of computer vision, one can mention the work of Schramm et al. [6], where the authors present an approach to fuse the data from stereo, depth and thermal cameras for robust self-localization. The resulting position is obtained through the Extenden Kalman Filter (EKF). The Kim et al. [7] show how to fuse the radar and visual images for the advanced driver assistance system via extrinsic calibration process. Deilamsalehy et al. [8] propose to fuse data from LiDAR, camera and Inertial Measurement Unit (IMU) using EKF for pose estimation. Instead of using mentioned EKF or any other filter for the data fusion, we create a distributed system with a weighted average algorithm, where one state vector is calculated with a depth image, whereas the other one independently with an amplitude image. The weighted average approach for the state vector fusion is simple to implement, and moreover, its suitable for any application, since it doesn’t require to know a system dynamics.

All experiments presented in this paper were tested in the European Proximity Operations Simulator (EPOS 2.0) [9] at the German Aerospace Center (DLR), which allows real-time simulation of close range proximity operations under realistic space illumination conditions.

2 Visual Navigation with PMD Sensor

2.1 Problem Statement

The problem addressed in this paper is to accurately estimate position and orientation of the target spacecraft with measurements from a time-of-light PMD sensor. In this work we provide experiments with DLR-Argos 3D-P320 Camera prototype (Fig. 1, left up) released by Bluetechnix company with technical characteristic presented in the Table 1.

The depth measurement principle of the PMD technologies is based on the calculation of the phase shift between the emitted NIR signal by the LED’s of camera and reflected signal from the target. Co-registered amplitude information of the reflected signal is calculated simultaneously. An example of depth and amplitude images acquired by the DLR-Argos 3D-P320 Camera in DLR’s EPOS is presented in Fig. 1 (left and right down images). The images illustrate target mockup (see Fig. 1, right up) used for the further test simulation for this paper. In the next section we provide experimentally justified fusion technique of two pose estimates from different sources in order to get one accurate pose for every frame.

Fig. 1.
figure 1

Left up: DLR-Argos 3D-P320 Camera. Right up: The mockup in EPOS laboratory. Left down: Depth image. Right down: Amplitude image.

Table 1. Technical data of the PMD sensor inside the DLR-Argos 3D-P320 camera.

2.2 Pose Estimation Techniques

Two completely different tracking methods are suggested for model-based pose estimation techniques of the target spacecraft. Model-based techniques refer to the knowledge of the 3D model of the target object throughout the entire tracking period. The modified version of state-of-the-art Iterative Closest Point (ICP) [10] algorithm with “reverse-calibration” method [11] for the nearest-neighbor search is proposed for the estimation of the state vector with depth images. Pose estimation technique for the amplitude images based on finding feature correspondences between 3D known model and detected features in the 2D gray scaled image. Throughout variety of the solvers here we propose to take a Gauss-Newton solver [12] based on a least square minimization problem in order to estimate position of the target related to the camera frame. Please, refer to the work of Klionovska et al. [5] for the detailed description of the pose estimation technique for the depth images and for the feature identification from amplitude images.

2.3 Fusion of Measurements

Due to the redundant information from the PMD sensor (depth and amplitude channels) it is a good chance to increase the reliability of the system and enhance the accuracy of the calculated state vector during the approach by fusion of two estimated measurements. Moreover, with the redundant state information the tracking of the target can still be provided even when one of the pose estimation techniques gives incorrect information. One of the simplest ways for the combination of measurements is to take a weighted average [13] of the pose vectors which is obtained after two different pose estimation techniques. The simple arithmetic mean of all measurements does not perform enough, since one measurement can be more reliable than other [14]. Taking in account this fact, it is better to assign more importance and greater weight to an observation \(y_i\) from one output channel that is more reliable, whereas a less accurate observation from the other output channel will receive minor weights. The weighted average for the fused estimate of n different measurements \(y_i\) with non-negative weights \(\omega _i\) looks as

$$\begin{aligned} y_{fused}= \frac{\sum _{i=1}^{n}\ \omega _i y_i}{\sum _{i=1}^{n}\ \omega _i}. \end{aligned}$$
(1)

We can simplify an Eq. 1 when the weights are normalized and sum up to 1:

$$\begin{aligned} y_{fused}= \sum _{i=1}^{n}\ \omega _i^{'} y_i, \sum _{i=1}^{n}\ \omega _i^{'}=1. \end{aligned}$$
(2)

From the mathematical point of view the weights \(\omega _i\) for every single member of the pose vector can be assigned as estimated variance of the measurement error \(\sigma _i^{2}\) occurred during pose estimation with one of the suggested methods

$$\begin{aligned} \omega _i=\frac{1}{\sigma _i^{2}}. \end{aligned}$$
(3)

In the work of Elmenreich [15] the author shows that the variance of the fusion result \(y_{fused}\) is minimized and always smaller than the input variances

$$\begin{aligned} \sigma _{fused}^{2}=\sum _{i=1}^{n}\ \omega _i^{2}\sigma _i^{2}=\sum _{i=1}^{n}\frac{1}{\sigma _i^{2}}. \end{aligned}$$
(4)

3 Experimental Scenarios and Performance Analysis

In order to find correct weights for fusion of both estimates, firstly we propose to run both algorithms separately. The offline test presents a straight frontal approach scenario, which starts at approximately distance 8 m between chaser and the target. The termination point is situated at the distance a bit less than 5 m. These both distances are chosen due to the characteristics of the DLR-Argos 3D-P320 sensor. Namely, the starting point of the simulation is chosen with relation to the resolution of the current PMD sensor. Since the resolution of the PMD sensor is small in compare with existent CCD sensors, the features of the imaged object become not to be clearly observable and it leads to the big errors in pose estimation. Moreover, the current illumination unit of the camera is suitable for the close range simulations (<10 m) and not for the long one. The final point is limited because of the field of view of the current camera. When the distance is less than 5 m the whole target is no more observable and pose estimation is not possible. In this test the target (Fig. 1, right up) is rotating around its principal axis of inertia at a rate of 2 deg/s. Overall the data set consists of 170 images. According to the number of image, the distance range is configured as following: from image 1 to 67 corresponds to the approach from 8 to 7 m; from image 68 to 118 is a range from 7 to 6 m (e.g. Fig. 2, images at the first row); and starting from image 119 to 170 the distance decreases from 6 to 4.9 m (e.g. Fig. 2, images at the second row). The ground truth for every logged image is provided by EPOS. The experimental scenarios, which we consider in this paper, are follows. Test scenario 1 presents frame-to-frame pose estimation technique using depth image and ICP with reverse calibration technique. Test scenario 2 concludes the result of the pose estimation algorithm with aforementioned image processing of amplitude images. Test scenario 3 shows the results of the fusion technique with calculated weights. What is special here is that in the Test scenario 3 the fusion technique is applied for the translational part, whereas the rotation is completely taken from the pose vector estimated with the amplitude image. This is so, because the algorithm with the amplitude images is less sensitive for the estimation of the orientation and usually provides better results. The camera coordinate frame is used in order to evaluate the results, where Z-axis is taken along the optical axis of the camera.

3.1 Test Cases 1 and 2: Pose Estimation Using Depth and Amplitude Images Separately

We run separately the algorithms for the depth images and amplitude images using provided dataset. In Fig. 3 we present the plots of the errors for rotation and translation components of the estimated pose for every frame in the Test case 1 (left up and down images) for the Test case 2 (right up and down images). The mean errors for the both cases are presented in the Table 2. From the results depicted in Fig. 3 and collected in Table 2 after offline simulation of the proposed estimation techniques explained in the Sect. 2.2, one can observe that estimation of the rotational components with the 2D technique using amplitude image dominates over the 3D pose estimation pipeline. It is due to the fact that the errors in the angles calculated with 3D pose estimation algorithm have a tendency to accumulate. This is caused by the nature of the algorithm - since the previous estimate for a new frame strong diverges from the real one, the follow calculated orientation (sometimes also position vector) within a next new frame has also big measurement errors. However, position of the target during the tracking was defined more accurate using the depth images, especially the distance component (position along Z axis).

Fig. 2.
figure 2

Depth and amplitude images within distance 7 to 6 m (first row) and within distance 6 to 5 m (second row).

Fig. 3.
figure 3

Translation and rotation errors for the Test cases 1 (left up and down) and 2 (right up and down).

In order to apply the fusion technique described in the Sect. 2.3, there is a prerequisite to define weights. According to the Eq. 3, it is necessary to define the variances. Table 2 presents the result of the standard deviations for three rotation angles and for Z, Y, X components of the position vector.

Table 2. Standard deviations and mean errors.
Table 3. Weights for the translation components.

3.2 Test Cases 3: Fusion of Pose Vectors with Weights

Taking into account revealed tendency after test cases 1 and 2, we propose to make simulations of the pose estimation during the approach for the same dataset and by using the weighted average technique for the translation components. We apply weights for the Z, Y and X coordinates presented in the Table 3. In Fig. 4 we plot the results of the angular and position measurement errors after conducted Test case 3.

Fig. 4.
figure 4

Translation and rotation errors for the Test cases 3.

In fact, as we expected, the fused technique with its measurements errors presented in Fig. 4 overcomes the drawbacks of both pose estimation techniques. The mean errors for Test case 3 are shown in the last column of Table 2. It means that the attitude of the target has almost the same mean errors as in the Test case 2, whereas the mean errors for the position are more similar with the Test case 1. The peaks for the pitch and yaw angles presented in Fig. 4 don’t corrupt or abrupt the tracking process, allowing reliably continue pose estimation of the target.

4 Discussion

With this paper we proved experimentally that the fusion technique guarantee accurate calculation of the position and orientation of the target using PMD sensor. Here for every frame the weighted average method fuses two estimates, which were calculated with depth and amplitude images independently. By different test simulations we have shown the main advantage of the fusion - decrease of the measured errors for the attitude and for the position of the target. Moreover, having a fused estimate we ensure stable frame-to-frame tracking during the approach. The proposed concept of the data fusion for pose estimation and tracking based on PMD sensor technologies can be used not only for space applications. For example, independent redundant information from one PMD visual sensor can be used in the field of (semi)autonomous driving and also driver assistance systems. This approach together with PMD sensor helps to reduce the number of visual sensors, but at the same time ensures increase of the accuracy and reliability of the visual system.