Resolve Intraoperative Brain Shift as Imitation Game

Zhong, Xia; Bayer, Siming; Ravikumar, Nishant; Strobel, Norbert; Birkhold, Annette; Kowarschik, Markus; Fahrig, Rebecca; Maier, Andreas

doi:10.1007/978-3-030-01045-4_15

Xia Zhong³¹,
Siming Bayer³¹,
Nishant Ravikumar³¹,
Norbert Strobel³⁴,
Annette Birkhold³²,
Markus Kowarschik³²,
Rebecca Fahrig³² &
…
Andreas Maier^31,33

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11042))

Included in the following conference series:

1856 Accesses
11 Citations

Abstract

Soft tissue deformation induced by craniotomy and tissue manipulation (brain shift) limits the use of preoperative image overlay in an image-guided neurosurgery, and therefore reduces the accuracy of the surgery as a consequence. An inexpensive modality to compensate for the brain shift in real-time is Ultrasound (US). The core subject of research in this context is the non-rigid registration of preoperative MR and intraoperative US images. In this work, we propose a learning based approach to address this challenge. Resolving intraoperative brain shift is considered as an imitation game, where the optimal action (displacement) for each landmark on MR is trained with a multi-task network. The result shows a mean target error of 1.21 ± 0.55 mm.

You have full access to this open access chapter, Download conference paper PDF

FocalErrorNet: Uncertainty-Aware Focal Modulation Network for Inter-modal Registration Error Estimation in Ultrasound-Guided Neurosurgery

A Hybrid Deep Registration of MR Scans to Interventional Ultrasound for Neurosurgical Guidance

Robust Ultrasound-to-Ultrasound Registration for Intra-operative Brain Shift Correction with a Siamese Neural Network

1 Introduction

In a neurosurgical procedure, the exposed brain tissue undergoes a time dependent elastic deformation caused by various factors, such as cerebrospinal fluid leakage, gravity and tissue manipulation. Conventional image-guided navigation systems do not take any elastic brain deformation (brain shift) into account. Consequently, the neuroanatomical overlays produced prior to the surgery does not correspond to the actual anatomy of the brain without an intraoperative image update. Hence, real-time intraoperative brain shift compensation has a great impact on the accuracy of image-guide neurosurgery.

An inexpensive modality to update the preoperative MRI image is Ultrasound (US). Its intraoperative repeatability offers another further benefit with respect to real-time visualization of intra-procedural anatomical information [1]. Both feature- and intensity-based deformable, multi-modal (MR-US) registration approaches are proposed to perform brain shift compensation.

In general, brain shift compensation approaches are based on feature-driven deformable registration methods to update the preoperative images by establishing correspondence of selected homologous landmarks. Performance of Chamfer Distance Map [2], Iterative Closest Point (ICP) [3] and Coherent Point Drift [4] are evaluated in phantom [2, 4] and clinical studies [3]. Inherently, the accuracy of feature-based methods is limited by the quality of the landmark segmentation and feature mapping algorithm.

Intensity-based algorithms overcome these intrinsic problems in the feature-based methods. Similarity metrics such as sum of squared differences [5] and normalized mutual information [6] were first proposed to register preoperative MR and iUS non-rigidly. However, intensity-based US-MR non-rigid registration poses a significant challenge due to the low signal-to-noise ratio (SNR) of the ultrasound images and different image characteristics and resolution of US and MR images. To tackle this challenge, Arbel et al., [7] first generates a pseudo US image based on the preoperative MR data and performs US-US non-rigid registration by optimizing the normalized Cross Correlation metric. Recently, local correlation ratio was proposed in a PaTch-based cOrrelation Ratio (RaPTOR) framework [8], where preoperative MR was registered to postresection US for the first time.

Recent advances in reinforcement learning (RL) and imitation learning (or behavior cloning) encourages the reformulation of the MR-US non-rigid registration problem. Krebs et al. [9] trained an artificial agent to estimate the Q-value for a set of pre-calculated actions. Since the Q-value of an action effects the current and future registration accuracy, a sequence of deformation fields for optimal registration can be estimated by maximizing the Q-value. In general, reinforcement learning presupposes a finite set of reasonable actions and learns the optimal policy to predict a combinatorial action sequence of the finite set. However, in a real world problem such as intraoperative brain shift correction, the number of feasible actions are infinite. Consequently, reinforcement learning is hardly to be adapted to resolve brain shift. In contrast, imitation learning is proposed to learn the actions itself. To this end, an agent is trained to mimics the action taken by the demonstrator in associated environment. Therefore, there is no restriction on the number of the actions. It has been used to solve tasks in robotic [10] and autonomous driving systems [11]. Our previous work reformulated the organ segmentation problem as imitation learning and showed good result [12].

Inspired by Turing’s original formulation of imitation game, we reformulate the brain shift correction problem based on the theory of imitation learning in this work. A multi-task neural network is trained to predict the movement of the landmarks directly by mimicing the ground-truth action exhibits by the demonstrator.

2 Imitation Game

We consider the registration of a preoperative MRI volume to the intraoperative ultrasound (iUS) for brain-shift correction as an imitation game. The game is constructed by first defining the environment. The environment $\mathbb {E}$ for the brain-shift correction using registration is defined as the underlying iUS volume and MRI volume. The key points $\varvec{P}^\mathbb {E} = [\varvec{p}_1^\mathbb {E}, \varvec{p}_2^\mathbb {E}, \cdots \varvec{p}_N^\mathbb {E}]^T$ in the MRI volume are shifted non-rigidly in three-space to target points $\varvec{Q} ^\mathbb {E}= [\varvec{q}_1^\mathbb {E},\varvec{q}_2^\mathbb {E}, \cdots , \varvec{q}_N^\mathbb {E}]^T$ in the iUS volume. Subsequently, we define the demonstrator as a system able to estimate the ideal action, in the form of a piece-wise linear transformation $\varvec{a}_i^{\mathbb {E},t}$, for the $i^{\text {th}}$ key point $\varvec{p}_i^{\mathbb {E},t}$, in the $t^\text {th}$ observation $\varvec{O}^{\mathbb {E},t}$, to the corresponding target point $\varvec{q}_i^{\mathbb {E},t}$. The goal of the game is defined as the act of finding an agent $\mathcal {M}(\cdot )$, to mimic the demonstrator and predict the transformations of the key points given an observation. This was formulated as a least square problem (Eq. 1).

$$\begin{aligned} \text {arg}\min _{\mathcal {M}} = \sum _\mathbb {E} \sum _t \Vert \mathcal {M}(\varvec{O}^{\mathbb {E},t}) - \varvec{A}^{\mathbb {E},t} \Vert ^2_2 \end{aligned}$$

(1)

Here, $\varvec{A}^{\mathbb {E},t} = [\varvec{a}_1^{\mathbb {E},t}, \varvec{a}_2^{\mathbb {E},t}, \cdots , \varvec{a}_N^{\mathbb {E},t}]^T $ denotes the action of all N key points. In the context of brain shift correction, we use annotated landmarks in the MRI as key points $\varvec{p}_i^t$ and landmarks in iUS as target points $\varvec{q}_i^t$. A neural network is employed as our agent $\mathcal {M}$.

2.1 Observation Encoding

We encode the observation of the point cloud in the environment as a feature vector. For each point $\varvec{p}_i^{\mathbb {E},t}$ in the point cloud, we extract a cubic sub-volume centered at this point in three-space. The cubic sub-volume has an isotropic dimension of $C^3$ and voxel size of $S^3$ in mm and its orientation is identical to the world coordinate system. The value of each voxel in the sub-volume is extracted by sampling the underlying iUS volume in the corresponding position, and interpolating using trilinear interpolation. We denote the sub-volume encoding as a matrix $\varvec{V}^{\mathbb {E},t} = [\varvec{v}_1^{\mathbb {E},t}, \varvec{v}_2^{\mathbb {E},t}, \cdots \varvec{v}_N^{\mathbb {E},t}]^T$, where each sub-volume is flattened into a vector $\varvec{v}_i^{\mathbb {E},t} \in \mathbb {R}^{C^3}$. Apart from the sub-volume, we also encode the point cloud information into the observation. We normalized the point cloud to a unit sphere and used the normalized coordinates $\tilde{\varvec{P}}^{\mathbb {E},t} = [\tilde{\varvec{p}}_1^{\mathbb {E},t},\tilde{\varvec{p}}_2^{\mathbb {E},t}, \cdots , \tilde{\varvec{p}}_N^{\mathbb {E},t}]^3$ as a part in the encoding. The observation $\varvec{O}^{\mathbb {E},t}$ is a concatenation of $\varvec{V}^{\mathbb {E},t}$ and $\tilde{\varvec{P}}^{\mathbb {E},t}$.

2.2 Demonstrator

The demonstrator predicts the action $\varvec{A}^{\mathbb {E},t} \in \mathbb {R}^{3 \times N}$ of the key points. We define the action for brain shift as the displacement vector for the key points to move to their respective targets. As both the target points and the key points are known, one intuitive way to calculate the action for each key point is to compute the displacement field directly as $\varvec{a}_i^{\mathbb {E},t} = \varvec{q}_i^{\mathbb {E},t} - \varvec{p}_i^{\mathbb {E},t}$. As we can see, this demonstrator estimates the displacement independent of the observation. This can make the learning difficult. Therefore, we also calculate the translation vector $\varvec{t}_i^{\mathbb {E},t} = \bar{\varvec{q}}_i^{\mathbb {E},t} - \bar{\varvec{p}}_i^{\mathbb {E},t} \in \mathbb {R}^{3\times 1}$ as the auxiliary output of the demonstrator. Hence, the objective function is,

$$\begin{aligned} \text {arg}\min _{\mathcal {M}} = \sum _\mathbb {E} \sum _t \Vert \mathcal {M}(\varvec{O}^t) - \varvec{A}^t \Vert ^2_2 + \lambda \Vert \mathcal {M}' (\varvec{O}^t) - \varvec{t}^t \Vert ^2_2 \end{aligned}$$

(2)

where, $\mathcal {M}'$ denotes the agent estimating the auxiliary output and $\lambda $ is the weighting of the auxiliary output. In the implementation, a multi-task neural network is implemented as both $\mathcal {M}$ and $\mathcal {M}'$.

2.3 Data Augmentation

To facilitate the learning process, we augment the training dataset to increase the number of samples and the overall variability. In the context of brain shift correction, data augmentation can be applied both to the environment $\mathbb {E}$ and to the key points $\varvec{P}^{\mathbb {E},t}$. In order to augment the environment $\mathbb {E}$, the elastic deformation proposed by Simard et al. [13] is applied to the MRI and iUS volumes. Varieties of brain shift deformations are simulated by warping the T1, flair MRI volumes and the iUS volume, together with their associated landmarks, independently, using two different deformation fields.

In each of the augmented environments, we also augmented the key points’ (MRI landmarks) coordinates in two different ways. For each key point, we added a random translation vector with a maximal magnitude of 1 mm in each direction. This synthetic non-rigid deformation was included to mimic inter-rater differences that may be included, during landmark annotation [14]. An additional translation vector was also used to shift all key points with a maximal magnitude of 6 mm in each direction. This was done to simulate the residual rigid registration error introduced during the initial registration using fiducial markers. Of particular importance, is how these augmentation steps were applied to the data. We assumed the translation between the key points and target points in the training data to be a random registration error. Consequently, we initially aligned the key points to the center of gravity of the target points. The center of gravity is defined as mean of all associated points. The non-rigid and translation augmentation steps were applied subsequently, to the key points.

2.4 Imitation Network

As observation encoding and the demonstrator are both based on a point cloud, the imitation network also works with a point cloud. Inspired by PointNet [15], which process the point cloud data without a neighborhood assumption efficiently, we proposed a network architecture that utilizes both the known neighborhood in the sub-volume $\varvec{V}^{\mathbb {E},t}$, and the unknown permutation of associated key points $\tilde{\varvec{P}}^{\mathbb {E},t}$. The network is depicted in Fig. 1. The network uses the sub-volume and key points as two inputs and processes them independently. During observation encoding, each row vector denotes a sub-volume $\varvec{v}_i^{\mathbb {E},t} \in \mathbb {R}^{C^3}$ of the associated key point $\varvec{p}_i^{\mathbb {E},t}$. Therefore, we use three consecutive $C \times 1$ convolutions with a stride size of $C \times 1$, to approximate a 3D separable convolution and extract the texture feature vectors. We also employ $3 \times 1$ convolution kernels to extract features from key points. These low-level features are concatenated for further processing. The main part of the network largely employs the PointNet architecture, where we use a multilayer perceptron (MLP) to extract local features, and max pooling to extract global features. The local and global features are concatenated to propagate the gradient and facilitate the training process. The multi-task learning formulation of the network also helps improves overall robustness. We used batch normalization for each layer and ReLU as activation function. One property of the network is that if a copy of a key point and a associated sub-volume is added as additional input, the output of the network for these key points remains unchanged. This is especially useful in the context of brain-shift correction, where the number of key points usually varies before and after resection. Therefore, we use the maximum number of landmarks in the training data as input key point number of our network. For a training data smaller than this number, we arbitrarily copy one of the key points. Finally, after predicting the deformation of the key points, the deformation field between them is interpolated using B-splines.

Table 1. Evaluation of the mean distance between landmarks in MRI and ultrasound before and after correction.

Full size table

3 Evaluation

We trained and tested our method using the Correction of Brainshift with Intra-Operative Ultrasound (CuRIOUS) MICCAI challenge 2018 data. This challenge use the clinical dataset described in [14]. In the current phase of the challenge, 23 datasets are used as training data, in which 22 comprise the required MRI and ultrasound landmark annotations before dura opening. The registration method is evaluated using target registration error (mTRE) in mm. We used leave-one-out cross-validation to train and evaluate our method. To train the imitation network, we used 19 datasets for training, two for validation and one as the test set. Each training and validation dataset was augmented by 32 folds for the environment cascaded with 32 folds key points augmentation. In total 19.4k datasets were used for the training, 2k were used for validation. We chose a sub-volume with isometric dimensions $C=7$ and voxel size of $2\times 2\times 2\,\text {mm}^3$. 16 points were used as input key points and a batch size of 128 was used for the training. The adapted Adam optimizer proposed by Sashank et al. [16] with a learning rate of 0.001 was used. The results are shown in Table 1. Using our method, the overall mean target registration errors (mTREs) can be reduced from 5.37 ± 4.27 mm to 1.21 ± 0.55 mm. In a similar setting, but applied to different datasets, the state-of-the-art registration method RaPTOR has an overall mTRE of 2.9 ± 0.8 mm [8].

The proposed imitation network has 0.22 M trainable parameters, requires 6.7 M floating point operations (FLOPS), and converges within 7 epochs. To calculate the computational complexity in the application phase, we consider the network having a complexity of $\mathcal {O}(1)$ due to pretraining. The observation encoding step has a complexity of $\mathcal {O}(N \times C^3)$, where N denotes the number of key points and C denotes the number of sub-volume dimension. Therefore, the complexity of the proposed algorithm is $\mathcal {O}(N \times C^3)$, independent of the resolution of underlying MRI or iUS volume. In the current implementation, the average runtime of the algorithm is 1.77 s, of which $88\%$ time is used for observation encoding using CPU.

4 Discussion

To our best knowledge, an imitation learning based approach is proposed for the first time in the context of brain shift correction. The presented method achieves encouraging results within 2 mm with real-time capability (<2 s). In 21 out of 22 datasets, the mTREs are reduced significantly. As the mTRE in 22^th dataset is initially small, the inter-rater difference of 0.5 mm is still remarkable [14]. Hence, these results indicates the applicability of the proposed method in the clinical environment. However, following aspects should be concerned for the further development. One important aspect is the number and variation of the training data used in the proposed imitation learning algorithm. Although the number of the training datasets are increased effectively by applying data augmentation methods (described in Sect. 2.3), variation of the training data such as location of the tumor and orientation of the head cannot be augmented without further considerations. A common tool to simulate the different image orientation is rotational augmentation. However, it alters the effect of gravity implicitly, therefore results in unrealistic training data. Thus, rotational augmentation is inappropriate for the data augmentation in context of brain shift compensation. The other aspect is the comprehensive validation of the proposed method. The generalizability and robustness should be evaluated with a larger amount of data acquired with different intraoperative image modalities. In this challenge, we use landmarks as key points and predict the deformation of the landmarks directly. In future applications, we could also adapt our approach to control points of a free-form deformation or contour points of a certain structure (e.g. vessel). The associated target points could be either manually annotated or automatically estimated with point matching algorithms.

5 Conclusion

In this study, we proposed a novel approach for intra-operative brain shift correcting, during tumor resection surgery, using imitation learning. The presented method uses observation encoding to describe the local texture and point-cloud information and the trained imitation network is used to estimate the movement of landmarks defined in pre-operative MR volumes, directly to their counterparts in iUS volumes, based on this encoding. Our network reduced the mean landmark distance between the pre- and intra-operative image pairs substantially, from 5.37 ± 4.27 mm to 1.21 ± 0.55 mm, in real-time, which is particularly compelling for its future use in a surgical setting. Additionally, the proposed approach is flexible, as it is not modality- or anatomy-specific, and thus could be employed in a variety of image-guided surgical interventions.

References

Bayer, S., Maier, A., Ostermeier, M., Fahrig, R.: Intraoperative imaging modalities and compensation for brain shift in tumor resection surgery. Int. J. Biomed. Imaging 2017 (2017)
Google Scholar
Reinertsen, I., Descoteaux, M., Drouin, S., Siddiqi, K., Collins, D.L.: Vessel driven correction of brain shift. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS, vol. 3217, pp. 208–216. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30136-3_27
Chapter Google Scholar
Reinertsen, I., Lindseth, F., Unsgaard, G., Collins, D.L.: Clinical validation of vessel-based registration for correction of brain-shift. Med. Image Anal. 11(6), 673–684 (2007)
Article Google Scholar
Farnia, P., Ahmadian, A., Khoshnevisan, A., Jaberzadeh, A., Serej, N.D., Kazerooni, A.F.: An efficient point based registration of intra-operative ultrasound images with MR images for computation of brain shift; a phantom study. In: IEEE EMBC 2011, pp. 8074–8077, August 2011
Google Scholar
Pennec, X., Cachier, P., Ayache, N.: Tracking brain deformations in time-sequences of 3D US images. Pattern Recognit. Lett. 24(4–5), 801–813 (2003)
Article MATH Google Scholar
Letteboer, M.M.J., Willems, P.W.A., Viergever, M.A., Niessen, W.J.: Non-rigid registration of 3D ultrasound images of brain tumours acquired during neurosurgery. In: Ellis, R.E., Peters, T.M. (eds.) MICCAI 2003. LNCS, vol. 2879, pp. 408–415. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39903-2_50
Chapter Google Scholar
Arbel, T., Morandi, X., Comeau, R.M., Collins, D.L.: Automatic non-linear MRI-ultrasound registration for the correction of intra-operative brain deformations. Comput. Aided Surg. 9, 123–136 (2004)
Article MATH Google Scholar
Rivaz, H., Chen, S.S., Collins, D.L.: Automatic deformable MR-ultrasound registration for image-guided neurosurgery. IEEE Trans. Med. Imaging 34(2), 366–380 (2015)
Article Google Scholar
Krebs, J., et al.: Robust non-rigid registration through agent-based action learning. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) MICCAI 2017. LNCS, vol. 10433, pp. 344–352. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66182-7_40
Chapter Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Article Google Scholar
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Zhong, X., et al.: Action learning for 3D point cloud based organ segmentation. arXiv preprint arXiv:1806.05724 (2018)
Simard, P.Y., Steinkraus, D., Platt, J.C., et al.: Best practices for convolutional neural networks applied to visual document analysis. In: ICDAR, vol. 3, pp. 958–962 (2003)
Google Scholar
Xiao, Y., Fortin, M., Unsgård, G., Rivaz, H., Reinertsen, I.: REtroSpective Evaluation of Cerebral Tumors (RESECT): a clinical database of pre-operative MRI and intra-operative ultrasound in low-grade glioma surgeries. Med. Phys. 44, 3875–3882 (2017)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit. 1(2), 4 (2017)
Google Scholar
Reddi, S.J., Kale, S., Kumar, S.: On the convergence of adam and beyond. In: International Conference on Learning Representations (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Pattern Recognition Lab, Friedrich-Alexander University Erlangen-Nürnberg, Erlangen, Germany
Xia Zhong, Siming Bayer, Nishant Ravikumar & Andreas Maier
Siemens Healthcare GmbH, Forchheim, Germany
Annette Birkhold, Markus Kowarschik & Rebecca Fahrig
Erlangen Graduate School in Advanced Optical Technologies (SAOT), Erlangen, Germany
Andreas Maier
Fakultät für Elektrotechnik, Hochschule für angewandte Wissenschaften Würzburg-Schweinfurt, Würzburg and Schweinfurt, Germany
Norbert Strobel

Authors

Xia Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Siming Bayer
View author publications
You can also search for this author in PubMed Google Scholar
Nishant Ravikumar
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Strobel
View author publications
You can also search for this author in PubMed Google Scholar
Annette Birkhold
View author publications
You can also search for this author in PubMed Google Scholar
Markus Kowarschik
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Fahrig
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Maier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xia Zhong .

Editor information

Editors and Affiliations

University College London, London, UK
Danail Stoyanov
University of Leeds, Leeds, UK
Zeike Taylor
Kitware Inc., Carrboro, NC, USA
Stephen Aylward
University of Porto, Porto, Portugal
João Manuel R.S. Tavares
Western University, London, ON, Canada
Yiming Xiao
Memorial Sloan Kettering Cancer Center, New York, NY, USA
Amber Simpson
Sunnybrook Health Science Centre, Toronto, ON, Canada
Anne Martel
Deutsches Krebsforschungszentrum (DKFZ), Heidelberg, Germany
Lena Maier-Hein
University of Western Ontario, London, ON, Canada
Shuo Li
Concordia University, Montreal, QC, Canada
Hassan Rivaz
SINTEF Health Research, Trondheim, Norway
Ingerid Reinertsen
Grenoble Alpes University, St.-Martin-d’Hères, France
Matthieu Chabanas
National Cancer Institute, Bethesda, MD, USA
Keyvan Farahani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, X. et al. (2018). Resolve Intraoperative Brain Shift as Imitation Game. In: Stoyanov, D., et al. Simulation, Image Processing, and Ultrasound Systems for Assisted Diagnosis and Navigation. POCUS BIVPCS CuRIOUS CPM 2018 2018 2018 2018. Lecture Notes in Computer Science(), vol 11042. Springer, Cham. https://doi.org/10.1007/978-3-030-01045-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-01045-4_15
Published: 15 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01044-7
Online ISBN: 978-3-030-01045-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics