Motion Selectivity of Neurons in Self-driving Networks

Yellapragada, Baladitya; Anderson, Alexander; Yu, Stella; Zipser, Karl

doi:10.1007/978-3-030-11021-5_32

Baladitya Yellapragada^14,15,
Alexander Anderson^14,16,
Stella Yu^14,15 &
…
Karl Zipser^14,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11133))

Included in the following conference series:

European Conference on Computer Vision

2176 Accesses

Abstract

We investigated if optical flow filters were implicitly learned by a neural network trained to drive a vehicle. The network was not trained to predict optical flow across the frames, but, through a series of controlled experiments, we claim that optical flow filters are present in the network. However, this appears to be only the case for sideways flows more relevant for steering predictions. For motor throttle predictions, the network looks at the variance of the pixels over time rather than computing optical flow. In addition, the filters that are likely used for motor throttle predictions dominate primarily in the middle of the network.

You have full access to this open access chapter, Download conference paper PDF

Deep Imitation Learning: The Impact of Depth on Policy Performance

Autonomous Vehicle Control: End-to-End Learning in Simulated Urban Environments

Neural Circuit Policies Imposing Visual Perceptual Autonomy

Article 23 February 2023

Keywords

1 Introduction and Relevant Work

Our novel contributions are (1) showing a neural network trained to output two separate driving tasks (i.e., steering and motor throttle predictions) can yield different motion-sensitive neurons that contribute to different output behaviors, and (2) demonstrating that we can probe these hidden filters through controlled experiments inspired by psychology. The experiment results indicate that optical flow filters are used for steering decisions, but variance filters are used for motor throttle decisions.

Our self-driving network takes in video from left and right cameras to predict future steering and motor throttle values, so there are many possible spatiotemporal cues that our network could respond to.

We first tried reproducing receptive field visualizations [1, 7]. Shown in Fig. 1, we generated gradient ascent visualizations on the layers for an early CNN (2 convolutional layers and 2 dense layers) taking in 2 frames at a times. Across frames and cameras for any given neuron filter, Layer 1 receptive fields appear sensitive to optical flow and natural stereoscopic disparity.

However, this is hard to quantify, and later layers are even noisier. Furthermore, our current convolutional network is primarily the SqueezeNet architecture from Iandola et al. [2]. We did not want to interpret unstructured visualizations from 1\(\,\times \,\)1 and 3\(\,\times \,\)3 filters. Instead, though not semantic, we labeled and compared inputs by presumed relevant features, similar to Zhou et al. [8]. We then took inspiration from the general feature manipulation of predictive modeling experiments in psychophysics [6].

We studied optical flow because they provide cues about depth and future trajectories [5], and there is early evidence for them through gradient ascent analysis.

2 Experimental Setup

We labeled input videos by their average steer and motor throttle combinations. We only used videos whose current and future driving combinations had little variation, and the future ones had to be well predicted by the network. This allowed us to easily test on salient ego-motion videos containing one type of flow per video.

As seen in Fig. 2, by speeding up and slowing down a given video, we created new videos with similar optical flow vectors across the visual field, but with more or less magnitude. We then compared how these affected output driving predictions to test the relevance of input video motion.

We also controlled the frame order and stereoscopic disparity in the input videos, after manipulating the video speed. If optical flow is a relevant feature for our driving predictions, then we should see a change in response with or without properly ordered time frames, similar to the network in Zhou et al. [9]. Furthermore, if the network is attempting to recover depth cues from motion, it could be also affected by stereoscopic disparity, another source of depth cues present with our network setup.

3 Results and Discussion

Theoretically, we expected lower frame rate sampling to push predictions toward zero, and for higher frame rate sampling to do the opposite.

As seen in Fig. 3, input video speed manipulation affects both steering and motor throttle predictions. This suggests potential optical flow sensitivity, but will need to be explored further.

3.1 Temporal Controls

In Fig. 4, steer and motor throttle predictions were plotted for input videos with different frame orders. Motor throttle predictions appear robust to frame order transformations, but the steering predictions are not.

As seen in Fig. 5, changing around the frame order significantly impacts the video speed manipulation experiment for steer predictions. We need smooth flow of time, either forward or reverse, to get results similar to those from the video speed experiment in Fig. 3. This implies optical flow filters are used for steer decisions.

For motor throttle predictions, changing around the frame order does not significantly impact the video speed manipulation experiment. Figure 6 shows motor throttle predictions are sensitive to input motion independent of frame order, implying that variance filters are used. Independent of frame order, little motion would yield little variance across the frames, whereas high motion would yield the opposite.

3.2 Steer and Motor Speed Results Across Stereo Controls

Lastly, for steer and motor speed predictions, stereoscopic disparity changes do not significantly impact the video speed experiment. Figure 7 shows that the motion selective filters for steer and motor speed predictions are independent of stereo features.

4 Conclusion

We show that our network trained to predict steering and motor throttle from stereo video exhibits different motion-selective behavior for steering and throttle. Through a series of controlled psychophysical experiments, we demonstrated that both the steer and motor throttle predictions are correctly affected by varying the motion in the input video. However, even though both behaviors look similar on the surface, correct steer predictions are dependent on smooth frame order, whereas motor throttle predictions are not.

We show that steer decisions are based on optical flow filters in the hidden layers, whereas motor throttle decisions are based on variance filters.

Even though we did not present this in the paper, we did the same video speed experiments on hidden layer neurons as we did for the output neurons. By plotting average neuron activation for changed-speed videos versus normal speed videos, we can generate the same steer-like and motor-like profiles as in Fig. 3. We further found the distribution of steer-like and motor-like neurons across the layers, arguing that these ultimately contribute to the final steer and motor throttle predictions. Linear SVMs were used to find the motor-like neurons based on their activation profiles, with the middle layers of our network having the most motor-like neurons.

From a theoretical standpoint, motor throttle only affects radially-dependent optical flow, but steering creates optical flow consistent throughout the visual field. The latter optical flow is easier for convolutional filters to capture, which we see in our results.

Lastly, consistent with Lundquist et al. [3], depth-sensitive stereo features are more difficult for convolutional networks to learn than other features. Our results appear to be robust to changes in stereoscopic disparity. It seems as though motion cues were more relevant than stereo cues in deciding changes in steer or motor throttle predictions.

References

Erhan, D., Courville, A., Bengio Y.: Understanding representations learned in deep architectures. Techreport (2010)
Google Scholar
Iandola, F., Han, S., Moskewicz, M., Ashraf, K., Dally, W., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\) 0.5 MB model size. In: International Conference on Learning Representations (2017)
Google Scholar
Lundquist, S., Paiton, D., Schultz, P., Kenyon, G.: Sparse encoding of binocular images for depth inference. In: IEEE (2016)
Google Scholar
Meyer, S., Wang, O., Zimmer H., Grosse, M., Sorkine-Hornung, A.: Phase-based frame interpolation for video. In: IEEE Conference on Computer Vision and Pattern Recognition (2015)
Google Scholar
Saunders, J.: View rotation is used to perceive path curvature from optic flow. J. Vis. 10(13), 25 (2010)
Article Google Scholar
Yarkoni, T., Westfall, J.: Choosing prediction over explanation in psychology: lessons from machine learning. Perspect. Psychol. Sci. 12(6), 1100–1122 (2017)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: International Conference on Learning Representations (2015)
Google Scholar
Zhou, B., Andonian, A., Oliva, A., Torralba, A.: Temporal Relational Reasoning in Videos. arXiv (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, Berkeley, CA, 94720, USA
Baladitya Yellapragada, Alexander Anderson, Stella Yu & Karl Zipser
International Computer Science Institute, 1947 Center Street, Berkeley, CA, 94704, USA
Baladitya Yellapragada & Stella Yu
Redwood Center for Theoretical Neuroscience, University of California, Berkeley, CA, 94720-3198, USA
Alexander Anderson & Karl Zipser

Authors

Baladitya Yellapragada
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Stella Yu
View author publications
You can also search for this author in PubMed Google Scholar
Karl Zipser
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Baladitya Yellapragada , Alexander Anderson , Stella Yu or Karl Zipser .

Editor information

Editors and Affiliations

Technical University of Munich, Garching, Germany
Laura Leal-Taixé
Technische Universität Darmstadt, Darmstadt, Germany
Stefan Roth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yellapragada, B., Anderson, A., Yu, S., Zipser, K. (2019). Motion Selectivity of Neurons in Self-driving Networks. In: Leal-Taixé, L., Roth, S. (eds) Computer Vision – ECCV 2018 Workshops. ECCV 2018. Lecture Notes in Computer Science(), vol 11133. Springer, Cham. https://doi.org/10.1007/978-3-030-11021-5_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-11021-5_32
Published: 23 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11020-8
Online ISBN: 978-3-030-11021-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics