Abstract
Device-to-device (D2D) communication is defined as the direct communication between two D2D user equipments (DUEs) without traversing the evolved NodeB of 5G networks. In the underlay mode of resource reuse, DUEs and cellular user equipments share resource blocks to improve system throughput by reusing the spectrum. In order to further enhance the performance, an extended version of reinforcement learning algorithm, Multi-Player Multi-Armed Bandit, is employed to control the transmission power of the DUEs to reduce the interference induced by resource sharing. Three learning strategies, namely Epsilon-first, Epsilon-greedy, Upper-Confidence-Bound, are applied. Simulation results show that the proposed method improves performance in terms of the average transmission power of D2D pairs, the ratio of unallocated D2D pairs, energy efficiency, and total throughput.
Similar content being viewed by others
References
Tehrani, M. N., Uysal, M., & Yanikomeroglu, H. (2014). Device-to-device communication in 5G cellular networks: Challenges, solutions, and future directions. IEEE Communications Magazine,52(5), 86–92.
Asadi, A., Wang, Q., & Mancuso, V. (2014). A survey on device-to-device communication in cellular networks. IEEE Communications Surveys & Tutorials,16(4), 1801–1819.
Feng, D., Lu, L., Yuan-Wu, Y., Li, G. Y., Li, S., & Feng, G. (2014). Device-to-device communications in cellular networks. IEEE Communications Magazine,52(4), 49–55.
Huynh, T., Onuma, T., Kuroda, K., Hasegawa, M., & Hwang, W.-J. (2016). Joint downlink and uplink interference management for device to device communication underlaying cellular networks. IEEE Access,4, 4420–4430.
Luo, Y., Shi, Z., Zhou, X., Liu, Q., & Yi, Q. (2014). Dynamic resource allocations based on q-learning for d2d communication in cellular networks. In 2014 11th international computer conference on wavelet actiev media technology and information processing (ICCWAMTIP) (pp. 385–388). IEEE.
Lee, N., Lin, X., Andrews, J. G., & Heath, R. W. (2014). Power control for D2D underlaid cellular networks: Modeling, algorithms, and analysis. IEEE Journal on Selected Areas in Communications,33(1), 1–13.
Nie, S., Fan, Z., Zhao, M., Gu, X., & Zhang, L. (2016). Q-learning based power control algorithm for D2D communication. In 2016 IEEE 27th annual international symposium on personal, indoor, and mobile radio communications (PIMRC) (pp. 1–6). IEEE.
Bubeck, S., & Bianchi, N. C. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems (No. 1). Boston: Now Publishers Inc.
Maghsudi, S., & Hossain, E. (2016). Multi-armed bandits with application to 5G small cells. IEEE Wireless Communications,23(3), 64–73.
Kalathil, D., Nayyar, N., & Jain, R. (2014). Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory,60(4), 2331–2345.
Bistritz I., & Leshem, A. (2018). Distributed multi-player bandits-a game of thrones approach. In Proceedings of the 32nd international conference on neural information processing systems (pp. 7222–7232). Curran Associates Inc.
Maghsudi, S., & Stańczak, S. (2014). Joint channel selection and power control in infrastructureless wireless networks: A multiplayer multiarmed bandit framework. IEEE Transactions on Vehicular Technology,64(10), 4565–4578.
GPP. (2016). TS 36.213: Evolved universal terrestrial radio access (E-UTRA); Physical layer procedures.
Ghosh, A., & Ratasuk, R. (2011). Essentials of LTE and LTE-A. Cambridge: Cambridge University Press.
Sutton, R. S., & Barto, A. G. (2011). Reinforcement learning: An introduction. Cambridge: The MIT Press.
GPP. (2014). TR 36.843: Study on LTE device to device proximity services; Radio aspects.
Acknowledgements
This research was supported by the Ministry of Science and Technology of Taiwan under Grant Nos. 108-2221-E-197-009 and 108-2221-E-197-011. The authors also would like to thank Prof. Cho-Chin Lin and MS student Yu-Yang Hsieh for their help with the pseudo-codes.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kuo, FC., Schindelhauer, C., Wang, HC. et al. D2D Resource Allocation with Power Control Based on Multi-player Multi-armed Bandit. Wireless Pers Commun 113, 1455–1470 (2020). https://doi.org/10.1007/s11277-020-07313-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11277-020-07313-2