Skip to main content

Advertisement

Log in

D2D Resource Allocation with Power Control Based on Multi-player Multi-armed Bandit

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Device-to-device (D2D) communication is defined as the direct communication between two D2D user equipments (DUEs) without traversing the evolved NodeB of 5G networks. In the underlay mode of resource reuse, DUEs and cellular user equipments share resource blocks to improve system throughput by reusing the spectrum. In order to further enhance the performance, an extended version of reinforcement learning algorithm, Multi-Player Multi-Armed Bandit, is employed to control the transmission power of the DUEs to reduce the interference induced by resource sharing. Three learning strategies, namely Epsilon-first, Epsilon-greedy, Upper-Confidence-Bound, are applied. Simulation results show that the proposed method improves performance in terms of the average transmission power of D2D pairs, the ratio of unallocated D2D pairs, energy efficiency, and total throughput.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Tehrani, M. N., Uysal, M., & Yanikomeroglu, H. (2014). Device-to-device communication in 5G cellular networks: Challenges, solutions, and future directions. IEEE Communications Magazine,52(5), 86–92.

    Article  Google Scholar 

  2. Asadi, A., Wang, Q., & Mancuso, V. (2014). A survey on device-to-device communication in cellular networks. IEEE Communications Surveys & Tutorials,16(4), 1801–1819.

    Article  Google Scholar 

  3. Feng, D., Lu, L., Yuan-Wu, Y., Li, G. Y., Li, S., & Feng, G. (2014). Device-to-device communications in cellular networks. IEEE Communications Magazine,52(4), 49–55.

    Article  Google Scholar 

  4. Huynh, T., Onuma, T., Kuroda, K., Hasegawa, M., & Hwang, W.-J. (2016). Joint downlink and uplink interference management for device to device communication underlaying cellular networks. IEEE Access,4, 4420–4430.

    Article  Google Scholar 

  5. Luo, Y., Shi, Z., Zhou, X., Liu, Q., & Yi, Q. (2014). Dynamic resource allocations based on q-learning for d2d communication in cellular networks. In 2014 11th international computer conference on wavelet actiev media technology and information processing (ICCWAMTIP) (pp. 385–388). IEEE.

  6. Lee, N., Lin, X., Andrews, J. G., & Heath, R. W. (2014). Power control for D2D underlaid cellular networks: Modeling, algorithms, and analysis. IEEE Journal on Selected Areas in Communications,33(1), 1–13.

    Article  Google Scholar 

  7. Nie, S., Fan, Z., Zhao, M., Gu, X., & Zhang, L. (2016). Q-learning based power control algorithm for D2D communication. In 2016 IEEE 27th annual international symposium on personal, indoor, and mobile radio communications (PIMRC) (pp. 1–6). IEEE.

  8. Bubeck, S., & Bianchi, N. C. (2012). Regret analysis of stochastic and nonstochastic multi-armed bandit problems (No. 1). Boston: Now Publishers Inc.

    Book  Google Scholar 

  9. Maghsudi, S., & Hossain, E. (2016). Multi-armed bandits with application to 5G small cells. IEEE Wireless Communications,23(3), 64–73.

    Article  Google Scholar 

  10. Kalathil, D., Nayyar, N., & Jain, R. (2014). Decentralized learning for multiplayer multiarmed bandits. IEEE Transactions on Information Theory,60(4), 2331–2345.

    Article  MathSciNet  Google Scholar 

  11. Bistritz I., & Leshem, A. (2018). Distributed multi-player bandits-a game of thrones approach. In Proceedings of the 32nd international conference on neural information processing systems (pp. 7222–7232). Curran Associates Inc.

  12. Maghsudi, S., & Stańczak, S. (2014). Joint channel selection and power control in infrastructureless wireless networks: A multiplayer multiarmed bandit framework. IEEE Transactions on Vehicular Technology,64(10), 4565–4578.

    Article  Google Scholar 

  13. GPP. (2016). TS 36.213: Evolved universal terrestrial radio access (E-UTRA); Physical layer procedures.

  14. Ghosh, A., & Ratasuk, R. (2011). Essentials of LTE and LTE-A. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  15. Sutton, R. S., & Barto, A. G. (2011). Reinforcement learning: An introduction. Cambridge: The MIT Press.

    MATH  Google Scholar 

  16. GPP. (2014). TR 36.843: Study on LTE device to device proximity services; Radio aspects.

Download references

Acknowledgements

This research was supported by the Ministry of Science and Technology of Taiwan under Grant Nos. 108-2221-E-197-009 and 108-2221-E-197-011. The authors also would like to thank Prof. Cho-Chin Lin and MS student Yu-Yang Hsieh for their help with the pseudo-codes.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chih-Cheng Tseng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kuo, FC., Schindelhauer, C., Wang, HC. et al. D2D Resource Allocation with Power Control Based on Multi-player Multi-armed Bandit. Wireless Pers Commun 113, 1455–1470 (2020). https://doi.org/10.1007/s11277-020-07313-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-020-07313-2

Keywords

Navigation