KR102669662B1

KR102669662B1 - Wireless communication base station and method thereof

Info

Publication number: KR102669662B1
Application number: KR1020220136044A
Authority: KR
Inventors: 김선우; 박현우; 전종현
Original assignee: 한양대학교 산학협력단
Priority date: 2022-10-20
Filing date: 2022-10-20
Publication date: 2024-05-28
Also published as: KR20240055577A

Abstract

본 개시는 무선 통신 기지국 및 그 방법에 관한 것으로, 적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상태 정보 및 빔 제어 정보에 기초하여 학습보상 정보를 산출하는 학습부와, 빔 제어 정보에 기초하여 단말에 대한 통신 빔을 형성하는 빔 형성 제어를 수행하는 제어부를 포함하되, 빔 제어 정보는 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함한다.The present disclosure relates to a wireless communication base station and a method thereof, which calculates status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, and calculates status information based on status information and preset action information. It includes a learning unit that calculates beam control information and calculates learning reward information based on status information and beam control information, and a control unit that performs beam forming control to form a communication beam for the terminal based on the beam control information. , The beam control information includes direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam.

Description

무선 통신 기지국 및 방법{WIRELESS COMMUNICATION BASE STATION AND METHOD THEREOF}{WIRELESS COMMUNICATION BASE STATION AND METHOD THEREOF}

본 개시는 무선 통신 기지국 및 방법에 관한 것으로, 보다 상세하게는 단말로부터의 수신 신호에 기초하여 빔 추적 및 빔 폭 제어를 수행하는 무선 통신 기지국 및 방법에 관한 것이다.The present disclosure relates to a wireless communication base station and method, and more specifically, to a wireless communication base station and method that performs beam tracking and beam width control based on a received signal from a terminal.

무선 통신 시스템(Wireless Communication System)에서, 다중 안테나를 이용하여 신호를 특정 방향으로 집중시키는 빔 형성(Beam Forming) 기술이 이용되고 있다.In a wireless communication system, beam forming technology is used to focus signals in a specific direction using multiple antennas.

이러한 빔 형성 기술은, 이동하는 단말을 추적하는 경우 단말의 이동성에 대응하면서도 통신 링크가 지속적으로 유지되어야 하므로, 보다 짧은 주기로 빔을 형성할 필요가 있다.In this beam forming technology, when tracking a moving terminal, a communication link must be continuously maintained while responding to the mobility of the terminal, so it is necessary to form a beam at a shorter period.

또한, 빔 형성에 이용되는 안테나 수가 증가할수록 더 높은 빔 이득을 얻을 수 있지만, 그만큼 빔 폭은 좁아지게 되므로, 좁아진 빔 폭을 유지하며 무선 통신을 계속 수행하는 경우, 빔 관리를 수행함에 있어 오버헤드가 커지게 되는 문제점이 있었다.In addition, as the number of antennas used for beam formation increases, higher beam gain can be obtained, but the beam width becomes narrower, so if wireless communication is continued while maintaining the narrowed beam width, there is an overhead in performing beam management. There was a problem that was growing.

이러한 점들을 고려할 때, 다중 안테나를 이용한 빔 형성 기술에 있어서, 지속적인 빔 형성을 통해 이동하는 단말에 대한 빔 추적을 수행하면서도 빔 형성의 정확성을 향상시키고 오버헤드 부담을 감소시킬 수 있는 빔 형성 기술이 요구된다.Considering these points, in beam forming technology using multiple antennas, there is a beam forming technology that can improve beam forming accuracy and reduce overhead burden while performing beam tracking for a moving terminal through continuous beam forming. It is required.

본 개시는, 이동하는 단말을 실시간으로 추적하여 빔 형성을 수행할 수 있는 무선 통신 기지국 및 방법을 제공하고자 한다.The present disclosure seeks to provide a wireless communication base station and method that can perform beam forming by tracking a moving terminal in real time.

또한, 본 개시는 심층 강화 학습을 기반으로 하여 이동하는 단말의 실시간 추적 속도 및 정확성을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공하고자 한다.In addition, the present disclosure seeks to provide a wireless communication base station and method that can improve real-time tracking speed and accuracy of a moving terminal based on deep reinforcement learning.

또한, 본 개시는 이동하는 단말에 대한 빔 형성에 있어서 빔 폭을 실시간으로 제어하여 통신 성능을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공하고자 한다.In addition, the present disclosure seeks to provide a wireless communication base station and method that can improve communication performance by controlling the beam width in real time when forming a beam for a moving terminal.

또한, 본 개시는 심층 강화 학습을 기반으로 하는 빔 추적 및 빔 폭 제어를 통해 빔 형성에 있어 오버헤드를 감소시킬 수 있는 무선 통신 기지국 및 방법을 제공하고자 한다. In addition, the present disclosure seeks to provide a wireless communication base station and method that can reduce overhead in beam forming through beam tracking and beam width control based on deep reinforcement learning.

일 측면에서, 본 실시예들은 무선통신을 수행하는 기지국에 있어서, 적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상태 정보 및 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 학습부와, 빔 제어 정보에 기초하여 단말에 대한 통신 빔을 형성하는 빔 형성 제어를 수행하는 제어부를 포함하되, 빔 제어 정보는 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함하는 기지국을 제공할 수 있다.In one aspect, the present embodiments calculate status information based on information on the number of antennas used for beam forming and a received signal received from at least one terminal in a base station performing wireless communication, and determine the status information and preset actions. A learning unit that calculates beam control information based on the information and calculates learning compensation information based on the status information and beam control information, and performs beam forming control to form a communication beam for the terminal based on the beam control information. It includes a control unit, but the beam control information may provide a base station including direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam.

다른 측면에서, 본 실시예들은 기지국이 무선 통신을 수행하는 방법에 있어서, 적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상태 정보 및 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 학습 단계와, 빔 제어 정보에 기초하여 단말에 대한 통신 빔을 형성하도록 제어하는 빔 형성 제어 단계를 포함하되, 빔 제어 정보는 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함하는 방법을 제공할 수 있다.In another aspect, the present embodiments provide a method in which a base station performs wireless communication, calculating status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, and calculating status information and information in advance. A learning step of calculating beam control information based on set action information and calculating learning compensation information based on status information and beam control information, and beam forming control to form a communication beam for the terminal based on the beam control information. A method may be provided including a control step, wherein the beam control information includes direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam.

본 개시에 의하면, 이동하는 단말을 실시간으로 추적하여 빔 형성을 수행할 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다. According to the present disclosure, it is possible to provide a wireless communication base station and method that can perform beam forming by tracking a moving terminal in real time.

또한, 본 개시에 의하면, 심층 강화 학습을 기반으로 하여 이동하는 단말의 실시간 추적 속도 및 정확성을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.In addition, according to the present disclosure, it is possible to provide a wireless communication base station and method that can improve real-time tracking speed and accuracy of a moving terminal based on deep reinforcement learning.

또한, 본 개시에 의하면, 이동하는 단말에 대한 빔 형성에 있어서 빔 폭을 실시간으로 제어하여 통신 성능을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.In addition, according to the present disclosure, it is possible to provide a wireless communication base station and method that can improve communication performance by controlling the beam width in real time when forming a beam for a moving terminal.

또한, 본 개시에 의하면, 심층 강화 학습을 기반으로 하는 빔 추적 및 빔 폭 제어를 통해 빔 형성에 있어 오버헤드를 감소시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.In addition, according to the present disclosure, a wireless communication base station and method can be provided that can reduce overhead in beam forming through beam tracking and beam width control based on deep reinforcement learning.

도 1은 본 실시예가 적용될 수 있는 NR 무선 통신 시스템에 대한 구조를 간략하게 도시한 도면이다.
도 2는 본 실시예가 적용될 수 있는 NR 시스템에서의 프레임 구조를 설명하기 위한 도면이다.
도 3은 본 실시예가 적용될 수 있는 무선 접속 기술이 지원하는 자원 그리드를 설명하기 위한 도면이다.
도 4는 본 실시예가 적용될 수 있는 무선 접속 기술이 지원하는 대역폭 파트를 설명하기 위한 도면이다.
도 5는 본 실시예가 적용될 수 있는 무선 접속 기술에서의 동기 신호 블록을 예시적으로 도시한 도면이다.
도 6는 본 실시예가 적용될 수 있는 무선 접속 기술에서의 랜덤 액세스 절차를 설명하기 위한 도면이다.
도 7은 CORESET에 대해서 설명하기 위한 도면이다.
도 8은 본 개시에 따른 무선 통신 시스템에서 통신 빔 형성을 설명하기 위한 도면이다.
도 9는 일 실시예에 따른 강화 학습이 수행되는 구성을 예시적으로 설명하기 위한 도면이다.
도 10은 본 개시에 따른 무선 통신을 수행하는 기지국에 관한 블록도이다.
도 11은 일 실시예에 따른 빔 제어 정보를 산출하는 구성을 예시적으로 설명하기 위한 도면이다.
도 12는 일 실시예에 따른 강화 학습 기반의 빔 형성 제어를 예시적으로 설명하기 위한 도면이다.
도 13은 본 개시에 따른 기지국이 무선 통신을 수행하는 방법을 설명하기 위한 순서도이다.
도 14는 일 실시예에 따른 강화 학습 단계를 설명하기 위한 순서도이다.Figure 1 is a diagram briefly illustrating the structure of an NR wireless communication system to which this embodiment can be applied.
Figure 2 is a diagram for explaining the frame structure in an NR system to which this embodiment can be applied.
Figure 3 is a diagram for explaining a resource grid supported by wireless access technology to which this embodiment can be applied.
Figure 4 is a diagram for explaining the bandwidth part supported by the wireless access technology to which this embodiment can be applied.
FIG. 5 is a diagram illustrating an exemplary synchronization signal block in a wireless access technology to which this embodiment can be applied.
Figure 6 is a diagram for explaining a random access procedure in wireless access technology to which this embodiment can be applied.
Figure 7 is a diagram to explain CORESET.
FIG. 8 is a diagram for explaining communication beam formation in a wireless communication system according to the present disclosure.
Figure 9 is a diagram for exemplarily explaining a configuration in which reinforcement learning is performed according to an embodiment.
Figure 10 is a block diagram of a base station performing wireless communication according to the present disclosure.
FIG. 11 is a diagram for exemplarily explaining a configuration for calculating beam control information according to an embodiment.
FIG. 12 is a diagram illustrating reinforcement learning-based beam forming control according to an embodiment.
Figure 13 is a flowchart for explaining a method by which a base station performs wireless communication according to the present disclosure.
Figure 14 is a flowchart explaining reinforcement learning steps according to an embodiment.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 참조하여 상세하게 설명한다. 각 도면의 구성 요소들에 참조부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가질 수 있다. 또한, 본 실시예들을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 기술 사상의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략할 수 있다. 본 명세서 상에서 언급된 "포함한다", "갖는다", "이루어진다" 등이 사용되는 경우 "~만"이 사용되지 않는 이상 다른 부분이 추가될 수 있다. 구성 요소를 단수로 표현한 경우에 특별한 명시적인 기재 사항이 없는 한 복수를 포함하는 경우를 포함할 수 있다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to illustrative drawings. In adding reference numerals to components in each drawing, identical components may have the same reference numerals as much as possible even if they are shown in different drawings. Additionally, in describing the present embodiments, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present technical idea, the detailed description may be omitted. When “comprises,” “has,” “consists of,” etc. mentioned in the specification are used, other parts may be added unless “only” is used. When a component is expressed in the singular, it can also include the plural, unless specifically stated otherwise.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제1, 제2, A, B, (a), (b) 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질, 차례, 순서 또는 개수 등이 한정되지 않는다.Additionally, in describing the components of the present disclosure, terms such as first, second, A, B, (a), and (b) may be used. These terms are only used to distinguish the component from other components, and the nature, sequence, order, or number of the components are not limited by the term.

구성 요소들의 위치 관계에 대한 설명에 있어서, 둘 이상의 구성 요소가 "연결", "결합" 또는 "접속" 등이 된다고 기재된 경우, 둘 이상의 구성 요소가 직접적으로 "연결", "결합" 또는 "접속" 될 수 있지만, 둘 이상의 구성 요소와 다른 구성 요소가 더 "개재"되어 "연결", "결합" 또는 "접속"될 수도 있다고 이해되어야 할 것이다. 여기서, 다른 구성 요소는 서로 "연결", "결합" 또는 "접속" 되는 둘 이상의 구성 요소 중 하나 이상에 포함될 수도 있다. In the description of the positional relationship of components, when two or more components are described as being “connected,” “coupled,” or “connected,” the two or more components are directly “connected,” “coupled,” or “connected.” ", but it should be understood that two or more components and other components may be further "interposed" and "connected," "combined," or "connected." Here, other components may be included in one or more of two or more components that are “connected,” “coupled,” or “connected” to each other.

구성 요소들이나, 동작 방법이나 제작 방법 등과 관련한 시간적 흐름 관계에 대한 설명에 있어서, 예를 들어, "~후에", "~에 이어서", "~다음에", "~전에" 등으로 시간적 선후 관계 또는 흐름적 선후 관계가 설명되는 경우, "바로" 또는 "직접"이 사용되지 않는 이상 연속적이지 않은 경우도 포함할 수 있다.In the explanation of temporal flow relationships related to components, operation methods, production methods, etc., for example, temporal precedence relationships such as “after”, “after”, “after”, “before”, etc. Or, when a sequential relationship is described, non-continuous cases may be included unless “immediately” or “directly” is used.

한편, 구성 요소에 대한 수치 또는 그 대응 정보(예: 레벨 등)가 언급된 경우, 별도의 명시적 기재가 없더라도, 수치 또는 그 대응 정보는 각종 요인(예: 공정상의 요인, 내부 또는 외부 충격, 노이즈 등)에 의해 발생할 수 있는 오차 범위를 포함하는 것으로 해석될 수 있다.On the other hand, when a numerical value or corresponding information (e.g. level, etc.) for a component is mentioned, even if there is no separate explicit description, the numerical value or corresponding information is related to various factors (e.g. process factors, internal or external shocks, It can be interpreted as including the error range that may occur due to noise, etc.).

본 명세서에서의 무선 통신 시스템은 음성, 데이터 패킷 등과 같은 다양한 통신 서비스를 무선자원을 이용하여 제공하기 위한 시스템을 의미하며, 단말과 기지국, 코어 네트워크를 포함할 수 있다.The wireless communication system in this specification refers to a system for providing various communication services such as voice and data packets using wireless resources, and may include a terminal, a base station, and a core network.

이하에서 개시하는 본 실시 예들은 다양한 무선 접속 기술을 사용하는 무선 통신 시스템에서 적용될 수 있다. 예를 들어, 본 실시 예들은 CDMA(code division multiple access), FDMA(frequency division multiple access), TDMA(timedivision multiple access), OFDMA(orthogonal frequency division multiple access), SC-FDMA(singlecarrier frequency division multiple access) 등과 같은 다양한 무선 접속 기술에 적용될 수 있다. CDMA는 UTRA(universal terrestrial radio access)나 CDMA2000과 같은 무선 기술로 구현될 수 있다. TDMA는 GSM(global system for mobile communications)/GPRS(general packet radio service)/EDGE(enhanced datarates for GSM evolution)와 같은 무선 기술로 구현될 수 있다. OFDMA는 IEEE(institute of electrical andelectronics engineers) 802.11(Wi-Fi), IEEE 802.16(WiMAX), IEEE 802-20, E-UTRA(evolved UTRA) 등과 같은 무선 기술로 구현될 수 있다. IEEE 802.16m은 IEEE 802.16e의 진화로, IEEE 802.16e에 기반한 시스템과의 하위 호환성(backward compatibility)를 제공한다. UTRA는 UMTS(universal mobile telecommunications system)의 일부이다. 3GPP(3rd generation partnership project) LTE(long term evolution)은 E-UTRA(evolved-UMTSterrestrial radio access)를 사용하는 E-UMTS(evolved UMTS)의 일부로써, 하향링크에서 OFDMA를 채용하고 상향링크에서 SC-FDMA를 채용한다. 이와 같이 본 실시 예들은 현재 개시되거나 상용화된 무선 접속 기술에 적용될 수 있고, 현재 개발 중이거나 향후 개발될 무선 접속 기술에 적용될 수도 있다.The present embodiments disclosed below can be applied to wireless communication systems using various wireless access technologies. For example, the present embodiments include code division multiple access (CDMA), frequency division multiple access (FDMA), time division multiple access (TDMA), orthogonal frequency division multiple access (OFDMA), and single carrier frequency division multiple access (SC-FDMA). It can be applied to various wireless access technologies such as the like. CDMA can be implemented with wireless technologies such as universal terrestrial radio access (UTRA) or CDMA2000. TDMA may be implemented with wireless technologies such as global system for mobile communications (GSM)/general packet radio service (GPRS)/enhanced data rates for GSM evolution (EDGE). OFDMA can be implemented with wireless technologies such as IEEE (institute of electrical and electronics engineers) 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802-20, E-UTRA (evolved UTRA), etc. IEEE 802.16m is an evolution of IEEE 802.16e and provides backward compatibility with systems based on IEEE 802.16e. UTRA is part of the universal mobile telecommunications system (UMTS). 3GPP (3rd generation partnership project) LTE (long term evolution) is a part of E-UMTS (evolved UMTS) that uses E-UTRA (evolved-UMTSterrestrial radio access), employing OFDMA in the downlink and SC- in the uplink. FDMA is adopted. In this way, the present embodiments can be applied to wireless access technologies that are currently disclosed or commercialized, and can also be applied to wireless access technologies that are currently being developed or will be developed in the future.

한편, 본 명세서에서의 단말은 무선 통신 시스템에서 기지국과 통신을 수행하는 무선 통신 모듈을 포함하는 장치를 의미하는 포괄적 개념으로서, WCDMA, LTE, HSPA 및 IMT-2020(5G 또는 New Radio) 등에서의 UE(User Equipment)는 물론, GSM에서의 MS(Mobile Station), UT(User Terminal), SS(Subscriber Station), 무선 기기(wireless device) 등을 모두 포함하는 개념으로 해석되어야 할 것이다. 또한, 단말은 사용 형태에 따라 스마트 폰과 같은 사용자 휴대 기기가 될 수도 있고, V2X 통신 시스템에서는 차량, 차량 내의 무선 통신 모듈을 포함하는 장치 등을 의미할 수도 있다. 또한, 기계 형태 통신(Machine Type Communication) 시스템의 경우에 기계 형태 통신이 수행되도록 통신 모듈을 탑재한 MTC 단말, M2M 단말 등을 의미할 수도 있다. Meanwhile, the terminal in this specification is a comprehensive concept meaning a device including a wireless communication module that communicates with a base station in a wireless communication system, and is a UE in WCDMA, LTE, HSPA, and IMT-2020 (5G or New Radio), etc. It should be interpreted as a concept that includes not only (User Equipment), but also MS (Mobile Station), UT (User Terminal), SS (Subscriber Station), and wireless device in GSM. In addition, a terminal may be a user portable device such as a smart phone depending on the type of use, and in a V2X communication system, it may mean a vehicle, a device including a wireless communication module within the vehicle, etc. Additionally, in the case of a machine type communication system, it may refer to an MTC terminal, M2M terminal, etc. equipped with a communication module to perform machine type communication.

본 명세서의 기지국 또는 셀은 네트워크 측면에서 단말과 통신하는 종단을 지칭하며, 노드-B(Node-B), eNB(evolved Node-B), gNB(gNode-B), LPN(Low Power Node), 섹터(Sector), 싸이트(Site), 다양한 형태의 안테나, BTS(Base Transceiver System), 액세스 포인트(Access Point), 포인트(예를 들어, 송신포인트, 수신포인트, 송수신포인트), 릴레이 노드(Relay Node), 메가 셀, 매크로 셀, 마이크로 셀, 피코 셀, 펨토셀, RRH(Remote Radio Head), RU(Radio Unit), 스몰 셀(small cell) 등 다양한 커버리지 영역을 모두 포괄하는 의미이다. 또한, 셀은 단말에 설정되는 BWP(Bandwidth Part)를 포함하는 의미일 수 있다. 예를 들어, 서빙 셀은 단말의 Activation BWP를 의미할 수 있다. The base station or cell in this specification refers to an end point that communicates with a terminal in terms of a network, and includes Node-B (Node-B), evolved Node-B (eNB), gNode-B (gNB), Low Power Node (LPN), Sector, site, various types of antennas, BTS (Base Transceiver System), access point, point (e.g. transmission point, reception point, transmission/reception point), relay node ), mega cell, macro cell, micro cell, pico cell, femtocell, RRH (Remote Radio Head), RU (Radio Unit), and small cell. Additionally, a cell may mean including a Bandwidth Part (BWP) set in the terminal. For example, a serving cell may mean the UE's Activation BWP.

앞서 나열된 다양한 셀은 각 셀을 제어하는 기지국이 존재하므로 기지국은 두 가지 의미로 해석될 수 있다. 1) 무선 영역과 관련하여 메가 셀, 매크로 셀, 마이크로 셀, 피코 셀, 펨토 셀, 스몰 셀(small cell)을 제공하는 장치 그 자체이거나, 2) 무선 영역 그 자체를 지시할 수 있다. 1)에서 소정의 무선 영역을 제공하는 장치들이 동일한 개체에 의해 제어되거나 무선 영역을 협업으로 구성하도록 상호 작용하는 모든 장치들을 모두 기지국으로 지시한다. 무선 영역의 구성 방식에 따라 포인트, 송수신 포인트, 송신 포인트, 수신 포인트 등은 기지국의 일 실시 예가 된다. 2)에서 사용자 단말의 관점 또는 이웃하는 기지국의 입장에서 신호를 수신하거나 송신하게 되는 무선 영역 그 자체를 기지국으로 지시할 수 있다.Since the various cells listed above have a base station that controls each cell, base station can be interpreted in two ways. 1) It may be the device itself that provides mega cells, macro cells, micro cells, pico cells, femto cells, and small cells in relation to the wireless area, or 2) it may indicate the wireless area itself. In 1), all devices providing a predetermined wireless area are controlled by the same entity or all devices that interact to collaboratively configure the wireless area are directed to the base station. Depending on how the wireless area is configured, a point, a transmission/reception point, a transmission point, a reception point, etc. are examples of a base station. In 2), the wireless area itself, where signals are to be received or transmitted, from the perspective of the user terminal or the neighboring base station, can be indicated to the base station.

본 명세서에서 셀(Cell)은 송수신 포인트로부터 전송되는 신호의 커버리지 또는 송수신 포인트(transmission point 또는 transmission/reception point)로부터 전송되는 신호의 커버리지를 가지는 요소 반송파(component carrier), 그 송수신 포인트 자체를 의미할 수 있다.In this specification, a cell refers to the coverage of a signal transmitted from a transmission/reception point, a component carrier having coverage of a signal transmitted from a transmission point or transmission/reception point, or the transmission/reception point itself. You can.

상향링크(Uplink, UL, 또는 업링크)는 단말에 의해 기지국으로 데이터를 송수신하는 방식을 의미하며, 하향링크(Downlink, DL, 또는 다운링크)는 기지국에 의해 단말로 데이터를 송수신하는 방식을 의미한다. 하향링크(downlink)는 다중 송수신 포인트에서 단말로의 통신 또는 통신 경로를 의미할 수 있으며, 상향링크(uplink)는 단말에서 다중 송수신 포인트로의 통신 또는 통신 경로를 의미할 수 있다. 이때, 하향링크에서 송신기는 다중 송수신 포인트의 일부분일 수 있고, 수신기는 단말의 일부분일 수 있다. 또한, 상향링크에서 송신기는 단말의 일부분일 수 있고, 수신기는 다중 송수신 포인트의 일부분일 수 있다.Uplink (UL, or uplink) refers to a method of transmitting and receiving data from a terminal to a base station, and downlink (Downlink, DL, or downlink) refers to a method of transmitting and receiving data from a base station to a terminal. do. Downlink may refer to communication or a communication path from multiple transmission/reception points to a terminal, and uplink may refer to communication or a communication path from a terminal to multiple transmission/reception points. At this time, in the downlink, the transmitter may be part of a multiple transmission/reception point, and the receiver may be part of the terminal. Additionally, in the uplink, a transmitter may be part of a terminal, and a receiver may be part of a multiple transmission/reception point.

상향링크와 하향링크는, PDCCH(Physical Downlink Control CHannel), PUCCH(Physical Uplink Control CHannel) 등과 같은 제어 채널을 통하여 제어 정보를 송수신하고, PDSCH(Physical Downlink Shared CHannel), PUSCH(Physical Uplink Shared CHannel) 등과 같은 데이터 채널을 구성하여 데이터를 송수신한다. 이하에서는 PUCCH, PUSCH, PDCCH 및 PDSCH 등과 같은 채널을 통해 신호가 송수신되는 상황을 'PUCCH, PUSCH, PDCCH 및 PDSCH를 전송, 수신한다'는 형태로 표기하기도 한다.Uplink and downlink transmit and receive control information through control channels such as PDCCH (Physical Downlink Control CHannel) and PUCCH (Physical Uplink Control CHannel), and PDSCH (Physical Downlink Shared CHannel), PUSCH (Physical Uplink Shared CHannel), etc. Data is transmitted and received by configuring the same data channel. Hereinafter, the situation in which signals are transmitted and received through channels such as PUCCH, PUSCH, PDCCH, and PDSCH is sometimes expressed as 'transmitting and receiving PUCCH, PUSCH, PDCCH, and PDSCH.'

설명을 명확하게 하기 위해, 이하에서는 본 기술 사상을 3GPP LTE/LTE-A/NR(New RAT) 통신 시스템을 위주로 기술하지만 본 기술적 특징이 이에 제한되는 것은 아니다.For clarity of explanation, the technical idea hereinafter is mainly described in the 3GPP LTE/LTE-A/NR (New RAT) communication system, but the technical features are not limited thereto.

3GPP에서는 4G(4th-Generation) 통신 기술에 대한 연구 이후에 ITU-R의 차세대 무선 접속 기술의 요구사항에 맞추기 위한 5G(5th-Generation)통신 기술에 대한 연구를 진행하고 있다. 구체적으로, 3GPP는 5G 통신 기술로 LTE-Advanced 기술을 ITU-R의 요구사항에 맞추어 향상 시킨 LTE-A pro와 4G 통신 기술과는 별개의 새로운 NR 통신 기술에 대한 연구를 진행하고 있다. LTE-A pro와 NR은 모두 5G 통신 기술로 제출될 것으로 보이나, 이하에서는 설명의 편의를 위해서 NR을 중심으로 본 실시예들을 설명한다. Following research on 4G (4th-Generation) communication technology, 3GPP is conducting research on 5G (5th-Generation) communication technology to meet the requirements of ITU-R's next-generation wireless access technology. Specifically, 3GPP is conducting research on LTE-A pro, which is a 5G communication technology that improves LTE-Advanced technology to meet the requirements of ITU-R, and a new NR communication technology that is separate from 4G communication technology. Both LTE-A pro and NR are expected to be submitted as 5G communication technologies, but for convenience of explanation, the following embodiments will be described focusing on NR.

NR에서의 운영 시나리오는 기존 4G LTE의 시나리오에서 위성, 자동차, 그리고 새로운 버티컬 등에 대한 고려를 추가하여 다양한 동작 시나리오를 정의하였으며, 서비스 측면에서 eMBB(Enhanced Mobile Broadband) 시나리오, 높은 단말 밀도를 가지되 넓은 범위에 전개되어 낮은 데이터 레이트(data rate)와 비동기식 접속이 요구되는 mMTC(Massive Machine Communication) 시나리오, 높은 응답성과 신뢰성이 요구되고 고속 이동성을 지원할 수 있는 URLLC(Ultra Reliability and Low Latency) 시나리오를 지원한다.The operating scenario in NR defines a variety of operating scenarios by adding consideration of satellites, automobiles, and new verticals to the existing 4G LTE scenario, and in terms of service, the eMBB (Enhanced Mobile Broadband) scenario has a high terminal density but is wide. It is deployed in a wide range of applications, supporting mMTC (Massive Machine Communication) scenarios that require low data rates and asynchronous access, and URLLC (Ultra Reliability and Low Latency) scenarios that require high responsiveness and reliability and can support high-speed mobility. .

이러한 시나리오를 만족하기 위해서 NR은 새로운 waveform 및 프레임 구조 기술, 낮은 지연속도(Low latency) 기술, 초고주파 대역(mmWave) 지원 기술, 순방향 호환성(Forward compatible) 제공 기술이 적용된 무선 통신 시스템을 개시한다. 특히, NR 시스템에서는 순방향 호환성을 제공하기 위해서 유연성 측면에서 다양한 기술적 변화를 제시하고 있다. 주요 기술적 특징은 아래에서 도면을 참조하여 설명한다.To satisfy this scenario, NR is launching a wireless communication system with new waveform and frame structure technology, low latency technology, ultra-high frequency band (mmWave) support technology, and forward compatible technology. In particular, the NR system proposes various technical changes in terms of flexibility to provide forward compatibility. The main technical features are explained below with reference to the drawings.

도 1은 본 실시예가 적용될 수 있는 NR 시스템에 대한 구조를 간략하게 도시한 도면이다. Figure 1 is a diagram briefly illustrating the structure of an NR system to which this embodiment can be applied.

도 1을 참조하면, NR 시스템은 5GC(5G Core Network)와 NR-RAN파트로 구분되며, NG-RAN은 사용자 평면(SDAP/PDCP/RLC/MAC/PHY) 및 UE(User Equipment)에 대한 제어 평면(RRC) 프로토콜 종단을 제공하는 gNB와 ng-eNB들로 구성된다.gNB 상호 또는 gNB와 ng-eNB는 Xn 인터페이스를 통해 상호 연결된다. gNB와 ng-eNB는 각각 NG 인터페이스를 통해 5GC로 연결된다. 5GC는 단말 접속 및 이동성 제어 기능 등의 제어 평면을 담당하는 AMF (Access and Mobility Management Function)와 사용자 데이터에 제어 기능을 담당하는 UPF (User Plane Function)를 포함하여 구성될 수 있다. NR에서는 6GHz 이하 주파수 대역(FR1, Frequency Range 1)과 6GHz 이상 주파수 대역(FR2, Frequency Range 2)에 대한 지원을 모두 포함한다.Referring to Figure 1, the NR system is divided into 5GC (5G Core Network) and NR-RAN parts, and NG-RAN controls the user plane (SDAP/PDCP/RLC/MAC/PHY) and UE (User Equipment). It consists of gNB and ng-eNB providing flat (RRC) protocol termination. gNB interconnection or gNB and ng-eNB are interconnected through Xn interface. gNB and ng-eNB are each connected to 5GC through the NG interface. 5GC may be composed of an Access and Mobility Management Function (AMF), which is responsible for the control plane such as terminal access and mobility control functions, and a User Plane Function (UPF), which is responsible for controlling user data. NR includes support for both the frequency band below 6 GHz (FR1, Frequency Range 1) and the frequency band above 6 GHz (FR2, Frequency Range 2).

gNB는 단말로 NR 사용자 평면 및 제어 평면 프로토콜 종단을 제공하는 기지국을 의미하고, ng-eNB는 단말로 E-UTRA 사용자 평면 및 제어 평면 프로토콜 종단을 제공하는 기지국을 의미한다. 본 명세서에서 기재하는 기지국은 gNB및 ng-eNB를 포괄하는 의미로 이해되어야 하며, 필요에 따라 gNB 또는 ng-eNB를 구분하여 지칭하는 의미로 사용될 수도 있다.gNB refers to a base station that provides NR user plane and control plane protocol termination to the terminal, and ng-eNB refers to a base station that provides E-UTRA user plane and control plane protocol termination to the terminal. The base station described in this specification should be understood to encompass gNB and ng-eNB, and may be used to refer to gNB or ng-eNB separately, if necessary.

NR에서는 하향링크 전송을 위해서 Cyclic prefix를 사용하는 CP-OFDM 웨이브 폼을 사용하고, 상향링크 전송을 위해서 CP-OFDM 또는 DFT-s-OFDM을 사용한다. OFDM 기술은 MIMO(Multiple Input Multiple Output)와 결합이 용이하며, 높은 주파수 효율과 함께 저 복잡도의 수신기를 사용할 수 있다는 장점을 가지고 있다.In NR, the CP-OFDM wave form using a cyclic prefix is used for downlink transmission, and CP-OFDM or DFT-s-OFDM is used for uplink transmission. OFDM technology is easy to combine with MIMO (Multiple Input Multiple Output) and has the advantage of being able to use a low-complexity receiver with high frequency efficiency.

한편, NR에서는 전술한 3가지 시나리오 별로 데이터 속도, 지연속도, 커버리지 등에 대한 요구가 서로 상이하기 때문에 임의의 NR 시스템을 구성하는 주파수 대역을 통해 각각의 시나리오 별 요구사항을 효율적으로 만족시킬 필요가 있다. 이를 위해서, 서로 다른 복수의 뉴머롤러지(numerology) 기반의 무선 자원을 효율적으로 멀티플렉싱(multiplexing)하기 위한 기술이 제안되었다.Meanwhile, in NR, the requirements for data rate, delay rate, coverage, etc. are different for each of the three scenarios described above, so it is necessary to efficiently satisfy the requirements for each scenario through the frequency band that constitutes an arbitrary NR system. . To this end, a technology for efficiently multiplexing wireless resources based on a plurality of different numerologies has been proposed.

구체적으로, NR 전송 뉴머롤러지는서브캐리어 간격(sub-carrier spacing)과 CP(Cyclic prefix)에 기초하여 결정되며, 아래 표 1과 같이 15khz를 기준으로 μ 값이 2의 지수 값으로 사용되어 지수적으로 변경된다.Specifically, the NR transmission numerology is determined based on sub-carrier spacing and CP (Cyclic prefix), and as shown in Table 1 below, the μ value is used as an exponent value of 2 based on 15khz, resulting in an exponential is changed to

μμ 서브캐리어 간격Subcarrier spacing Cyclic prefixCyclic prefix Supported for dataSupported for data Supported for synchSupported for synchronization 00 1515 NormalNormal YesYes YesYes 1One 3030 NormalNormal YesYes YesYes 22 6060 Normal, ExtendedNormal, Extended YesYes NoNo 33 120120 NormalNormal YesYes YesYes 44 240240 NormalNormal NoNo YesYes

위 표 1과 같이 NR의 뉴머롤러지는서브캐리어 간격에 따라 5가지로 구분될 수 있다. 이는 4G 통신 기술 중 하나인 LTE의 서브캐리어 간격이 15khz로 고정되는 것과는 차이가 있다. 구체적으로, NR에서 데이터 전송을 위해서 사용되는 서브캐리어 간격은 15, 30, 60, 120khz이고, 동기 신호 전송을 위해서 사용되는 서브캐리어 간격은 15, 30, 12, 240khz이다. 또한, 확장 CP는 60khz 서브캐리어 간격에만 적용된다. 한편, NR에서의 프레임 구조(frame structure)는 1ms의 동일한 길이를 가지는 10의 서브프레임(subframe)으로 구성되는 10ms의 길이를 가지는 프레임(frame)이 정의된다. 하나의 프레임은 5ms의 하프 프레임으로 나뉠 수 있으며, 각 하프 프레임은 5개의 서브프레임을 포함한다. 15khz 서브캐리어 간격의 경우에 하나의 서브프레임은 1개의 슬롯(slot)으로 구성되고, 각 슬롯은 14개의 OFDM 심볼(symbol)로 구성된다.As shown in Table 1 above, the numerology of NR can be divided into five types depending on the subcarrier spacing. This is different from the subcarrier spacing of LTE, one of the 4G communication technologies, which is fixed at 15khz. Specifically, the subcarrier intervals used for data transmission in NR are 15, 30, 60, and 120khz, and the subcarrier intervals used for synchronization signal transmission are 15, 30, 12, and 240khz. Additionally, the extended CP applies only to the 60khz subcarrier spacing. Meanwhile, the frame structure in NR is defined as a frame with a length of 10ms consisting of 10 subframes with the same length of 1ms. One frame can be divided into half-frames of 5ms, and each half-frame contains 5 subframes. In the case of 15khz subcarrier spacing, one subframe consists of 1 slot, and each slot consists of 14 OFDM symbols.

도 2는 본 실시예가 적용될 수 있는 NR 시스템에서의 프레임 구조를 설명하기 위한 도면이다. 도 2를 참조하면, 슬롯은 노멀 CP의 경우에 고정적으로 14개의 OFDM 심볼로 구성되나, 슬롯의 길이는 서브캐리어 간격에 따라 달라질 수 있다. 예를 들어, 15khz 서브캐리어 간격을 가지는 뉴머롤러지의 경우에 슬롯은 1ms 길이로 서브프레임과 동일한 길이로 구성된다. 이와 달리, 30khz 서브캐리어 간격을 가지는 뉴머롤러지의 경우에 슬롯은 14개의 OFDM 심볼로 구성되나, 0.5ms의 길이로 하나의 서브프레임에 두 개의 슬롯이 포함될 수 있다. 즉, 서브프레임과 프레임은 고정된 시간 길이를 가지고 정의되며, 슬롯은 심볼의 개수로 정의되어 서브캐리어 간격에 따라 시간 길이가 달라질 수 있다. 한편, NR은 스케줄링의 기본 단위를 슬롯으로 정의하고, 무선 구간의 전송 지연을 감소시키기 위해서 미니 슬롯(또는 서브 슬롯 또는 non-slot based schedule)도 도입하였다. 넓은 서브캐리어 간격을 사용하면 하나의 슬롯의 길이가 반비례하여 짧아지기 때문에 무선 구간에서의 전송 지연을 줄일 수 있다. 미니 슬롯(또는 서브 슬롯)은 URLLC 시나리오에 대한 효율적인 지원을 위한 것으로 2, 4, 7개 심볼 단위로 스케줄링이 가능하다. Figure 2 is a diagram for explaining the frame structure in an NR system to which this embodiment can be applied. Referring to FIG. 2, a slot is fixedly composed of 14 OFDM symbols in the case of normal CP, but the length of the slot may vary depending on the subcarrier spacing. For example, in the case of numerology with a 15khz subcarrier spacing, a slot is 1ms long and has the same length as a subframe. In contrast, in the case of numerology with a 30khz subcarrier spacing, a slot consists of 14 OFDM symbols, but two slots can be included in one subframe with a length of 0.5ms. That is, subframes and frames are defined with a fixed time length, and slots are defined by the number of symbols, so the time length may vary depending on the subcarrier interval. Meanwhile, NR defines the basic unit of scheduling as a slot, and also introduces a mini-slot (or sub-slot or non-slot based schedule) to reduce transmission delay in the wireless section. When a wide subcarrier spacing is used, the length of one slot is shortened in inverse proportion, so transmission delay in the wireless section can be reduced. Mini-slots (or sub-slots) are designed to efficiently support URLLC scenarios and can be scheduled in units of 2, 4, or 7 symbols.

또한, NR은 LTE와 달리 상향링크 및 하향링크 자원 할당을 하나의 슬롯 내에서 심볼 레벨로 정의하였다. HARQ 지연을 줄이기 위해 전송 슬롯 내에서 바로 HARQ ACK/NACK을 송신할 수 있는 슬롯 구조가 정의되었으며, 이러한 슬롯 구조를 자기 포함(self-contained) 구조로 명명하여 설명한다.Additionally, unlike LTE, NR defines uplink and downlink resource allocation at the symbol level within one slot. In order to reduce HARQ delay, a slot structure that can transmit HARQ ACK/NACK directly within a transmission slot has been defined, and this slot structure is named and described as a self-contained structure.

NR에서는 총 256개의 슬롯 포맷을 지원할 수 있도록 설계되었으며, 이중 62개의 슬롯 포맷이 Rel-15에서 사용된다. 또한, 다양한 슬롯의 조합을 통해서 FDD 또는 TDD 프레임을 구성하는 공통 프레임 구조를 지원한다. 예를 들어, 슬롯의 심볼이 모두 하향링크로 설정되는 슬롯 구조와 심볼이 모두 상향링크로 설정되는 슬롯 구조 및 하향링크 심볼과 상향링크 심볼이 결합된 슬롯 구조를 지원한다. 또한, NR은 데이터 전송이 하나 이상의 슬롯에 분산되어 스케줄링됨을 지원한다. 따라서, 기지국은 슬롯 포맷 지시자(SFI, Slot Format Indicator)를 이용하여 단말에 슬롯이 하향링크 슬롯인지, 상향링크 슬롯인지 또는 플렉시블 슬롯인지를 알려줄 수 있다. 기지국은 단말 특정하게 RRC 시그널링을 통해서 구성된 테이블의 인덱스를 SFI를 이용하여 지시함으로써 슬롯 포맷을 지시할 수 있으며, DCI(Downlink Control Information)를 통해서 동적으로 지시하거나 RRC를 통해서 정적 또는 준정적으로 지시할 수도 있다. NR is designed to support a total of 256 slot formats, of which 62 slot formats are used in Rel-15. In addition, it supports a common frame structure that forms an FDD or TDD frame through a combination of various slots. For example, a slot structure in which all slot symbols are set to downlink, a slot structure in which all symbols are set to uplink, and a slot structure in which downlink symbols and uplink symbols are combined are supported. Additionally, NR supports scheduling data transmission distributed over one or more slots. Therefore, the base station can use a slot format indicator (SFI) to inform the terminal whether the slot is a downlink slot, an uplink slot, or a flexible slot. The base station can indicate the slot format by indicating the index of the table configured through UE-specific RRC signaling using SFI, and can indicate it dynamically through DCI (Downlink Control Information) or statically or quasi-statically through RRC. It may be possible.

NR에서의 물리 자원(physical resource)과 관련하여, 안테나 포트(antenna port), 자원 그리드(resource grid), 자원 요소(resource element), 자원 블록(resource block), 대역폭 파트(bandwidth part) 등이 고려될 수 있다.Regarding physical resources in NR, antenna port, resource grid, resource element, resource block, bandwidth part, etc. are considered. It can be.

안테나 포트는 안테나 포트 상의 심볼이 운반되는 채널이 동일한 안테나 포트 상의 다른 심볼이 운반되는 채널로부터 추론될 수 있도록 정의된다. 하나의 안테나 포트 상의 심볼이 운반되는 채널의 광범위 특성(large-scale property)이 다른 안테나 포트 상의 심볼이 운반되는 채널로부터 추론될 수 있는 경우, 2 개의 안테나 포트는 QC/QCL(quasi co-located 혹은 quasi co-location) 관계에 있다고 할 수 있다. 여기에서, 광범위 특성은 지연 확산(Delay spread), 도플러 확산(Doppler spread), 주파수 시프트(Frequency shift), 평균 수신 파워(Average received power), 수신 타이밍(Received Timing) 중 하나 이상을 포함한다.An antenna port is defined so that a channel carrying a symbol on the antenna port can be inferred from a channel carrying another symbol on the same antenna port. If the large-scale properties of the channel carrying the symbols on one antenna port can be inferred from the channel carrying the symbols on the other antenna port, then the two antenna ports are quasi co-located or QC/QCL. It can be said that they are in a quasi co-location relationship. Here, the wide range characteristics include one or more of delay spread, Doppler spread, frequency shift, average received power, and received timing.

도 3은 본 실시예가 적용될 수 있는 무선 접속 기술이 지원하는 자원 그리드를 설명하기 위한 도면이다.FIG. 3 is a diagram illustrating a resource grid supported by wireless access technology to which this embodiment can be applied.

도 3을 참조하면, 자원 그리드(Resource Grid)는 NR이 동일 캐리어에서 복수의 뉴머롤러지를 지원하기 때문에 각 뉴머롤러지에 따라 자원 그리드가 존재할 수 있다. 또한, 자원 그리드는 안테나 포트, 서브캐리어 간격, 전송 방향에 따라 존재할 수 있다.Referring to FIG. 3, since NR supports multiple numerology on the same carrier, a resource grid may exist for each numerology. Additionally, resource grids may exist depending on antenna ports, subcarrier spacing, and transmission direction.

자원 블록(resource block)은 12개의 서브캐리어로 구성되며, 주파수 도메인 상에서만 정의된다. 또한, 자원 요소(resource element)는 1개의 OFDM 심볼과 1개의 서브캐리어로 구성된다. 따라서, 도 3에서와 같이 하나의 자원 블록은 서브캐리어 간격에 따라 그 크기가 달라질 수 있다. 또한, NR에서는 자원 블록 그리드를 위한 공통 참조점 역할을 수행하는 "Point A"와 공통 자원 블록, 가상 자원 블록 등을 정의한다.A resource block consists of 12 subcarriers and is defined only in the frequency domain. Additionally, a resource element consists of one OFDM symbol and one subcarrier. Therefore, as shown in FIG. 3, the size of one resource block may vary depending on the subcarrier spacing. Additionally, NR defines "Point A", which serves as a common reference point for the resource block grid, common resource blocks, virtual resource blocks, etc.

도 4는 본 실시예가 적용될 수 있는 무선 접속 기술이 지원하는 대역폭 파트를 설명하기 위한 도면이다.Figure 4 is a diagram for explaining the bandwidth part supported by the wireless access technology to which this embodiment can be applied.

NR에서는 캐리어 대역폭이 20Mhz로 고정된 LTE와 달리 서브캐리어 간격 별로 최대 캐리어 대역폭이 50Mhz에서 400Mhz로 설정된다. 따라서, 모든 단말이 이러한 캐리어 대역폭을 모두 사용하는 것을 가정하지 않는다. 이에 따라서 NR에서는 도 4에 도시된 바와 같이 캐리어 대역폭 내에서 대역폭 파트를 지정하여 단말이 사용할 수 있다. 또한, 대역폭 파트는 하나의 뉴머롤러지와 연계되며 연속적인 공통 자원 블록의 서브 셋으로 구성되고, 시간에 따라 동적으로 활성화 될 수 있다. 단말에는 상향링크 및 하향링크 각각 최대 4개의 대역폭 파트가 구성되고, 주어진 시간에 활성화된 대역폭 파트를 이용하여 데이터가 송수신된다.In NR, unlike LTE, where the carrier bandwidth is fixed at 20Mhz, the maximum carrier bandwidth is set from 50Mhz to 400Mhz for each subcarrier interval. Therefore, it is not assumed that all terminals use all of these carrier bandwidths. Accordingly, in NR, the terminal can use a designated bandwidth part within the carrier bandwidth as shown in FIG. 4. Additionally, the bandwidth part is linked to one numerology and consists of a subset of consecutive common resource blocks, and can be activated dynamically over time. The terminal is configured with up to four bandwidth parts for each uplink and downlink, and data is transmitted and received using the bandwidth parts activated at a given time.

페어드 스펙트럼(paired spectrum)의 경우 상향링크 및 하향링크 대역폭 파트가 독립적으로 설정되며, 언페어드 스펙트럼(unpaired spectrum)의 경우 하향링크와 상향링크 동작 간에 불필요한 주파수 리튜닝(re-tunning)을 방지하기 위해서 하향링크와 상향링크의 대역폭 파트가 중심 주파수를 공유할 수 있도록 쌍을 이루어 설정된다.In the case of a paired spectrum, the uplink and downlink bandwidth parts are set independently, and in the case of an unpaired spectrum, to prevent unnecessary frequency re-tunning between downlink and uplink operations. For this purpose, the bandwidth parts of the downlink and uplink are set in pairs so that they can share the center frequency.

NR에서 단말은 기지국에 접속하여 통신을 수행하기 위해서 셀 검색 및 랜덤 액세스 절차를 수행한다.In NR, the terminal performs cell search and random access procedures to connect to the base station and perform communication.

셀 검색은 기지국이 전송하는 동기 신호 블록(SSB, Synchronization Signal Block)를 이용하여 단말이 해당 기지국의 셀에 동기를 맞추고, 물리계층 셀 ID를 획득하며, 시스템 정보를 획득하는 절차이다.Cell search is a procedure in which the terminal synchronizes to the cell of the base station, obtains a physical layer cell ID, and obtains system information using a synchronization signal block (SSB) transmitted by the base station.

도 5는 본 실시예가 적용될 수 있는 무선 접속 기술에서의 동기 신호 블록을 예시적으로 도시한 도면이다.Figure 5 is a diagram illustrating a synchronization signal block in a wireless access technology to which this embodiment can be applied.

도 5를 참조하면, SSB는 각각 1개 심볼 및 127개 서브 캐리어를 점유하는 PSS(primarysynchronization signal) 및 SSS(secondary synchronization signal) 및 3개의 OFDM 심볼 및 240 개의 서브캐리어에 걸쳐있는 PBCH로 구성된다.Referring to FIG. 5, the SSB is composed of a primary synchronization signal (PSS) and a secondary synchronization signal (SSS), each occupying 1 symbol and 127 subcarriers, and a PBCH spanning 3 OFDM symbols and 240 subcarriers.

단말은 시간 및 주파수 도메인에서 SSB를 모니터링하여 SSB를 수신한다.The terminal monitors the SSB in the time and frequency domains and receives the SSB.

SSB는 5ms 동안 최대 64번 전송될 수 있다. 다수의 SSB는 5ms 시간 내에서 서로 다른 전송 빔으로 전송되며, 단말은 전송에 사용되는 특정 하나의 빔을 기준으로 볼 때에는 20ms의 주기마다 SSB가 전송된다고 가정하고 검출을 수행한다. 5ms 시간 내에서 SSB 전송에 사용할 수 있는 빔의 개수는 주파수 대역이 높을수록 증가할 수 있다. 예를 들어, 3GHz 이하에서는 최대 4개의 SSB 빔 전송이 가능하며, 3~6GHz까지의 주파수 대역에서는 최대 8개, 6GHz 이상의 주파수 대역에서는 최대 64개의 서로 다른 빔을 사용하여 SSB를 전송할 수 있다.SSB can be transmitted up to 64 times in 5ms. Multiple SSBs are transmitted through different transmission beams within 5ms, and the terminal performs detection assuming that SSBs are transmitted every 20ms period based on one specific beam used for transmission. The number of beams that can be used for SSB transmission within 5ms time can increase as the frequency band becomes higher. For example, up to 4 different SSB beams can be transmitted under 3 GHz, up to 8 different beams can be used in the frequency band from 3 to 6 GHz, and up to 64 different beams can be used in the frequency band above 6 GHz.

SSB는 하나의 슬롯에 두 개가 포함되며, 서브캐리어 간격에 따라 아래와 같이 슬롯 내에서의 시작 심볼과 반복 횟수가 결정된다.Two SSBs are included in one slot, and the start symbol and number of repetitions within the slot are determined according to the subcarrier spacing as follows.

한편, SSB는 종래 LTE의 SS와 달리 캐리어 대역폭의 센터 주파수에서 전송되지 않는다. 즉, SSB는 시스템 대역의 중심이 아닌 곳에서도 전송될 수 있고, 광대역 운영을 지원하는 경우 주파수 도메인 상에서 복수의 SSB가 전송될 수 있다. 이에 따라서, 단말은 SSB를 모니터링 하는 후보 주파수 위치인 동기 래스터(synchronization raster)를 이용하여 SSB를 모니터링 한다. 초기 접속을 위한 채널의 중심 주파수 위치 정보인 캐리어래스터(carrier raster)와 동기 래스터는 NR에서 새롭게 정의되었으며, 동기 래스터는 캐리어래스터에 비해서, 주파수 간격이 넓게 설정되어 있어서, 단말의 빠른 SSB 검색을 지원할 수 있다.Meanwhile, unlike SS in conventional LTE, SSB is not transmitted at the center frequency of the carrier bandwidth. In other words, SSBs can be transmitted even in places other than the center of the system band, and when broadband operation is supported, multiple SSBs can be transmitted in the frequency domain. Accordingly, the terminal monitors the SSB using a synchronization raster, which is a candidate frequency location for monitoring the SSB. The carrier raster and synchronization raster, which are the center frequency location information of the channel for initial access, were newly defined in NR, and the synchronization raster has a wider frequency interval than the carrier raster, supporting fast SSB search of the terminal. You can.

단말은 SSB의 PBCH를 통해서 MIB를 획득할 수 있다. MIB(Master Information Block)는 단말이 네트워크가 브로드캐스팅 하는 나머지 시스템 정보(RMSI, Remaining Minimum System Information)를 수신하기 위한 최소 정보를 포함한다. 또한, PBCH는 시간 영역 상에서의 첫 번째 DM-RS 심볼의 위치에 대한 정보, SIB1을 단말이 모니터링하기 위한 정보(예를 들어, SIB1 뉴머롤러지 정보, SIB1 CORESET에 관련된 정보, 검색 공간 정보, PDCCH 관련 파라미터 정보 등), 공통 자원 블록과 SSB 사이의 오프셋 정보(캐리어 내에서의 절대 SSB의 위치는 SIB1을 통해서 전송) 등을 포함할 수 있다. 여기서, SIB1 뉴머롤러지 정보는 단말이 셀 검색 절차를 완료한 이후에 기지국에 접속하기 위한 랜덤 액세스 절차의 메시지 2와 메시지 4에서도 동일하게 적용된다.The UE can obtain the MIB through the PBCH of the SSB. MIB (Master Information Block) contains the minimum information required for the terminal to receive the remaining system information (RMSI, Remaining Minimum System Information) broadcast by the network. In addition, the PBCH includes information about the location of the first DM-RS symbol in the time domain, information for the terminal to monitor SIB1 (e.g., SIB1 numerology information, information related to SIB1 CORESET, search space information, PDCCH (related parameter information, etc.), offset information between the common resource block and the SSB (the position of the absolute SSB within the carrier is transmitted through SIB1), etc. Here, the SIB1 numerology information is equally applied to messages 2 and 4 of the random access procedure for accessing the base station after the terminal completes the cell search procedure.

전술한 RMSI는 SIB1(System Information Block 1)을 의미하며, SIB1은 셀에서 주기적으로(ex, 160ms) 브로드캐스팅 된다. SIB1은 단말이 초기 랜덤 액세스 절차를 수행하는데 필요한 정보를 포함하며, PDSCH를 통해서 주기적으로 전송된다. 단말이 SIB1을 수신하기 위해서는 PBCH를 통해서 SIB1 전송에 사용되는 뉴머롤러지 정보, SIB1의 스케줄링에 사용되는 CORESET(Control Resource Set) 정보를 수신해야 한다. 단말은 CORESET 내에서 SI-RNTI를 이용하여 SIB1에 대한 스케줄링 정보를 확인하고, 스케줄링 정보에 따라 SIB1을 PDSCH 상에서 획득한다. SIB1을 제외한 나머지 SIB들은 주기적으로 전송될 수도 있고, 단말의 요구에 따라 전송될 수도 있다.The above-described RMSI stands for SIB1 (System Information Block 1), and SIB1 is broadcast periodically (ex, 160ms) in the cell. SIB1 contains information necessary for the terminal to perform the initial random access procedure and is transmitted periodically through PDSCH. In order for the terminal to receive SIB1, it must receive numerology information used for SIB1 transmission and CORESET (Control Resource Set) information used for scheduling SIB1 through the PBCH. The UE uses SI-RNTI in CORESET to check scheduling information for SIB1 and acquires SIB1 on the PDSCH according to the scheduling information. Except for SIB1, the remaining SIBs may be transmitted periodically or according to the request of the terminal.

도 6는 본 실시예가 적용될 수 있는 무선 접속 기술에서의 랜덤 액세스 절차를 설명하기 위한 도면이다.Figure 6 is a diagram for explaining a random access procedure in wireless access technology to which this embodiment can be applied.

도 6을 참조하면, 셀 검색이 완료되면 단말은 기지국으로 랜덤 액세스를 위한 랜덤 액세스 프리앰블을 전송한다. 랜덤 액세스 프리앰블은 PRACH를 통해서 전송된다. 구체적으로, 랜덤 액세스 프리앰블은 주기적으로 반복되는 특정 슬롯에서 연속된 무선 자원으로 구성되는 PRACH를 통해서 기지국으로 전송된다. 일반적으로, 단말이 셀에 초기 접속하는 경우에 경쟁 기반 랜덤 액세스 절차를 수행되며, 빔 실패 복구(BFR, Beam Failure Recovery)를 위해서 랜덤 액세스를 수행하는 경우에는 비경쟁 기반 랜덤 액세스 절차가 수행된다. Referring to FIG. 6, when cell search is completed, the terminal transmits a random access preamble for random access to the base station. The random access preamble is transmitted through PRACH. Specifically, the random access preamble is transmitted to the base station through PRACH, which consists of continuous radio resources in a specific slot that is repeated periodically. Generally, when a UE initially accesses a cell, a contention-based random access procedure is performed, and when random access is performed for beam failure recovery (BFR), a non-contention-based random access procedure is performed.

단말은 전송한 랜덤 액세스 프리앰블에 대한 랜덤 액세스 응답을 수신한다. 랜덤 액세스 응답에는 랜덤 액세스 프리앰블식별자(ID), UL Grant (상향링크 무선자원), 임시 C-RNTI(Temporary Cell - Radio Network Temporary Identifier) 그리고 TAC(Time Alignment Command) 이 포함될 수 있다. 하나의 랜덤 액세스 응답에는 하나 이상의 단말들을 위한 랜덤 액세스 응답 정보가 포함될 수 있기 때문에, 랜덤 액세스 프리앰블식별자는 포함된 UL Grant, 임시 C-RNTI 그리고 TAC가 어느 단말에게 유효한지를 알려주기 위하여 포함될 수 있다. 랜덤 액세스 프리앰블식별자는 기지국이 수신한 랜덤 액세스 프리앰블에대한식별자일 수 있다. TAC는 단말이 상향 링크 동기를 조정하기 위한 정보로서 포함될 수 있다. 랜덤 액세스 응답은 PDCCH상의 랜덤 액세스 식별자, 즉 RA-RNTI(Random Access - Radio Network Temporary Identifier)에 의해지시될 수 있다.The terminal receives a random access response to the transmitted random access preamble. The random access response may include a random access preamble identifier (ID), UL Grant (uplink radio resource), temporary C-RNTI (Temporary Cell - Radio Network Temporary Identifier), and TAC (Time Alignment Command). Since one random access response may include random access response information for one or more terminals, the random access preamble identifier may be included to indicate to which terminal the included UL Grant, temporary C-RNTI, and TAC are valid. The random access preamble identifier may be an identifier for the random access preamble received by the base station. TAC may be included as information for the terminal to adjust uplink synchronization. The random access response may be indicated by a random access identifier on the PDCCH, that is, RA-RNTI (Random Access - Radio Network Temporary Identifier).

유효한 랜덤 액세스 응답을 수신한 단말은 랜덤 액세스 응답에 포함된 정보를 처리하고, 기지국으로 스케줄링된 전송을 수행한다. 예를 들어, 단말은 TAC을 적용시키고, 임시 C-RNTI를 저장한다. 또한, UL Grant를 이용하여, 단말의 버퍼에 저장된 데이터 또는 새롭게 생성된 데이터를 기지국으로 전송한다. 이 경우 단말을 식별할 수 있는 정보가 포함되어야 한다.The terminal that has received a valid random access response processes the information included in the random access response and performs scheduled transmission to the base station. For example, the terminal applies TAC and stores temporary C-RNTI. Additionally, using the UL Grant, data stored in the terminal's buffer or newly created data is transmitted to the base station. In this case, information that can identify the terminal must be included.

마지막으로 단말은 경쟁 해소를 위한 하향링크 메시지를 수신한다.Finally, the terminal receives a downlink message to resolve contention.

NR에서의 하향링크 제어채널은 1~3 심볼의 길이를 가지는 CORESET(Control Resource Set)에서 전송되며, 상/하향 스케줄링 정보와 SFI(Slot format Index), TPC(Transmit Power Control) 정보 등을 전송한다. The downlink control channel in NR is transmitted in CORESET (Control Resource Set) with a length of 1 to 3 symbols, and transmits uplink/downlink scheduling information, SFI (Slot format Index), and TPC (Transmit Power Control) information. .

이와 같이 NR에서는 시스템의 유연성을 확보하기 위해서, CORESET 개념을 도입하였다. CORESET(Control Resource Set)은 하향링크 제어 신호를 위한 시간-주파수 자원을 의미한다. 단말은 CORESET 시간-주파수 자원에서 하나 이상의 검색 공간을 사용하여 제어 채널 후보를 디코딩할 수 있다. CORESET 별 QCL(Quasi CoLocation) 가정을 설정하였으며, 이는 종래 QCL에 의해서 가정되는 특성인 지연 스프레드, 도플러 스프레드, 도플러 쉬프트, 평균 지연 외에 아날로그 빔 방향에 대한 특성을 알리기 위한 목적으로 사용된다. In this way, in order to secure the flexibility of the system, NR introduced the CORESET concept. CORESET (Control Resource Set) refers to time-frequency resources for downlink control signals. The terminal may decode the control channel candidate using one or more search spaces in the CORESET time-frequency resource. QCL (Quasi CoLocation) assumptions were set for each CORESET, and this is used for the purpose of informing the characteristics of the analog beam direction in addition to the delay spread, Doppler spread, Doppler shift, and average delay, which are the characteristics assumed by the conventional QCL.

도 7은 CORESET에 대해서 설명하기 위한 도면이다. Figure 7 is a diagram to explain CORESET.

도 7을 참조하면, CORESET은 하나의 슬롯 내에서 캐리어 대역폭 내에서 다양한 형태로 존재할 수 있으며, 시간 영역 상에서 CORESET은 최대 3개의 OFDM 심볼로 구성될 수 있다. 또한, CORESET은 주파수 도메인 상에서 캐리어 대역폭까지 6개의 자원 블록의 배수로 정의된다.Referring to FIG. 7, CORESET may exist in various forms within one slot and within the carrier bandwidth, and in the time domain, CORESET may be composed of up to three OFDM symbols. Additionally, CORESET is defined as a multiple of 6 resource blocks from the frequency domain to the carrier bandwidth.

첫 번째 CORESET은 네트워크로부터 추가 구성 정보 및 시스템 정보를 수신할 수 있도록 초기 대역폭 파트 구성의 일부로 MIB를 통해서 지시된다. 기지국과의 연결 설정 후에 단말은 RRC 시그널링을 통해서 하나 이상의 CORESET 정보를 수신하여 구성할 수 있다.The first CORESET is directed through the MIB as part of the initial bandwidth part configuration to enable it to receive additional configuration and system information from the network. After establishing a connection with the base station, the terminal can receive and configure one or more CORESET information through RRC signaling.

본 명세서에서 NR(New Radio)과 관련한 주파수, 프레임, 서브프레임, 자원, 자원블럭, 영역(region), 밴드, 서브밴드, 제어채널, 데이터채널, 동기신호, 각종 참조신호, 각종 신호, 각종 메시지는 과거 또는 현재 사용되는 의미 또는 장래 사용되는 다양한 의미로 해석될 수 있다.In this specification, frequencies, frames, subframes, resources, resource blocks, regions, bands, subbands, control channels, data channels, synchronization signals, various reference signals, various signals, and various messages related to NR (New Radio) can be interpreted in a variety of meanings that may be used in the past or present, or may be used in the future.

도 8은 본 개시에 따른 무선 통신 시스템에서 통신 빔 형성을 설명하기 위한 도면이다.FIG. 8 is a diagram for explaining communication beam formation in a wireless communication system according to the present disclosure.

일 예로, 본 개시에 따른 무선 통신 시스템에 포함되는 각 기지국 및 통신 장치는 적어도 하나 이상의 안테나 장치를 포함할 수 있고, 각 안테나 장치는 미리 설정된 간격에 따라 배열된 안테나 어레이(Antenna Array) 형태로 구성될 수 있다. 이 경우, 송신 안테나 및 수신 안테나 모두 안테나 어레이로 구성될 수도 있고, 송신 안테나 또는 수신 안테나 중 하나만 안테나 어레이로 구성될 수도 있다.As an example, each base station and communication device included in the wireless communication system according to the present disclosure may include at least one antenna device, and each antenna device is configured in the form of an antenna array arranged according to a preset interval. It can be. In this case, both the transmitting antenna and the receiving antenna may be configured as an antenna array, or only one of the transmitting antenna or the receiving antenna may be configured as an antenna array.

또한, 각 안테나 장치에서는 방사되는 빔을 조합하여, 특정한 방향으로 강하게 빔 형성(Beam Forming)을 하는 방식으로 송수신이 이루어질 수 있다.In addition, each antenna device can transmit and receive by combining radiated beams and performing strong beam forming in a specific direction.

일 예로, 빔 형성은 기지국에서 단말에 신호를 전송하는 하향링크에서 수행되는 하향링크 빔 형성과, 단말에서 기지국에 신호를 전송하는 상향링크에서 수행되는 상향링크 빔 형성으로 구분될 수 있다.For example, beam forming can be divided into downlink beam forming performed in the downlink, where a base station transmits a signal to the terminal, and uplink beam forming, performed in the uplink where the terminal transmits a signal to the base station.

일 예로, 빔 형성은 각 안테나로부터 송신되는 신호에 기초하여 빔을 형성하는 송신 빔 형성과, 각 안테나가 수신하는 신호에 기초하여 빔을 형성하는 수신 빔 형성으로 구분될 수 있다.For example, beam forming can be divided into transmission beam forming, which forms a beam based on signals transmitted from each antenna, and reception beam forming, which forms beams based on signals received by each antenna.

일 예로, 빔 형성은 각 장치가 자신의 위치 정보를 타 장치와 피드백하는 방식으로 이루어지는 피드백 빔 형성, 각 장치가 타 장치의 방향을 측정하여 전파를 방사하는 방향 빔 형성으로 구분될 수 있다.For example, beam forming can be divided into feedback beam forming, where each device feeds back its location information to other devices, and directional beam forming, where each device measures the direction of other devices and radiates radio waves.

또한, 단말이 이동하는 경우, 이동하는 단말을 추적하면서 관련된 정보들을 업데이트하여 지속적으로 빔 형성이 이루어지도록 빔 추적 및 빔 폭 제어를 수행하는 구성을 포함할 수 있다. In addition, when the terminal moves, it may include a configuration that performs beam tracking and beam width control to continuously perform beam formation by tracking the moving terminal and updating related information.

그리고, 빔 형성은 지속적으로 수행될 수 있다. 일 예로, 단말이 이동하는 경우, 빔 형성은 이동하는 단말에 대한 빔 추적 및 빔 폭 제어를 수행하면서 업데이트되는 정보들에 기초하여 지속적으로 수행될 수 있다.And, beam forming can be performed continuously. For example, when a terminal moves, beam forming may be continuously performed based on information that is updated while performing beam tracking and beam width control for the moving terminal.

도 8을 참조하면, 기지국(800)과 제 1 단말(810), 제 2 단말(820) 및 제 3 단말(830)은 각각 서로 신호를 송수신할 수 있다. 여기서, 기지국(800)은 제 1 단말(810), 제 2 단말(820) 및 제 3 단말(830) 모두와 동시에 신호를 송수신할 수도 있고, 또는 각 단말과 서로 다른 시점에 신호를 송수신할 수도 있다.Referring to FIG. 8, the base station 800, the first terminal 810, the second terminal 820, and the third terminal 830 can transmit and receive signals with each other. Here, the base station 800 may transmit and receive signals simultaneously with all of the first terminal 810, the second terminal 820, and the third terminal 830, or may transmit and receive signals with each terminal at different times. there is.

이 경우, 제 1 단말(810), 제 2 단말(820) 및 제 3 단말(830)이 기지국(800)에게 송신하는 각각의 신호는 반송 신호나 특정 타이밍을 얻기 위한 트레이닝 신호(training signal) 또는 파일럿 신호(pilot signal)를 포함할 수 있다.In this case, each signal transmitted by the first terminal 810, the second terminal 820, and the third terminal 830 to the base station 800 is a carrier signal, a training signal to obtain a specific timing, or It may include a pilot signal.

일 예로, 제 2 단말(820)은 기지국(800)에게 트레이닝 신호를 송신할 수 있다. 기지국(800)은 수신되는 트레이닝 신호에 기초하여 일정한 신호를 송신할 수 있다. 여기서, 기지국(800)의 송신 신호는 전 방위를 대상으로 송신되는 신호일 수도 있고, 특정 방향으로 송신되는 신호일 수도 있다.As an example, the second terminal 820 may transmit a training signal to the base station 800. The base station 800 may transmit a certain signal based on the received training signal. Here, the transmission signal of the base station 800 may be a signal transmitted in all directions or may be a signal transmitted in a specific direction.

예를 들면, 기지국(800)은, 제 2 단말(820)에서 송신되는 트레이닝 신호에 대한 수신 신호에 기초하여 제 2 단말(820)이 위치한 방향이라고 추정되는 방향으로 빔을 형성할 수 있다. 또한, 기지국(800)은, 제 2 단말(820)의 위치가 변경되는 경우, 변경된 위치를 추적할 수 있다. 그리고 추적 결과에 기초하여 빔 형성 방향을 변경하고, 빔 폭을 조정할 수 있다.For example, the base station 800 may form a beam in a direction estimated to be the direction in which the second terminal 820 is located based on the received signal for the training signal transmitted from the second terminal 820. Additionally, when the location of the second terminal 820 changes, the base station 800 can track the changed location. And based on the tracking results, the beam forming direction can be changed and the beam width can be adjusted.

위와 같이, 이동하는 단말에 대한 빔 형성을 수행하는 구체적인 구성에 관하여는 이하 도 10 내지 도 12에서 보다 자세히 설명하기로 한다.As above, the specific configuration for performing beam forming for a moving terminal will be described in more detail in FIGS. 10 to 12 below.

도 9는 일 실시예에 따른 강화 학습이 수행되는 구성을 예시적으로 설명하기 위한 도면이다.Figure 9 is a diagram for exemplarily explaining a configuration in which reinforcement learning is performed according to an embodiment.

도 9를 참조하면, 일 실시예에 따른 강화 학습은 학습 환경(910)으로부터 상태 S(930)를 관측하고, 학습 에이전트(920)가 상태 S(930)에 기초하여 행동 A(940)를 선택하여 수행하며, 상태 S(930)에서의 행동 A(940)에 따른 보상 R(950)이 산출되는 구성을 포함할 수 있다.9, reinforcement learning according to one embodiment observes a state S 930 from a learning environment 910, and a learning agent 920 selects action A 940 based on state S 930. It is performed and may include a configuration in which compensation R (950) is calculated according to action A (940) in state S (930).

여기서, 보상 R(950)의 산출은, 학습 에이전트(920)가 상태 S(930)에서 취할 수 있는 적어도 하나 이상의 행동 중에서, 기대 보상값을 최대화 할 수 있는 행동 A(940)를 선택하고, 상태 S(930) 및 선택된 행동 A(940)에 기초하여 보상 R(950)이 산출되는 구성을 포함할 수 있다.Here, the calculation of reward R (950) is performed by selecting action A (940) that can maximize the expected reward value from among at least one action that the learning agent (920) can take in state S (930), and It may include a configuration in which a reward R (950) is calculated based on S (930) and the selected action A (940).

여기서, 기대 보상값은, 특정 상태에서 특정 행동을 선택하여 수행했을 경우에 산출될 것으로 기대되는 보상을 의미할 수 있다. 이러한 기대 보상값은, 상태에 따라 달라질 수 있고, 선택되는 행동이 무엇인가에 따라서도 달라질 수 있다. 또한, 각각의 강화 학습 수행 방법마다, 기대 보상값을 산출하는 방법이 달라질 수 있다.Here, the expected reward value may refer to the reward expected to be calculated when a specific action is selected and performed in a specific state. This expected reward value may vary depending on the state and may also vary depending on the selected action. Additionally, for each reinforcement learning performance method, the method of calculating the expected reward value may vary.

또한, 이러한 강화 학습은, 행동 A(940)가 수행된 후 학습 환경(910)의 다음 상태를 관측하여 상태 S(930)를 업데이트하고, 업데이트된 상태 S(930)에 기초하여 행동 A(940)가 업데이트 되며, 그에 따라 보상 R(950)이 새롭게 산출되는 구성이 반복적으로 이루어지는 것을 포함할 수 있다.In addition, this reinforcement learning updates the state S (930) by observing the next state of the learning environment (910) after action A (940) is performed, and performs action A (940) based on the updated state S (930). ) is updated, and the compensation R (950) may be newly calculated accordingly, which may include being repeatedly performed.

일 예로, 일 실시예에 따른 강화 학습은, Q 함수(Q-Function)에 기초하여 Q 값(Q-Value)을 산출하는 Q 학습(Q-learning)을 수행하는 구성을 포함할 수 있다. 여기서 Q 학습은, 마르코프 결정 과정(Marcov Decision Process)에 기초하여 최적의 정책을 찾는 구성을 포함할 수 있다.As an example, reinforcement learning according to an embodiment may include a component that performs Q-learning to calculate a Q-Value based on a Q-Function. Here, Q learning may include finding an optimal policy based on a Markov Decision Process.

그리고 Q 함수는, 일정한 상태가 주어지고, 해당 상태에서 일정한 행동이 수행되는 경우, 그러한 행동의 수행이 가져다 줄 보상의 기대값에 관한 변수인 Q 값을 예측하는 것을 포함할 수 있다.And the Q function may include predicting the Q value, which is a variable related to the expected value of reward that performance of such action will bring when a certain state is given and a certain action is performed in that state.

또한, Q 함수에 기초하여 Q 학습을 수행하는 경우, 각 상태에서 최고의 Q값을 주는 행동을 선택하여 수행할 수 있다. 일 예로, Q 함수는 미리 설정된 Q 테이블에 기초하여 Q 값을 산출하는 구성을 포함할 수 있다.Additionally, when performing Q learning based on the Q function, the action that gives the highest Q value in each state can be selected and performed. As an example, the Q function may include a component that calculates the Q value based on a preset Q table.

예를 들면, 학습 에이전트(920)는, 상태 S(930)에서 Q 함수에 기초하여 Q 값을 최대화 할 수 있는 행동을 선택할 수 있다. 여기서, Q 함수를 이용하여 선택된 행동이 행동 A(940)인 경우, 상태 S(930) 및 상태 S(930)에서의 행동 A(940)에 기초하여 Q 값 및 보상 R(950)이 산출될 수 있다.For example, learning agent 920 may select an action that can maximize the value of Q based on the Q function in state S 930. Here, if the action selected using the Q function is action A (940), the Q value and reward R (950) will be calculated based on state S (930) and action A (940) in state S (930). You can.

또한, 행동 A(940)가 수행된 후 학습 환경(910)의 상태를 다시 관측하여 상태 S(930)를 업데이트할 수 있고, 업데이트된 상태 S(930)에서 최대 Q 값을 가지는 행동을 선택하여 행동 A(940)를 업데이트할 수 있다. 또한, 상태 S(930) 및 행동 A(940) 각각의 업데이트된 값에 기초하여, Q 함수를 업데이트할 수 있다.In addition, after action A (940) is performed, the state S (930) can be updated by observing the state of the learning environment (910) again, and the action with the maximum Q value is selected from the updated state S (930) Action A (940) can be updated. Additionally, the Q function may be updated based on the updated values of state S (930) and action A (940), respectively.

즉, 상태 S(930) 관측, 행동 A(940) 선택 및 수행, 보상 R(950) 산출을 Q 함수에 기초하여 반복적으로 수행함으로써, 누적 보상을 최대화 할 수 있는 최적의 정책을 학습할 수 있다.In other words, by repeatedly observing state S (930), selecting and performing action A (940), and calculating reward R (950) based on the Q function, an optimal policy that can maximize cumulative reward can be learned. .

다른 예로, 일 실시예에 따른 강화 학습은, 심층 강화 학습을 수행하는 구성을 포함할 수 있다. 여기서 심층 강화 학습은, 상태 관측, 기대 보상값을 최대화할 수 있는 행동 선택 및 그에 따른 보상 산출을 포함하는 학습 과정을 심층 신경망(Deep Neural Network)을 이용하여 수행하는 것을 포함할 수 있다.As another example, reinforcement learning according to an embodiment may include a component that performs deep reinforcement learning. Here, deep reinforcement learning may include performing a learning process including state observation, action selection that can maximize the expected reward value, and calculation of the resulting reward using a deep neural network.

예를 들면, 학습 에이전트(920)는, 관측된 상태 S(930)에서 기대 보상값을 최대화 할 수 있는 행동 A(940)을 선택하고, 행동 A(940) 수행에 따른 보상 R(950)을 산출하는 것을 심층 신경망을 이용하여 수행할 수 있다.For example, the learning agent 920 selects action A (940) that can maximize the expected reward value in the observed state S (930) and provides reward R (950) according to performing action A (940). The calculation can be performed using a deep neural network.

다른 예로, 일 실시예에 따른 강화 학습은, DQN(Deep Q Network)을 이용하여 수행하는 구성을 포함할 수 있다. 여기서 DQN은, Q 함수를 이용하여 Q 값을 산출하는 Q 학습을 수행하되, Q 함수의 세부 내용을 심층 신경망을 이용하여 구성하는 것을 포함할 수 있다. 일 예로, Q 함수는, 심층 신경망을 이용하여 Q 값을 산출하는 구성을 포함할 수 있다.As another example, reinforcement learning according to an embodiment may include a configuration performed using a Deep Q Network (DQN). Here, DQN may include performing Q learning to calculate the Q value using the Q function, and configuring the details of the Q function using a deep neural network. As an example, the Q function may include a component that calculates the Q value using a deep neural network.

예를 들면, 학습 에이전트(920)는, 상태 S(930)에서 심층 신경망을 이용하는 Q 함수에 기초하여 Q 값을 최대화 할 수 있는 행동을 선택할 수 있다. 여기서, 심층 신경망을 이용하는 Q 함수를 이용하여 선택된 행동이 행동 A(940)인 경우, 상태 S(930) 및 상태 S(930)에서의 행동 A(940)에 기초하여 Q 값 및 보상 R(950)이 산출될 수 있다.For example, learning agent 920 may select an action that can maximize the Q value in state S 930 based on a Q function using a deep neural network. Here, when the action selected using the Q function using a deep neural network is action A (940), the Q value and reward R (950) are based on state S (930) and action A (940) in state S (930). ) can be calculated.

또한, 행동 A(940)가 수행된 후 학습 환경(910)의 상태를 다시 관측하여 상태 S(930)를 업데이트할 수 있고, 업데이트된 상태 S(930)에서 최대 Q 값을 가지는 행동을 선택하여 행동 A(940)를 업데이트할 수 있다. 또한, 상태 S(930) 및 행동 A(940) 각각의 업데이트된 값에 기초하여, 심층 신경망 및 심층 신경망을 이용하는 Q 함수를 업데이트할 수 있다.In addition, after action A (940) is performed, state S (930) can be updated by observing the state of the learning environment (910) again, and the action with the maximum Q value is selected from the updated state S (930) Action A (940) can be updated. Additionally, based on the updated values of state S (930) and action A (940), the deep neural network and the Q function using the deep neural network may be updated.

즉, 상태 S(930) 관측, 행동 A(940) 선택 및 수행, 보상 R(950) 산출을 심층 신경망을 이용하는 Q 함수에 기초하여 반복적으로 수행함으로써, 누적 보상을 최대화 할 수 있는 최적의 정책을 학습할 수 있다.In other words, by repeatedly performing observation of state S (930), selection and performance of action A (940), and calculation of reward R (950) based on the Q function using a deep neural network, an optimal policy that can maximize cumulative reward is established. You can learn.

이상에서 설명한 바와 같이, 일 실시예에 따른 강화 학습은, 상태 S(930) 관측, 기대 보상값을 최대화 할 수 있는 행동 A(940) 선택 및 수행, 보상 R(950) 산출을 포함할 수 있고, 이러한 강화 학습은 Q 학습을 이용하는 구성, 심층 신경망을 이용하는 구성, DQN을 이용하는 구성 중 적어도 하나를 포함할 수 있다.As described above, reinforcement learning according to one embodiment may include observing the state S (930), selecting and performing action A (940) that can maximize the expected reward value, and calculating the reward R (950), , This reinforcement learning may include at least one of a configuration using Q learning, a configuration using a deep neural network, and a configuration using DQN.

그리고 이러한 강화 학습에 관한 내용은, 본 개시에 따른 강화 학습 기반의 빔 추적 및 빔 폭 제어를 수행하는 구성에서도 모두 적용될 수 있다.And this content regarding reinforcement learning can also be applied to a configuration that performs reinforcement learning-based beam tracking and beam width control according to the present disclosure.

일 예로, 도 9에서 설명한 강화 학습은, 이하에서 설명할 기지국의 학습부에서 모두 수행될 수 있다. 보다 구체적인 예를 들어 설명하면, 학습 환경(910)으로부터 관측되는 상태 S(930)는, 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보로부터 산출되는 상태 정보를 포함할 수 있다. 그리고, 상태 S(930)에 기초하여 선택되는 행동 A(940)는, 상태 정보에 기초하여 선택되는 방향 추정 정보 및 빔 폭 제어 정보를 포함할 수 있다. 또한, 상태 S(930) 및 행동 A(940)에 기초하여 산출되는 보상 R(950)은, 상태 정보, 방향 추정 정보 및 빔 폭 제어 정보에 기초하여 산출되는 학습 보상 정보를 포함할 수 있다.As an example, the reinforcement learning described in FIG. 9 may be performed in the learning unit of the base station, which will be described below. To explain with a more specific example, the state S 930 observed from the learning environment 910 may include state information calculated from a reception signal received from the terminal and information on the number of antennas used for beam forming. And, action A (940) selected based on state S (930) may include direction estimation information and beam width control information selected based on state information. Additionally, the reward R (950) calculated based on the state S (930) and the action A (940) may include learning reward information calculated based on state information, direction estimation information, and beam width control information.

위와 같이 강화 학습을 이용하여 빔 형성 제어를 수행하는 구성에 관하여는 이하 도 10 내지 도 12에서 보다 자세히 설명하기로 한다.The configuration of performing beam forming control using reinforcement learning as described above will be described in more detail in FIGS. 10 to 12 below.

도 10은 본 개시에 따른 무선 통신을 수행하는 기지국에 관한 블록도이다.Figure 10 is a block diagram of a base station performing wireless communication according to the present disclosure.

도 10을 참조하면, 본 개시에 따른 기지국(1000)은, 학습부(1010) 및 제어부(1020) 중 적어도 하나를 포함할 수 있다. 그리고 학습부(1010)와 제어부(1020)는 서로 연결될 수 있다.Referring to FIG. 10, the base station 1000 according to the present disclosure may include at least one of a learning unit 1010 and a control unit 1020. And the learning unit 1010 and the control unit 1020 may be connected to each other.

일 예로, 기지국(1000)은, 적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상태 정보 및 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 학습을 수행하는 학습부(1010)와, 빔 제어 정보에 기초하여 단말에 대한 통신 빔을 형성하는 빔 형성 제어를 수행하는 제어부(1020)를 포함할 수 있다.As an example, the base station 1000 calculates status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, and provides beam control information based on the status information and preset action information. A learning unit 1010 that performs learning to calculate learning compensation information based on status information and beam control information, and a control unit that performs beam forming control to form a communication beam for the terminal based on the beam control information. (1020) may be included.

그리고 빔 제어 정보는, 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함할 수 있다.And the beam control information may include direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam.

학습부(1010)는, 각 단말로부터 수신되는 수신 신호 및 안테나 수 정보에 기초하여 상태 정보를 산출할 수 있다. 여기서, 수신 신호는 단말이 기지국(1000)에 대하여 송신하는 트레이닝 신호를 포함할 수 있다.The learning unit 1010 may calculate status information based on the received signal and antenna number information received from each terminal. Here, the received signal may include a training signal transmitted by the terminal to the base station 1000.

일 예로, 수신 신호는 시간 경과에 따라 계속하여 수신될 수 있다. 예를 들면, 미리 설정된 시간마다 일정한 횟수로 수신 신호가 수신될 수 있고, 각 수신 신호에 기초하여 각각 서로 다른 상태 정보가 산출될 수 있다.As an example, the received signal may be continuously received over time. For example, a received signal may be received a certain number of times per preset time, and different state information may be calculated based on each received signal.

일 예로, 학습부(1010)는, 수신 신호를 다양한 방식으로 수신하여 상태 정보를 산출할 수 있다. 일 예로, 상태 정보 산출은, 수신 신호의 일부 또는 전부를 추출하는 방식, 수신 신호의 일부 또는 전부를 일정한 방식으로 변환하는 방식, 수신 신호로부터 추출된 정보 또는 변환된 정보에 기초하여 일정한 정보를 산출하는 방식 중 적어도 하나를 이용하여 이루어질 수 있다.As an example, the learning unit 1010 may receive the received signal in various ways and calculate state information. As an example, state information calculation includes a method of extracting part or all of a received signal, a method of converting part or all of a received signal in a certain way, and calculating certain information based on information extracted from the received signal or converted information. This can be done using at least one of the following methods.

일 예로, 상태 정보는, 수신 신호에 관한 빔 형성 벡터 정보 및 채널 벡터 정보를 포함할 수 있다. 빔 형성 벡터 정보는, 단말로부터의 수신 신호에 기초하여 형성되는 통신 빔에 관한 정보를 벡터 형식으로 나타낸 정보를 포함할 수 있다. 채널 벡터 정보는 기지국(1000)과 단말 간의 통신 채널에 관한 정보를 벡터 형식으로 나타낸 정보를 포함할 수 있다.As an example, the status information may include beam forming vector information and channel vector information regarding the received signal. Beam forming vector information may include information about a communication beam formed based on a signal received from a terminal expressed in vector format. Channel vector information may include information about the communication channel between the base station 1000 and the terminal expressed in vector format.

일 예로, 빔 형성 벡터 정보는, 단말로부터의 수신 신호에 기초하여 산출되는 도래각에 관한 정보 또는 도래각에 기초하여 단말의 방향을 추정한 방향 정보를 하나의 성분으로 포함할 수 있다. 또한, 빔 형성 벡터 정보는, 통신 빔의 빔 폭에 관한 정보와, 통신 빔의 빔 이득에 관한 정보 등을 하나의 성분으로 더 포함할 수 있다.As an example, the beam forming vector information may include, as one component, information about the angle of arrival calculated based on a signal received from the terminal or direction information estimating the direction of the terminal based on the angle of arrival. Additionally, the beam forming vector information may further include information about the beam width of the communication beam, information about the beam gain of the communication beam, etc. as one component.

일 예로, 채널 벡터 정보는 기지국(1000)과 단말 간의 통신 링크에서 채널 특성에 관한 정보를 벡터 형식으로 나타낸 정보를 포함할 수 있다. 경우에 따라 채널 벡터 정보는 단말로부터 기지국(1000)이 수신하는 수신 신호에 관한 채널 상태 정보(Channel State Information, CSI)에 기초하여 산출될 수 있다.As an example, channel vector information may include information about channel characteristics in a communication link between the base station 1000 and a terminal expressed in a vector format. In some cases, channel vector information may be calculated based on channel state information (CSI) regarding the received signal that the base station 1000 receives from the terminal.

한편, 상태 정보는, 수신 신호 세기 정보를 포함할 수 있다. 여기서, 수신 신호 세기 정보는, RSSI (Received Signal Strength Indicatior), RSRP (Reference Signal Received Power), RSRQ (Reference Signal Received Quality) 등 수신 신호의 세기를 표현할 수 있는 정보라면 어떠한 구성이든 포함될 수 있을 것이다.Meanwhile, the status information may include received signal strength information. Here, the received signal strength information may include any information that can express the strength of the received signal, such as Received Signal Strength Indicatior (RSSI), Reference Signal Received Power (RSRP), and Reference Signal Received Quality (RSRQ).

일 예로, 수신 신호 세기 정보는, 수신 신호에 포함된 정보를 추출하는 방식으로 산출될 수 있다. 또는, 수신 신호에 포함된 다른 정보를 이용하여 수신 신호 세기를 계산하는 방식으로 산출될 수도 있다. 예를 들면, 빔 형성 벡터 정보, 채널 벡터 정보 및 안테나 수 정보를 이용하여 수신 신호 세기에 관한 계산식을 설정하고, 설정된 계산식에 기초하여 수신 신호 세기 정보를 산출할 수 있다.As an example, received signal strength information may be calculated by extracting information included in the received signal. Alternatively, the received signal strength may be calculated using other information included in the received signal. For example, a calculation formula for received signal strength can be set using beam forming vector information, channel vector information, and antenna number information, and received signal strength information can be calculated based on the set calculation formula.

학습부(1010)는, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출할 수 있다. 이 경우 액션 정보는, 빔 방향 액션을 수행하기 위한 방향 액션 정보 및 빔 폭 액션을 수행하기 위한 빔 폭 액션 정보를 포함할 수 있다. 그리고 빔 제어 정보는, 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함할 수 있다.The learning unit 1010 may calculate beam control information based on status information and preset action information. In this case, the action information may include direction action information for performing a beam direction action and beam width action information for performing a beam width action. And the beam control information may include direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam.

일 예로, 액션 정보는, 학습부(1010)가 빔 제어 정보를 산출하기 위해 취할 수 있는 액션에 관하여 미리 설정된 정보들을 포함할 수 있다. 예를 들면, 액션 정보는, 상태 정보에 해당하는 상태에서 미리 설정된 방향 간격에 따라 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하기 위한 방향 액션 정보 및 미리 설정된 빔 폭 게인에 따라 통신 빔의 빔 폭을 변화시키는 빔 폭 액션을 수행하기 위한 빔 폭 액션 정보를 포함할 수 있다. 그리고 학습부(1010)는, 이러한 방향 액션 정보 및 빔 폭 액션 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다.As an example, the action information may include preset information regarding actions that the learning unit 1010 can take to calculate beam control information. For example, the action information includes direction action information for performing a beam direction action that changes the direction of the communication beam according to a preset direction interval in a state corresponding to status information, and beam direction action information for performing a beam direction action to change the direction of the communication beam according to a preset beam width gain. It may include beam width action information for performing a beam width action to change the width. And the learning unit 1010 can calculate direction estimation information and beam width control information based on the direction action information and beam width action information.

일 예로, 방향 액션 정보는, 미리 설정된 방향 간격에 따라 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하기 위한 정보들을 포함할 수 있다. 이 경우, 방향 액션 정보는, 기지국이 신호를 송수신할 수 있는 적어도 하나 이상의 방향에 관한 정보를 포함할 수 있다.As an example, the direction action information may include information for performing a beam direction action that changes the direction of a communication beam according to a preset direction interval. In this case, the direction action information may include information about at least one direction in which the base station can transmit and receive signals.

일 예로, 방향 액션 정보는, 기지국(1000)이 신호를 송수신할 수 있는 각 방향에 관한 정보가 미리 설정된 형태일 수 있다. 예를 들어, 360°의 전 방위에서 신호를 송수신할 수 있는 기지국인 경우, 10°를 단위 간격으로 하여 36개의 방향 액션 정보가 미리 설정될 수 있다. 이 경우, 기지국(1000)에서 송수신 가능한 방위의 범위, 방향 정보의 수 및 단위 간격 등은 얼마든지 다양하게 설정될 수 있다.As an example, the direction action information may be in the form of preset information about each direction in which the base station 1000 can transmit and receive signals. For example, in the case of a base station capable of transmitting and receiving signals in all directions of 360°, 36 pieces of directional action information can be preset at unit intervals of 10°. In this case, the range of directions that can be transmitted and received by the base station 1000, the number of direction information, and unit intervals can be set in various ways.

일 예로, 방향 액션 정보는, 기존에 이미 기지국(1000)으로부터 형성된 통신 빔의 방향에 기초하여, 해당 방향에 대한 일정한 크기의 방향 간격의 형태로 설정될 수 있다. 예를 들어, 방향 액션 정보는, 방향 간격을 10°로 설정하고, 기존 통신 빔의 방향에 대해 -10°액션, +0°액션 및 +10°액션의 3가지 빔 방향 액션 에 관한 정보를 포함할 수 있다.As an example, the direction action information may be set in the form of a directional interval of a certain size for that direction, based on the direction of a communication beam already formed from the base station 1000. For example, the direction action information sets the direction interval to 10° and includes information about three beam direction actions: -10° action, +0° action, and +10° action with respect to the direction of the existing communication beam. can do.

일 예로, 빔 폭 액션 정보는, 미리 설정된 빔 폭 게인에 따라 상기 통신 빔의 빔 폭을 변화시키는 빔 폭 액션을 수행하기 위한 정보들을 포함할 수 있다. 이 경우, 빔 폭 액션 정보는, 기지국(1000)이 형성할 수 있는 적어도 하나 이상의 빔 폭에 관한 정보를 포함할 수 있다. 경우에 따라, 빔 폭에 관한 정보는, 빔의 각도 크기에 기초한 형태의 정보일 수도 있고, 또는 배열 안테나 중에서 빔 형성에 이용되는 안테나의 수에 기초한 형태의 정보일 수도 있다.As an example, the beam width action information may include information for performing a beam width action that changes the beam width of the communication beam according to a preset beam width gain. In this case, the beam width action information may include information about at least one beam width that the base station 1000 can form. In some cases, information about the beam width may be information based on the angular size of the beam, or may be information based on the number of antennas used for beam formation among array antennas.

일 예로, 빔 폭 액션 정보는, 기지국(1000)이 형성할 수 있는 각각의 빔 폭에 관한 정보가 미리 설정된 형태일 수 있다. 예를 들어, 15°, 30° 및 60°의 빔 폭 중 하나로 빔을 형성할 수 있도록 설정되어 있는 경우, 빔 폭 액션 정보는, 15°에 해당하는 빔 폭 액선, 30°에 해당하는 빔 폭 액션 및 60°에 해당하는 빔 폭 액션에 관한 정보를 포함할 수 잇다.As an example, the beam width action information may be in a form in which information about each beam width that the base station 1000 can form is preset. For example, if you are set to form a beam with one of the beam widths of 15°, 30°, and 60°, the beam width action information is: the beam width line for 15°, the beam width for 30° It may contain information about the action and the beam width action corresponding to 60°.

일 예로, 빔 폭 액션 정보는, 기존에 이미 기지국(1000)으로부터 형성된 통신 빔의 빔 폭에 기초하여, 해당 빔 폭에 곱하여 새로운 빔 폭을 산출할 수 있도록 설정된 빔 폭 게인에 기초하여 설정될 수 있다. 예를 들어, 빔 폭 액션 정보는, 빔 폭 게인을 2로 설정하고, 기존 통신 빔의 빔 폭에 대해 *1/2 액션, *1 액션 및 *2 액션의 3가지 빔 폭 액션에 관한 정보를 포함할 수 있다.As an example, the beam width action information may be set based on the beam width of a communication beam already formed from the base station 1000 and a beam width gain set to calculate a new beam width by multiplying the corresponding beam width. there is. For example, the beam width action information sets the beam width gain to 2 and provides information about three beam width actions: *1/2 action, *1 action, and *2 action for the beam width of the existing communication beam. It can be included.

보다 구체적인 예를 들면, 빔 폭에 관한 정보를 빔의 각도 크기에 기초한 형태로 설정하고, 기존 빔 폭이 30°인 경우, *1/2 액션을 수행하면 15°, *1 액션을 수행하면 30° 및 *2 액션을 수행하면 60°의 빔 폭에 관한 빔 폭 액션 정보가 각각 산출될 수 있다.For a more specific example, information about the beam width is set in a form based on the angular size of the beam, and if the existing beam width is 30°, performing *1/2 action will result in 15°, and performing *1 action will result in 30°. By performing the ° and *2 actions, beam width action information regarding a beam width of 60° can be calculated, respectively.

다른 구체적인 예를 들면, 빔 폭에 관한 정보를 안테나 수에 기초한 형태의 정보로 설정하고, 기존 빔 폭에 해당하는 안테나 수가 N인 경우, *1/2 액션을 수행하면 0.5N, *1 액션을 수행하면 N, *2 액션을 수행하면 2N에 해당하는 빔 폭에 관한 빔 폭 액션 정보가 각각 산출될 수 있다.For another specific example, if information about the beam width is set to information based on the number of antennas, and the number of antennas corresponding to the existing beam width is N, performing the *1/2 action results in 0.5N, *1 action. When performing N and *2 actions, beam width action information regarding the beam width corresponding to 2N can be calculated, respectively.

학습부(1010)는, 상태 정보 및 액션 정보에 기초하여 방향 추정 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 상태 정보 중 빔 형성 벡터 정보 및 채널 벡터 정보와, 액션 정보 중 빔 방향 액션 정보에 기초하여 방향 추정 정보를 산출할 수 있다.The learning unit 1010 may calculate direction estimation information based on state information and action information. For example, the learning unit 1010 may calculate direction estimation information based on beam forming vector information and channel vector information among state information and beam direction action information among action information.

일 예로, 방향 추정 정보는, 상태 정보에 포함되는 채널 상태 정보에 기초하여 산출될 수 있다. 예를 들면, 채널 상태 정보에서, 수신 신호가 수신된 채널에 관한 정보를 추출하여, 추출된 정보를 각 방향 정보와 비교하는 방법으로 산출될 수 있다.As an example, direction estimation information may be calculated based on channel state information included in state information. For example, it can be calculated by extracting information about the channel through which the received signal was received from the channel state information and comparing the extracted information with each direction information.

일 예로, 학습부(1010)는, 상태 정보에 기초하여 방향 추정 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 빔 형성 벡터 정보 및 채널 벡터 정보에 기초하여 상기 방향 추정 정보를 산출할 수 있다. As an example, the learning unit 1010 may calculate direction estimation information based on state information. For example, the learning unit 1010 may calculate the direction estimation information based on beam forming vector information and channel vector information.

일 예로, 학습부(1010)는, 빔 형성 벡터 정보에 포함되는 도래각에 관한 정보, 채널 벡터 정보에 포함되는 단말과의 통신 채널에 관한 정보 등을 기초로 하여, 단말의 방향을 추정하는 방식으로 방향 추정 정보를 산출할 수 있다.As an example, the learning unit 1010 estimates the direction of the terminal based on information about the angle of arrival included in the beam forming vector information and information about the communication channel with the terminal included in the channel vector information. Direction estimation information can be calculated.

일 예로, 학습부(1010)는, 빔 형성 벡터 정보 및 채널 벡터 정보를 이용하여 노이즈 부분을 제외한 수신 신호의 크기를 계산하고, 계산 결과에 따라 수신 신호 크기가 가장 큰 방향에 기초하여 방향 액션 정보 및 방향 추정 정보를 산출할 수 있다.As an example, the learning unit 1010 calculates the size of the received signal excluding the noise portion using beam forming vector information and channel vector information, and provides direction action information based on the direction in which the received signal size is largest according to the calculation result. and direction estimation information can be calculated.

일 예로, 학습부(1010)는, 상태 정보에 수신 신호 세기 정보가 포함되어 있는 경우, 곧바로 수신 신호 세기 정보가 가장 큰 방향을 선택하고, 선택된 방향에 기초하여 방향 액션 정보 및 방향 추정 정보를 산출할 수 있다.As an example, if the status information includes received signal strength information, the learning unit 1010 immediately selects the direction in which the received signal strength information is greatest and calculates direction action information and direction estimation information based on the selected direction. can do.

일 예로, 학습부(1010)는, 액션 정보에 기초하여 방향 추정 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 방향 액션 정보에 기초하여 상기 방향 추정 정보를 산출할 수 있다.As an example, the learning unit 1010 may calculate direction estimation information based on action information. For example, the learning unit 1010 may calculate the direction estimation information based on direction action information.

일 예로, 학습부(1010)는, 방향 액션 정보에 포함되는 방향 액션들 중 하나를 선택하고, 선택된 방향 액션에 기초하여 결정되는 방향을 통신 빔을 형성해야 할 방향으로 추정하여 방향 추정 정보를 산출할 수 있다.As an example, the learning unit 1010 selects one of the direction actions included in the direction action information, estimates the direction determined based on the selected direction action as the direction in which a communication beam should be formed, and calculates direction estimation information. can do.

여기서, 방향 액션 정보는, 기지국(1000)이 신호를 송수신할 수 있는 적어도 하나 이상의 방향에 관한 정보를 포함할 수 있다. 방향 액션 정보는, 기지국(1000)이 신호를 송수신할 수 있는 각 방향에 관한 정보가 미리 설정된 형태일 수도 있고, 기존에 형성된 통신 빔의 방향에 대해 적용할 수 있는 일정한 크기의 방향 간격의 형태로 설정될 수도 있다.Here, the direction action information may include information about at least one direction in which the base station 1000 can transmit and receive signals. Directional action information may be in the form of preset information about each direction in which the base station 1000 can transmit and receive signals, or in the form of a directional interval of a certain size applicable to the direction of an existing communication beam. It can also be set.

학습부(1010)는, 상태 정보 및 액션 정보에 기초하여 빔 폭 제어 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 상태 정보 중 안테나 수 정보와, 액션 정보 중 빔 폭 액션 정보에 기초하여 빔 폭 제어 정보를 산출할 수 있다.The learning unit 1010 may calculate beam width control information based on state information and action information. For example, the learning unit 1010 may calculate beam width control information based on antenna number information among state information and beam width action information among action information.

경우에 따라, 학습부(1010)는, 안테나 수 정보 및 빔 폭 액션 정보 외에도 빔 형성 벡터 정보, 채널 벡터 정보 및 빔 방향 액션 정보 중 적어도 하나를 더 이용하여 빔 폭 제어 정보를 산출할 수 있다. 그리고 이러한 빔 폭 제어 정보의 산출을 통해 빔 형성 제어의 효율을 향상시킬 수 있다.In some cases, the learning unit 1010 may calculate beam width control information using at least one of beam forming vector information, channel vector information, and beam direction action information in addition to antenna number information and beam width action information. And the efficiency of beam forming control can be improved through calculation of this beam width control information.

일 예로, 학습부(1010)는, 안테나 수 정보에 기초하여 빔 폭 제어 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 안테나 수 정보에 기초하여 제 1 상태에서 형성된 빔의 빔 폭을 산출하고, 제 2 상태에서의 상태 정보 및 빔 폭 액션 정보에 기초하여 제 2 상태에서 형성할 통신 빔의 빔 폭을 결정하며, 결정된 빔 폭에 따라 빔 폭 제어 정보를 산출할 수 있다.As an example, the learning unit 1010 may calculate beam width control information based on information on the number of antennas. For example, the learning unit 1010 calculates the beam width of a beam formed in the first state based on information on the number of antennas, and calculates the beam width of the beam formed in the second state based on state information and beam width action information in the second state. The beam width of the communication beam to be used is determined, and beam width control information can be calculated according to the determined beam width.

경우에 따라, 학습부(1010)는, (방향 액션, 빔 폭 액션) 순서쌍에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다. 예를 들면, 학습부(1010)는, 방향 액션 정보에 기초한 방향 추정 정보 산출과, 빔 폭 액션 정보에 기초한 빔 폭 제어 정보 산출을 각각 수행할 수도 있고, 함께 수행할 수도 있다.In some cases, the learning unit 1010 may calculate direction estimation information and beam width control information based on ordered pairs (direction action, beam width action). For example, the learning unit 1010 may perform calculation of direction estimation information based on direction action information and calculation of beam width control information based on beam width action information separately or together.

구체적인 예를 들면, 학습부(1010)는, 방향 액션 정보 및 빔 폭 액션 정보를 함께 고려하여 (방향 액션, 빔 폭 액션) 순서쌍을 생성하고, 각각의 (방향 액션, 빔 폭 액션) 순서쌍 별로 빔을 형성하는 경우에 빔 방향 및 빔 폭을 각각 예측하여, 예측 결과에 따라 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다.For a specific example, the learning unit 1010 considers the direction action information and the beam width action information together to generate ordered pairs (direction action, beam width action), and beams each (direction action, beam width action) ordered pair. When forming, the beam direction and beam width can be predicted respectively, and direction estimation information and beam width control information can be calculated according to the prediction results.

일 예로, 학습부(1010)는, 상태 정보, 방향 액션 정보 및 빔 폭 액션 정보에 기초하여, 각 (방향 액션, 빔 폭 액션) 순서쌍 별로 각각의 기대 보상값을 산출하고, 산출된 기대 보상값 중 가장 큰 값을 가지는 것으로 판단되는 (방향 액션, 빔 폭 액션) 순서쌍에 해당하는 액션 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다.As an example, the learning unit 1010 calculates each expected compensation value for each (direction action, beam width action) ordered pair based on state information, direction action information, and beam width action information, and calculates the expected compensation value. Direction estimation information and beam width control information can be calculated based on action information corresponding to an ordered pair (direction action, beam width action) that is determined to have the largest value.

학습부(1010)는, 상태 정보 및 빔 제어 정보에 기초하여 학습 보상 정보를 산출할 수 있다.The learning unit 1010 may calculate learning compensation information based on state information and beam control information.

일 예로, 학습부(1010)는, 상태 정보 및 방향 추정 정보에 기초하여 학습 보상 정보를 산출할 수 있다. 경우에 따라, 학습부(1010)는, 상태 정보에 기초하여 방향 추정 정보를 산출하고, 산출된 방향 추정 정보를 이용하여 학습 보상 정보를 산출할 수 있다. 이 경우, 상태 정보 중 빔 형성 벡터 정보 및 채널 벡터 정보가 방향 추정 정보 및 학습 보상 정보의 산출에 이용될 수 있다.As an example, the learning unit 1010 may calculate learning compensation information based on state information and direction estimation information. In some cases, the learning unit 1010 may calculate direction estimation information based on state information and calculate learning compensation information using the calculated direction estimation information. In this case, beam forming vector information and channel vector information among the state information can be used to calculate direction estimation information and learning compensation information.

일 예로, 학습부(1010)는, 상태 정보 및 빔 폭 제어 정보에 기초하여 학습 보상 정보를 산출할 수 있다. 경우에 따라, 학습부(1010)는, 상태 정보에 기초하여 빔 폭 제어 정보를 산출하고, 산출된 빔 폭 제어 정보를 이용하여 학습 보상 정보를 산출할 수 있다. 이 경우, 상태 정보 중 안테나 수 정보가 빔 폭 제어 정보 및 학습 보상 정보의 산출에 이용될 수 있다.As an example, the learning unit 1010 may calculate learning compensation information based on state information and beam width control information. In some cases, the learning unit 1010 may calculate beam width control information based on state information and calculate learning compensation information using the calculated beam width control information. In this case, antenna number information among the state information can be used to calculate beam width control information and learning compensation information.

일 예로, 학습 보상 정보는, 수신 신호 세기 정보에 기초하여 산출될 수 있다. 일 예로, 학습부(1010)는, 수신 신호 세기 정보의 크기를 증가시키는 방향으로 강화 학습이 이루어지도록 학습 보상 정보를 산출할 수 있다.As an example, learning compensation information may be calculated based on received signal strength information. As an example, the learning unit 1010 may calculate learning compensation information so that reinforcement learning is performed in the direction of increasing the size of the received signal strength information.

예를 들어, 학습부(1010)는, 수신 신호 세기 정보가 미리 설정된 상한 임계값 이상인 경우는 학습 보상 정보를 2로 산출하고, 수신 신호 세기 정보가 상한 임계값 미만이고 미리 설정된 하한 임계값 이상인 경우는 학습 보상 정보를 1로 산출하며, 수신 신호 세기 정보가 하한 임계값 미만인 경우는 학습 보상 정보를 0으로 산출할 수 있다.For example, the learning unit 1010 calculates the learning compensation information as 2 when the received signal strength information is greater than the preset upper limit threshold, and when the received signal strength information is less than the upper limit threshold and greater than the preset lower limit threshold. Calculates the learning compensation information as 1, and when the received signal strength information is less than the lower limit threshold, the learning compensation information can be calculated as 0.

위와 같은 방법으로 학습 보상 정보를 산출하는 경우, 강화 학습을 통하여 수신 신호의 크기를 더욱 더 크게 하는 방향으로 학습을 수행할 수 있다. 그리고 이를 통해, 강화 학습에 기반하여 수행되는 빔 추적의 속도 및 정확도가 향상될 수 있고, 빔 형성에 있어 오버헤드를 감소시킬 수 있다..When calculating learning reward information in the above manner, learning can be performed to further increase the size of the received signal through reinforcement learning. And through this, the speed and accuracy of beam tracking performed based on reinforcement learning can be improved, and the overhead in beam forming can be reduced.

그리고, 학습 보상 정보는, 강화 학습을 수행하는 구체적인 방법에 따라 다르게 수행될 수도 있다. 예를 들면, Q 학습 기반의 강화 학습, 심층 강화 학습, DQN을 이용한 심층 강화 학습의 경우, 각각 학습 보상 정보가 서로 다른 방법으로 산출될 수 있다.Additionally, learning compensation information may be performed differently depending on the specific method of performing reinforcement learning. For example, in the case of Q learning-based reinforcement learning, deep reinforcement learning, and deep reinforcement learning using DQN, learning reward information can be calculated in different ways.

일 예로, Q 학습을 이용하여 강화 학습을 수행하는 경우, 학습부(1010)는, Q 함수에 기초하여, 특정 상태 정보에서 선택 가능한 적어도 하나 이상의 방향 정보 중에서 Q 값을 최대화 할 수 있는 방향 액션 정보 및 빔 폭 액션 정보를 선택하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다. 그런 다음, Q 함수에 기초하여, 상태 정보 및 그에 기초하여 산출된 방향 추정 정보 및 빔 폭 제어 정보를 이용하여 학습 보상 정보를 산출할 수 있다.As an example, when performing reinforcement learning using Q learning, the learning unit 1010 provides direction action information that can maximize the Q value among at least one direction information selectable from specific state information based on the Q function. And by selecting beam width action information, direction estimation information and beam width control information can be calculated. Then, based on the Q function, learning compensation information can be calculated using the state information and the direction estimation information and beam width control information calculated based thereon.

다른 예로, 심층 강화 학습을 수행하는 경우, 학습부(1010)는, 심층 신경망을 이용하여, 특정 상태 정보에서 선택 가능한 적어도 하나 이상의 방향 정보 중에서 기대 보상값을 최대화 할 수 있는 방향 액션 정보 및 빔 폭 액션 정보를 선택하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다. 그런 다음, 심층 신경망에 기초하여, 상태 정보 및 그에 기초하여 산출된 방향 추정 정보 및 빔 폭 제어 정보를 이용하여 학습 보상 정보를 산출할 수 있다.As another example, when performing deep reinforcement learning, the learning unit 1010 uses a deep neural network to provide direction action information and beam width that can maximize the expected reward value among at least one direction information selectable from specific state information. Direction estimation information and beam width control information can be calculated by selecting action information. Then, based on the deep neural network, learning compensation information can be calculated using the state information and the direction estimation information and beam width control information calculated based thereon.

다른 예로, DQN을 이용하여 심층 강화 학습을 수행하는 경우, 학습부(1010)는, 심층 신경망을 이용하는 Q 함수에 기초하여 특정 상태 정보에서 선택 가능한 적어도 하나 이상의 방향 정보 중에서 Q 값을 최대화 할 수 있는 방향 액션 정보 및 빔 폭 액션 정보를 선택하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다. 그런 다음, 심층 신경망을 이용하는 Q 함수에 기초하여, 상태 정보 및 그에 기초하여 산출된 방향 추정 정보 및 빔 폭 제어 정보를 이용하여 학습 보상 정보를 산출할 수 있다.As another example, when performing deep reinforcement learning using DQN, the learning unit 1010 can maximize the Q value among at least one direction information selectable from specific state information based on a Q function using a deep neural network. Direction estimation information and beam width control information can be calculated by selecting direction action information and beam width action information. Then, based on the Q function using a deep neural network, learning compensation information can be calculated using the state information and the direction estimation information and beam width control information calculated based on the state information.

한편, 학습부(1010)는, 단말의 제 1 상태에 관한 제 1 상태 정보 및 단말의 제 2 상태에 관한 제 2 상태 정보를 포함할 수 있고, 제 1 상태 정보에 기초하여 학습을 수행하고, 학습 결과에 기초하여 제 2 상태를 관측하여, 학습을 지속적으로 수행하는 구성을 포함할 수 있다.Meanwhile, the learning unit 1010 may include first state information about the first state of the terminal and second state information about the second state of the terminal, and performs learning based on the first state information, It may include a configuration that continuously performs learning by observing the second state based on the learning result.

이러한 구성을 강화 학습 관점에서 설명하면, 제 1 상태에서 수행된 강화 학습 결과에 기초하여 제 2 상태 관측 및 제 2 상태에서의 강화 학습이 수행되는 과정이 반복적으로 이루어질 수 있다.If this configuration is explained from a reinforcement learning perspective, the process of observing the second state and performing reinforcement learning in the second state based on the results of reinforcement learning performed in the first state can be repeated.

구체적으로, 관측이 이루어진 제 1 상태에서 가능한 행동 중에서 기대 보상값이 최대인 행동을 선택할 수 있고, 선택된 행동을 수행할 수 있다.Specifically, the action with the maximum expected reward value can be selected among the possible actions in the first state in which the observation was made, and the selected action can be performed.

그런 다음, 제 1 상태 및 제 1 상태에서 수행된 행동에 기초하여 보상을 산출할 수 있다. 또한, 제 1 상태 및 제 1 상태에서 수행된 행동에 기초하여, 제 2 상태를 관측할 수 있다.A reward can then be calculated based on the first state and the actions performed in the first state. Additionally, based on the first state and actions performed in the first state, the second state may be observed.

이후, 제 2 상태에서 새로운 행동을 선택하여 수행할 수 있고, 제 2 상태에서 수행된 행동에 기초하여 새로운 보상을 산출할 수 있다.Afterwards, a new action can be selected and performed in the second state, and a new reward can be calculated based on the action performed in the second state.

일 예로, 상태 정보는, 단말의 제 1 상태에 관한 제 1 상태 정보 및 단말의 제 2 상태에 관한 제 2 상태 정보를 포함할 수 있다. 이 경우, 제 1 상태 정보는 제 1 수신 신호에 기초하여 산출될 수 있고, 제 2 상태 정보는 제 2 수신 신호에 기초하여 산출될 수 있다.As an example, the status information may include first status information regarding the first status of the terminal and second status information regarding the second status of the terminal. In this case, the first state information may be calculated based on the first received signal, and the second state information may be calculated based on the second received signal.

학습부(1010)는, 제 1 상태 정보에 기초하여 각각 적어도 하나 이상의 방향 액션 정보 및 빔 폭 액션 정보 중에서 기대 보상값이 최대인 방향 액션 정보 및 빔 폭 액션 정보를 선택할 수 있고, 선택된 방향 액션 정보 및 빔 폭 액션 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다. 그리고 이를 이용하여 단말의 제 1 상태에 관한 빔 추적 및 빔 폭 제어가 수행될 수 있다.The learning unit 1010 may select direction action information and beam width action information with the maximum expected compensation value from among at least one direction action information and beam width action information, respectively, based on the first state information, and select direction action information and beam width action information that have the maximum expected compensation value. And direction estimation information and beam width control information can be calculated based on the beam width action information. And using this, beam tracking and beam width control regarding the first state of the terminal can be performed.

그런 다음, 학습부(1010)는, 제 1 상태 정보 및 제 1 상태에서 산출된 방향 추정 정보 및 빔 폭 제어 정보에 기초하여 학습 보상 정보를 산출할 수 있다. 또한, 제 1 상태 정보 및 제 1 상태에서 산출된 방향 추정 정보 및 빔 폭 제어 정보에 기초하여 단말의 제 2 상태에 관한 제 2 상태 정보가 산출될 수 있다.Then, the learning unit 1010 may calculate learning compensation information based on the first state information and the direction estimation information and beam width control information calculated in the first state. Additionally, second state information regarding the second state of the terminal may be calculated based on the first state information and the direction estimation information and beam width control information calculated in the first state.

이후, 학습부(1010)는, 제 2 상태 정보가 산출된 경우, 제 2 상태 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 업데이트할 수 있고, 제 2 상태 정보 및 업데이트된 방향 추정 정보에 기초하여 학습 보상 정보를 업데이트하는 과정을 반복하여 수행할 수 있다.Thereafter, when the second state information is calculated, the learning unit 1010 may update the direction estimation information and the beam width control information based on the second state information, and may update the direction estimation information and the beam width control information based on the second state information and the updated direction estimation information. Based on this, the process of updating learning reward information can be repeatedly performed.

그리고, 이러한 강화 학습은, 도 9에서 설명한 바와 같이, Q 학습, 심층 강화 학습 및 DQN 중 적어도 하나의 방법을 선택하여 수행하는 것을 포함할 수 있다.And, as described in FIG. 9, such reinforcement learning may include selecting and performing at least one method among Q learning, deep reinforcement learning, and DQN.

제어부(1020)는, 기지국(1000)으로부터 일정한 빔 방향 및 빔 폭을 갖는 통신 빔이 형성되도록 제어할 수 있다. 예를 들면, 학습부(1010)에서 산출되는 빔 제어 정보에 기초하여 단말에 대한 통신 빔을 형성하는 빔 형성 제어를 수행할 수 있다.The control unit 1020 can control the base station 1000 to form a communication beam with a constant beam direction and beam width. For example, beam forming control to form a communication beam for the terminal can be performed based on beam control information calculated by the learning unit 1010.

이 경우, 제어부(1020)에서 빔 형성 제어를 수행하는 구성은, 빔을 형성하는 구성과 관련된 공지기술이라면 모두 이용될 수 있다.In this case, any known technology related to the beam forming configuration can be used as a configuration for performing beam forming control in the control unit 1020.

일 예로, 제어부(1020)는, 방향 추정 정보에 기초하여 결정되는 빔 방향 및 빔 폭 제어 정보에 기초하여 결정되는 빔 폭으로 통신 빔이 형성되도록 빔 형성 제어를 수행할 수 있다.As an example, the control unit 1020 may perform beam forming control so that a communication beam is formed with a beam direction determined based on direction estimation information and a beam width determined based on beam width control information.

도 11은 일 실시예에 따른 빔 제어 정보를 산출하는 구성을 예시적으로 설명하기 위한 도면이다.FIG. 11 is a diagram for exemplarily explaining a configuration for calculating beam control information according to an embodiment.

도 11을 참조하면, 일 실시예에 따른 기지국(1000)은, 상태 정보(1100), 방향 액션 정보(1110) 및 빔 폭 액션 정보(1120)에 기초하여 빔 제어 정보(1130)를 산출할 수 있다.Referring to FIG. 11, the base station 1000 according to an embodiment may calculate beam control information 1130 based on status information 1100, direction action information 1110, and beam width action information 1120. there is.

일 예로, 상태 정보(1100)는, 단말로부터 수신되는 수신 신호에 포함되는 빔 형성 벡터 정보 및 채널 벡터 정보와, 빔 형성에 이용된 안테나 수에 관한 안테나 수 정보를 포함할 수 있다.As an example, the status information 1100 may include beam forming vector information and channel vector information included in the received signal received from the terminal, and antenna number information regarding the number of antennas used for beam forming.

일 예로, 방향 액션 정보(1110)는, 상태 정보에 기초하여 판단되는 수신 신호 및 빔의 상태에 기초하여, 본 개시에 따라 형성된 통신 빔의 방향에 관하여 취할 수 있는 방향 액션에 관한 제 1 방향 액션, 제 2 방향 액션 및 제 3 방향 액션에 관한 정보를 포함할 수 있다.As an example, the direction action information 1110 is a first direction action regarding a direction action that can be taken with respect to the direction of a communication beam formed according to the present disclosure, based on the status of the received signal and the beam determined based on the state information. , may include information about second-direction actions and third-direction actions.

일 예로, 빔 폭 액션 정보(1120)는, 상태 정보에 기초하여 판단되는 수신 신호 및 빔의 상태에 기초하여, 본 개시에 따라 형성된 통신 빔의 빔 폭에 관하여 취할 수 있는 빔 폭 액션에 관한 제 1 빔 폭 액션, 제 2 빔 폭 액션 및 제 3 빔 폭 액션에 관한 정보를 포함할 수 있다.As an example, the beam width action information 1120 provides information regarding a beam width action that can be taken with respect to the beam width of a communication beam formed according to the present disclosure, based on the status of the received signal and the beam determined based on the status information. It may include information about the first beam width action, the second beam width action, and the third beam width action.

일 예로, 빔 제어 정보(1130)는, 단말이 위치하는 방향을 추정하기 위한 방향 추정 정보 및 통신 빔의 빔 폭을 제어하기 위한 빔 폭 제어 정보를 포함할 수 있다.As an example, the beam control information 1130 may include direction estimation information for estimating the direction in which the terminal is located and beam width control information for controlling the beam width of the communication beam.

일 예로, 빔 제어 정보(1130)의 산출은, 방향 추정 정보의 산출 및 빔 폭 제어 정보의 산출을 포함할 수 있다. 이 경우, 방향 추정 정보의 산출은 상태 정보 및 방향 액션 정보에 기초하여 이루어질 수 있다. 그리고 빔 폭 제어 정보의 산출은 상태 정보 및 빔 폭 액션 정보에 기초하여 이루어질 수 있다.As an example, calculation of beam control information 1130 may include calculation of direction estimation information and calculation of beam width control information. In this case, direction estimation information may be calculated based on state information and direction action information. And calculation of beam width control information can be made based on status information and beam width action information.

일 예로, 기지국(1000)은, 상태 정보(1100)에 포함되는 빔 형성 벡터 정보 및 채널 벡터 정보에 기초하여 수신 신호 및 이미 형성된 빔의 상태를 판단할 수 있다. 그리고, 판단된 상태에 기초하여 방향 액션 정보(1110)에 포함되는 제 1 방향 액션, 제 2 방향 액션 및 제 3 방향 액션 중 기대 보상값이 가장 큰 방향 액션을 선택한 결과에 기초하여 방향 추정 정보를 산출할 수 있다.As an example, the base station 1000 may determine the status of a received signal and an already formed beam based on beam forming vector information and channel vector information included in the status information 1100. And, based on the determined state, direction estimation information is provided based on the result of selecting the direction action with the largest expected compensation value among the first direction action, second direction action, and third direction action included in the direction action information 1110. It can be calculated.

예를 들어, 방향 액션 정보(1110)에 기초하여 방향 액션 기대 보상값을 산출하는 경우, 제 1 방향 액션에 기초하여 산출되는 제 1 방향 액션 기대 보상값은 3.0, 제 2 방향 액션에 기초하여 산출되는 제 2 방향 액션 기대 보상값은 4.0, 제 3 방향 액션에 기초하여 산출되는 제 3 방향 액션 기대 보상값은 5.0으로 각각 산출될 수 있다. 이 경우, 산출된 3개의 방향 액션 기대 보상값 중 가장 큰 5.0의 값을 가지는 제 3 방향 액션 기대 보상값에 기초하여, 방향 액션 정보(1110)에서 제 3 방향 액션에 관한 정보가 선택될 수 있다.For example, when the directional action expected compensation value is calculated based on the directional action information 1110, the first direction action expected reward value calculated based on the first direction action is 3.0, and the first direction action expected reward value calculated based on the second direction action is 3.0. The expected compensation value for the second direction action may be calculated as 4.0, and the expected compensation value for the third direction action calculated based on the third direction action may be calculated as 5.0. In this case, information about the third direction action may be selected from the direction action information 1110 based on the third direction action expected reward value having the largest value of 5.0 among the calculated three direction action expected reward values. .

일 예로, 기지국(1000)은, 상태 정보(1100)에 포함되는 안테나 수 정보에 기초하여 수신 신호 및 이미 형성된 빔의 상태를 판단할 수 있다. 그리고, 판단된 상태에 기초하여 빔 폭 액션 정보(1120)에 포함되는 제 1 빔 폭 액션, 제 2 빔 폭 액션 및 제 3 빔 폭 액션 중 기대 보상값이 가장 큰 빔 폭 액션을 선택한 결과에 기초하여 빔 폭 제어 정보를 산출할 수 있다.As an example, the base station 1000 may determine the status of a received signal and an already formed beam based on information on the number of antennas included in the status information 1100. And, based on the result of selecting the beam width action with the largest expected compensation value among the first beam width action, second beam width action, and third beam width action included in the beam width action information 1120 based on the determined state. Thus, beam width control information can be calculated.

예를 들어, 빔 폭 액션 정보(1120)에 기초하여 빔 폭 액션 기대 보상값을 산출하는 경우, 제 1 빔 폭 액션에 기초하여 산출되는 제 1 빔 폭 액션 기대 보상값은 1.5, 제 2 빔 폭 액션에 기초하여 산출되는 제 2 빔 폭 액션 기대 보상값은 3.0, 제 3 빔 폭 액션에 기초하여 산출되는 제 3 빔 폭 액션 기대 보상값은 6.0으로 각각 산출될 수 있다. 이 경우, 산출된 3개의 빔 폭 액션 기대 보상값 중 가장 큰 6.0의 값을 가지는 제 3 빔 폭 액션 기대 보상값에 기초하여, 빔 폭 액션 정보(1120)에서 제 3 빔 폭 액션에 관한 정보가 선택될 수 있다For example, when calculating the beam width action expected compensation value based on the beam width action information 1120, the first beam width action expected compensation value calculated based on the first beam width action is 1.5, the second beam width action is 1.5, and the second beam width action expected compensation value is 1.5. The second beam width action expected compensation value calculated based on the action may be calculated as 3.0, and the third beam width action expected compensation value calculated based on the third beam width action may be calculated as 6.0. In this case, information about the third beam width action is provided in the beam width action information 1120 based on the third beam width action expected compensation value having the largest value of 6.0 among the three calculated beam width action expected compensation values. can be chosen

다른 예로, 기지국(1000)은, 상태 정보(1100), 방향 액션 정보(1110) 및 빔 폭 액션 정보(1120)에 기초하여, 각 (방향 액션, 빔 폭 액션) 순서쌍 별로 각각의 기대 보상값을 산출하고, 산출된 기대 보상값 중 가장 큰 값을 가지는 것으로 판단되는 (방향 액션, 빔 폭 액션) 순서쌍에 해당하는 액션 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출할 수 있다.As another example, the base station 1000 determines each expected compensation value for each (direction action, beam width action) ordered pair based on the status information 1100, direction action information 1110, and beam width action information 1120. Direction estimation information and beam width control information can be calculated based on action information corresponding to an ordered pair (direction action, beam width action) that is determined to have the largest value among the calculated expected compensation values.

예를 들어, 방향 액션 정보(1110) 및 빔 폭 액션 정보(1120)에 기초하여 기대 보상값을 산출하는 경우, (방향 액션, 빔 폭 액션) 순서쌍은 (제 1 방향, 제 1 빔 폭) 액션, (제 1 방향, 제 2 빔 폭) 액션, (제 1 방향, 제 3 빔 폭) 액션, (제 2 방향, 제 1 빔 폭) 액션, (제 2 방향, 제 2 빔 폭) 액션, (제 2 방향, 제 3 빔 폭) 액션, (제 3 방향, 제 1 빔 폭) 액션, (제 3 방향, 제 2 빔 폭) 액션 및 (제 3 방향, 제 3 빔 폭) 액션에 해당하는 9개의 액션이 고려될 수 있다.For example, when calculating the expected compensation value based on the direction action information 1110 and the beam width action information 1120, the ordered pair (direction action, beam width action) is the (first direction, first beam width) action , (first direction, second beam width) action, (first direction, third beam width) action, (second direction, first beam width) action, (second direction, second beam width) action, ( 9 corresponding to (2nd direction, 3rd beam width) action, (3rd direction, 1st beam width) action, (3rd direction, 2nd beam width) action and (3rd direction, 3rd beam width) action Dog actions can be considered.

이 때, (제 1 방향, 제 1 빔 폭) 액션 기대 보상값이 6.5, (제 1 방향, 제 2 빔 폭) 액션 기대 보상값이 8.0, (제 1 방향, 제 3 빔 폭) 액션 기대 보상값이 11.0, (제 2 방향, 제 1 빔 폭) 액션 기대 보상값이 5.5, (제 2 방향, 제 2 빔 폭) 액션 기대 보상값이 7.0, (제 2 방향, 제 3 빔 폭) 액션 기대 보상값이 10.0, (제 3 방향, 제 1 빔 폭) 액션 기대 보상값이 4.5, (제 3 방향, 제 2 빔 폭) 액션 기대 보상값이 6.0, (제 3 방향, 제 3 빔 폭) 액션 기대 보상값이 9.0으로 산출되는 경우, 산출된 9개의 기대 보상값 중 가장 큰 11.0의 값을 가지는 (제 1 방향, 제 3 빔 폭) 액션 기대 보상값에 기초하여, 방향 액션 정보(1110) 및 빔 폭 액션 정보(1120)에서 (제 1 방향, 제 3 빔 폭) 액션에 관한 정보가 선택될 수 있다.At this time, the (first direction, 1st beam width) action expected reward value is 6.5, (1st direction, 2nd beam width) action expected reward value is 8.0, and (1st direction, 3rd beam width) action expected reward value is 6.5. The value is 11.0, (second direction, first beam width) action expected reward value is 5.5, (second direction, second beam width) action expected reward value is 7.0, (second direction, third beam width) action expected reward value is 5.5 Compensation value is 10.0, (third direction, 1st beam width) action expected reward value is 4.5, (3rd direction, 2nd beam width) action expected reward value is 6.0, (3rd direction, 3rd beam width) action When the expected compensation value is calculated as 9.0, based on the action expected compensation value (first direction, third beam width) with the largest value of 11.0 among the nine calculated expected compensation values, direction action information 1110 and Information about the (first direction, third beam width) action may be selected from the beam width action information 1120.

도 12는 일 실시예에 따른 강화 학습 기반의 빔 형성 제어를 예시적으로 설명하기 위한 도면이다.FIG. 12 is a diagram illustrating reinforcement learning-based beam forming control according to an embodiment.

도 12를 참조하면, 일 실시예에 따른 강화 학습 기반의 통신 빔 추적은, 시간 경과에 따른 단말의 각 상태에 기초하여 수행될 수 있다. 이 경우, 기지국의 상태는 기지국 제 1 상태(1210), 기지국 제 2 상태(1220), 기지국 제 3 상태(1230), 기지국 제 4 상태(1240) 및 기지국 제 5 상태(125)를 포함할 수 있고, 단말의 상태는 단말 제 1 상태(1260), 단말 제 2 상태(1270)를 포함할 수 있다.Referring to FIG. 12, reinforcement learning-based communication beam tracking according to an embodiment may be performed based on each state of the terminal over time. In this case, the state of the base station may include the base station first state 1210, the base station second state 1220, the base station third state 1230, the base station fourth state 1240, and the base station fifth state 125. The state of the terminal may include a first terminal state 1260 and a second terminal state 1270.

그리고, 이하에서 설명할 제 1 신호는 k-1 번째 타임 슬롯의 2번째 수신 신호를, 제 2 신호는 k 번째 타임 슬롯의 1번째 수신 신호를, 제 3 신호는 k 번째 타임 슬롯의 2번째 수신 신호를 각각 나타낼 수 있다.And, the first signal, which will be described below, is the 2nd reception signal of the k-1th time slot, the second signal is the 1st reception signal of the kth time slot, and the third signal is the 2nd reception of the kth time slot. Each signal can be represented.

기지국 제 1 상태(1210)는, 단말 제 1 상태(1260)에 기초하여 빔 형성이 수행된 상태를 나타낼 수 있다. 이 경우, 단말 제 1 상태(1260)에서는 제 1 신호가 송신될 수 있고, 단말에서 송신되는 제 1 신호는 기지국에 수신될 수 있다.The base station first state 1210 may indicate a state in which beam forming is performed based on the terminal first state 1260. In this case, in the terminal first state 1260, the first signal may be transmitted, and the first signal transmitted from the terminal may be received by the base station.

기지국 제 2 상태(1220)는, 단말 제 1 상태(1260)에서 단말 제 2 상태(1270)로 변화하면서 단말의 이동이 이루어진 상태를 나타낼 수 있다. 이 경우, 단말 제 2 상태(1260)에서 송신된 제 2 신호는 기지국에 수신되었으나, 제 2 신호에 기초한 빔 형성은 아직 이루어지지 않은 경우를 포함할 수 있다.The base station second state 1220 may represent a state in which the terminal has moved while changing from the terminal first state 1260 to the terminal second state 1270. In this case, the second signal transmitted in the terminal second state 1260 may be received by the base station, but beam forming based on the second signal has not yet been performed.

기지국 제 3 상태(1230)는, 단말 제 2 상태(1270)에서 송신된 제 2 신호가 기지국에서 수신되고, 제 2 신호에 기초하여 강화 학습 기반의 빔 추적이 이루어진 상태를 나타낼 수 있다. 그리고, 이러한 빔 추적 결과에 기초하여 단말 제 2 상태(1270)에 해당하는 방향에 빔 형성 제어가 이루어질 수 있다.The base station third state 1230 may represent a state in which the second signal transmitted in the terminal second state 1270 is received by the base station and reinforcement learning-based beam tracking is performed based on the second signal. And, based on these beam tracking results, beam forming control may be performed in the direction corresponding to the terminal second state 1270.

이 경우, 기지국은 제 2 신호에 기초하여 제 1 상태 정보를 산출하고, 제 1 상태 정보에 기초하여 선택 가능한 적어도 하나 이상의 방향 액션 정보 중 기대 보상값이 최대인 방향 액션 정보를 선택하여 방향 추정 정보를 산출할 수 있다. 그런 다음, 산출된 방향 추정 정보에 기초하여 빔 형성을 수행할 수 있다.In this case, the base station calculates first state information based on the second signal, selects direction action information with the maximum expected compensation value among at least one direction action information selectable based on the first state information, and provides direction estimation information. can be calculated. Then, beam forming can be performed based on the calculated direction estimation information.

예를 들어, 방향 액션 정보가 단말 제 1 상태(1260)에서의 빔 형성 방향을 기준으로, 미리 설정된 방향 간격 Δθ을 이용하여 제 1 방향 액션을 (-Δθ), 제 2 방향 액션을 (+0), 제 3 방향 액션을 (+Δθ)으로 설정된 상황을 상정할 수 있다.For example, the direction action information is based on the beam forming direction in the terminal's first state 1260, and the first direction action is (-Δθ) and the second direction action is (+0) using a preset direction interval Δθ. ), it can be assumed that the third direction action is set to (+Δθ).

이 경우, 제 1 방향 액션 정보에 따른 제 1 방향(1232)의 기대 보상값이 3.0, 제 2 방향 액션 정보에 따른 제 2 방향(1234)의 기대 보상값이 4.0, 제 3 방향 액션 정보에 따른 제 3 방향(1236)의 기대 보상값이 5.0으로 각각 산출되면, 기지국은 기대 보상값이 최대인 제 3 방향 액션 정보에 따라 +Δθ 액션을 수행한 결과인 제 3 방향(1236)을 방향 추정 정보로 산출하고, 제 3 방향으로 통신 빔이 형성되도록 빔 형성 제어를 수행할 수 있다.In this case, the expected reward value in the first direction 1232 according to the first direction action information is 3.0, the expected reward value in the second direction 1234 according to the second direction action information is 4.0, and the expected reward value in the second direction 1234 according to the third direction action information is 3.0. When the expected compensation value of the third direction 1236 is calculated to be 5.0, the base station uses the third direction 1236, which is the result of performing a +Δθ action according to the third direction action information with the maximum expected compensation value, as direction estimation information. , and beam forming control can be performed so that a communication beam is formed in the third direction.

기지국 제 4 상태(1240)는, 단말 제 2 상태(1270)에서 송신된 제 2 신호가 기지국에서 수신되고, 제 2 신호에 기초하여 강화 학습 기반의 빔 폭 제어가 이루어진 상태를 나타낼 수 있다. 그리고, 이러한 빔 폭 제어 결과에 기초하여 빔 형성 제어가 이루어질 수 있다.The base station fourth state 1240 may represent a state in which the second signal transmitted in the terminal second state 1270 is received by the base station and reinforcement learning-based beam width control is performed based on the second signal. And, beam forming control can be performed based on these beam width control results.

이 경우, 기지국은 제 2 신호에 기초하여 제 1 상태 정보를 산출하고, 제 1 상태 정보에 기초하여 선택 가능한 적어도 하나 이상의 빔 폭 액션 정보 중 기대 보상값이 최대인 빔 폭 액션 정보를 선택하여 빔 폭 제어 정보를 산출할 수 있다. 그런 다음, 산출된 빔 폭 제어 정보에 기초하여 빔 형성 제어를 수행할 수 있다.In this case, the base station calculates first state information based on the second signal, selects beam width action information with the maximum expected compensation value among at least one beam width action information selectable based on the first state information, and beam Width control information can be calculated. Then, beam forming control can be performed based on the calculated beam width control information.

예를 들어, 빔 폭 액션 정보가 단말 제 1 상태(1260)에서 형성된 빔의 빔 폭을 기준으로, 미리 설정된 빔 폭 게인 을 이용하여 제 1 빔 폭 액션을 (*), 제 2 빔 폭 액션을 (*0), 제 3 빔 폭 액션을 (*)으로 설정된 상황을 상정할 수 있다.For example, the beam width action information is a preset beam width gain based on the beam width of the beam formed in the terminal first state 1260. Use the first beam width action (* ), the second beam width action (*0), the third beam width action (* ) can be assumed.

이 경우, 제 1 빔 폭 액션 정보에 따른 제 1 빔 폭(1242)의 기대 보상값이 1.5, 제 2 빔 폭 액션 정보에 따른 제 2 빔 폭(1244)의 기대 보상값이 3.0, 제 3 빔 폭 액션 정보에 따른 제 3 빔 폭(1246)의 기대 보상값이 6.0으로 각각 산출되면, 기지국은 기대 보상값이 최대인 제 3 빔 폭 액션 정보에 따라 +Δθ 액션을 수행한 결과인 제 3 빔 폭(1236)을 빔 폭 제어 정보로 산출하고, 제 3 빔 폭으로 통신 빔이 형성되도록 빔 형성 제어를 수행할 수 있다.In this case, the expected compensation value of the first beam width 1242 according to the first beam width action information is 1.5, the expected compensation value of the second beam width 1244 according to the second beam width action information is 3.0, and the third beam If the expected compensation value of the third beam width 1246 according to the width action information is calculated as 6.0, the base station receives the third beam that is the result of performing a +Δθ action according to the third beam width action information with the maximum expected compensation value. The width 1236 can be calculated as beam width control information, and beam forming control can be performed to form a communication beam with the third beam width.

그리고 경우에 따라, 기지국 제 3 상태(1230)에서의 방향 액션 및 기지국 제 4 상태(1240)에서의 빔 폭 액션, 그에 따른 기대 보상값의 산출은 함께 수행될 수도 있다. 이 경우, (방향 액션, 빔 폭 액션)의 순서쌍 형태를 기초로 하여 관련된 각 정보의 산출이 수행될 수 있다.And in some cases, the direction action in the third state 1230 of the base station, the beam width action in the fourth state 1240 of the base station, and the calculation of the resulting expected compensation value may be performed together. In this case, calculation of each related information can be performed based on the ordered pair form (direction action, beam width action).

기지국 제 5 상태(1250)는, 단말 제 2 상태(1270)에 기초하여 빔 형성이 수행된 상태를 나타낼 수 있다. 이 경우, 단말로부터 제 3 신호가 송신될 수 있고, 기지국은 이를 수신하여 이후 빔 형성 제어를 계속 수행할 수 있다.The base station fifth state 1250 may represent a state in which beam forming is performed based on the terminal second state 1270. In this case, a third signal may be transmitted from the terminal, and the base station may receive it and continue to perform beam forming control thereafter.

아래에서는 본 개시에 따른 기지국을 방법 관점에서 다시 한번 간략히 설명한다. 위에서 설명한 내용의 중복되는 내용은 필요에 따라 생략하나, 아래 방법 관점에서도 모두 적용될 수 있다.Below, the base station according to the present disclosure will be briefly described again from a method perspective. Redundant content of the content described above will be omitted as needed, but all can be applied from the perspective of the method below.

도 13은 본 개시에 따른 기지국이 무선 통신을 수행하는 방법을 설명하기 위한 순서도이다.Figure 13 is a flowchart for explaining a method by which a base station performs wireless communication according to the present disclosure.

도 13을 참조하면, 본 개시에 따른 무선 통신 방법은, 적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상태 정보 및 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 강화 학습 단계(S1310)와, 빔 제어 정보에 기초하여 상기 단말에 대한 통신 빔을 형성하도록 제어하는 빔 형성 제어 단계(S1320)를 포함할 수 있다.Referring to FIG. 13, the wireless communication method according to the present disclosure calculates status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, and calculates status information based on the status information and preset action information. A reinforcement learning step (S1310) of calculating beam control information based on the beam control information and learning compensation information based on the status information and beam control information, and controlling the beam to form a communication beam for the terminal based on the beam control information. It may include a formation control step (S1320).

일 예로, 강화 학습 단계(S1310)에서는, Q 함수에 기초하여 Q 값을 산출하는 Q 학습 기반의 강화 학습을 이용하여 빔 추적 및 빔 폭 제어를 수행하는 것을 포함할 수 있다. 다른 예로, 심층 신경망을 이용하는 심층 강화 학습을 이용하여 빔 추적 및 빔 폭 제어를 수행할 수 있고, 또는 Q 함수에서의 Q 값 산출을 심층 신경망을 이용하여 수행하는 DQN 기반의 심층 강화 학습을 이용하여 빔 추적 및 빔 폭 제어를 수행할 수도 있다.For example, the reinforcement learning step (S1310) may include performing beam tracking and beam width control using Q learning-based reinforcement learning that calculates the Q value based on the Q function. As another example, beam tracking and beam width control can be performed using deep reinforcement learning using a deep neural network, or the Q value in the Q function can be calculated using DQN-based deep reinforcement learning using a deep neural network. Beam tracking and beam width control can also be performed.

빔 형성 제어 단계(S1320)에서는, 기지국에서 단말의 방향을 추정하고 빔 폭을 결정하여 통신 빔이 형성되도록 제어하는 것을 포함할 수 있다. 구체적으로, 강화 학습 단계(S1310)에서 산출된 방향 추정 정보에 기초하여 결정된 빔 방향 및 빔 폭 제어 정보에 기초하여 결정된 빔 폭으로 통신 빔이 형성되도록 제어하는 것을 포함할 수 있다.In the beam forming control step (S1320), the base station may estimate the direction of the terminal and determine the beam width to control the formation of a communication beam. Specifically, it may include controlling a communication beam to be formed with a beam width determined based on beam direction and beam width control information determined based on direction estimation information calculated in the reinforcement learning step (S1310).

도 14는 일 실시예에 따른 강화 학습 단계를 설명하기 위한 순서도이다.Figure 14 is a flowchart for explaining reinforcement learning steps according to an embodiment.

도 14를 참조하면, 일 실시예에 따른 강화 학습 단계(S1310)는, 상태 정보 산출 단계(S1410)와, 빔 제어 정보 산출 단계(S1420)와, 학습 보상 정보 산출 단계(S1430)를 포함할 수 있다.Referring to FIG. 14, the reinforcement learning step (S1310) according to one embodiment may include a state information calculating step (S1410), a beam control information calculating step (S1420), and a learning compensation information calculating step (S1430). there is.

상태 정보 산출 단계(S1410)에서는, 적어도 하나 이상의 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하는 것을 포함할 수 있다. 이 경우, 수신 신호에는 빔 형성 벡터 정보, 채널 벡터 정보, 채널 상태 정보(CSI) 및 수신 신호 세기 정보 등이 포함될 수 있다.The status information calculation step (S1410) may include calculating status information based on at least one received signal and information on the number of antennas used for beam forming. In this case, the received signal may include beam forming vector information, channel vector information, channel state information (CSI), and received signal strength information.

빔 제어 정보 산출 단계(S1420)에서는, 상태 정보 및 미리 설정된 액션 정보에 기초하여 단말이 위치하는 방향에 관한 방향 추정 정보 및 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 산출하는 것을 포함할 수 있다.The beam control information calculation step (S1420) may include calculating direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam based on status information and preset action information. .

일 예로, 빔 제어 정보 산출 단계(S1420)에서는, 미리 설정된 방향 간격에 따라 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하기 위한 방향 액션 정보에 기초하여 방향 추정 정보를 산출하는 것을 포함할 수 있다.As an example, the beam control information calculation step (S1420) may include calculating direction estimation information based on direction action information for performing a beam direction action that changes the direction of a communication beam according to a preset direction interval. .

구체적으로, 빔 제어 정보 산출 단계(S1420)에서는, 빔 형성 벡터 정보 및 채널 벡터 정보를 이용하여 방향 추정 정보를 산출하는 것을 포함할 수 있다. 이 경우, 빔 형성 벡터 정보 및 채널 벡터 정보를 이용하여 수신 신호 크기가 가장 큰 방향에 관한 방향 액션 정보를 선택하고, 선택된 방향 액션 정보에 기초하여 방향 추정 정보를 산출하는 것을 포함할 수 있다.Specifically, the beam control information calculation step (S1420) may include calculating direction estimation information using beam forming vector information and channel vector information. In this case, it may include selecting direction action information regarding the direction in which the received signal size is largest using beam forming vector information and channel vector information, and calculating direction estimation information based on the selected direction action information.

일 예로, 빔 제어 정보 산출 단계(S1420)에서는, 미리 설정된 빔 폭 게인에 따라 통신 빔의 빔 폭을 결정하는 빔 폭 액션을 수행하기 위한 빔 폭 액션 정보에 기초하여 빔 폭 제어 정보를 산출하는 것을 포함할 수 있다.For example, in the beam control information calculation step (S1420), beam width control information is calculated based on beam width action information for performing a beam width action for determining the beam width of a communication beam according to a preset beam width gain. It can be included.

구체적으로, 빔 제어 정보 산출 단계(S1420)에서는, 안테나 수 정보를 이용하여 빔 폭 제어 정보를 산출하는 것을 포함할 수 있다. 이 경우, 안테나 수 정보를 이용하여 수신 신호 크기가 가장 큰 빔 폭에 관한 빔 폭 액션 정보를 선택하고, 선택된 빔 폭 액션 정보에 기초하여 빔 폭 제어 정보를 산출하는 것을 포함할 수 있다.Specifically, the beam control information calculation step (S1420) may include calculating beam width control information using antenna number information. In this case, it may include selecting beam width action information regarding the beam width with the largest received signal size using information on the number of antennas, and calculating beam width control information based on the selected beam width action information.

일 예로, 빔 제어 정보 산출 단계(S1420)에서는, 수신 신호 세기 정보에 기초하여 빔 제어 정보를 산출하는 것을 포함할 수 있다. 구체적으로, 상태 정보에 포함되는 수신 신호 세기 정보를 비교하여, 수신 신호 세기 정보가 가장 큰 빔 방향 및 빔 폭에 관한 빔 방향 액션 정보 및 빔 폭 액션 정보를 선택하고, 선택된 빔 방향 액션 정보 및 빔 폭 액션 정보에 기초하여 방향 추정 정보 및 빔 폭 제어 정보를 산출하는 것을 포함할 수 있다.For example, the beam control information calculation step (S1420) may include calculating beam control information based on received signal strength information. Specifically, the received signal strength information included in the status information is compared, the beam direction action information and beam width action information related to the beam direction and beam width with the largest received signal strength information are selected, and the selected beam direction action information and beam width are selected. It may include calculating direction estimation information and beam width control information based on the width action information.

학습 보상 정보 산출 단계(S1430)에서는, 상태 정보 및 빔 제어 정보를 이용하여 학습 보상 정보를 산출하는 것을 포함할 수 있다.The learning compensation information calculation step (S1430) may include calculating learning compensation information using state information and beam control information.

일 예로, 학습 보상 정보 산출 단계(S1430)에서는, 수신 신호 세기 정보에 기초하여 학습 보상 정보를 산출하는 것을 포함할 수 있다. 구체적으로, 수신 신호 세기 정보가 미리 설정된 상한 임계값 이상인 경우는 학습 보상 정보를 2로 산출하고, 수신 신호 세기 정보가 상한 임계값 미만이고, 미리 설정된 하한 임계값 이상인 경우는 학습 보상 정보를 1로 산출하며, 수신 신호 세기 정보가 하한 임계값 미만인 경우는 학습 보상 정보를 0으로 산출하는 방식으로 학습 보상 정보를 산출하는 것을 포함할 수 있다.As an example, the learning compensation information calculation step (S1430) may include calculating learning compensation information based on received signal strength information. Specifically, if the received signal strength information is greater than the preset upper limit threshold, the learning compensation information is calculated as 2, and if the received signal strength information is less than the upper limit threshold and greater than the preset lower limit threshold, the learning compensation information is calculated as 1. Calculation may include calculating the learning compensation information by calculating the learning compensation information as 0 when the received signal strength information is less than the lower limit threshold.

이 경우, 학습 보상 정보는 <2, 1, 0>의 3개 값 중 하나로만 산출되게 되므로, 학습 보상 정보를 활용한 강화 학습에 따른 빔 형성 제어의 오버 헤드를 감소시킬 수 있다.In this case, since the learning compensation information is calculated as only one of three values: <2, 1, 0>, the overhead of beam forming control according to reinforcement learning using the learning compensation information can be reduced.

또는, 경우에 따라, 학습 보상 정보 산출 단계(S1430)에서는, 상태 정보에 기초하여 산출되는 수신 신호 세기 정보를 학습 보상 정보로 산출할 수도 있다.Alternatively, depending on the case, in the learning compensation information calculation step (S1430), the received signal strength information calculated based on the state information may be calculated as the learning compensation information.

이 경우, 강화 학습에 따른 보상 부여 및 학습 네트워크의 갱신을 보다 정교하게 수행할 수 있으므로, 학습 보상 정보를 활용한 강화 학습에 따른 빔 형성 제어의 정확성을 향상시킬 수 있다.In this case, since reward granting and updating of the learning network according to reinforcement learning can be performed more precisely, the accuracy of beam forming control according to reinforcement learning using learning reward information can be improved.

이하에서는, 또 다른 실시예에 따른 심층 강화 학습 기반의 적응형 빔 폭 제어 및 빔 추적을 수행하는 구성을 예시적으로 설명한다. 이러한 구성은, 밀리미터파 V2X 통신 환경에서 수행될 수 있으며, 차량 간 통신 (V2V), 차량과 구조물 간의 통신 (V2I) 등 다양한 V2X 통신 상황에서 활용될 수 있다. 그리고 기지국은 디지털 빔 형성을 수행하는 구성을 포함할 수 있다.Below, a configuration for performing adaptive beam width control and beam tracking based on deep reinforcement learning according to another embodiment will be described as an example. This configuration can be performed in a millimeter wave V2X communication environment and can be utilized in various V2X communication situations such as vehicle-to-vehicle communication (V2V) and vehicle-to-structure communication (V2I). And the base station may include a component that performs digital beam forming.

여기서, 기지국은, N_b개의 선형 안테나 배열(uniform linear array)을 포함할 수 있고, 안테나 배열은 서로 평행한 구성일 수 있다. 그리고, 밀리미터파 통신 채널은, 채널 이득 α, 도래각 θ 및 빔 형성 벡터 ω 를 포함할 수 있다. 또한, 기지국을 통한 무선 통신의 초기 엑세스와 연결은 이미 구현되었다고 가정한다.Here, the base station may include N _b uniform linear antenna arrays, and the antenna arrays may be parallel to each other. And, the millimeter wave communication channel may include a channel gain α, an angle of arrival θ, and a beam forming vector ω. Additionally, it is assumed that initial access and connection to wireless communication through the base station have already been implemented.

본 실시예에 따른 심층 강화 학습 기반의 적응형 빔 폭 제어 및 빔 추적 알고리즘은 세 단계 프로세스로 구성될 수 있다.The deep reinforcement learning-based adaptive beam width control and beam tracking algorithm according to this embodiment may be composed of a three-step process.

첫째, 움직이는 모바일이 기지국으로 파일럿 심볼 b_k,l (정보가 공유된 신호)를 송신하여 state 를 측정한다.First, the moving mobile measures the state by transmitting a pilot symbol b _k,l (a signal with shared information) to the base station.

둘째, 기지국은 실시간으로 측정한 수신신호를 기반으로 한 함수와 안테나 개수를 state 로 하며, 심층 강화 학습을 사용해 모바일에 대한 빔 추적 과 빔 폭 제어 action을 수행한다.Second, the base station uses a function based on the received signal measured in real time and the number of antennas as the state, and uses deep reinforcement learning to perform beam tracking and beam width control actions for the mobile.

셋째, 기지국은 실시간으로 트레이닝 신호의 수신 신호 세기를 비교하여 심층 강화 학습을 위한 보상 (reward) 값을 받는다. Action 이 기대 보상을 최대화하도록 심층 강화 학습의 네트워크를 업데이트 하여 이후의 통신 성능을 향상시키는 방향으로 트레이닝을 한다.Third, the base station compares the received signal strength of the training signal in real time and receives a reward value for deep reinforcement learning. Action updates the deep reinforcement learning network to maximize the expected reward and trains to improve subsequent communication performance.

각 단계의 구체적인 설명은 아래와 같다.The specific description of each step is as follows.

첫째, 움직이는 모바일이 기지국으로 보내는 트레이닝 신호의 채널 모델은 높은 경로 이득과 적당한 이동성이 있다고 가정한다. 이 때 k번째 타임 슬롯의 l번째 수신 신호 z_k,l 에 관하여는 다음 수학식 1과 같은 수식이 예시적으로 적용될 수 있다.First, the channel model of the training signal sent by the moving mobile to the base station assumes high path gain and moderate mobility. At this time, the following equation 1 may be applied to the lth received signal z _k,l of the kth time slot.

여기서, w_k,l 은 기지국의 수신 빔 형성 벡터, h_k 는 기지국과 모바일 사이의 채널 벡터, b_k,l 은 모바일에서 송신된 파일럿 심볼, n_k,l 은 노이즈이다.Here, w _k,l is the received beamforming vector of the base station, h _k is the channel vector between the base station and the mobile, b _k,l is the pilot symbol transmitted from the mobile, and n _k,l is noise.

이 경우, 채널 이득(α)은 경로 손실과 small scale fading 으로 나타날 수 있다.In this case, the channel gain (α) may appear as path loss and small scale fading.

둘째, 기지국은 실시간으로 심층 강화 학습 기반의 빔 폭 제어와 빔 추적을 사용해 모바일의 방향(채널)을 추정한다. 심층 강화 학습은 state 측정, action, reward 의 3단계 프로세스로 구성되며 state 는 모바일의 이동성에 의한 각도 변화와 기지국의 빔 폭을 반영한다. state s_k 는 현재와 이전 타임 슬롯의 수신 신호에 대한 함수와 안테나 개수로 나타낼 수 있으며, 다음 수학식 2 및 3과 같은 수식이 예시적으로 적용될 수 있다.Second, the base station estimates the direction (channel) of the mobile in real time using deep reinforcement learning-based beam width control and beam tracking. Deep reinforcement learning consists of a three-step process of state measurement, action, and reward, and the state reflects the angle change due to the mobility of the mobile and the beam width of the base station. State s _k can be expressed as a function for the received signals of the current and previous time slots and the number of antennas, and the following equations 2 and 3 can be applied as examples.

여기서, state s_k 에는 s_k 의 실수부인 Real(s_k), 허수부인 Imag(s_k) 및 안테나 개수에 관한 N_k 가 각각 포함될 수 있다. 그리고 s_k 는 k 번째 타임 슬롯의 1번째 수신 신호 z_k,1 및 k-1 번째 타임 슬롯의 2번째 수신 신호 z_k-1,2 에 기초하여 생성될 수 있다. 또한 경우에 따라, s_k 는 k 번째 타임 슬롯의 수신 신호 z_k 및 k-1 번째 타임 슬롯의 수신 신호z_k-1 의 형식으로 표현될 수도 있다.Here, state s _k may include Real(s _k ), which is the real part of s _k , Imag(s _k ), which is the imaginary part, and N _k regarding the number of antennas. And s _k can be generated based on the 1st received signal z _k,1 of the kth time slot and the 2nd received signal z _k-1,2 of the k-1th time slot. Additionally, in some cases, s _k may be expressed in the form of a received signal z _k of the k-th time slot and a received signal z _k-1 of the k-1-th time slot.

L은 한 타임 슬롯의 총 심볼의 개수이며, L^* 번째 심볼에서 action 이 행해진 후 reward 가 계산될 수 있다. N_k 는 k번째 타임 슬롯에서의 기지국의 배열 안테나의 개수로 설정될 수 있다. L is the total number of symbols in one time slot, and reward can be calculated after the action is performed on the L ^* th symbol. N _k may be set as the number of array antennas of the base station in the k-th time slot.

측정한 state 정보를 입력으로 하는 심층 강화 학습 의 action a_k,m 은 action 영역 에 속할 수 있다. 그리고 에 관하여는 다음 수학식 4와 같은 수식이 예시적으로 적용될 수 있다.The action a _{k, m} of deep reinforcement learning that uses the measured state information as input is the action area. may belong to and Regarding Equation 4 below, the following equation can be applied as an example.

이러한 action 영역 에는 빔 추적 action 에 관한 영역과 빔 폭 제어 action 에 관한 영역이 포함될 수 있다. 그리고 (빔 추적)과 (빔 폭 제어)는 각각 다음 수학식 5 및 6과 같은 수식이 예시적으로 적용될 수 있다.These action areas Regarding beam tracking action: Area and beam width control actions Areas may be included. and (beam tracking) and (Beam width control) can be illustratively applied using equations such as the following equations 5 and 6, respectively.

여기서, 및 은 action step size 이며, 경우에 따라 은 방향 간격으로 설정되어 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하는 데에 이용될 수 있고, 은 빔 폭 게인으로 설정되어 통신 빔의 빔 폭을 변화시키는 빔 폭 액션을 수행하는 데에 이용될 수 있다. 그리고 a_k-1 과 a_k-2는 다음 수학식 7 및 8에 예시된 내용과 같은 행동을 취할 수 있다.here, and is the action step size, and in some cases is set at the direction interval and can be used to perform a beam direction action that changes the direction of the communication beam, is set as the beam width gain and can be used to perform a beam width action that changes the beam width of the communication beam. And a _k-1 and a _k-2 can take the same actions as illustrated in Equations 7 and 8 below.

여기서, 는 k번째 타임 슬롯에서 도래각(θ)의 예측값을, N_k 는 k번째 타임 슬롯에서 빔 폭 제어에 이용되는 안테나의 개수를 나타낼 수 있다.here, may represent the predicted value of the angle of arrival (θ) in the kth time slot, and N _k may represent the number of antennas used for beam width control in the kth time slot.

수학식 5 및 7에 따르면, k 번째 타임 슬롯에서 행동 a_k,1 이 수행되는 경우, 추정 도래각은 k-1 번째 타임 슬롯에서의 도래각보다 각도 변화량만큼 감소하거나(-), 그대로이거나(0), 각도 변화량만큼 증가하는() 3가지 상태 중 하나로 업데이트될 수 있다.According to Equations 5 and 7, when action a _k,1 is performed in the k-th time slot, the estimated angle of arrival is reduced by the angle change amount compared to the angle of arrival in the k-1-th time slot (- ), remain the same (0), or increase by the amount of angle change ( ) can be updated to one of three states:

수학식 6 및 8에 따르면, k 번째 타임 슬롯에서의 빔 폭은, k-1번째 타임 슬롯에서의 빔 폭보다 빔 폭 게인으로 나눈만큼 감소하거나(*), 그대로이거나(*1), 빔 폭 게인을 곱한만큼 증가하는(*) 3가지 상태 중 하나로 업데이트될 수 있다.According to Equations 6 and 8, the beam width at the k-th time slot is reduced by the beam width at the k-1th time slot divided by the beam width gain (* ), remain the same (*1), or increase by the amount multiplied by the beam width gain (* ) can be updated to one of three states:

셋째, 기지국은 실시간으로 트레이닝 신호의 수신 신호 세기를 비교하여 심층 강화 학습을 위한 reward 값을 받아 심층 강화 학습의 네트워크를 업데이트 하여 이후의 통신 성능을 향상시키는 방향으로 트레이닝을 수행할 수 있다.Third, the base station can perform training to improve subsequent communication performance by comparing the received signal strength of the training signal in real time, receiving a reward value for deep reinforcement learning, and updating the deep reinforcement learning network.

action 에 대한 reward r_k에 관하여는 다음 수학식 7과 같은 수식이 예시적으로 적용될 수 있다.Regarding the reward r _k for action, the following equation (7) can be applied as an example.

여기서, c_u 는 상한 임계값, c_l 은 하한 임계값이며, reward 산출에 있어 노이즈에 의한 오류를 줄이기 위하여 이용될 수 있다.Here, c _u is the upper threshold and c _l is the lower threshold, and can be used to reduce errors caused by noise in calculating reward.

물론, 경우에 따라서는 및Of course, in some cases and

과 같이 예시적으로 제시된 산출식들을 직접 이용하여 reward를 산출할 수도 있을 것이다. The reward may be calculated directly using the calculation formulas presented as examples.

이상에서 설명한 바와 같이, 심층 강화 학습 기반으로 하는 빔 추적 트레이닝이 계속될수록, 심층 강화 학습 네트워크가 더욱 더 수신신호의 크기를 크게 하는 방향으로 훈련이 될 수 있다. 이에 따라, 빔 추적의 정확도가 향상될 수 있고, 빔 폭 제어의 정확도 및 효율이 향상될 수 있다.As described above, as beam tracking training based on deep reinforcement learning continues, the deep reinforcement learning network can be trained to further increase the size of the received signal. Accordingly, the accuracy of beam tracking can be improved, and the accuracy and efficiency of beam width control can be improved.

또한, 본 개시는 이동하는 단말을 실시간으로 추적하여 빔 형성을 수행할 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다. Additionally, the present disclosure can provide a wireless communication base station and method that can perform beam forming by tracking a moving terminal in real time.

또한, 본 개시는 심층 강화 학습을 기반으로 하여 이동하는 단말의 실시간 추적 속도 및 정확성을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.Additionally, the present disclosure can provide a wireless communication base station and method that can improve real-time tracking speed and accuracy of a moving terminal based on deep reinforcement learning.

또한, 본 개시는 이동하는 단말에 대한 빔 형성에 있어서 빔 폭을 실시간으로 제어하여 통신 성능을 향상시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.Additionally, the present disclosure can provide a wireless communication base station and method that can improve communication performance by controlling the beam width in real time when forming a beam for a moving terminal.

또한, 본 개시는 심층 강화 학습을 기반으로 하는 빔 추적 및 빔 폭 제어를 통해 빔 형성에 있어 오버헤드를 감소시킬 수 있는 무선 통신 기지국 및 방법을 제공할 수 있다.Additionally, the present disclosure can provide a wireless communication base station and method that can reduce overhead in beam forming through beam tracking and beam width control based on deep reinforcement learning.

이상의 설명은 본 개시의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 기술 사상의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 또한, 본 실시예들은 본 개시의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이므로 이러한 실시예에 의하여 본 기술 사상의 범위가 한정되는 것은 아니다. 본 개시의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 개시의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an illustrative explanation of the technical idea of the present disclosure, and those skilled in the art will be able to make various modifications and variations without departing from the essential characteristics of the present disclosure. In addition, the present embodiments are not intended to limit the technical idea of the present disclosure, but rather to explain it, so the scope of the present technical idea is not limited by these embodiments. The scope of protection of this disclosure should be interpreted in accordance with the claims below, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of rights of this disclosure.

Claims

무선 통신을 수행하는 기지국에 있어서,
적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상기 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상기 상태 정보 및 상기 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 학습부; 및
상기 빔 제어 정보에 기초하여 상기 단말에 대한 통신 빔을 형성하는 빔 형성 제어를 수행하는 제어부를 포함하되,
상기 빔 제어 정보는,
상기 단말이 위치하는 방향에 관한 방향 추정 정보 및 상기 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함하고,
상기 상태 정보는,
상기 수신 신호에 관한 빔 형성 벡터 정보 및 채널 벡터 정보를 포함하고,
상기 학습부는,
상기 빔 형성 벡터 정보 및 상기 채널 벡터 정보에 기초하여 상기 방향 추정 정보를 산출하며, 상기 안테나 수 정보에 기초하여 상기 빔 폭 제어 정보를 산출하는 기지국.In a base station performing wireless communication,
Calculate status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, calculate beam control information based on the status information and preset action information, and calculate the status information and the a learning unit that calculates learning compensation information based on beam control information; and
A control unit that performs beam forming control to form a communication beam for the terminal based on the beam control information,
The beam control information is,
Contains direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam,
The status information is,
Contains beam forming vector information and channel vector information regarding the received signal,
The learning department,
A base station that calculates the direction estimation information based on the beam forming vector information and the channel vector information, and calculates the beam width control information based on the antenna number information.

삭제delete

제 1 항에 있어서,
상기 제어부는,
상기 방향 추정 정보에 기초하여 결정되는 빔 방향 및 상기 빔 폭 제어 정보에 기초하여 결정되는 빔 폭으로 상기 통신 빔이 형성되도록 상기 빔 형성 제어를 수행하는 기지국.According to claim 1,
The control unit,
A base station that performs beam forming control so that the communication beam is formed with a beam direction determined based on the direction estimation information and a beam width determined based on the beam width control information.

제 1 항에 있어서,
상기 액션 정보는,
상기 상태 정보에 해당하는 상태에서 미리 설정된 방향 간격에 따라 상기 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하기 위한 방향 액션 정보 및 미리 설정된 빔 폭 게인에 따라 상기 통신 빔의 빔 폭을 변화시키는 빔 폭 액션을 수행하기 위한 빔 폭 액션 정보를 포함하고,
상기 학습부는,
상기 방향 액션 정보 및 상기 빔 폭 액션 정보에 기초하여 상기 방향 추정 정보 및 상기 빔 폭 제어 정보를 산출하는 기지국.According to claim 1,
The action information is,
Direction action information for performing a beam direction action for changing the direction of the communication beam according to a preset direction interval in a state corresponding to the state information and a beam for changing the beam width of the communication beam according to a preset beam width gain Contains beam width action information for performing the width action,
The learning department,
A base station that calculates the direction estimation information and the beam width control information based on the direction action information and the beam width action information.

제 4 항에 있어서,
상기 학습부는,
상기 상태 정보, 상기 방향 액션 정보 및 상기 빔 폭 액션 정보에 기초하여, 각 (방향 액션, 빔 폭 액션) 순서쌍 별로 각각의 기대 보상값을 산출하며
산출된 상기 기대 보상값 중 가장 큰 값을 가지는 것으로 판단되는 상기 (방향 액션, 빔 폭 액션) 순서쌍에 해당하는 상기 액션 정보에 기초하여 상기 방향 추정 정보 및 상기 빔 폭 제어 정보를 산출하는 기지국.According to claim 4,
The learning department,
Based on the state information, the direction action information, and the beam width action information, each expected compensation value is calculated for each (direction action, beam width action) ordered pair,
A base station that calculates the direction estimation information and the beam width control information based on the action information corresponding to the ordered pair (direction action, beam width action) that is determined to have the largest value among the calculated expected compensation values.

제 1 항에 있어서,
상기 상태 정보는,
상기 수신 신호의 세기에 관한 수신 신호 세기 정보를 포함하고,
상기 학습부는,
상기 수신 신호 세기 정보의 크기를 증가시키는 방향으로 강화 학습이 이루어지도록 상기 학습 보상 정보를 산출하는 기지국.According to claim 1,
The status information is,
Contains received signal strength information regarding the strength of the received signal,
The learning department,
A base station that calculates the learning compensation information so that reinforcement learning is performed in the direction of increasing the size of the received signal strength information.

제 6 항에 있어서,
상기 학습부는,
상기 수신 신호 세기 정보의 크기가 미리 설정된 상한 임계값 이상인 경우는 상기 학습 보상 정보를 2로 산출하고,
상기 수신 신호 세기 정보의 크기가 상기 상한 임계값 미만이고, 미리 설정된 하한 임계값 이상인 경우는 상기 학습 보상 정보를 1로 산출하며,
상기 수신 신호 세기 정보의 크기가 상기 하한 임계값 미만인 경우는 상기 학습 보상 정보를 0으로 산출하는 기지국.According to claim 6,
The learning department,
If the size of the received signal strength information is greater than a preset upper limit threshold, the learning compensation information is calculated as 2,
If the size of the received signal strength information is less than the upper limit threshold and greater than the preset lower limit threshold, the learning compensation information is calculated as 1,
A base station that calculates the learning compensation information as 0 when the size of the received signal strength information is less than the lower limit threshold.

기지국이 무선 통신을 수행하는 방법에 있어서,
적어도 하나 이상의 단말로부터 수신되는 수신 신호 및 빔 형성에 이용되는 안테나 수 정보에 기초하여 상태 정보를 산출하고, 상기 상태 정보 및 미리 설정된 액션 정보에 기초하여 빔 제어 정보를 산출하며, 상기 상태 정보 및 상기 빔 제어 정보에 기초하여 학습 보상 정보를 산출하는 강화 학습 단계; 및
상기 빔 제어 정보에 기초하여 상기 단말에 대한 통신 빔을 형성하도록 제어하는 빔 형성 제어 단계를 포함하되,
상기 빔 제어 정보는,
상기 단말이 위치하는 방향에 관한 방향 추정 정보 및 상기 통신 빔의 빔 폭에 관한 빔 폭 제어 정보를 포함하고,
상기 상태 정보는,
상기 수신 신호에 관한 빔 형성 벡터 정보 및 채널 벡터 정보를 포함하고,
상기 강화 학습 단계는,
상기 빔 형성 벡터 정보 및 상기 채널 벡터 정보에 기초하여 상기 방향 추정 정보를 산출하며, 상기 안테나 수 정보에 기초하여 상기 빔 폭 제어 정보를 산출하는 방법.In a method for a base station to perform wireless communication,
Calculate status information based on a received signal received from at least one terminal and information on the number of antennas used for beam forming, calculate beam control information based on the status information and preset action information, and calculate the status information and the A reinforcement learning step of calculating learning reward information based on beam control information; and
A beam forming control step of controlling to form a communication beam for the terminal based on the beam control information,
The beam control information is,
Contains direction estimation information regarding the direction in which the terminal is located and beam width control information regarding the beam width of the communication beam,
The status information is,
Contains beam forming vector information and channel vector information regarding the received signal,
The reinforcement learning step is,
A method of calculating the direction estimation information based on the beam forming vector information and the channel vector information, and calculating the beam width control information based on the antenna number information.

삭제delete

제 8 항에 있어서,
상기 제어 단계는,
상기 방향 추정 정보에 기초하여 결정되는 빔 방향 및 상기 빔 폭 제어 정보에 기초하여 결정되는 빔 폭으로 상기 통신 빔이 형성되도록 상기 빔 형성 제어를 수행하는 방법.According to claim 8,
The control step is,
A method of performing the beam forming control so that the communication beam is formed with a beam direction determined based on the direction estimation information and a beam width determined based on the beam width control information.

제 8 항에 있어서,
상기 액션 정보는,
상기 상태 정보에 해당하는 상태에서 미리 설정된 방향 간격에 따라 상기 통신 빔의 방향을 변화시키는 빔 방향 액션을 수행하기 위한 방향 액션 정보 및 미리 설정된 빔 폭 게인에 따라 상기 통신 빔의 빔 폭을 변화시키는 빔 폭 액션을 수행하기 위한 빔 폭 액션 정보를 포함하고,
상기 강화 학습 단계는,
상기 방향 액션 정보 및 상기 빔 폭 액션 정보에 기초하여 상기 방향 추정 정보 및 상기 빔 폭 제어 정보를 산출하는 방법.According to claim 8,
The action information is,
Direction action information for performing a beam direction action for changing the direction of the communication beam according to a preset direction interval in a state corresponding to the state information and a beam for changing the beam width of the communication beam according to a preset beam width gain Contains beam width action information for performing the width action,
The reinforcement learning step is,
A method of calculating the direction estimation information and the beam width control information based on the direction action information and the beam width action information.

제 11 항에 있어서,
상기 강화 학습 단계는,
상기 상태 정보, 상기 방향 액션 정보 및 상기 빔 폭 액션 정보에 기초하여, 각 (방향 액션, 빔 폭 액션) 순서쌍 별로 각각의 기대 보상값을 산출하며
산출된 상기 기대 보상값 중 가장 큰 값을 가지는 것으로 판단되는 상기 (방향 액션, 빔 폭 액션) 순서쌍에 해당하는 상기 액션 정보에 기초하여 상기 방향 추정 정보 및 상기 빔 폭 제어 정보를 산출하는 방법.According to claim 11,
The reinforcement learning step is,
Based on the state information, the direction action information, and the beam width action information, each expected compensation value is calculated for each (direction action, beam width action) ordered pair,
A method of calculating the direction estimation information and the beam width control information based on the action information corresponding to the ordered pair (direction action, beam width action) that is determined to have the largest value among the calculated expected compensation values.

제 8 항에 있어서,
상기 상태 정보는,
상기 수신 신호의 세기에 관한 수신 신호 세기 정보를 포함하고,
상기 강화 학습 단계는,
상기 수신 신호 세기 정보의 크기를 증가시키는 방향으로 강화 학습이 이루어지도록 상기 학습 보상 정보를 산출하는 방법.According to claim 8,
The status information is,
Contains received signal strength information regarding the strength of the received signal,
The reinforcement learning step is,
A method of calculating the learning compensation information so that reinforcement learning is performed in the direction of increasing the size of the received signal strength information.

제 13 항에 있어서,
상기 강화 학습 단계는,
상기 수신 신호 세기 정보의 크기가 미리 설정된 상한 임계값 이상인 경우는 상기 학습 보상 정보를 2로 산출하고,
상기 수신 신호 세기 정보의 크기가 상기 상한 임계값 미만이고, 미리 설정된 하한 임계값 이상인 경우는 상기 보상 정보를 1로 산출하며,
상기 수신 신호 세기 정보의 크기가 상기 하한 임계값 미만인 경우는 상기 학습 보상 정보를 0으로 산출하는 방법.
According to claim 13,
The reinforcement learning step is,
If the size of the received signal strength information is greater than a preset upper limit threshold, the learning compensation information is calculated as 2,
If the size of the received signal strength information is less than the upper limit threshold and greater than the preset lower limit threshold, the compensation information is calculated as 1,
A method of calculating the learning compensation information as 0 when the size of the received signal strength information is less than the lower limit threshold.