WO2020192938A1

WO2020192938A1 - Network entity and method for supporting network fault detection

Info

Publication number: WO2020192938A1
Application number: PCT/EP2019/057954
Authority: WO
Inventors: Ömer BULAKCI; Serkan AYAZ; Panagiotis SPAPIS; Chan Zhou; Josef Eichinger
Original assignee: Huawei Technologies Co., Ltd.
Priority date: 2019-03-28
Filing date: 2019-03-28
Publication date: 2020-10-01

Abstract

This disclosure relates to fault detection in a network. In particular, the invention provides a network entity, method and computer program for supporting such network fault detection. The network entity is configured to monitor expected traffic from at least one terminal device, and to provide an indication message to a fault management entity and/or to a recovery entity, when determining an irregularity in the expected traffic within a predetermined time interval. Thus, the network can benefit from improvements of lower latency and higher reliability.

Description

NETWORK ENTITY AND METHOD FOR SUPPORTING NETWORK FAULT

DETECTION

TECHNICAL FIELD

The present disclosure relates to the field of wireless communication networks. In this technical field, the disclosure specifically proposes a network entity, a method and a computer program for supporting network fault detection.

BACKGROUND

It is known that the next generation of mobile and wireless communications, namely, the fifth generation (5G), envisions new use cases, services and applications, for instance, enhanced mobile broadband (eMBB), ultra-reliable low latency communications (URLLC), and massive machine-type communications (mMTC). Any combination of these use cases can also be possible.

Further, enhanced Vehicular to everything (eV2X) can be seen as a special 5G service type, which includes both safety and non-safety services, e.g., according to 3GPP TS 22.186 - Service requirements for enhanced V2X scenarios. One of the key requirements for eV2X services is the critical latency (e.g., 3-10ms) and the reliability (e.g., 99.999% and higher). Further requirements are, for example, key performance indicators (KPIs), which may need to be adapted on demand, due to new application requests, for instance, Level of Automation (LoA) change, dynamic group-UE formations, or network changes (network congestion to core network and/or access network entities, mode of transmission / operation change). One key challenge is to ensure under these changes the service continuity for V2X/URLLC services without any temporary loss of service or with a minimum or even no loss of packets. Next Generation Radio access networks (NG-RAN) need to support V2X /URLLC services for user plane (UP), control plane (CP) or both, in order to ensure that the reliability and coverage requirements are met.

In the 5G system (5GS), there may occur faults or failures. For example, a faulty cell can be caused by an access node that is no longer properly functioning, however, has not generated any fault detection trigger at an operation and maintenance (OAM) domain. Such fault can occur due to different reasons: e.g. hardware (HW), software (SW), or firmware (FW) issues. A fault can also occur at different levels, e.g., a total failure where an entire cell may stop functioning properly, or only a partial failure where some component carriers/part of the protocol stack may stop functioning properly.

Besides, in 5G, virtualization/softwarization of network functions (NFs) is one of the enablers to allow for flexibility and end-to-end (E2E) optimization. This could lead to 1) more frequent failures from SW parts, and 2) higher impact of SW failures onto network failures. In addition, in 5G, Small Cells (Planned/Unplanned) and road side units (RSUs) can be required, in order to boost performance in dense areas. However, these are low power, low cost nodes, which may be more vulnerable to partial/full SW/HW/ FW failures.

Two example motivations to solve the above-mentioned fault problem are: 1) URLLC & V2X services need ultra-high availability and ultra-high resilience. 2) Failures/faults/faulty cells or nodes can cause safety issues and significant impact on mission-critical services. Thus, a mechanism is required, e.g., within NG-RAN (5G), to ensure the requirements for the V2X/URLLC critical services, in particular by detecting a fault/failure as early as possible. The previous approaches provide long-term detection, e.g., detecting a faulty cell through customer complaints to the mobile network operator (MNO) hotline, which is too slow and has to be done manually. Another example could be detecting a faulty cell by means of offline post-processing checks made through, e.g., Call Detailed Record. However, the MNO would have to collect vast amount of data and would have to then delete the data after a recovery, and may have to repeat the process. Such a solution may not be optimal, since it would be too slow in detection.

SUMMARY

In summary, an improved device and method is desired. In view of the above-mentioned problems and disadvantages, embodiments of the present invention aim to provide a network entity 100 and method for supporting network fault detection. An object is thereby to provide ultra-high availability and ultra-high resilience, e.g., for V2X and/or URLLC services, and fault detection as early as possible, in order to reduce safety issues and impact on mission-critical services caused by faulty cells or other fault may occur in the network.

An object is achieved by embodiments of the invention as provided in the enclosed independent claims. Advantageous implementations of these embodiments are further defined in the dependent claims. A first aspect of the invention provides a network entity for supporting network fault detection, wherein the network entity is configured to: monitor expected traffic from at least one terminal device; and provide an indication message to a fault management entity and/or to a recovery entity, when determining an irregularity in the expected traffic within a predetermined time interval.

With the network entity of the first aspect, a failure or fault can be detected as soon as possible, in order to enable the ultra-reliability and low latency of e.g. a V2X/URLLC service. The predetermined time interval can be adjusted during the runtime or can be pre-configured, e.g., via a fault management entity. The predetermined time interval may be set or adjusted by the network entity, as well, or jointly by fault management and the network entity. For instance, the fault management may provide policies for the time interval and the network entity can determine the time interval, e.g., based on the monitored statistics or obtained statistics. The statistics may be obtained from a data analytics function, wherein the data analytics function can be a network data analytics function (NWDAF) in a core network (CN), or a management data analytics function (MDAF) in a management layer, or a data analytics function in the RAN. The fault management entity may be part of the management layer or part of management and orchestration layer, which may operate on long-timescale statistics and may cooperate with the recovery entity for recovery solutions, where these recovery solutions can be of long-timescale. The network entity can be a node, e.g., a physical node or virtual node in the network, or a network function, e.g., implemented as a virtual network function (VNF) or a physical network function (PNF). A VNF implementation may have the advantage that the network entity may be deployed or instantiated at different parts of the network, e.g., in a central unit (CU) of RAN, or in a management layer.

In an implementation form of the first aspect, the indication message includes at least one of: an irregularity report; and a fault detection trigger.

In an implementation, the irregularity report may be provided to the fault management entity and/or the fault detection trigger may be provided to the recovery entity.

With the use of the different report and/or trigger, the network (e.g. RAN) can react fast to a fault or failure in the network, e.g. by determining which specific entity shall react to the failure. The irregularity report can be utilized by the fault management entity to generate a fault detection trigger that can be then provided to the recovery entity. The fault management entity may receive one or more irregularity reports from one or more network entities. The one or more irregularity reports can be utilized to detect a fault and to generate a fault detection trigger.

In an implementation form of the first aspect, the network entity is further configured to: determine the irregularity, when the expected traffic is not received within the predetermined time interval.

The network entity can accordingly determine the irregularity, if the fault or failure can be recognized by the network entity, and can therefore support a faster recovery. The time interval may depend on the service type, e.g., an URLLC service may have shorter time interval compared to an eMBB service.

In an implementation form of the first aspect, the indication message includes a reliability level regarding the determination of the irregularity.

Based on the reliability level, for example, the fault management entity can determine how likely it is that there is a fault or failure in the network. In another example, the recovery unit may determine the need of the execution of the recovery based on the reliability level. If the reliability level is high, i.e., there is likely a fault, the recovery entity may perform the recovery immediately after receiving the indication message.

In an implementation form of the first aspect, the network entity is further configured to: provide the indication message to the recovery entity, when the reliability level is above a pre selected threshold.

The fact that the network entity may only provide the indication message to the recovery entity, when the reliability level is above the pre-selected threshold can make the fault detection in the network (e.g. RAN) more efficient, since only more reliably detected faults or failures are reported. The reliability level and the threshold/thresholds can be implemented in different ways such that the notion of being above or below a threshold may result in the same action. The threshold or thresholds may be implemented in the form of ranges, e.g., between a minimum and maximum value. In an implementation form of the first aspect, the network entity is further configured to: determine the reliability level and/or the pre-selected threshold based on at least one conditions of: a number of the at least one monitored terminal device and a number of monitored applications generating the expected traffic thereof

Applications that are generating traffic patterns can be characterized more deterministically and/or devices that can generate more characteristic signaling, for example, static sensors generating keep-alive signals, can be used to determine the reliability level and/or the preselected threshold, in order to make the fault detection more efficient or reliable. This is due to the fact that only highly likely faults or failures will be determined and need to be addressed. Furthermore, the at least one terminal device to be monitored can be selected based on the signaling or traffic generated. During the runtime different devices or applications can be monitored. For instance, if the irregularity is monitored at the terminal devices with more characteristic signaling and/or the irregularity is monitored at more than one terminal device, the reliability level of fault detection can be increased.

In an implementation form of the first aspect, the expected traffic includes at least one of an application level signaling and a network-level signaling.

In an implementation form of the first aspect, the application level signaling includes at least one of a: keep-alive signal, Cooperative Awareness Message (CAM), tele-operated driving (ToD) signal, and

HD Local Map signal.

In an implementation form of the first aspect, the network-level signaling includes at least one of event-based signaling and periodic signaling.

In an implementation form of the first aspect, the event-based signal is a location reporting triggered by an area event and/or a motion event of one or more terminal devices.

With the above features, the network entity can detect different kinds of faults or failures under different situations. The ToD is also known as remote driving. The application level signaling can be at least one of a signaling generated for a service, e.g., ToD. Such application-level signaling can be selected based on the criterion how well the signaling can be characterized, e.g., in terms of periodicity.

In an implementation form of the first aspect, the network entity is further configured to : register with the recovery entity and/or with another network entity, in order to cooperate in monitoring the expected traffic and/or in order to obtain configuration information indicating how to monitor, what to monitor and/or when to provide the indication message.

The registration between the network entity and recovery entity and/or another network entity can enable the network entity to monitor different types of expected traffic in the network. It can also share the indication message with the recovery entity and/or the another network entity, or can provide early notification, e.g. in case of mission critical services. For instance, the configuration of the network entity may be obtained from another network entity. The another network entity can be of the same type of the network entity, can be an agent of the network entity (e.g. the network entity is a Detection Function (DF) and the another network entity is a DF agent), or a fault management entity. The another network entity may be implemented in another RAN access node, where the network entity and another network entity may communicated via the inter-access node interface, e.g., Xn interface or intra-access node interface, e.g., FI interfaces, as specified by 3GPP.

In an implementation form of the first aspect, the network entity is further configured to: send the indication message also to the another network entity.

In an implementation form of the first aspect, the network entity is configured to be configured by the fault management entity.

In an implementation form of the first aspect, the network entity is further configured to: transmit a notification to the at least one terminal device, said notification indicating the determined irregularity.

With an early indication message also to the another network entity, a faster reaction for recovery or emergency action can be made. Such an indication message can be provided to another network entity or entities based on the registration. The at least one terminal device can have access to, e.g., V2X and/or URLLC services such that the indication message can be used to avoid service interruption.

In an implementation form of the first aspect, the network entity is further configured to: send the indication message to another network entity, for providing a second indication message by the another network entity to the fault management entity and/or to the recovery entity.

With communication between the network entities, indication messages can be forward or a new indication message including the fault information and/or irregularity information can be generated to the fault management entity as more flexible for network entity.

In an implementation form of the first aspect, the network entity is further configured to perform at least one of: receive location information and/or traffic information provided by a location management unit, concerning the at least one terminal device, and monitor the expected traffic based on the location information and/or the traffic information.

The facilitation of the location management unit makes the fault detection more efficient. The location management unit can be in the form of a location management function (LMF) and/or Location Management component (LMC) and/or Distributed LMF. The network entity can interact with the location management unit to obtain the location information or location-related information despite the area/motion event (e.g., location related service request) did not occur.

In an implementation form of the first aspect, the network entity is a RAN entity, in particular implemented in or as a Central Unit, Distributed Unit, or User Equipment, or the network entity is a CN entity, the network entity is a Management or Orchestration entity, or the network entity is implemented in an application server or implemented as an application entity.

When the network entity is implemented as a network function, the network entity can be implemented in different parts of the network, e.g., at a cluster head user equipment (UE) in a group communication.

A second aspect of the present invention provides a method for supporting network fault detection, comprising: monitoring expected traffic from at least one terminal device; and providing an indication message to a fault management entity and/or to a recovery entity, when determining an irregularity in the expected traffic within a predetermined time interval.

In implementation forms of the second aspect, the method may be developed according to the features of the various implementation forms of the network entity of the first aspect described above.

A third aspect of the present invention provides a computer program comprising a program code for carrying out, when implemented on a processor, the method according to the above mentioned method.

It has to be noted that all devices, elements, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.

BRIEF DESCRIPTION OF DRAWINGS

The above described aspects and implementation forms of the present invention will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:

FIG. 1 illustrates a network entity according to an embodiment of the invention.

FIG. 2 shows a diagram illustrating network entities according to embodiments of the invention, which are Detection Functions (DFs) implemented at different domains and in different network elements in the 5GS. FIG. 3 shows schematically an example of a regular operation in a cell, in accordance with at least one embodiment of the invention.

FIG. 4 shows schematically an example of DF operation, when an irregularity is found and an associated trigger is sent to a FM entity, in accordance with at least one embodiment of the invention.

FIG. 5 shows a message sequence chart illustrating how a DF can be operated in a 5GS, in accordance with at least one embodiment of the invention.

FIG. 6 illustrates utilization of application- level expected V2X/URLLC traffic to detect a fault in a 5GS, in accordance with at least one embodiment of the invention.

FIG. 7 illustrates utilization of application- level expected V2X/URLLC traffic to detect a fault in a 5GS, where FallBackNotification messages are used, in accordance with at least one embodiment of the invention.

FIG. 8 illustrates utilization of application- level expected V2X/URLLC traffic to detect a fault in a 5GS, where network exposure mechanisms are utilized along with the involvement of the application server, in accordance with at least one embodiment of the invention.

FIG. 9 illustrates utilization of network-level expected signalling to detect a fault in a

5GS, where periodic and triggered location reporting is used, in accordance with at least one embodiment of the invention.

FIG. 10 illustrates utilization of network-level expected signaling to detect a fault in a

5GS, where information on RLFs are used, in accordance with at least one embodiment of the invention.

FIG. 11 shows a message sequence chart illustrating DF configuration by an FM entity, in accordance with at least one embodiment of the invention. FIG. 12 shows a message sequence chart illustrating DF registration with NFs, in accordance with at least one embodiment of the invention.

FIG. 13 shows a message sequence chart illustrating DF registration with NFs in case of distributed implementations and via DF-R, in accordance with at least one embodiment of the invention.

FIG. 14 is a block diagram of a method according to an embodiment of the invention.

DETAILED DESCRIPTION OF EMBODIMENTS

FIG. 1 shows a network entity 100 according to an embodiment of the invention. The network entity 100 is configured to support network fault detection. The network entity 100 is thus also referred to in this document as“Detection Function (DF)”. The network entity 100 may be implemented in a network node of e.g. a RAN network, e.g. in a base station. The network entity 100 may also be implemented in a mobile terminal, a vehicle, a sensor, a cluster head UE, a central unit (CU), a distributed unit (DU), an (edge) User Plane Function (UPF) and/or an application server in the network.

The network entity 100 is configured to monitor expected traffic 101 from at least one terminal device 102. The terminal device 102 may be a UE or mobile device, and/or may be a vehicle or sensor. The expected traffic 101 may include at least one of an application level signaling and a network-level signaling. Thereby, the application level signaling may include at least one of a keep-alive signal, a CAM, a ToD signal, and a HD Local Map signal. Further, the network- level signaling may include at least one of an event-based signaling and a periodic signaling.

The network entity 100 is further configured to provide an indication message 103 to a fault management entity 104 and/or to a recovery entity 105, when determining an irregularity in the expected traffic 101 within a predetermined time interval. That is, the network entity 100 may monitor the expected traffic 101, determine whether the expected traffic 101 shows an irregularity, and generates and sends the notification message 103 if detecting an irregularity.

FIG. 2 shows network entities 100 according to embodiments of the invention, which base on the network entity 100 shown in FIG. 1. In particular, the network entities 100 are referred to as“Fault DFs” (i.e. each network entity 100 implements a DF, i.e. the DFs are also labelled with reference sign 100 in the following). The DFs 100 are configured to discover irregularities in the network traffic. Depending on the location of the DF in the network, the traffic may pertain to the different parts of the network. For example, a DF at the DU can monitor the traffic between the DU and the UE(s), while a DF at the CU can additionally monitor the traffic on the FI interface. The network traffic that can be monitored by a DF can also depend on the implementation location. For instance, a DF at the application server may more easily monitor the application- level signaling. In case that a DF 100 discovers such an irregularity, it may generate and send an indication message 103, which may include irregularity reporting and/or a fault detection trigger. The indication message 103 may be transmitted to trigger an Agile Recovery Mechanisms in the RAN, i.e. may be provided to a recovery entity 105. The DFs 100 may be configured by a central DF 100 or may be configured by a FM entity 104.

In particular a DF 100 can analyze patterns in/of the expected traffic 101. Thereby, repeated traffic patterns may be taken as the expected traffic 101, such as keep-alive signals, tele- operated driving (ToD) signals, HD Local Map signals and/or Cooperative Awareness Messages (CAMs). This provides a good means of traffic“expectation”, since such signals are typically transmitted periodically. If the expected traffic 101 does not arrive for a pre determined time interval At, the DF 100 can trigger the indication message 103. The indication message 103 may include a reliability level regarding the determination of the irregularity in the expected traffic 101. The DF 100 can analyze user plane (U-Plane, aka UP) data flows as wells as control plane (C-plane, aka CP) signaling. The DF 100 may also interact with a Header Compression function (ROHC) at a Packet Data Convergence Protocol (PDCP), e.g., a Type of Service (ToS) field.

The DF 100 can be implemented as a standalone RAN function, or can be implemented as a part of an existing RAN functionality, e.g. a radio resource control (RRC), radio resource management (RRM), or self-organizing network (SON) extension. FIG. 2 shows multiple DFs 100, which are implemented at different network elements (terminal device 102, DU, CU, UPF, Application Server), and can each monitor expected traffic 101 to be utilized for the failure detection. The expected traffic 101 can include different types of signals for different DFs 100. Due to low- latency requirements, DFs 100 may be preferred to be located close to the edges of the network. In some embodiments, a DF 100 providing an indication message 103 can be triggered by a lack of application-level expected signaling, e.g. lack of a keep-alive signal or CAM in a geographical area. In other embodiments, a DF 100 providing an indication message 103 can be triggered by a lack of expected network-level signaling, e.g. lack of a positioning reporting. In other embodiments, a DF 100 providing an indication message 103 can be triggered by a lack of expected event-based network signaling, e.g. multiple radio link failures (RLFs), in a geographical area.

In terms of expected application- level traffic 101, for instance, keep-alive signals of vehicles, Vehicle Sensors, UEs, and/or of internet of things (IoT) devices can be utilized, in order to detect failures/faults/faulty cells or nodes on-time. Keep-alive signals, also referred to as“heart beats” of terminal devices 102, are typically generated by applications of the terminal devices 102 (e.g., smartphone always-on applications & IoT applications), e.g., to keep an application online or to provide regular information of a sensor. There may be a plurality of devices 102 in a cell, which each generate keep-alive signals. Each terminal device 102 may further have several applications with different keep-alive signal characteristics. The terminal devices 102 can be of UE type, which can be in the order of tens, or can be machines, which can be in the order of thousands. Some of these terminal devices 102 may also be static devices, e.g., sensors or semi-dynamic devices, e.g., a V2X UE or a drone. It can be expected that machines generate more predictable keep-alive signals to report their status.

The DF 100 can generate the indication message 103, when one or more of the keep-alive signals stop fully in a given cell for a given time interval At, and may communicate, as the indication message 103, specifically an irregularity/anomaly report and/or a fault detection trigger to the FM entity 104 and/or to the recovery entity 105. It should be noted that the recovery entity 105 may operate more functions than just recovering a fault. In other words, the recovery entity 105 may be an entity that is able to operate any function that facilitates fault detection or recovery, for example, providing a network function of registration, network exposure, and/or location management etc. Therefore, the recovery entity 105 can also considered as a failure detection assisting entity.

FIG. 3 illustrates a regular operation in a cell, when there are exemplarily a UE1 and a plurality of IoT/Sensors (i.e. terminal devices 102) in the cell. The terminal devices 102 are served by a base station (gNBl). FIG. 4 illustrates exemplarily, how the detection of a fault occurs in the same cell as shown in FIG. 3, namely when keep-alive signals stop for the time interval At (without a stop detection request from edge equipments or an initiative detection stop by the gNBl). The predetermined time interval At provides a degree of freedom to adapt to different characteristics of different keep-alive signals. At can in particular depend on the application type, or can be pre-determined by the FM entity 104 depending on the existing applications. For determining the predetermined time interval At, packet inter-arrival time (IAT) of the keep alive signals can be utilized, wherein short IAT may be exploited for the low-latency communications. The IAT can be obtained, e.g., via traffic traces from real network.

As is shown in FIG. 4, the DF 100 (in FIG. 4 at thegNBl) may determine that for the given interval At time interval, the keep-alive signals from a single UE, or multiple UEs, or all UEs (and from a single or from multiple applications), and/or the keep-alive signals from one or more machines (e.g. IoT/Sensors) in a cell are not received anymore. This implies that with a high probability this cell is a faulty cell, i.e., it is highly unlikely that the expected keep-alive signals stop in the cell for any other reason than a fault. The DF 100 may in this case inform the FM entity 104, or any other management node or function, such that necessary measures can be taken on-time. For example, after a faulty cell/fault is detected, an Agile Recovery Mechanism, e.g. configured by the FM entity 104, may take the necessary measures to mitigate the faulty cell condition, e.g., perform Fail-safe Protocol Configuration (e.g., activating a duplicate protocol stack as of PHY or MAC layer to generate the same or similar RF signals), Resetting the Cell (Deactivate & Activate the cell), or Extending the cell coverage areas of the neighboring cell & resetting the cell.

FIG. 5 shows a message sequence chart (MSC) illustrating how a DF 100 (network entity 100) can be operated in a 5GS. One or more UEs, such as mobile phones, or one or more vehicles (i.e. generally terminal devices 102) may end (application level) keep-alive signals to a 5G base station (gNB) or to a road side unit (RSU). The DF 100 is in FIG. 5 implemented exemplarily in/by the gNB or RSU, analyzes the expected traffic 101, and can discover an irregularity by monitoring in particular whether the keep-alive signals stop fully in a given cell for the predetermined time interval At. If such an irregularity is discovered, and optionally the fault likelihood or the reliability level is additionally very high, a fault detection trigger (i.e. an indication message 103) can be triggered e.g. to an Agile Recovery function implemented by the recovery entity 105. The reliability level may be analyzed by the DF 100, for instance, by determining whether the detection reliability is above a pre-selected threshold value (ThX), which threshold may depend on the network deployment, access node type, services/applications that are supported, and/or statistics collected. For example, the reliability level and the threshold may be determined on the basis of a percentage or number of irregularity devices in the whole of the terminal devices 102, or on the bases of a percentage or number of irregularity applications generating the expected traffic 101. The DF 100 may make use of network context, for example, via self- learning algorithms that can interface with data analytics such as MDAF as in the management layer and NWDAF as in the core network (CN). These data analytics functions are to some extent specified by 3GPP. A data analytics function can also be implemented in the RAN, e.g., at a central unit (CU), such that the DF 100 may interface with such a data analytics function in RAN. The DF 100 can be implemented inside a data analytics function, e.g., as one of the features of a data analytics function.

In FIG. 5, the indication message 103 is sent by the DF 100 to the recovery entity 103 (exemplarily implemented by a Failsafe Node configurator (FNC)) and a RAN Recovery Mechanism is triggered. The FNC can e.g. activate a protocol stack, which can overtake the operation of the failed part of the protocol stack. For instance, by way of providing a failsafe protocol configuration to the DF 100, and then performing an agile fault recovery, e.g. failsafe protocol execution. Thereafter, the DF 100 may send an event notification to an operation administration management (OAM) domain, which may implement the FM entity 104, so that the FM entity 104 is informed about the above actions. The event notification may comprise event statistics, which may include a type of the event and the time interval, the recovery mechanism, and the current status.

In an alternative embodiment, the indication message 103 may be provided by the DF 100 to the FM entity 104 together with event information, even when the fault likelihood is small, i.e. the determined reliability level is below the threshold. The FM entity 104 may then analyze the event information, and may determine whether a recovery mechanism should still be taken, and what type of recovery mechanism should be taken.

In the above two alternatives, the first alternative, in which the indication message 103 is provided to the recovery entity 105, given that the detection reliability level is above the pre selected threshold, has the advantage that the fault recovery mechanisms can be performed with less delay. The second alternative has the advantage that a recovery mechanism is only performed if the FM entity decides that this is necessary.

FIG. 6 illustrates further embodiments of the network entity 100, which build on the network entity 100 shown in FIG. 1. In FIG. 6, (application- level) expected V2X/URLLC traffic 101, such as CAM signals or V2X/URLLC related semi-persistent scheduling (SPS) signals, are utilized to detect failures/faults/faulty cells or nodes. CAM signals are typically transmitted periodically, and the transmission rate may be/change, e.g., between 100 ms to 1000 ms. Two exemplary cases (“Case 1” and“Case 2”) are considered in FIG. 6, in which cases CAM signals or V2X/URLLC related SPS signals are utilized to detect a fault.

In the first case (“Case 1”, as illustrated in FIG. 6 on the left-hand side), a central unit (CU) of a gNB, can set a SPS based on an UEAssistantcelnformation signal (UE assistance information) in a Radio Resource Control (RRC) message. The message sequence chart (shown in the upper right of FIG. 6) is an example how the network, e.g. gNB, gets the UE assistance information from a UE. Without SPS deactivation request, when the reception of the SPS (e.g. a tele- operated driving (ToD) or HD Local Map updates in SPS) stops, an indication message 103 can be generated by a DF agent 100 in a Distributed Unit (DU). A DF agent 100 may be a network entity 100 that functions as an agent to the DF 100. That is, the DF agent 100 may be configured to perform the same functions than the DF 100, and may report and/or may be registered with the DF 100.

In the second case (“Case 2”, as illustrates in FIG. 6 on the right-hand side) based on an application-level analysis, a DF 100 in in a gNB-type RSU may determine that the reception of CAM messages has stopped. In this case, an indication message 103, e.g. a fault detection trigger and/or irregularity report, can be generated by the DF 100.

In particular, as shown in FIG. 6, a gNB may be disaggregated, which means that it may be split into CU and one or more distributed units (DUs). A DF agent 100 may be implemented in a DU, and a DF agent 100 may be implemented in a RSUs. The DF agents 100 may generate irregularity reports, wherein a central DF 100 (here at the CU) or DF agent 100 (e.g. at the RSU) may trigger an alarm (indication message 103) on the failure to the FM entity 104. The indication message 103 can include information on the event, e.g., time stamp, reliability level, type of the event, where or at which node the event occurred. The indication message 103 may be directly or indirectly sent to the FM entity 104. It is also possible that a cluster head (CH) of UEs (here mobile terminals (MTs)) has a DF agent 100, which may also generate an indication message 103, and e.g. send it to a DF agent 100 in a DU or RSU. If a DF 100 or the FM entity 104 receives multiple indication messages 103, it may combine the information received to decide on a fault. Here, combining may imply a joint consideration. DF agents 100 may be configured by a central DF 100 or by the FM entity 104.

In a further embodiment, which can be considered as an extension of the above-described“Case 1”, and is illustrated in FIG. 7 and is referred to as“Case lb”. In this case, terminal devices 102 (here UEs labelled mobile terminals (MTs)) may provide a FallBackNotification (fall back notification) message to a neighboring access node, or DU, or a master cell group (MCG) or secondary cell group (SCG) in case of multi-connectivity, when the terminal devices 102 stop getting SPS grants on the Uu interface and they fall back to NR mode 2. The FallBackNotification may provide details on the event, for example, a time stamp and cell ID where the event happened. The FallBackNotification(s) may be used by a DF agent 100 or DF 100 at the DU, in order to generate an indication message 103. The content and/or the number of the FallBackNotification messages can indicate, e.g. a type of the fault and a reliability level of a fault existence. For example, the increased number of FallBackNotifications can imply a higher chance of fault. The DU may be implemented in the form of a road-side unit (RSU). This scheme may also apply to non-disaggregated access nodes, such as gNBs, where the FallBackNotification(s) can be sent to a neighboring gNB.

In particular, as shown in FIG. 7, two Mobile Terminals (MTs) 102, which may also be in a multi-connectivity mode, stop getting SPS grants from a DU1 on the Uu interface. Accordingly, they may fall back on a NR mode 2, and transmit the FallBackNotification messages to a DU2, which implements a DF agent 100. The DF agent 100 of DU2, e.g., if the number of FallBackNotifications is high, may determine that a reliability level is high, and may transmit an indication message 103 (irregularity report and/or fault detection trigger), which may include the multiple FallBackNotifications, to the gNB-CU which implements another DF 100, which can be of the form of a central DF. Based on the reliability level and the FallBackNotification information, the DF 100 at the gNB-CU may trigger another indication message 103 (irregularity report and/or fault detection trigger), indicating that the DU1 failed and e.g., information on the event, to the FM entity 104. In a further embodiment, illustrated in FIG. 8, if a V2X Application Server (implementing a DF 100) does not receive expected V2X traffic 101, for example, expected CAMs, ToD signals or HD Local Map Updates, for a time interval At, it may trigger an indication message 103. However, the application server may not know what the cause of the fault (or the network topology) is. So, the application server may send an APP DF Trigger / Irregularity Report Message to a Network Exposure Function/Service Capability Exposure Function/Exposure Governance management Function (NEF/SCEF/EGMF), and the exposure function(s) can forward this or generate a new indication message 103 comprising the fault information to OAM and/or per domain FM entity 104. The FM entity 104 can initiate a healing trigger and interact, such as via probes and report requests, with the involved network sub-nets to identify where the exact failure is (e.g. which gNB, RSU, etc.). For example, the FM entity 104 can send FM analytics for the particular domain, which has the failure (e.g. RAN) to management data analytics function (MDAF), and the MDAF can interact with core network (CN, aka 5G core 5GC) to start the recovery, e.g., by requesting different control plane/user plane (CP/UP) paths. In addition, multiple application servers may report about the irregularity messages, e.g., about the keep alive signal, then the network can know there is a problem.

In another embodiment shows in FIG. 9, network-level expected traffic 101 can be utilized to detect failures/faults/faulty cells or nodes in the network, e.g., 5GS. One such example is the periodic and triggered location reporting. For example, 3GPP TR23.731 sets the requirement on Periodic and Triggered Focation Reporting, where different triggers are outlined, such as Area event (UE entering, leaving or remaining with the area) and Motion event (UE moving by more than a threshold straight line distance from a previous location). On this basis, the DF 100 can coordinate with a location management unit, e.g., location management function/ location management component (FMF/FMC), to predict a fault via, for example, lack of Periodic Focation Reporting, trajectory estimate, or Predicted Focation Reporting, such as, considering the velocity of the MT along with Motion Event, in other words, predicted Focation Reporting in a pre-determined time at a predicted range of location, and the pre-determined time is based on the velocity of the MT along with Motion Event.

In this embodiment, the DF 100 can register for the notification of the location reporting and/or trajectory estimate, for configuration information indicating how to monitor, what to monitor and/or when to provide the indication message 103. The service registration can be made between DF 100 and location management function (FMF) and/or a local FMF (FFMF) in the RAN (also may be referred to as location management component, LMC). LMF in the CN and LMC in RAN may also coordinate. MTs 102 can register for location service (LCS) periodic- triggered event at LMF and/or LMC. When the location reporting is registered for an area event, LMF and/or LMC can predict an area event after a time interval (in FIG. 9 tl-tO which may indicate a At time difference) considering a predicted trajectory of the MTs 102, e.g., based on the velocity, speed, initial location, and/or location history. DF 100 and/or a location management entity (LMF or LMC) can determine that a location reporting trigger has not been initiated by the MTs considering the predicted trajectory of the MTs 102. Such a notice can imply a fault and can trigger an irregularity report and/or fault detection trigger (indication message 103). This can be sent to an FM entity 104. Further, the number of MTs 102 that are not sending such location request(s) can be an indication for the reliability level of the failure, in other words, a threshold can be set for determine that how many of such notice is a highly likely fault in the network needs to initiate recovery. This can be also a group MTs 102 performing group communications.

A further embodiment is illustrated in FIG. 10, and uses Radio Link Failures (RLFs) for checking a lack of (network-level) expected traffic 101. Multi-connectivity is a solution, e.g., to enable the Zero-Handover Interruption Time requirement set by IMT-2020. In this embodiment, MT(s) 102 can connect to multiple DUs that are served by the same or different CU(s). The collection of the RLFs of a DU can be utilized to detect a failure and, for example, to determine which specific DU has caused the RLF. A DU failure can result in multiple path switches to other DU(s) in case of multi-connectivity. A DF 100 implemented in the CU can analyze this information to generate an indication message 103 (irregularity report and/or fault detection trigger). Such information can be shared with the FM entity 104, as illustrated in FIG. 10. The indication message 103 can comprise type(s) of irregularity, detection reliability level, and detection granularity (such as DU or interface). If more than one CU serves the DUs, the DFs 100 or DF agents 100 may coordinate over an inter-access node interface, e.g., Xn interface. DFs 100 in different CUs may communicate over Xn interface.

In FIG. 10, for example, multiple MTs 102 fail to connect with one DU in a Secondary Cell Group (SCG), which results in multiple RLF declarations to the Master Cell Group (MCG), e.g. Multi-connectivity and inside SCGfailureinformation (SCG failure information) message. The DF 100, here implemented in the CU, generates an irregularity report, i.e. indication message 103, to the FM entity 104. In a further embodiment, which builds on the previous embodiments, the DF 100 can be configured by the FM entity 104, which may be located in an OAM. The configuration mechanism of this embodiment is shown in FIG. 11 in an MSC. The DF 100 analyzes the expected traffic 101 and may discover an irregularity. The traffic analysis can include interaction between DF and a location management unit. In case of an irregularity detection, an irregularity report (indication message 103) can be triggered to the FM entity 104. The FM entity 104 can configure the DF 100, for example, based on slice requirements, and such a configuration can include sensitivity comprising the reliability level. According to the sensitivity, the DF 100 and/or DF agent 100 may generate a fault detection trigger and/or send an irregularity report, periodic or event based, slice-based configuration (for example, a slice type of URLLC or V2X may require more strict failure detection), and whether the DF 100 and/or DF agent 100 shall first inform the FM entity 104, or can inform the failure recovery mechanisms. The FM entity 104 can trigger the Agile Fault Recovery, e.g., by triggering a self- organizing network (SON) function.

In a further embodiment, which builds on the previous embodiments, the DF 100 and/or DF agent 100 may interact with other NFs and/or network elements (such as UEs 102 and access nodes). This can be, for example, based on a service offered as in a case of service based architecture (SB A). An embodiment of service request and registration among a DF 100 and NFs is shown in FIG. 12. The DF 100 is shown as a standalone function for simplicity but as shown previously it may be collocated with other nodes (e.g., gNB/RSU, 5GC). Other NFs and/or network elements (such as UEs 102 and access nodes) may register for traffic analysis. This may be particularly the case for the UEs 102 with mission critical services, e.g., V2X and URLLC. To be able to analyze the traffic 101 , the DF 100 can send a traffic analysis subscription request, which can be followed by a response, such as ACK (Acknowledge signal), NACK (Not Acknowledge signal), and Configuration Update for adapting new state of the network in time. The traffic analysis subscription request may comprise Traffic Analysis Type, Time Granularity of Reporting, Area of reporting, Type of reporting, e.g., event-based, UE IDs to be analyzed. This can enable the DF to observe different types of the traffic in the 5GS. When an irregularity is detected, an irregularity report or fault detection trigger can be shared with the registered NFs and/or network elements. Further, early notification can be sent to at least one of the UEs 02, preferably registered UEs 102, for notification indicating the determined irregularity, e.g., with the mission critical services and over non-failed radio interfaces. The non-failed radio interface can be a radio interface of a neighboring access node. The early notification, which indicates the determined irregularity, may be transmitted to the registered UEs 102 when the indication message 103 is generated to the FM entity 104 and/or the recovery entity 105, so that the UE 102 can make early emergency action, such as reset, self-recovery, fall back to NR mode 2, fall back to unlicensed access, or stop.

In a further embodiment, which builds on the previous embodiments, as illustrated in FIG. 13, the DF registration may be performed by a DF registration function (marked as DF-R) device, for example, in distributed implementations. Fike above, to be able to analyze the expected traffic 101, the DF-R can send a traffic analysis subscription request, which can be followed by a response, such as ACK, NACK, and Configuration Update. The traffic analysis subscription request may comprise Traffic Analysis Type, Time Granularity of Reporting, Area of reporting, Type of reporting, e.g., event-based, UE IDs to be analyzed. Here, the DF-R is implemented for the registration to the UEs gNB/RSU, 5GC (here exemplarily FMF), and OAM (FM). The registration for 5GC is for detection of event-based predicted traffic by FMF. DF and/or FMF can determine that a location reporting trigger has not been initiated by the UEs 102 considering the predicted trajectory of the UEs 102. Such a notice can imply a fault detection trigger and be sent to the FM entity 104.

FIG. 14 shows block diagram of a method 1400 for supporting network fault detection. The method 1400 comprises: a step 1401 of monitoring expected traffic 101 from at least one terminal device 102; and a step 1402 of providing an indication message 103 to a fault management (FM) entity 104 and/or to a recovery entity 105, when determining an irregularity in the expected traffic 101 within a predetermined time interval. The method 1400 can be carried out by the network entity/DF 100 or DF agent 100 described above.

In summary, embodiments of the invention provide a network entity 100, method and computer program for supporting network fault detection, so that the network can benefit from the improvement of the low latency and high reliability.

Embodiments of the present invention have been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed invention, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word“comprising” does not exclude other elements or steps and the indefinite article“a” or“an” does not exclude a plurality. A single element or other unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

1. A network entity (100) for supporting network fault detection, wherein the network entity (100) is configured to:

monitor expected traffic (101) from at least one terminal device (102); and

provide an indication message (103) to a fault management entity (104) and/or to a recovery entity (105), when determining an irregularity in the expected traffic (101) within a predetermined time interval.

2. The network entity (100) according to claim 1, wherein the indication message (103) includes at least one of:

an irregularity report; and

a fault detection trigger.

3. The network entity (100) according to claim 1 or 2, wherein the network entity (100) is further configured to:

determine the irregularity, when the expected traffic (101) is not received within the predetermined time interval.

4. The network entity (100) according to one of the claims 1 to 3, wherein:

the indication message (103) includes a reliability level regarding the determination of the irregularity.

5. The network entity (100) according to claim 4, wherein the network entity (100) is further configured to:

provide the indication message (103) to the recovery entity (105), when the reliability level is above a pre-selected threshold.

6. The network entity (100) according to claim 5, wherein the network entity (100) is further configured to:

determine the reliability level and/or the pre-selected threshold based on at least one conditions of:

a number of the at least one monitored terminal device (102), and

a number of monitored applications generating the expected traffic (101) thereof.

7. The network entity (100) according to one of the claims 1 to 6, wherein: the expected traffic (101) includes at least one of an application level signaling and a network-level signaling.

8. The network entity (100) according to claim 7, wherein:

the application level signaling includes at least one of a:

keep-alive signal,

Cooperative Awareness Message, CAM,

ToD signal, and

HD Local Map signal.

9. The network entity (100) according to claims 7 or 8, wherein the network-level signaling includes at least one of event-based signaling and periodic signaling.

10. The network entity (100) according to claim 9, wherein the event-based signal is a location reporting triggered by an area event and/or a motion event of one or more terminal devices (102).

11. The network entity (100) according to any one of the preceding claims, wherein the network entity (100) is further configured to:

register with the recovery entity (105) and/or with another network entity, in order to cooperate in monitoring the expected traffic (101) and/or in order to obtain configuration information indicating how to monitor, what to monitor and/or when to provide the indication message (103).

12. The network entity (100) according to claim 11, wherein the network entity (100) is further configured to:

send the indication message (103) also to the another network entity.

13. The network entity (100) according to any one of the preceding claims, wherein the network entity (100) is further configured to be configured by the fault management entity (104).

14. The network entity (100) according to any one of the preceding claims, wherein the network entity (100) is further configured to:

transmit a notification to the at least one terminal device (102), said notification indicating the determined irregularity.

15. The network entity (100) according to any one of the preceding claims, wherein the network entity (100) is further configured to:

send the indication message (103) to another network entity, for providing a second indication message by the another network entity to the fault management entity (104) and/or to the recovery entity (105).

16. The network entity (100) according to any one of the preceding claims, wherein the network entity (100) is further configured to perform at least one of:

receive location information and/or traffic information provided by a location management unit, concerning the at least one terminal device (102), and

monitor the expected traffic (101) based on the location information and/or the traffic information.

17. The network entity (100) according to any one of the preceding claims, wherein:

the network entity (100) is a Radio Network Access, RAN, entity, in particular implemented in or as a Central Unit, Distributed Unit, or User Equipment;

the network entity (100) is a Core Network, CN, entity;

the network entity (100) is a Management or Orchestration entity; or

the network entity (100) is implemented in an application server or implemented as an application entity.

18. A method (1400) for supporting network fault detection, wherein the method (1400) comprises:

monitoring (1401) expected traffic (101) from at least one terminal device (102); and providing (1402) an indication message (103) to a fault management entity (104) and/or to a recovery entity (105), when determining an irregularity in the expected traffic (101) within a predetermined time interval.

19. A computer program comprising a program code for carrying out, when implemented on a processor, the method according to claim 18.