WO2021244334A1 - 一种信息处理方法及相关设备 - Google Patents

一种信息处理方法及相关设备 Download PDF

Info

Publication number
WO2021244334A1
WO2021244334A1 PCT/CN2021/095336 CN2021095336W WO2021244334A1 WO 2021244334 A1 WO2021244334 A1 WO 2021244334A1 CN 2021095336 W CN2021095336 W CN 2021095336W WO 2021244334 A1 WO2021244334 A1 WO 2021244334A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
terminal device
entity
model
decision
Prior art date
Application number
PCT/CN2021/095336
Other languages
English (en)
French (fr)
Inventor
徐晨
王坚
张公正
李榕
王俊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21816869.8A priority Critical patent/EP4152797A4/en
Publication of WO2021244334A1 publication Critical patent/WO2021244334A1/zh
Priority to US18/071,316 priority patent/US20230087821A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W48/00Access restriction; Network selection; Access point selection
    • H04W48/08Access restriction or access information delivery, e.g. discovery data delivery
    • H04W48/14Access restriction or access information delivery, e.g. discovery data delivery using user query or user detection

Definitions

  • This application relates to the field of communication technology, and in particular to an information processing method and related equipment.
  • AI technology has very successful applications in the fields of image processing and natural language processing.
  • AI technology is applied to the network layer (such as network optimization, mobility management, resource allocation, etc.), or AI technology is applied to the physical layer (such as channel coding and decoding, channel prediction, receiver, etc.) and so on.
  • AI entities can be deployed in the access network to improve the processing capabilities of the access network (such as improving resource allocation efficiency, etc.), but there is currently no definition of the relationship between the AI entity in the access network and user equipment (UE).
  • the basic interaction method cannot effectively apply AI technology to the wireless access network.
  • the embodiments of the present application provide an information processing method and related equipment.
  • the information processing method can apply AI technology to a wireless access network, which is beneficial to improving the processing capability of the wireless access network.
  • an embodiment of the present application provides an information processing method, which can be applied to a first AI entity in an access network.
  • the first AI entity may receive second AI model information sent by the terminal device, and the second AI model information does not include user data of the terminal device.
  • the first AI entity updates the first AI model information according to the second AI model information.
  • the first AI entity sends the updated first AI model information to the terminal device.
  • the above method flow defines a basic interaction mode between the first AI entity and the terminal device.
  • the first AI entity and the terminal device both have AI training capabilities, then the first AI entity can train and update the first AI model based on the terminal device sending the second AI model, and send the updated first AI model to the terminal equipment.
  • the second AI model information sent by the terminal device does not include the user data of the terminal device, which is beneficial to realize the privacy protection of the terminal device.
  • the aforementioned training interaction can update the first AI model of the first AI entity, which is beneficial to improve the processing capabilities of the first AI entity and the terminal device.
  • the first AI entity may also receive a request message sent by the terminal device, where the request message is used to request the first AI model information.
  • the first AI entity sends the first AI model information to the terminal device.
  • the above method flow defines another basic interaction mode between the first AI entity and the terminal device.
  • the first AI entity receives the request message of the terminal device, and sends the first AI model information to the terminal device.
  • the terminal device can perform inferences based on the data to be decided and the AI model to obtain AI decision information.
  • the first AI entity may also receive AI information of the terminal device, where the AI information includes AI capability parameters.
  • the AI capability parameter is used to indicate whether the terminal device has AI reasoning capability and/or AI training capability.
  • the first AI entity receives AI decision information and status information sent by the terminal device, and the AI decision information is The terminal device is obtained by inputting the state information into the second AI model for inference, and the state information is obtained by the terminal device according to observation information.
  • the terminal device can obtain AI decision information and send the AI decision information to the first AI entity so that the first AI entity can obtain the AI decision information of the terminal device, which is beneficial to the first AI entity.
  • the AI entity updates the AI model.
  • the first AI entity receives AI information of the terminal device, where the AI information includes AI update parameters;
  • the first AI entity receives feedback information, and the feedback information is used to indicate data used for AI training.
  • the first AI entity receives AI information of the terminal device, and the AI information includes AI update parameters. If the AI update parameter indicates a scheduled AI update or an event triggers an AI update, the first AI entity receives feedback information, which is used to indicate data used for AI training.
  • the AI update parameter in the AI information of the terminal device can instruct the terminal device to perform AI update.
  • the first AI entity may receive feedback information sent by the terminal device, and the feedback information may be used for training updates of the first AI entity, which is beneficial to improving the processing capability of the first AI entity.
  • the first AI entity updates the first AI model based on AI training data.
  • the AI training data includes one or more of AI decision information, status information or feedback information.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined based on the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the embodiment of the present application extends a deep reinforcement learning process, and the first AI entity can monitor the performance indicators of the system, which is beneficial to update the first AI model.
  • an embodiment of the present application provides an information processing method, which is applied to a terminal device.
  • the terminal device sends a request message to the first AI entity, and the request message is used to request the first AI model information.
  • the terminal device receives the first AI model information sent by the first AI entity.
  • the terminal device inputs the state information into the first AI model for reasoning, and obtains the AI decision information of the terminal device.
  • the status information is determined based on observation information, and the observation information indicates data used for AI decision-making.
  • the terminal device can obtain the first AI model information from the first AI entity, and determine the second AI model of the terminal device according to the first AI model information.
  • the terminal device can input the data used for AI decision-making into the second AI model for reasoning, thereby obtaining AI decision-making information.
  • the flow of the terminal device to realize the AI reasoning function is completed, which is beneficial to improve the processing capability of the terminal device.
  • the terminal device may also send the AI information of the terminal device to the first AI entity, where the AI information includes AI capability parameters.
  • the AI capability parameter indicates that the terminal device has AI reasoning capability.
  • the terminal device can notify the first AI entity through interaction with the first AI entity.
  • the terminal device may also send AI decision information and status information to the first AI entity.
  • the terminal device can send the AI decision information obtained by reasoning to the first AI entity through interaction with the first AI entity.
  • the AI information of the terminal device includes AI capability parameters and/or AI update parameters. If the AI update parameter indicates a scheduled AI update or an event triggers an AI update, the terminal device may send feedback information to the first AI entity, and the feedback information is used to indicate data used for AI training.
  • the terminal device can notify the first AI entity to also perform AI training and update data through interaction with the first AI entity.
  • the terminal device obtains the second AI model according to the AI training data.
  • the AI training data includes one or more of AI decision information, status information, or feedback information.
  • the terminal device can update the local second AI model through its own training.
  • the terminal device sends the second AI model information to the first AI entity.
  • the terminal device receives the updated first AI model information sent by the first AI entity, and the updated first AI model information is determined by the first AI entity according to the second AI model information.
  • the terminal device can send the local second AI model information to the first AI entity through the interaction with the first AI entity, so that the first AI entity can follow the second AI model information Update the first AI model information.
  • the second AI model information sent by the terminal device to the first AI entity has nothing to do with the data of the terminal device itself, which is beneficial to the privacy protection of the terminal device.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined based on the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the embodiment of the present application extends a deep reinforcement learning process. If the terminal device has AI training capabilities, it can monitor the performance indicators of the system, which is beneficial to update the local second AI model.
  • the embodiments of the present application provide an information processing method, which can be applied to the first AI entity in the access network.
  • the first AI entity may receive observation information sent by the terminal device, and the observation information indicates data used for AI decision-making.
  • the first AI entity determines the AI decision information of the terminal device according to the observation information and the first AI model, and sends the AI decision information to the terminal device.
  • the above method flow defines another basic interaction mode between the first AI entity and the terminal device.
  • the first AI entity has AI reasoning capabilities, and can determine the AI decision information of the terminal device according to the data sent by the terminal device for AI decision-making and its own first AI model.
  • the first AI entity in the access network realizes the application of AI technology to the wireless access network, which is beneficial to improving the processing capability of the wireless access network.
  • the first AI entity before receiving the observation information sent by the terminal device, may also receive AI information of the terminal device, where the AI information includes AI capability parameters.
  • the AI capability parameter is used to indicate whether the terminal device has AI reasoning capability and/or AI training capability.
  • the first AI entity receives the observation information sent by the terminal device.
  • the terminal device can implement the related AI function through the first AI entity.
  • the first AI entity may preprocess the observation information to obtain corresponding state information.
  • the first AI entity then inputs the state information into the first AI model for reasoning, and obtains the AI decision information of the terminal device.
  • the first AI entity first needs to convert the observation information into state information that can be processed by the AI model to obtain the AI decision information.
  • the embodiments of the present application provide an information processing method, which can be applied to terminal devices.
  • the terminal device sends observation information to the first AI entity, and the observation information indicates data used for AI decision-making.
  • the terminal device receives the AI decision information of the terminal device sent by the first AI entity, and executes the decision according to the AI decision information.
  • the terminal device can obtain the AI decision information of the terminal device through interaction with the first AI entity, and realize the corresponding AI function.
  • the terminal device may also send the AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter Indicates that the terminal device has no AI capability.
  • the AI decision information of the terminal device is obtained by the first AI entity inputting state information into the first AI model for inference; the state information is obtained by the first AI entity based on observation information.
  • the AI decision information of the terminal device can be obtained through interaction with the first AI entity.
  • an embodiment of the present application provides a first AI entity, and the first AI entity includes an intelligent decision-making module.
  • the intelligent decision-making module is used to receive the second AI model information sent by the terminal device, and the second AI model information does not include user data of the terminal device.
  • the intelligent decision module is also used to update the first AI model information according to the second AI model information.
  • the first AI model information is the AI model information of the first AI entity.
  • the intelligent decision module is also used to send the updated first AI model information to the terminal device.
  • the intelligent decision-making module is also used to receive a request message sent by the terminal device, and the request message is used to request the first AI model information. After receiving the request message, the intelligent decision module can send the first AI model information to the terminal device.
  • the first AI entity further includes a preprocessing module.
  • the preprocessing module is used to receive AI information of the terminal device, and the AI information includes AI capability parameters.
  • the intelligent decision module is also used to receive AI decision information and status information sent by the terminal device.
  • the AI decision information is obtained by the terminal device inputting state information into the second AI model for inference, and the state information is obtained by the terminal device based on observation information, and the observation information indicates data used for AI decision making.
  • the preprocessing module is also used to receive AI information of the terminal device, and the AI information includes AI update parameters.
  • the first AI entity may also include a data collection and training module. If the AI update parameter indicates a scheduled AI update or an event triggers an AI update, the data collection and training module is used to receive feedback information, which is used to indicate the data used for AI training.
  • the intelligent decision-making module is also used to update the first AI model based on the AI training data.
  • the AI training data includes one or more of AI decision information, status information, or feedback information.
  • the feedback information includes reward information
  • the reward information is used to update the first AI model.
  • the reward information is determined based on the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • an embodiment of the present application provides a terminal device, which includes a transceiver module and a processing module.
  • the transceiver module is used to send a request message to the first AI entity, and the request message is used to request the first AI model information.
  • the transceiver module is also configured to receive the first AI model information sent by the first AI entity.
  • the processing module is used to input the state information into the second AI model for reasoning, and obtain the AI decision information of the terminal device; wherein the state information is determined based on the observation information; the observation information indicates the data used for AI decision-making; the second AI model is based on the terminal device The first AI model information is determined.
  • the transceiver module is also used to send AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter indicates that the terminal device has AI inference capability.
  • the transceiver module is also used to send AI decision information and status information to the first AI entity.
  • the AI information of the terminal device includes AI capability parameters and/or AI update parameters.
  • the transceiver module is also used to send feedback information to the first AI entity, and the feedback information is used to indicate data used for AI training.
  • the processing module is also used to obtain the second AI model based on the AI training data.
  • the AI training data includes one or more of AI decision information, status information, or feedback information.
  • the transceiver module is also used to send the second AI model information to the first AI entity.
  • the transceiver module may also receive updated first AI model information sent by the first AI entity, where the updated first AI model information is determined by the first AI entity according to the second AI model information.
  • the feedback information includes reward information.
  • the reward information is used to update the first AI model.
  • the reward information is determined based on the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • an embodiment of the present application provides a first AI entity, and the first AI entity includes a preprocessing module and an intelligent decision-making module.
  • the preprocessing module is used to receive observation information sent by the terminal device, and the observation information indicates data used for AI decision-making.
  • the intelligent decision module is used to determine the AI decision information of the terminal device based on the observation information and the first AI model.
  • the intelligent decision module is also used to send AI decision information to the terminal device.
  • the preprocessing module is also used to receive AI information of the terminal device, and the AI information includes AI capability parameters.
  • the preprocessing module is used to receive observation information sent by the terminal device.
  • the preprocessing module is also used to preprocess the observation information to obtain the corresponding state information.
  • the intelligent decision-making module is also used to input the state information into the first AI model for reasoning, and obtain the AI decision information of the terminal device.
  • an embodiment of the present application provides a terminal device, which includes a transceiver module and a processing module.
  • the transceiver module is used to send observation information to the first AI entity, and the observation information indicates data used for AI decision-making.
  • the transceiver module is also used to receive AI decision information of the terminal device sent by the first AI entity.
  • the processing module is used to execute decisions based on AI decision information.
  • the transceiver module is further configured to send AI information of the terminal device to the first AI entity, where the AI information includes AI capability parameters.
  • the AI capability parameter indicates that the terminal device has no AI capability.
  • the AI decision information of the terminal device is obtained by the first AI entity inputting state information into the first AI model for inference; the state information is obtained by the first AI entity based on observation information.
  • an embodiment of the present application provides a first AI entity, which has a function of implementing the information processing method provided in the first aspect.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • an embodiment of the present application provides a terminal device, which has a function of implementing the information processing method provided in the second aspect.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • an embodiment of the present application provides a first AI entity that has the function of implementing the information processing method provided in the third aspect.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • an embodiment of the present application provides a terminal device, which has a function of implementing the information processing method provided in the fourth aspect.
  • This function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above-mentioned functions.
  • an embodiment of the present application provides a communication system.
  • the communication system includes the first AI entity provided in the fifth, seventh, ninth or eleventh aspect, as well as the sixth and eighth aspects. Aspect, the tenth aspect or the terminal device provided by the twelfth aspect.
  • an embodiment of the present application provides a computer-readable storage medium, the readable storage medium includes a program or instruction, when the program or instruction runs on a computer, the computer executes the first aspect or the first aspect The method in any one of the possible implementations.
  • an embodiment of the present application provides a computer-readable storage medium that includes a program or instruction, and when the program or instruction runs on a computer, the computer executes the second aspect or the second aspect The method in any one of the possible implementations.
  • an embodiment of the present application provides a computer-readable storage medium, which includes a program or instruction, which when the program or instruction runs on a computer, causes the computer to execute the third aspect or the third aspect The method in any one of the possible implementations.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the readable storage medium includes a program or instruction.
  • the program or instruction runs on a computer, the computer executes the fourth aspect or the fourth aspect.
  • the method in any one of the possible implementations.
  • the embodiments of the present application provide a chip or chip system.
  • the chip or chip system includes at least one processor and an interface.
  • the interface and the at least one processor are interconnected through a wire, and the at least one processor is used to run computer programs or instructions. , To perform the method described in the first aspect or any one of the possible implementation manners of the first aspect.
  • an embodiment of the present application provides a chip or chip system.
  • the chip or chip system includes at least one processor and an interface.
  • the interface and the at least one processor are interconnected through a wire, and the at least one processor is used to run computer programs or instructions.
  • the embodiments of the present application provide a chip or chip system.
  • the chip or chip system includes at least one processor and an interface.
  • the interface and the at least one processor are interconnected through a wire, and the at least one processor is used to run computer programs or instructions. , To perform the method described in the third aspect or any one of the possible implementation manners of the third aspect.
  • an embodiment of the present application provides a chip or chip system.
  • the chip or chip system includes at least one processor and an interface. Instructions to perform the method described in the fourth aspect or any one of the possible implementation manners of the fourth aspect.
  • the interface in the chip can be an input/output interface, a pin, or a circuit.
  • the chip system in the foregoing aspect may be a system on chip (SOC), or a baseband chip, etc., where the baseband chip may include a processor, a channel encoder, a digital signal processor, a modem, and an interface module.
  • SOC system on chip
  • baseband chip may include a processor, a channel encoder, a digital signal processor, a modem, and an interface module.
  • the chip or chip system described above in this application further includes at least one memory, and the at least one memory stores instructions.
  • the memory may be a storage module inside the chip, for example, a register, a cache, etc., or a storage module of the chip (for example, a read-only memory, a random access memory, etc.).
  • an embodiment of the present application provides a computer program or computer program product, including code or instructions.
  • the code or instructions run on a computer, the computer can execute the first aspect or any one of the first aspects.
  • the method in the implementation mode is not limited to.
  • an embodiment of the present application provides a computer program or computer program product, including code or instructions.
  • the code or instructions run on a computer, the computer can execute the second aspect or any of the second aspects. The method in the implementation mode.
  • an embodiment of the present application provides a computer program or computer program product, including code or instructions.
  • the code or instructions run on a computer, the computer can execute the third aspect or any of the third aspects. The method in the implementation mode.
  • an embodiment of the present application provides a computer program or computer program product, including code or instructions, when the code or instructions run on a computer, the computer can execute any of the fourth aspect or the fourth aspect.
  • the method in the implementation mode is not limited to.
  • Figure 1 is a schematic diagram of the interaction between an agent and the environment
  • Figure 2 is a schematic diagram of a Markov decision process
  • FIG. 3a is a schematic diagram of a network architecture provided by an embodiment of this application.
  • Figure 3b is a schematic diagram of a 5G RAN architecture provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of a RAN architecture provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of an information processing method provided by an embodiment of this application.
  • FIG. 6 is a flowchart of information processing when a terminal device has no AI capability according to an embodiment of the application
  • FIG. 7 is a schematic flowchart of another information processing method provided by an embodiment of the application.
  • FIG. 8 is a flowchart of information processing when a terminal device has AI reasoning capability according to an embodiment of the application.
  • FIG. 9 is a schematic flowchart of another information processing method provided by an embodiment of this application.
  • FIG. 10 is a schematic diagram of a process of federated learning provided by an embodiment of this application.
  • FIG. 11 is a schematic diagram of an AI training process provided by an embodiment of the application.
  • FIG. 12 is a schematic diagram of a DRL online learning process provided by an embodiment of this application.
  • FIG. 13 is a schematic flowchart of a decision-making early stop technology provided by an embodiment of this application.
  • FIG. 14 is a schematic diagram of the application of a DRL algorithm deployed in a cell according to an embodiment of the application
  • 15 is a schematic diagram of a virtual cell assisted training provided by an embodiment of this application.
  • FIG. 16 is a schematic diagram of a training terminal deployed in a real cell according to an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of a first AI entity provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of another first AI entity provided by an embodiment of this application.
  • FIG. 19 is a schematic structural diagram of another first AI entity provided by an embodiment of this application.
  • FIG. 20 is a schematic structural diagram of another first AI entity provided by an embodiment of this application.
  • FIG. 21 is a schematic structural diagram of a terminal device provided by an embodiment of this application.
  • FIG. 22 is a schematic structural diagram of another terminal device provided by an embodiment of this application.
  • FIG. 23 is a schematic structural diagram of another terminal device provided by an embodiment of this application.
  • FIG. 24 is a schematic structural diagram of another terminal device provided by an embodiment of this application.
  • AI Artificial intelligence
  • the academic community is applying AI technology to the network layer (such as network optimization, mobility management, resource allocation, etc.) and the physical layer (such as There are a lot of researches on channel coding and decoding, channel prediction, receiver, etc.).
  • AI technology has very successful applications in the fields of image processing and natural language processing.
  • AI technology is applied to the network layer (such as network optimization, mobility management, resource allocation, etc.), or AI technology is applied to the physical layer (such as channel coding and decoding, channel prediction, receiver, etc.) and so on.
  • the more commonly used AI technologies include supervised learning and reinforcement learning.
  • supervised learning refers to the process of adjusting the parameters of the classifier using a set of samples of known categories to achieve the required performance, which is also called supervised training.
  • the goal of supervised learning is to learn the mapping relationship between input and output in the training set given a training set.
  • the training set is a collection of correct input and output mapping relationships.
  • the supervised learning method is currently a widely studied machine learning method.
  • the supervised learning method includes neural network propagation algorithm, decision tree learning algorithm and so on.
  • reinforcement learning is the interactive learning between the agent and the environment.
  • Figure 1 is a schematic diagram of the interaction between an agent and the environment.
  • the agent can act on the environment according to the state of the environment feedback, thereby obtaining a reward (reward) and the state at the next moment, so that the agent can accumulate the largest reward in a period of time.
  • Reinforcement learning is different from supervised learning in that it does not require a training set.
  • the reinforcement signal provided by the environment evaluates the quality of the generated action (usually using scalar signals), rather than telling the reinforcement learning system how to generate the correct action . Since the information provided by the external environment is scarce, the agent needs to learn from its own experience. In this way, the agent acquires knowledge in an action-evaluation environment and improves the action plan to adapt to the environment.
  • DRL deep reinforcement learning
  • the mathematical model may include, but is not limited to, Markov decision process, neural network and other models.
  • Figure 2 is a schematic diagram of a Markov decision process (MDP).
  • MDP Markov decision process
  • the Markov decision process is a mathematical model for analyzing decision problems. It assumes that the environment has Markov properties (the conditional probability distribution of the future state of the environment depends only on the current state), and the decision maker periodically observes the environment State (s 0 , s 1, etc. in Figure 2), make decisions based on the current state of the environment (a 0 , a 1, etc. in Figure 2), and get new states and rewards after interacting with the environment (Figure 2) R 0 , r 1 etc. in 2), as shown in Figure 2.
  • Figure 2 is a schematic diagram of a Markov decision process (MDP).
  • the Markov decision process is a mathematical model for analyzing decision problems. It assumes that the environment has Markov properties (the conditional probability distribution of the future state of the environment depends only on the current state), and the decision
  • FIG. 3a is a schematic diagram of a network system provided by an embodiment of this application.
  • the network system includes a core network (5GC), an access network (NG-RAN) and terminal equipment.
  • 5GC and NG-RAN exchange information through NG interface; access network devices (for example, gNB) in NG-RAN can exchange information through Xn interface.
  • the terminal equipment can be connected with the access network equipment through a wireless link to realize the information interaction between the terminal equipment and the access network equipment.
  • the network system may include, but is not limited to: global system for mobile communications (GSM), wideband code division multiple access (WCDMA), long term evolution (LTE) , Enhanced mobile broadband (eMBB) scenarios, ultra-reliable low latency communications (ultra-reliable low latency communications, uRLLC) scenarios, and massive machines in the new generation of radio access technology (NR) Class communication (massive machine type communications, mMTC) scenarios, narrowband-internet of things (NB-IoT), etc.
  • GSM global system for mobile communications
  • WCDMA wideband code division multiple access
  • LTE long term evolution
  • eMBB Enhanced mobile broadband
  • uRLLC ultra-reliable low latency communications
  • NR radio access technology
  • mMTC massive machine type communications
  • NB-IoT narrowband-internet of things
  • the access network device can be any device with a wireless transceiver function, which provides wireless communication services for terminal devices within the coverage area.
  • the access network equipment may include, but is not limited to: an evolved base station (NodeB or eNB or e-NodeB, evolutional NodeB) in a long term evolution (LTE) system, a new generation of radio access technology (new radio access technology, NR) in the base station (gNodeB or gNB) or the transmission receiving point/transmission reception point (TRP), the subsequent evolution of the 3GPP base station, the access node in the WiFi system, the wireless relay node, the wireless backhaul node, and the vehicle Equipment that undertakes base station functions in networking, D2D communication, and machine communication, satellites, etc.
  • NodeB or eNB or e-NodeB, evolutional NodeB in a long term evolution (LTE) system
  • NR new generation of radio access technology
  • gNodeB or gNB base station
  • TRP transmission receiving point/
  • the terminal device may be a device with a wireless transceiver function, or the terminal device may also be a chip.
  • the terminal device may be user equipment (UE), mobile phone (mobile phone), tablet computer (Pad), computer with wireless transceiver function, virtual reality (VR) terminal device, augmented reality (augmented reality) , AR) terminal equipment, in-vehicle terminal equipment, wireless terminals in remote medical, wireless terminals in smart grid, wearable terminal equipment, Internet of Vehicles, D2D communication, terminals in machine communication, etc.
  • FIG. 3b is a schematic diagram of a 5G RAN architecture provided by an embodiment of this application.
  • the access network equipment (for example, gNB) in the NG RAN may include a centralized module (central unit, CU) and a distributed module (distribute unit).
  • CU and DU can exchange information through the F1 interface, as shown in Figure 3b.
  • AI entities can be deployed in the access network to improve the processing capabilities of the access network (such as improving resource allocation efficiency, etc.), but there is currently no definition of the relationship between the AI entity in the access network and user equipment (UE).
  • UE user equipment
  • the embodiment of the present application provides an information processing method, which can apply AI technology to a wireless access network, which is beneficial to improving the processing capability of the wireless access network.
  • the information processing method can be applied to a RAN architecture provided in an embodiment of the present application.
  • FIG. 4 is a RAN architecture provided by an embodiment of the application.
  • the first AI entity (AI module) is added to the RAN architecture, which also defines that the first AI entity and the gNB can exchange information through the A1 interface, as shown in FIG. 4.
  • the first AI entity described in this embodiment may be located in an edge/cloud access network, which is beneficial to realize corresponding AI functions through edge computing/cloud computing.
  • the first AI entity may be further divided into a first AI entity-centralized module (AM-CU) and a first AI entity-distributed module (AM-DU).
  • the gNB can also be physically split into gNB-CU and gNB-DU.
  • AM-CU and gNB-CU define information exchange through an A1-C interface
  • AM-DU and gNB-DU define information exchange through an A1-D interface, as shown in Figure 4.
  • the communication content of the AI interface may include, but is not limited to, upload/download of AI model, upload/download of data, information interaction between gNB and the first AI entity (for example, the performance tracking module in the first AI entity can monitor gNB performance data) and so on.
  • the A1 interface is divided into A1-C and A1-D interfaces according to functions, which can correspond to the functional division of gNB-CU and gNB-DU, and the communication content of each interface is different.
  • A1-D interface transmission involves messages at the physical layer (physical, PHY), media access control (MAC) layer, and radio link control (RLC) layer
  • A1-C interface transmission involves Messages to higher layers (such as the packet data convergence protocol (PDCP) layer).
  • PDCP packet data convergence protocol
  • FIG. 5 is a schematic flowchart of an information processing method provided by an embodiment of this application.
  • the flow of the information processing method in FIG. 5 is implemented by the interaction between the first AI entity and the terminal device, and may include the following steps:
  • the terminal device sends observation information to a first AI entity; correspondingly, the first AI entity receives observation information sent by the terminal device;
  • the first AI entity determines the AI decision information of the terminal device according to the observation information and the first AI model
  • the first AI entity sends AI decision information to the terminal device; correspondingly, the terminal device receives the AI decision information sent by the first AI entity.
  • This embodiment defines a basic interaction mode between the first AI entity and the terminal device when the terminal device has no AI capability.
  • whether the terminal device has AI capability can be indicated by the AI information of the terminal device.
  • the AI information of the terminal device may include but is not limited to the following parameters: AI capability parameter (AICapabilityClass), AI update parameter (AIUpdateType), AI interaction parameter (AIInteractionType), and so on.
  • the AI capability parameter is used to indicate whether the terminal device has AI capability.
  • the AI capability parameter may indicate whether the terminal device has AI capability through a specific parameter value.
  • AICapabilityClass when the parameter value of AICapabilityClass is class 0, it means that the terminal device has no AI capability. In other words, the terminal device does not have AI reasoning and/or AI training capabilities, that is, the terminal device cannot implement AI functions.
  • the parameter value of AICapabilityClass is class 1
  • the terminal device has AI inference capability.
  • the terminal device can implement part of AI functions, such as obtaining AI decisions.
  • the terminal device when the parameter value of AICapabilityClass is class 2, it means that the terminal device has AI training capabilities. In other words, the terminal device can implement part of the AI function, such as training the AI model to obtain a better AI model.
  • AICapabilityClass when the parameter value of AICapabilityClass is class 3, it means that the terminal device has AI reasoning capabilities and AI training capabilities.
  • the terminal device can implement AI functions, such as training an AI model to obtain a better AI model, thereby obtaining better AI decisions.
  • AICapabilityClass is only an example, and the parameter value of AICapabilityClass may also be in other forms, for example, expressed in binary numbers, which is not limited in this embodiment.
  • the AI update parameter is used to indicate whether the terminal device performs AI update.
  • AI update refers to updating data.
  • the terminal device may send feedback information to the first AI entity, so that the first AI entity can update data.
  • the AI update parameter may also indicate whether to perform the AI update through a specific parameter value.
  • AIUpdateType when the parameter value of AIUpdateType is type 0, it means that no AI update is performed.
  • AIUpdateType when the parameter value of AIUpdateType is type 1, it means that the AI update is triggered by an event. That is to say, when there is an external event trigger, for example, the AI model does not adapt due to environmental changes, the AI update can be triggered through a long-term KPI deterioration event.
  • AIUpdateType when the parameter value of AIUpdateType is type 2, it means that AI update is triggered regularly.
  • the system can set a time parameter, which can indicate that every preset time period, the system will trigger an AI update.
  • parameter value of AIUpdateType is only an example, and the parameter value of AIUpdateType may also be in other forms, for example, expressed by a binary number, which is not limited in this embodiment.
  • the AI interaction parameter is used to indicate the interaction content between the terminal device and the first AI entity.
  • the interaction content between the terminal device and the first AI entity in this embodiment may include, but is not limited to, data, models, and so on.
  • the data interacted between the terminal device and the first AI entity refers to data used for AI reasoning and/or AI training, and may include, but is not limited to, status information, observation information, and the like.
  • the state information may be the status of environmental feedback in the reinforcement learning algorithm as shown in FIG. 1.
  • the interaction model between the terminal device and the first AI entity refers to a model used for AI inference and/or AI training.
  • the AI algorithm adopted by the first AI entity corresponds to different AI models, which is not limited in this embodiment.
  • the AI interaction parameter may indicate the interaction content between the terminal device and the first AI entity through a specific parameter value.
  • AIInteractionType when the parameter value of AIInteractionType is type 0, it means that the interaction content between the terminal device and the first AI entity includes upload data and/or download data.
  • AIInteractionType when the parameter value of AIInteractionType is type 1, it means that the interaction content between the terminal device and the first AI entity includes upload data and/or download model.
  • AIInteractionType when the parameter value of AIInteractionType is type 2, it means that the interaction content between the terminal device and the first AI entity includes an upload model and/or a download model.
  • AIInteractionType is only an example, and the parameter value of AIInteractionType may also be in other forms, such as expressed in binary numbers, which is not limited in this embodiment.
  • the terminal device may send the AI information of the terminal device to the first AI entity.
  • the AI information of the terminal device may include one or more of the AI capability parameters, AI update parameters, or AI interaction parameters described in the above embodiments.
  • the terminal device may send a service request message (for example, a resource allocation request message) to the first AI entity, and the service request message may carry the AI information of the terminal device to enable The first AI entity knows whether the terminal device has AI capability.
  • a service request message for example, a resource allocation request message
  • the first AI entity is a newly added entity in the access network, and the first AI entity has AI functions such as AI reasoning and AI training.
  • the first AI entity can be divided into multiple functional modules according to functions, including an intelligent policy function (IPF), a data collection and training module (data and training function, DTF), and a pre-processing module (pre-processing module).
  • IPF intelligent policy function
  • DTF data and training function
  • pre-processing module pre-processing module
  • the terminal device sends the observation information to the first AI entity, which may be the terminal device sending the observation information to the preprocessing module of the first AI entity.
  • observation information indicates the data used for AI decision-making.
  • observation information is the data provided for AI decision-making.
  • the observation information sent by the terminal device to the preprocessing module may include data such as the throughput of the terminal device.
  • the first AI entity determines the AI decision information of the terminal device according to the observation information and the first AI model, which may be executed by the intelligent decision module of the first AI entity.
  • the first AI model is a model for AI reasoning and/or AI training in the first AI entity, that is, the first AI model is an AI model in the edge/cloud.
  • the first AI model may include multiple types.
  • the first AI model may be a fully connected neural network model.
  • S502 in this embodiment may also be executed separately by the preprocessing module and the intelligent decision-making module of the first AI entity, and includes the following two steps:
  • the preprocessing module preprocesses the observation information to obtain the corresponding status information
  • the intelligent decision module inputs the state information into the first AI model for reasoning, and obtains the AI decision information of the terminal device.
  • the preprocessing module may first preprocess the observation information (for example, normalize the data) to obtain status information.
  • the state information is the data that can be used directly when using the AI model for reasoning.
  • the state information can refer to the system state in the Markov decision process as shown in Figure 2 (such as s 0 , s 1, etc.), or Refers to the preprocessed data (the state in the hidden Markov model cannot be directly obtained).
  • the intelligent decision-making module can input state information into the first AI model for reasoning.
  • the intelligent decision module may be an agent in the reinforcement learning algorithm as shown in FIG. 1, which can act on the environment, that is, obtain the AI decision information of the terminal device.
  • the AI decision information of the terminal device is the result of AI inference by the first AI entity based on the data used to make the AI decision.
  • the AI decision information is the action output by the agent.
  • the resource allocation result obtained by the first AI entity performing AI inference is the AI decision information of the terminal device.
  • the first AI entity applies AI technology to the resource allocation in the access network, which can allocate resources to the corresponding terminal devices in a more targeted manner, thereby helping to optimize the overall network. performance.
  • the first AI entity sends the AI decision information to the terminal device, which may be the intelligent decision module of the first AI entity sending the AI decision information to the terminal device.
  • FIG. 6 is a flowchart of information processing when a terminal device has no AI capability according to an embodiment of the application. Among them, since the terminal device has no AI capability, the terminal device may choose to request an AI decision from the first AI entity of the edge/cloud.
  • the terminal device uses the information processing method shown in FIG. 6 to obtain AI decision-making with a relatively large delay, and this method is suitable for services that are not sensitive to delay.
  • the terminal device sends observation information to the preprocessing module, and the observation information indicates data used for AI decision-making; correspondingly, the preprocessing module receives observation information sent by the terminal device;
  • the preprocessing module preprocesses the observation information to obtain corresponding status information
  • the preprocessing module sends status information to the intelligent decision-making module; correspondingly, the intelligent decision module receives the status information sent by the preprocessing module;
  • the intelligent decision-making module inputs the state information into the first AI model for reasoning, and obtains the AI decision information of the terminal device;
  • the intelligent decision-making module sends AI decision-making information to the terminal device; correspondingly, the terminal device receives the AI decision-making information sent by the intelligent decision-making module;
  • the terminal device executes a decision according to the AI decision information.
  • the above S601 to S606 are the overall processing flow when the AIUpdateType parameter value of the terminal device is type 0.
  • the information processing flow shown in FIG. 6 also includes the process of AI training data collection, including the following steps:
  • the intelligent decision-making module sends the status information and AI decision-making information to the data collection and training module;
  • the data collection and training module receives feedback information.
  • the feedback information is used to indicate the data used for AI training.
  • the data collection and the feedback information received by the training module are also different.
  • S608 may include two parallel steps S608a and S608b.
  • S608a is the terminal device sending feedback information to the data collection and training module
  • S608b is the performance tracking module sending feedback information to the data collection and training module.
  • the data collection and training module receives the tag information sent by the terminal device.
  • S607 and S606 are executed in no order, that is, S606 and S607 can be executed at the same time.
  • the embodiment of the present application provides an information processing method, which defines a basic interaction manner between a first AI entity and a terminal device.
  • the terminal device can implement the AI function through the first AI entity in the access network, and obtain the AI decision information of the terminal device.
  • the first AI entity in the access network realizes the application of AI technology to the wireless access network, which is beneficial to improving the processing capability of the wireless access network.
  • FIG. 7 is a schematic flowchart of another information processing method provided by an embodiment of the application.
  • the flow of the information processing method in FIG. 7 is implemented by the interaction between the first AI entity and the terminal device, and may include the following steps:
  • the terminal device sends a request message to the first AI entity; correspondingly, the first AI entity receives the request message sent by the terminal device;
  • the first AI entity sends first AI model information to the terminal device; correspondingly, the terminal device receives the first AI model information sent by the first AI entity;
  • the terminal device inputs the state information into the first AI model for inference, and obtains AI decision information of the terminal device.
  • This embodiment defines a basic interaction mode between the first AI entity and the terminal device when the terminal device has AI reasoning capabilities.
  • the terminal device can implement AI reasoning and obtain AI decisions.
  • the terminal device since the terminal device only has the AI reasoning ability but not the AI training ability, the terminal device needs to send a request message to the first AI entity to obtain the AI model.
  • the request message is used to request the first AI model information from the first AI entity.
  • the first AI model information may be the first AI model or related parameters of the first AI model.
  • the first AI model information can be an overall neural network, or related parameters of the neural network (such as the number of layers of the neural network, the number of neurons, etc.).
  • the execution of the foregoing S701 and S702 may be determined according to the AI update parameter of the terminal device.
  • the parameter value of AIUpdateType is type 0, S701 and S702 are executed only once during initialization.
  • S701 and S702 are triggered to execute according to the event. For example, the performance tracking module monitors that the system performance deteriorates and triggers an update.
  • the terminal device in this embodiment has AI inference capability, then the terminal device may include multiple AI function modules.
  • the terminal device includes a pre-processing module and an intelligent decision-making module, which are used to implement the AI reasoning process.
  • the terminal device can also implement the AI reasoning function through the local second AI entity.
  • the local second AI entity is an AI entity that is physically close to the terminal device, and may be an external device of the terminal device, which is not limited in this embodiment.
  • the terminal device itself includes multiple AI function modules.
  • FIG. 8 is a flowchart of information processing when a terminal device has an AI reasoning capability according to an embodiment of the application.
  • the terminal device has AI reasoning capability, that is, the terminal device includes at least an intelligent decision-making module.
  • the terminal device may request the first AI model information from the first AI entity of the edge/cloud, and complete AI reasoning locally to obtain AI decision information.
  • the terminal device uses the information processing method shown in FIG. 8 to obtain AI decision information with a relatively small delay, and this method is suitable for delay-sensitive services.
  • the AI function modules included in the terminal device in the embodiment shown in FIG. 8 are respectively called second modules.
  • the intelligent decision-making module of the terminal device is called the second intelligent decision-making module.
  • the AI function modules included in the first AI entity in the embodiment shown in FIG. 8 are respectively referred to as the first module.
  • the intelligent decision-making module of the first AI entity is referred to as the first intelligent decision-making module.
  • the information processing flow may include the following steps:
  • the second intelligent decision-making module sends a request message to the first intelligent decision-making module; correspondingly, the first intelligent decision-making module receives the request message sent by the second intelligent decision-making module;
  • the first intelligent decision-making module sends the first AI model information to the second intelligent decision-making module; correspondingly, the second intelligent decision-making module receives the first AI model information sent by the first intelligent decision-making module;
  • the second preprocessing module obtains observation information
  • the second preprocessing module preprocesses the observation information to obtain corresponding status information
  • the second preprocessing module sends status information to the second intelligent decision-making module; correspondingly, the second intelligent decision-making module receives the status information sent by the second preprocessing module;
  • the second intelligent decision-making module inputs the state information into the first AI model for reasoning, and obtains the AI decision information of the terminal device;
  • S807 The terminal device executes the decision according to the AI decision information.
  • the above S801 to S807 are the overall processing flow when the AIUpdateType parameter value of the terminal device is type 0.
  • the information processing flow shown in FIG. 8 also includes the process of AI training data collection, including the following steps:
  • the second intelligent decision-making module sends the status information and the AI decision-making information to the first data collection and training module;
  • the first data collection and training module receives feedback information.
  • the feedback information is used to indicate the data used for AI training.
  • the data collection and the feedback information received by the training module are also different.
  • S809 may include two parallel steps S809a and S809b.
  • S809a is the terminal device sending feedback information to the first data collection and training module
  • S809b is the first performance tracking module sending feedback information to the first data collection and training module.
  • the first data collection and training module receives the label information sent by the terminal device. Then S809 is that the first data collection and training module receives the tag information sent by the terminal device.
  • S807 and S808 are executed in no order, that is, S807 and S808 can be executed at the same time.
  • the foregoing processing flow may further include the following steps:
  • the second intelligent decision-making module sends the AI decision information of the terminal device to the terminal device;
  • S808a The terminal device executes the decision according to the AI decision information.
  • S8071 indicates that this step is executed after S806, replacing the original S807.
  • S808a indicates that this step and S808 are executed in no order, that is, S808a and S808 can be executed at the same time.
  • the AI function implemented by the terminal device in this example is to use AI for channel decoding.
  • the foregoing S801 to S809 may specifically include the following steps:
  • the second intelligent decision-making module sends a request message to the first intelligent decision-making module, where the request message is used to request a channel decoding model;
  • the first intelligent decision-making module sends channel decoding model information to the second intelligent decision-making module
  • the second intelligent decision-making module determines the channel decoding model of the terminal device according to the channel decoding model information
  • the second preprocessing module receives a signal, and the signal is data to be decoded
  • the second preprocessing module preprocesses the signal to obtain the log-likelihood ratio of the signal
  • the second preprocessing module sends the log-likelihood ratio of the signal to the second intelligent decision-making module
  • the second intelligent decision-making module inputs the log-likelihood ratio of the signal into the channel decoding model of the terminal device to reason, and obtain the decoded data of the signal;
  • the terminal equipment uses the decoded data of the signal.
  • the parameter value of the AIUpdateType of the terminal device is type 1 or type 2, the following steps are also included:
  • the second intelligent decision-making module sends the log-likelihood ratio of the signal and the decoded data of the signal to the first data collection and training module;
  • the first data collection and training module receives label information, which includes correct decoding data; or, the first data collection and training module receives reward information, which is 1 when the decoding is correct, and the label information is 1 when the decoding fails.
  • the reward information is 0.
  • the embodiment of the present application provides an information processing method, which defines another basic interaction mode between a first AI entity and a terminal device.
  • the terminal device has the AI reasoning capability
  • the terminal device can perform inference according to the first AI model to obtain the AI decision information of the terminal device, so as to realize the corresponding AI function.
  • FIG. 9 is a schematic flowchart of another information processing method provided by an embodiment of the application.
  • the flow of the information processing method in FIG. 9 is implemented by the interaction between the first AI entity and the terminal device, and may include the following steps:
  • the terminal device sends second AI model information to the first AI entity; correspondingly, the first AI entity receives the second AI model information sent by the terminal device;
  • the first AI entity updates the first AI model information according to the second AI model information
  • the first AI entity sends the updated first AI model information to the terminal device; correspondingly, the terminal device receives the updated first AI model information sent by the first AI entity.
  • This embodiment defines a basic interaction mode between the first AI entity and the terminal device when the terminal device has AI training capabilities.
  • the terminal device can train the AI model.
  • the second AI model information is the AI model information in the terminal device or the second AI entity. Similar to the first AI model information, the second AI model information may be the second AI model, or may be related parameters of the second AI model, which is not limited in this embodiment.
  • the first AI model and/or the second AI model are both obtained through training by the corresponding first data collection and training module and/or second data collection and training module.
  • the neural network training method may be used to train the first AI model and/or the second AI model.
  • the data collection and training module can initialize a neural network randomly, and each training is a process of using existing data to obtain a new neural network from the weight matrix and bias vector of a random neuron.
  • a loss function (loss function) can be used to evaluate the output result of the neural network, and the error can be back propagated.
  • the gradient descent method can be iteratively optimized until the loss function reaches the minimum value.
  • the data collection and training module can train the AI model through the above iterative optimization process to obtain a better AI model.
  • the second AI model information does not include user data of the terminal device.
  • the second AI model information sent by the terminal device to the first AI entity has nothing to do with the data of the terminal device itself, which is beneficial to the privacy protection of the terminal device.
  • the second AI model information may also include user data of the terminal device, so that the trained AI model is better, and it is beneficial to obtain more suitable AI decision information.
  • FIG. 10 is a schematic flowchart of a federated learning provided by an embodiment of the application.
  • the federated learning process shown in FIG. 10 is an example of a specific application of a basic interaction method between the first AI entity and the terminal device when the terminal device has AI training capabilities.
  • the federal learning process includes the following steps:
  • the second intelligent decision-making module sends an AI training data request message to the second data collection and training module;
  • the second data collection and training module sends the second AI training data to the second intelligent decision-making module;
  • the second intelligent decision-making module trains the second AI model according to the second AI training data
  • the second intelligent decision-making module sends the second AI model information to the first data collection and training module;
  • the first intelligent decision-making module sends an AI training data request message to the first data collection and training module;
  • the first data collection and training module sends the first AI training data to the first intelligent decision-making module;
  • the first intelligent decision-making module trains the first AI model according to the first AI training data
  • the first intelligent decision-making module sends the trained first AI model information to the second intelligent decision-making module.
  • the first AI training data refers to the AI training data in the first AI entity
  • the second AI training data refers to the AI training data in the terminal device.
  • the first AI model refers to the AI model in the first AI entity
  • the second AI model refers to the AI model in the terminal device.
  • the step of the second intelligent decision-making module sending the second AI model information to the first data collection and training module may be triggered periodically. That is, one or more local terminal devices can periodically upload one or more second AI model information to the cloud, and the cloud can save the locally uploaded second AI model information.
  • the first AI entity in the cloud trains and updates the first AI model
  • the first AI entity can deliver the trained first AI model information to the local.
  • the first AI model is trained and updated locally, and this cycle is repeated.
  • the loop process can be an infinite loop, or it can be a threshold (such as a loss function). When the loss function is less than the threshold, the loop stops, and the federated learning process ends.
  • the terminal device when the terminal device has AI training capabilities, the terminal device can also perform local AI training. That is to say, the aforementioned AI training interaction process may be the interaction between modules inside the terminal device, and the second AI model information is obtained through AI training.
  • the aforementioned AI training interaction process may also be an interaction between modules within the first AI entity, and the first AI model information is obtained through AI training.
  • the following is a detailed example of performing local training on the terminal device, or performing cloud training on the first AI entity.
  • FIG. 11 is a schematic diagram of a flow of AI training provided by an embodiment of the application.
  • the intelligent decision-making module and/or data collection and training module in FIG. 10 may refer to the first/second intelligent decision-making module, and/or, the first/second data collection and training module.
  • the AI training data and/or AI model in FIG. 11 may refer to the first/second AI training data, and/or, the first/second AI model.
  • the intelligent decision-making module sends an AI training data request message to the data collection and training module;
  • the data collection and training module sends AI training data to the intelligent decision-making module
  • the intelligent decision-making module trains an AI model based on the AI training data.
  • the AI training data may include, but is not limited to, AI decision information, status information, or feedback information.
  • the second intelligent decision-making module may update the second AI model according to the status information.
  • the first intelligent decision-making module may update the first AI model according to the AI decision information.
  • the terminal device when the terminal device has the AI reasoning ability and AI training ability, the terminal device can implement the process of AI reasoning and AI training through internal modules. In other words, when the AICapabilityClass parameter value of the terminal device is class 3, the terminal device can train the AI model and perform AI reasoning to obtain AI decision information.
  • the process of terminal equipment performing AI inference and AI training is obtained by combining the process of terminal equipment performing AI inference and the process of terminal equipment performing AI training in the previous embodiment.
  • the process of terminal equipment performing AI inference and AI training is obtained by combining the process of terminal equipment performing AI inference and the process of terminal equipment performing AI training in the previous embodiment.
  • the embodiment of the present application provides an information processing method, which defines another basic interaction mode between a first AI entity and a terminal device.
  • the terminal device can train and update the local first AI model, or interact with the first AI entity in the cloud to train and update the first AI model, so that the AI model is more suitable for different Application scenarios.
  • the AI algorithm adopted by the terminal device or the first AI entity is DRL.
  • the reward function of the system can be used as a performance indicator that indicates the final convergence of the algorithm.
  • the DRL online learning process can be implemented by the interaction between the terminal device and the first AI entity, or can be implemented by the internal module of the terminal device with AI reasoning capabilities and/or AI training capabilities.
  • the following takes the implementation of interaction between the terminal device and the first AI entity as an example for detailed description.
  • FIG. 12 is a schematic diagram of a DRL online learning process provided by an embodiment of the application.
  • the DRL online learning process includes the following steps:
  • the first data collection and training module sends a reward function request message to the first performance tracking module
  • the first performance tracking module sends a reward function to the first data collection and training module
  • S1203a The terminal device sends reward information to the first data collection and training module
  • the first performance tracking module sends the reward function update instruction information to the first data collection and training module;
  • the first data collection and training module updates the reward function according to the reward information.
  • the first performance tracking module can monitor the long-term key performance indicator (KPI) of the system, and the KPI can be used to guide the first data collection and training module to generate the reward function R( ⁇ , ⁇ ).
  • R represents the reward
  • the target parameter ⁇ is the performance data obtained by the terminal device executing the AI decision information, such as throughput and packet loss rate.
  • the weight value ⁇ of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices, and is used to indicate the weight of different short-term KPIs. That is, the weight value ⁇ of the target parameter may be obtained by the first performance tracking module in the first AI entity through long-term monitoring of the performance of all terminal devices in the system.
  • S1203a and S1203b are executed in no order, that is, S1203a and S1203b can be executed simultaneously.
  • S1203b may occur periodically, or may be caused by factors such as environmental changes that cause the AI model to not adapt, thereby triggering an update of the reward function.
  • the first data collection and training module may send a reward function request message to the first performance tracking module to request the update of the reward function.
  • the following uses a specific example to illustrate how the system adaptively adjusts the reward function during the DRL scheduling process.
  • an embodiment of the present application provides an early decision-making stop technology.
  • the early decision-making stop technology can predict the system performance through the performance tracking module and determine whether a catastrophe will occur. In order to avoid the catastrophic impact of exploration on the system as soon as possible.
  • FIG. 13 is a schematic flowchart of a decision-making early stop technology provided by an embodiment of the application.
  • the decision-making early stop technical process can be realized by the interaction between the terminal device and the first AI entity, or can be realized by the internal module of the terminal device with AI reasoning ability and/or AI training ability.
  • the multiple AI function modules in FIG. 13 may be function modules in the first AI entity in the cloud, or function modules in the second AI entity inside or externally connected to the local terminal device. This embodiment does not limited.
  • the terminal device sends observation information to the preprocessing module
  • the preprocessing module preprocesses the observation information to obtain corresponding status information
  • the preprocessing module sends status information to the intelligent decision-making
  • the intelligent decision-making module performs model inference to obtain AI decision information of the terminal device
  • the performance tracking module predicts system performance, and obtains decision mask information and/or penalty information
  • the performance tracking module sends decision mask information to the intelligent decision module
  • the intelligent decision-making module masks the AI decision information according to the decision mask information, and obtains the AI decision information after the mask processing;
  • the performance tracking module sends one or more of status information, decision mask information, and penalty information to the data collection and training module;
  • the intelligent decision-making module sends the masked AI decision information to the terminal device;
  • S1309a The terminal device executes the decision according to the AI decision information processed by the mask
  • the intelligent decision-making module sends status information and masked AI decision-making information to the data collection and training module;
  • S1310 The terminal device sends feedback information to the data collection and training module.
  • the performance tracking module needs to have long-term performance prediction capabilities. For example, the performance tracking module needs to determine whether a catastrophic performance loss will occur based on the current state of the system and the decisions made by the model.
  • the decision-making early stopping technology described in this embodiment may further include a step of model synchronization.
  • the intelligent decision-making module sends the AI model information to the performance tracking module.
  • the step of model synchronization and whether the two steps of S1308a are required depends on the predictive ability of the performance tracking module. That is to say, if the predictive ability of the performance tracking module is strong, the two steps of model synchronization and S1308a mentioned above are optional steps.
  • the decision mask information is used to mask the AI decision information so that the part that reduces system performance is processed. For example, if one or more users accessing the system will significantly reduce the system performance, then the performance tracking module can minimize the weight of the AI decision of the one or more users, then the one or more users will no longer Perform the corresponding AI decision.
  • the decision mask information can be obtained directly based on the prediction result, or can be obtained through the backup algorithm in the performance tracking module.
  • the performance tracking module may also use the decision mask information and/or penalty information as a training sample, and send the training sample to the data collection and training module.
  • the system makes scheduling decisions for 5 users, and the decision weights generated by DRL may be ⁇ 1.5, 1.1, 1.2, 0.2, 0 ⁇ .
  • the predicted throughput of user 0 and user 4 is 0, and then scheduling user 0 and/or user 4 in this case will inevitably lead to a waste of system resources.
  • the performance tracking module can generate a decision mask, for example, the decision masks of the 5 users are ⁇ 0,1,1,1,0 ⁇ . According to the above-mentioned decision mask, the performance tracking module can obtain the decision weights after mask processing as ⁇ 0, 1.1, 1.2, 0.2, 0 ⁇ respectively. Then according to the decision weight information, the system will schedule user 2. It can be seen that this scheduling is beneficial to reduce the waste of system resources and optimize the overall performance of the system.
  • the embodiment of the present application provides an application example in which a DRL algorithm is deployed in a cell. After the DRL algorithm is deployed in each cell, the DRL algorithm can be divided into two stages: the imitation learning stage of the agent and the online reinforcement learning stage of the agent, as shown in Figure 14.
  • the agent described in this example may be the first AI entity, or it may be a local second AI entity with AI reasoning capabilities and/or AI training capabilities.
  • the imitation learning stage of the agent is the first stage.
  • the agent needs training data to initialize the agent.
  • the base station uses a traditional scheduling algorithm for initialization training, and saves trajectory information during the entire scheduling process, so that the base station can perform supervised learning based on the saved trajectory information, thereby realizing initialization training for the base station.
  • the embodiment of the present application proposes that the simulation learning stage of the agent may be a virtual cell (vCell) assisted training process.
  • the first AI entity may obtain basic real information of the cell for training to generate vCells.
  • the basic real information of the cell may include, but is not limited to, the location information, mobility information, service information, channel information and other related information of the terminal equipment in the cell.
  • vCell is generally composed of a neural network.
  • the first AI entity may adopt a generative adversarial network (GAN) algorithm.
  • GAN generative adversarial network
  • the principle of the GAN training process is to first fix the generation network and train the identification network so that it can distinguish between real data and virtual data; then fix the identification network and train the generation network so that the virtual data generated by the generation network is as similar to the real data as possible. Then alternate until convergence.
  • the first AI entity can obtain real data and virtual data obtained by the generation network, and alternately train the authentication network and the generation network.
  • the first AI entity may obtain related information such as location information, mobility information, service information, and channel information of the terminal equipment in the cell, and input the related information into the generating network to obtain virtual data.
  • the data collection and training module in the first AI entity can train virtual data, that is, alternately train the identification network and the generation network according to the virtual data to generate a vCell, as shown in FIG. 15.
  • the vCell may be further divided into multiple virtual user equipment (virtual UE, vUE) and virtual environment (virtual environment, vEnv).
  • vUE virtual user equipment
  • vEnv virtual environment
  • vUE is used to model the UE
  • vEnv is used to model the environment.
  • vUE can use the multi-agentGAN algorithm to determine the location information, mobility information, and service information of the UE.
  • vEnv can use a conditional GAN algorithm to generate corresponding transmission channels based on UE location information, terrain information, weather information, and so on.
  • the online reinforcement learning stage of the agent is the second stage.
  • the agent can interact with the vCell that has completed training.
  • the introduction of vCell will greatly improve the convergence speed of the agent.
  • the agent can also perform online training.
  • the agent in this example can perform online training according to the deep reinforcement learning process shown in Figures 12 and 13.
  • the specific implementation process please refer to the description in the embodiment shown in FIG. 12 and FIG. 13, which will not be repeated here.
  • the embodiment of the present application also provides a training terminal, which is used to assist online training of the DRL algorithm.
  • a training terminal which is used to assist online training of the DRL algorithm.
  • FIG. 16 is a schematic diagram of a training terminal deployed in a real cell according to an embodiment of the application.
  • multiple and/or multiple training UEs can be deployed in a real cell, and each tUE can interact with the agent.
  • the interaction methods can include, but are not limited to, interaction through imitation learning, through Reinforcement learning for interaction, etc., as shown in Figure 16.
  • tUE has the following characteristics: it directly interacts with reinforcement learning algorithms; it can obtain a large number of training samples when idle; it can collect non-communication perceivable data in the cell; it can provide enhanced coverage services; it can be a fixed location device or It can be a mobile device.
  • tUE may be any one or more of the types included in the terminal equipment described in the embodiments of the present application.
  • tUE has the feature of acquiring a large number of training samples when it is idle, so tUE can collect a large number of training samples at night.
  • tUE has the feature of collecting non-communication perceivable data in the cell, then tUE can collect weather information, terrain information, obstruction information, etc., which can be used as training sample data and used for vCell modeling.
  • tUE has the feature of providing enhanced coverage services, so tUE can also be small stations, drones and other equipment.
  • tUE can effectively obtain training data without affecting actual services, and significantly improves training efficiency.
  • An embodiment of the present application provides a first AI entity, as shown in FIG. 17.
  • the first AI entity is used to implement the method executed by the first AI entity in the foregoing method embodiment, and specifically includes a preprocessing module 1701 and an intelligent decision module 1702.
  • the preprocessing module 1701 is used to receive observation information sent by the terminal device, and the observation information indicates data used for AI decision-making.
  • the intelligent decision module 1702 is used to determine the AI decision information of the terminal device according to the observation information and the first AI model.
  • the intelligent decision module 1702 is also used to send AI decision information to the terminal device.
  • the preprocessing module 1701 is also used to receive AI information of the terminal device, where the AI information includes AI capability parameters.
  • the preprocessing module 1701 is configured to receive observation information sent by the terminal device.
  • the preprocessing module 1701 is also used to preprocess the observation information to obtain corresponding status information.
  • the intelligent decision module 1702 is also used to input the state information into the first AI model for reasoning, and obtain AI decision information of the terminal device.
  • the above-mentioned preprocessing module 1701 may be used to execute S501 in FIG. 5 and S601 to S603 in FIG. 6, and the intelligent decision module 1702 may be used to execute S502 and S503 in FIG. 5, and S604 and S605 in FIG. And S607.
  • FIG. 18 is a schematic structural diagram of a first AI entity provided by an embodiment of the present application.
  • the first AI entity may be a device (such as a chip) capable of performing the information processing function described in the embodiment of the present application.
  • the first AI entity may include a transceiver 1801, at least one processor 1802, and a memory 1803.
  • the transceiver 1801, the processor 1802, and the memory 1803 may be connected to each other through one or more communication buses, or may be connected in other ways.
  • the transceiver 1801 can be used to send information or receive information. It can be understood that the transceiver 1801 is a general term and may include a receiver and a transmitter.
  • the receiver is used to receive observation information sent by the terminal device.
  • the transmitter is used to send AI decision information to the terminal device.
  • the transceiver 1801 may be used to implement part or all of the functions of the preprocessing module and the intelligent decision module shown in FIG. 17.
  • the processor 1802 can be used to process information.
  • the processor 1802 may call the program code stored in the memory 1803 to determine the AI decision information of the terminal device according to the observation information and the first AI model.
  • the processor 1802 may include one or more processors.
  • the processor 1802 may be one or more central processing units (CPUs), network processors (NPs), hardware chips, or other processors. random combination.
  • the processor 1802 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 1802 may be used to implement part or all of the functions of the pre-processing module and the intelligent decision-making module shown in FIG. 17.
  • the memory 1803 is used to store program codes and the like.
  • the memory 1803 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 1803 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD) or solid-state drive (SSD); the memory 1803 may also include a combination of the foregoing types of memories.
  • volatile memory volatile memory
  • RAM random access memory
  • non-volatile memory such as a read-only memory (read-only memory).
  • SSD solid-state drive
  • processor 1802 and memory 1803 may be coupled through an interface, or may be integrated together, which is not limited in this embodiment.
  • transceiver 1801 and processor 1802 may be used to implement the information processing method in the embodiment of the present application, where the specific implementation manner is as follows:
  • the transceiver 1801 is configured to receive observation information sent by a terminal device, and the observation information indicates data used for AI decision-making.
  • the processor 1802 is configured to determine the AI decision information of the terminal device according to the observation information and the first AI model.
  • the transceiver 1801 is also used to send AI decision information to the terminal device.
  • the transceiver 1801 is also used to receive AI information of the terminal device, where the AI information includes AI capability parameters.
  • the transceiver 1801 is used to receive observation information sent by the terminal device.
  • the processor 1802 is further configured to preprocess the observation information to obtain corresponding state information, and then input the state information into the first AI model for reasoning to obtain AI decision information of the terminal device.
  • the above transceiver 1801 may be used to perform S501 and S503 in FIG. 5, and S601, S603, and S605 in FIG. 6, and the processor 1802 may be used to perform S502 in FIG. 5, and S602 and S602 in FIG. S604.
  • the embodiment of the present application provides another first AI entity, as shown in FIG. 19.
  • the first AI entity is used to implement the method executed by the first AI entity in the foregoing method embodiment, and specifically includes an intelligent decision module 1901, a preprocessing module 1902, a data collection and training module 1903, and a performance tracking module 1904.
  • the intelligent decision module 1901 is configured to receive the second AI model information sent by the terminal device, and the second AI model information does not include user data of the terminal device.
  • the intelligent decision module 1901 is also configured to update the first AI model information according to the second AI model information; the first AI model information is the AI model information of the first AI entity.
  • the intelligent decision module 1901 is also used to send the updated first AI model information to the terminal device.
  • the intelligent decision module 1901 is also used to receive a request message sent by the terminal device, and the request message is used to request the first AI model information.
  • the intelligent decision module 1901 is also used to send the first AI model information to the terminal device.
  • the preprocessing module 1902 is configured to receive AI information of the terminal device, where the AI information includes AI capability parameters.
  • the intelligent decision module 1901 is also used to receive AI decision information and status information sent by the terminal device; among them, the AI decision information is the terminal device inputting status information
  • the second AI model is obtained by reasoning, and the state information is obtained by the terminal device based on the observation information; the observation information indicates the data used for AI decision-making.
  • the preprocessing module 1902 is further configured to receive AI information of the terminal device, and the AI information includes AI update parameters. If the AI update parameter indicates a scheduled AI update or an event triggers an AI update, the data collection and training module 1903 is used to receive feedback information, which is used to indicate data used for AI training.
  • the intelligent decision-making module 1901 is also used to update the first AI model according to the AI training data; where the AI training data includes one or more of AI decision information, status information, or feedback information.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined according to the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the performance tracking module 1904 is used to send reward information to the data collection and training module 1903.
  • the above-mentioned intelligent decision module 1901 may be used to execute S901 to S903 in FIG. 9 and S1005 to S1008 in FIG.
  • the data collection and training module 1903 is used to execute S809a and S809b in Figure 8, S1203a, S1203b and S1204 in Figure 12, and S1309b and S1310 in Figure 13.
  • FIG. 20 is a schematic structural diagram of a first AI entity provided by an embodiment of the present application.
  • the first AI entity may be a device (such as a chip) capable of performing the information processing function described in the embodiment of the present application.
  • the first AI entity may include a transceiver 2001, at least one processor 2002, and a memory 2003.
  • the transceiver 2001, the processor 2002, and the memory 2003 may be connected to each other through one or more communication buses, or may be connected in other ways.
  • the transceiver 2001 can be used to send information or receive information. It can be understood that the transceiver 2001 is a general term and may include a receiver and a transmitter.
  • the receiver is used to receive the second AI model information sent by the terminal device.
  • the transmitter is used to send the updated first AI model information to the terminal device.
  • the transceiver 2001 may be used to implement part or all of the functions of the intelligent decision module 1901, the preprocessing module 1902, the data collection and training module 1903, and the performance tracking module 1904 shown in FIG. 19.
  • the processor 2002 can be used to process information.
  • the processor 2002 may call the program code stored in the memory 2003 to update the first AI model information according to the second AI model information.
  • the processor 2002 may include one or more processors.
  • the processor 2002 may be one or more central processing units (CPUs), network processors (network processors, NPs), hardware chips, or other processors. random combination.
  • the processor 2002 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 2002 may be used to implement part or all of the functions of the intelligent decision-making module 1901, the pre-processing module 1902, the data collection and training module 1903, and the performance tracking module 1904 shown in FIG. 19.
  • the memory 2003 is used to store program codes and the like.
  • the memory 2003 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 2003 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory).
  • volatile memory volatile memory
  • non-volatile memory non-volatile memory
  • read-only memory read-only memory
  • memory 2003 may also include a combination of the above types of memories.
  • processor 2002 and memory 2003 may be coupled through an interface, or may be integrated together, which is not limited in this embodiment.
  • the above transceiver 2001 and the processor 2002 may be used to implement the information processing method in the embodiment of the present application, where the specific implementation manner is as follows:
  • the transceiver 2001 is configured to receive second AI model information sent by a terminal device, where the second AI model information does not include user data of the terminal device;
  • the processor 2002 is configured to update the first AI model information according to the second AI model information; the first AI model information is the AI model information of the first AI entity;
  • the transceiver 2001 is also used to send the updated first AI model information to the terminal device.
  • the transceiver 2001 is also used for:
  • the transceiver 2001 is also used for:
  • the transceiver 2001 is also used to:
  • the AI decision information is obtained by the terminal device inputting the state information into the second AI model for inference, and the state information is obtained by the terminal device based on the observation information; the observation information indicates that the AI decision is made Data used.
  • the transceiver 2001 is also used to receive AI information of the terminal device, where the AI information includes AI update parameters;
  • the transceiver 2001 is also used to receive feedback information, which is used to indicate data used for AI training.
  • processor 2002 is also used to:
  • the AI training data includes one or more of AI decision information, status information, or feedback information.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined according to the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the above transceiver 1801 may be used to execute S901 and S903 in FIG. 9, S1004 in FIG. 10, S1201 to S1203a in FIG. 12, and S1301 and S1308b in FIG. 13.
  • the processor 1802 is configured to execute S902 in FIG. 9, S1005 to S1007 in FIG. 10, and S1204 in FIG. 12.
  • An embodiment of the present application provides a terminal device, as shown in FIG. 21.
  • the terminal device is used to implement the method executed by the terminal device in the foregoing method embodiment, and specifically includes a transceiver module 2101 and a processing module 2102.
  • the transceiver module 2101 is configured to send observation information to the first AI entity, and the observation information indicates data used for AI decision-making.
  • the transceiver module 2101 is also configured to receive AI decision information of the terminal device sent by the first AI entity.
  • the processing module 2102 is used to perform decision-making according to the AI decision-making information.
  • the transceiver module 2101 is further configured to send AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter indicates that the terminal device has no AI capability.
  • the AI decision information of the terminal device is obtained by the first AI entity inputting state information into the first AI model for inference; the state information is obtained by the first AI entity based on observation information.
  • the foregoing transceiver module 2101 may be used to execute S501 and S503 in FIG. 5 and S601 and S605 in FIG. 6.
  • the processing module 2102 is used to execute S606 in FIG. 6.
  • FIG. 22 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device may be a device (such as a chip) that has the information processing function described in the embodiment of the present application.
  • the terminal device may include a transceiver 2201, at least one processor 2202, and a memory 2203.
  • the transceiver 2201, the processor 2202, and the memory 2203 may be connected to each other through one or more communication buses, or may be connected in other ways.
  • the transceiver 2201 can be used to send information or receive information. It can be understood that the transceiver 2201 is a general term and may include a receiver and a transmitter.
  • the receiver is configured to receive AI decision information of the terminal device sent by the first AI entity.
  • the transmitter is used to send observation information to the first AI entity.
  • the transceiver 2201 may be used to implement part or all of the functions of the transceiver module 2101 shown in FIG. 21.
  • the processor 2202 may be used to process information.
  • the processor 2202 may call the program code stored in the memory 2203 to implement decision making according to the AI decision information.
  • the processor 2202 may include one or more processors.
  • the processor 2202 may be one or more central processing units (CPUs), network processors (network processors, NPs), hardware chips, or other processors. random combination.
  • the processor 2202 is a CPU
  • the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 2201 may be used to implement part or all of the functions of the processing module 2102 shown in FIG. 21.
  • the memory 2203 is used to store program codes and the like.
  • the memory 2203 may include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 2203 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory).
  • volatile memory volatile memory
  • non-volatile memory non-volatile memory
  • read-only memory read-only memory
  • the memory 2203 may also include a combination of the foregoing types of memories.
  • processor 2202 and memory 2203 may be coupled through an interface, or may be integrated together, which is not limited in this embodiment.
  • transceiver 2201 and processor 2202 may be used to implement the information processing method in the embodiment of the present application, where the specific implementation manner is as follows:
  • the transceiver 2201 is configured to send observation information to the first AI entity, where the observation information indicates data used for AI decision-making.
  • the transceiver 2201 is also configured to receive AI decision information of the terminal device sent by the first AI entity.
  • the processor 2202 is configured to perform a decision according to the AI decision information.
  • the transceiver 2201 is further configured to send AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter indicates that the terminal device has no AI capability.
  • the AI decision information of the terminal device is obtained by the first AI entity inputting state information into the first AI model for inference; the state information is obtained by the first AI entity based on observation information.
  • the above transceiver 2201 may be used to perform S501 and S503 in FIG. 5, and S601 and S605 in FIG. 6.
  • the processor 2202 is configured to execute S606 in FIG. 6.
  • the embodiment of the present application provides another terminal device, as shown in FIG. 23.
  • the terminal device is used to implement the method executed by the terminal device in the foregoing method embodiment, and specifically includes a transceiver module 2301 and a processing module 2302.
  • the transceiver module 2301 is configured to send a request message to the first AI entity, and the request message is used to request first AI model information.
  • the transceiver module 2301 is also configured to receive first AI model information sent by the first AI entity.
  • the processing module 2302 is used to input the state information into the second AI model for reasoning to obtain the AI decision information of the terminal device; wherein the state information is determined based on the observation information; the observation information indicates the data used for AI decision-making; the second AI model is the terminal device Determined according to the first AI model information.
  • the transceiver module 2301 is further configured to send AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter indicates that the terminal device has AI inference capability.
  • the transceiver module 2301 is further configured to send AI decision information and status information to the first AI entity.
  • the AI information of the terminal device includes AI capability parameters and/or AI update parameters; the transceiver module 2301 is also configured to send feedback to the first AI entity if the AI update parameters indicate a regular AI update or an event triggers an AI update Information, the feedback information is used to indicate the data used for AI training.
  • the processing module 2302 is further configured to, if the AI capability parameter indicates that the terminal device has AI training capability, obtain the second AI model according to the AI training data; where the AI training data includes AI decision information, status information or feedback One or more of the information.
  • the transceiver module 2301 is further configured to send the second AI model information to the first AI entity.
  • the transceiver module 2301 is further configured to receive updated first AI model information sent by the first AI entity, where the updated first AI model information is determined by the first AI entity according to the second AI model information.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined according to the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the above transceiver module 2301 can be used to execute S701 and S702 in FIG. 7, S801 and S802 in FIG. 8, S901 and S903 in FIG. 9, S1004 and S1008 in FIG. 10, and S1203a in FIG. S1301 and S1308b in Figure 13.
  • the processing module 2302 is used to execute S703 in FIG. 7, S803, S804, and S808 in FIG. 8, S1003 in FIG. 10, and S1309a in FIG.
  • FIG. 24 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • the terminal device may be a device (such as a chip) that has the information processing function described in the embodiment of the present application.
  • the terminal device may include a transceiver 2401, at least one processor 2402, and a memory 2403.
  • the transceiver 2401, the processor 2402, and the memory 2403 may be connected to each other through one or more communication buses, or may be connected in other ways.
  • the transceiver 2401 can be used to send information or receive information. It can be understood that the transceiver 2401 is a general term and may include a receiver and a transmitter.
  • the receiver is configured to receive the first AI model information sent by the first AI entity.
  • the transmitter is used to send a request message to the first AI entity.
  • the transceiver 2401 may be used to implement part or all of the functions of the transceiver module 2301 shown in FIG. 23.
  • the processor 2402 may be used to process information.
  • the processor 2402 may call the program code stored in the memory 2403 to implement decision making according to AI decision information.
  • the processor 2402 may include one or more processors.
  • the processor 2402 may be one or more central processing units (CPUs), network processors (NPs), hardware chips, or other processors. random combination.
  • the processor 2402 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
  • the processor 2402 may be used to implement part or all of the functions of the processing module 2302 shown in FIG. 23.
  • the memory 2403 is used to store program codes and the like.
  • the memory 2403 may include volatile memory (volatile memory), such as random access memory (random access memory, RAM); the memory 2403 may also include non-volatile memory (non-volatile memory), such as read-only memory (read-only memory). Only memory (ROM), flash memory (flash memory), hard disk drive (HDD), or solid-state drive (SSD); the memory 2403 may also include a combination of the foregoing types of memories.
  • processor 2402 and memory 2403 may be coupled through an interface, or may be integrated together, which is not limited in this embodiment.
  • transceiver 2401 and processor 2402 may be used to implement the information processing method in the embodiment of the present application, where the specific implementation is as follows:
  • the transceiver 2401 is configured to send a request message to the first AI entity, where the request message is used to request first AI model information.
  • the transceiver 2401 is also configured to receive first AI model information sent by the first AI entity.
  • the processor 2402 is configured to input the state information into the second AI model for reasoning to obtain the AI decision information of the terminal device; wherein the state information is determined based on the observation information; the observation information indicates the data used for AI decision-making; the second AI model is the terminal device Determined according to the first AI model information.
  • the transceiver 2401 is further configured to send AI information of the terminal device to the first AI entity, where the AI information includes an AI capability parameter, where the AI capability parameter indicates that the terminal device has AI inference capability.
  • the transceiver 2401 is also used to send AI decision information and status information to the first AI entity.
  • the AI information of the terminal device includes AI capability parameters and/or AI update parameters. If the AI update parameter indicates a scheduled AI update or an event triggers an AI update, the transceiver 2401 is also used to send feedback information to the first AI entity, and the feedback information is used to indicate data used for AI training.
  • the transceiver 2401 is also used to obtain a second AI model based on the AI training data if the AI capability parameter indicates that the terminal device has AI training capabilities; wherein the AI training data includes AI decision information, status information or feedback One or more of the information.
  • the transceiver 2401 is further configured to send the second AI model information to the first AI entity.
  • the transceiver 2401 is further configured to receive updated first AI model information sent by the first AI entity, where the updated first AI model information is determined by the first AI entity according to the second AI model information.
  • the feedback information includes reward information; the reward information is used to update the first AI model.
  • the reward information is determined according to the reward function.
  • the reward function is determined according to the target parameter ⁇ and the weight value ⁇ of the target parameter.
  • the target parameter is the performance data obtained by the terminal device executing the AI decision information, and the weight value of the target parameter is determined by the first AI entity according to the performance data of one or more terminal devices.
  • the above transceiver 2401 may be used to perform S701 and S702 in FIG. 7, S801 and S802 in FIG. 8, S901 and S903 in FIG. 9, S1004 and S1008 in FIG. 10, and S1203a in FIG. S1301 and S1308b in Figure 13.
  • the processor 2402 is configured to execute S703 in FIG. 7, S803, S804, and S808 in FIG. 8, S1003 in FIG. 10, and S1309a in FIG.
  • An embodiment of the present application provides a communication system, which includes the terminal device described in the foregoing embodiment and a first AI entity.
  • the embodiment of the present application provides a computer-readable storage medium that stores a program or instruction, and when the program or instruction runs on a computer, the computer executes the information processing method in the embodiment of the present application.
  • the embodiment of the application provides a chip or a chip system.
  • the chip or chip system includes at least one processor and an interface.
  • the interface and the at least one processor are interconnected through a wire, and the at least one processor is used to run a computer program or instruction to perform the application.
  • the information processing method in the embodiment is used to run a computer program or instruction to perform the application.
  • the interface in the chip can be an input/output interface, a pin, or a circuit.
  • the chip system in the foregoing aspect may be a system on chip (SOC), or a baseband chip, etc., where the baseband chip may include a processor, a channel encoder, a digital signal processor, a modem, and an interface module.
  • SOC system on chip
  • baseband chip may include a processor, a channel encoder, a digital signal processor, a modem, and an interface module.
  • the chip or chip system described above in this application further includes at least one memory, and instructions are stored in the at least one memory.
  • the memory may be a storage module inside the chip, for example, a register, a cache, etc., or may be a storage module of the chip (for example, a read-only memory, a random access memory, etc.).
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions can be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (Digital Subscriber Line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
  • a cable such as Coaxial cable, optical fiber, digital subscriber line (Digital Subscriber Line, DSL) or wireless (such as infrared, wireless, microwave, etc.) transmission to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a high-density digital video disc (Digital Video Disc, DVD)), or a semiconductor medium (for example, a solid state disk (Solid State Disk, SSD)) etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请实施例公开了一种信息处理方法及相关设备,其中,本申请实施例提供一种接入网中的第一AI实体,并定义了多种第一AI实体与终端设备之间的基础交互方式。在一种交互方式中,第一AI实体可以接收终端设备发送的第二AI模型信息,该第二AI模型信息不包括终端设备的用户数据。第一AI实体可以根据该第二AI模型信息对第一AI实体的第一AI模型信息进行更新,再将更新后的第一AI模型信息发送给终端设备,以使终端设备对第二AI模型信息进行训练更新。可见,接入网中的第一AI实体实现了将AI技术应用于无线接入网,有利于提高无线接入网的处理能力。

Description

一种信息处理方法及相关设备
本申请要求于2020年5月30日提交中国国家知识产权局、申请号为202010480881.7、申请名称为“一种信息处理方法及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信技术领域,尤其涉及一种信息处理方法及相关设备。
背景技术
人工智能(artificial intelligence,AI)技术在图像处理与自然语言处理领域有着非常成功的应用。例如,将AI技术应用于网络层(如网络优化,移动性管理,资源分配等),或者将AI技术应用于物理层(如信道编译码,信道预测、接收机等)等方面。AI实体可以部署在接入网中以提高接入网的处理能力(如提高资源分配效率等),但是目前并未定义接入网中的AI实体与用户设备(user equipment,UE)之间的基础交互方式,无法高效将AI技术应用于无线接入网中。
发明内容
本申请实施例提供一种信息处理方法及相关设备,该信息处理方法可以将AI技术应用于无线接入网,有利于提高无线接入网的处理能力。
第一方面,本申请实施例提供一种信息处理方法,可以应用于接入网中的第一AI实体。其中,第一AI实体可以接收终端设备发送的第二AI模型信息,该第二AI模型信息不包括所述终端设备的用户数据。第一AI实体根据第二AI模型信息,更新第一AI模型信息。第一AI实体向终端设备发送更新后的第一AI模型信息。
可见,上述方法流程定义了一种第一AI实体与终端设备之间的基础交互方式。其中,第一AI实体和终端设备均具备AI训练能力,那么第一AI实体可以基于终端设备发送第二AI模型进行训练并更新第一AI模型,并将更新后的第一AI模型发送给终端设备。
其中,终端设备发送的第二AI模型信息不包括终端设备的用户数据,有利于实现终端设备的隐私保护。上述训练交互可以更新第一AI实体的第一AI模型,有利于提高第一AI实体和终端设备的处理能力。
在一种可能的设计中,第一AI实体还可以接收所述终端设备发送的请求消息,该请求消息用于请求第一AI模型信息。第一AI实体向终端设备发送第一AI模型信息。
可见,上述方法流程定义了另一种第一AI实体与终端设备之间的基础交互方式。其中,当终端设备具备AI推理能力时,第一AI实体接收终端设备的请求消息,并向终端设备发送第一AI模型信息。对应的,终端设备接收到第一AI模型信息后,可以根据待决策的数据和AI模型进行推理,得到AI决策信息。
在一种可能的设计中,第一AI实体接收终端设备发送的请求消息之前,第一AI实体还可以接收终端设备的AI信息,该AI信息包括AI能力参数。其中,AI能力参数用于指示终端设备是否具备AI推理能力和/或AI训练能力。
在一种可能的设计中,若所述AI能力参数指示所述终端设备具备AI推理能力,所述第一AI实体接收所述终端设备发送的AI决策信息和状态信息,所述AI决策信息是所述终端设备将所述状态信息输入所述第二AI模型进行推理得到的,所述状态信息是所述终端设备根据观察信息得到的。
可见,当终端设备具备AI推理能力时,终端设备可以得到AI决策信息,并将该AI决策信息发送给第一AI实体,以使第一AI实体获取终端设备的AI决策信息,有利于第一AI实体进行AI模型的更新。
在一种可能的设计中,所述第一AI实体接收所述终端设备的AI信息,所述AI信息包括AI更新参数;
若所述AI更新参数指示定时AI更新或事件触发AI更新,所述第一AI实体接收反馈信息,所述反馈信息用于指示进行AI训练使用的数据。
在一种可能的设计中,第一AI实体接收终端设备的AI信息,该AI信息包括AI更新参数。若AI更新参数指示定时AI更新或事件触发AI更新,第一AI实体接收反馈信息,该反馈信息用于指示进行AI训练使用的数据。
可见,终端设备的AI信息中的AI更新参数可以指示终端设备进行AI更新。对应的,第一AI实体可以接收终端设备发送的反馈信息,该反馈信息可以用于第一AI实体的训练更新,有利于提高第一AI实体的处理能力。
在一种可能的设计中,第一AI实体根据AI训练数据,更新第一AI模型。其中,所述AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种可能的设计中,反馈信息包括奖励信息;奖励信息用于更新所述第一AI模型。
在一种可能的设计中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
可见,本申请实施例扩展了一种深度强化学习的流程,第一AI实体可以监控***的性能指标,有利于更新第一AI模型。
第二方面,本申请实施例提供一种信息处理方法,应用于终端设备。其中,终端设备向第一AI实体发送请求消息,该请求消息用于请求第一AI模型信息。终端设备接收第一AI实体发送的第一AI模型信息。终端设备将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。其中,状态信息基于观察信息确定,观察信息指示进行AI决策使用的数据。
可见,终端设备自身具备AI推理能力时,终端设备可以从第一AI实体获取第一AI模型信息,并根据第一AI模型信息确定终端设备的第二AI模型。终端设备可以将进行AI决策使用的数据输入第二AI模型进行推理,从而得到AI决策信息。通过终端设备和第一AI实体之间的交互,完整了终端设备实现AI推理功能的流程,有利于提升终端设备的处理能力。
在一种可能的设计中,终端设备向第一AI实体发送请求消息之前,终端设备还可以向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数。其中,AI能力参数指示该终端设备具备AI推理能力。
可见,当终端设备自身具备AI推理能力时,终端设备可以通过与第一AI实体之间的交互通知第一AI实体。
在一种可能的设计中,终端设备还可以向第一AI实体发送AI决策信息和状态信息。
可见,当终端设备自身具备AI推理能力时,终端设备可以通过与第一AI实体之间的交互将推理得到的AI决策信息可以发送给第一AI实体。
在一种可能的设计中,终端设备的AI信息包括AI能力参数和/或AI更新参数。若AI更新参数指示定时AI更新或事件触发AI更新,终端设备可以向第一AI实体发送反馈信息,该反馈信息用于指示进行AI训练使用的数据。
可见,当终端设备的AI更新参数指示需要AI更新时,终端设备可以通过与第一AI实体之间的交互通知第一AI实体也进行AI训练更新数据。
在一种可能的设计中,若AI能力参数指示终端设备具备AI训练能力,终端设备根据AI训练数据,获取第二AI模型。其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
可见,当终端设备具备AI训练能力时,终端设备可以通过自身的训练更新本地的第二AI模型。
在一种可能的设计中,终端设备向第一AI实体发送第二AI模型信息。终端设备接收第一AI实体发送的更新后的第一AI模型信息,更新后的第一AI模型信息是第一AI实体根据第二AI模型信息确定的。
可见,当终端设备具备AI训练能力时,终端设备可以通过与第一AI实体之间的交互向第一AI实体发送本地的第二AI模型信息,以使第一AI实体根据第二AI模型信息更新第一AI模型信息。并且,终端设备向第一AI实体发送的第二AI模型信息与终端设备本身的数据无关,有利于终端设备的隐私保护。
在一种可能的设计中,反馈信息包括奖励信息;奖励信息用于更新第一AI模型。
在一种可能的设计中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
可见,本申请实施例扩展了一种深度强化学习的流程,若终端设备具备AI训练能力,可以监控***的性能指标,有利于更新本地的第二AI模型。
第三方面,本申请实施例提供一种信息处理方法,可以应用于接入网中的第一AI实体。其中,第一AI实体可以接收终端设备发送的观察信息,该观察信息指示进行AI决策使用的数据。第一AI实体根据观察信息和第一AI模型,确定终端设备的AI决策信息,并将该AI决策信息发送给终端设备。
可见,上述方法流程定义了另一种第一AI实体与终端设备之间的基础交互方式。其中,第一AI实体具备AI推理能力,可以根据终端设备发送的进行AI决策使用的数据以及自身的第一AI模型,确定终端设备的AI决策信息。也就是说,接入网中的第一AI实体实现了将AI技术应用于无线接入网,有利于提高无线接入网的处理能力。
在一种可能的设计中,第一AI实体在接收终端设备发送的观察信息之前,还可以接收终端设备的AI信息,该AI信息包括AI能力参数。其中,AI能力参数用于指示终端设备 是否具备AI推理能力和/或AI训练能力。
在一种可能的设计中,若所述终端设备的AI能力参数指示所述终端设备无AI能力,所述第一AI实体接收所述终端设备发送的观察信息。
可见,若终端设备不具备AI推理能力,终端设备可以通过第一AI实体实现相关的AI功能。
在一种可能的设计中,第一AI实体可以对观察信息进行预处理,得到对应的状态信息。第一AI实体再将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
可见,第一AI实体在获取终端设备的AI决策信息的过程中,先要将观察信息转换为AI模型可以处理的状态信息,才能得到AI决策信息。
第四方面,本申请实施例提供一种信息处理方法,可以应用于终端设备。其中,终端设备向第一AI实体发送观察信息,该观察信息指示进行AI决策使用的数据。终端设备接收第一AI实体发送的该终端设备的AI决策信息,并根据该AI决策信息执行决策。
可见,终端设备可以通过与第一AI实体之间的交互,来获取终端设备的AI决策信息,实现相应的AI功能。
在一种可能的设计中,终端设备向第一AI实体发送观察信息之前,终端设备还可以向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,该AI能力参数指示所述终端设备无AI能力。
在一种可能的设计中,终端设备的AI决策信息是第一AI实体将状态信息输入第一AI模型进行推理得到的;状态信息是第一AI实体根据观察信息得到的。
可见,当终端设备无AI能力时,可以通过与第一AI实体之间的交互,来获取终端设备的AI决策信息。
第五方面,本申请实施例提供一种第一AI实体,该第一AI实体包括智能决策模块。其中,智能决策模块用于接收终端设备发送的第二AI模型信息,第二AI模型信息不包括终端设备的用户数据。智能决策模块还用于根据第二AI模型信息,更新第一AI模型信息。其中,第一AI模型信息为第一AI实体的AI模型信息。智能决策模块还用于向终端设备发送更新后的第一AI模型信息。
在一种可能的设计中,智能决策模块还用于接收终端设备发送的请求消息,该请求消息用于请求第一AI模型信息。智能决策模块接收该请求消息后,可以向终端设备发送第一AI模型信息。
在一种可能的设计中,第一AI实体还包括预处理模块。其中,预处理模块用于接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种可能的设计中,若AI能力参数指示终端设备具备AI推理能力,智能决策模块还用于接收终端设备发送的AI决策信息和状态信息。其中,AI决策信息是终端设备将状态信息输入第二AI模型进行推理得到的,状态信息是终端设备根据观察信息得到的,观察信息指示进行AI决策使用的数据。
在一种可能的设计中,预处理模块还用于接收终端设备的AI信息,该AI信息包括AI更新参数。其中,第一AI实体还可以包括数据收集与训练模块。若AI更新参数指示定时AI更新或事件触发AI更新,数据收集与训练模块用于接收反馈信息,该反馈信息用于指 示进行AI训练使用的数据。
在一种可能的设计中,智能决策模块还用于根据AI训练数据,更新第一AI模型。其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种可能的设计中,反馈信息包括奖励信息,奖励信息用于更新所述第一AI模型。
在一种可能的设计中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
第六方面,本申请实施例提供一种终端设备,该终端设备包括收发模块和处理模块。其中,收发模块用于向第一AI实体发送请求消息,该请求消息用于请求第一AI模型信息。收发模块还用于接收第一AI实体发送的第一AI模型信息。处理模块用于将状态信息输入第二AI模型进行推理,得到终端设备的AI决策信息;其中,状态信息基于观察信息确定;观察信息指示进行AI决策使用的数据;第二AI模型是终端设备根据第一AI模型信息确定的。
在一种可能的设计中,收发模块还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,AI能力参数指示终端设备具备AI推理能力。
在一种可能的设计中,收发模块还用于向第一AI实体发送AI决策信息和状态信息。
在一种可能的设计中,终端设备的AI信息包括AI能力参数和/或AI更新参数。其中,若AI更新参数指示定时AI更新或事件触发AI更新,收发模块还用于向第一AI实体发送反馈信息,该反馈信息用于指示进行AI训练使用的数据。
在一种可能的设计中,若AI能力参数指示终端设备具备AI训练能力,处理模块还用于根据AI训练数据,获取第二AI模型。其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种可能的设计中,收发模块还用于向第一AI实体发送第二AI模型信息。收发模块还可以接收第一AI实体发送的更新后的第一AI模型信息,更新后的第一AI模型信息是第一AI实体根据第二AI模型信息确定的。
在一种可能的设计中,反馈信息包括奖励信息。其中,奖励信息用于更新所述第一AI模型。
在一种可能的设计中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
第七方面,本申请实施例提供一种第一AI实体,该第一AI实体包括预处理模块和智能决策模块。其中,预处理模块用于接收终端设备发送的观察信息,该观察信息指示进行AI决策使用的数据。智能决策模块用于根据观察信息和第一AI模型,确定终端设备的AI决策信息。智能决策模块还用于向终端设备发送AI决策信息。
在一种可能的设计中,预处理模块还用于接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种可能的设计中,若终端设备的AI能力参数指示终端设备无AI能力,预处理模块用于接收终端设备发送的观察信息。
在一种可能的设计中,预处理模块还用于对观察信息进行预处理,得到对应的状态信息。智能决策模块还用于将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
第八方面,本申请实施例提供一种终端设备,该终端设备包括收发模块和处理模块。其中,收发模块用于向第一AI实体发送观察信息,该观察信息指示进行AI决策使用的数据。收发模块还用于接收第一AI实体发送的终端设备的AI决策信息。处理模块用于根据AI决策信息执行决策。
在一种可能的设计中,收发模块还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数。其中,AI能力参数指示终端设备无AI能力。
在一种可能的设计中,终端设备的AI决策信息是第一AI实体将状态信息输入第一AI模型进行推理得到的;状态信息是第一AI实体根据观察信息得到的。
第九方面,本申请实施例提供一种第一AI实体,该实体具有实现第一方面所提供的信息处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第十方面,本申请实施例提供一种终端设备,该设备具有实现第二方面所提供的信息处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第十一方面,本申请实施例提供一种第一AI实体,该实体具有实现第三方面所提供的信息处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第十二方面,本申请实施例提供一种终端设备,该设备具有实现第四方面所提供的信息处理方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第十三方面,本申请实施例提供一种通信***,该通信***包括上述第五方面、第七方面、第九方面或第十一方面提供的第一AI实体,以及第六方面、第八方面、第十方面或第十二方面提供的终端设备。
第十四方面,本申请实施例提供一种计算机可读存储介质,该可读存储介质包括程序或指令,当所述程序或指令在计算机上运行时,使得计算机执行第一方面或第一方面中任一种可能实现方式中的方法。
第十五方面,本申请实施例提供一种计算机可读存储介质,该可读存储介质包括程序或指令,当所述程序或指令在计算机上运行时,使得计算机执行第二方面或第二方面中任一种可能实现方式中的方法。
第十六方面,本申请实施例提供一种计算机可读存储介质,该可读存储介质包括程序或指令,当所述程序或指令在计算机上运行时,使得计算机执行第三方面或第三方面中任一种可能实现方式中的方法。
第十七方面,本申请实施例提供一种计算机可读存储介质,该可读存储介质包括程序或指令,当所述程序或指令在计算机上运行时,使得计算机执行第四方面或第四方面中任一种可能实现方式中的方法。
第十八方面,本申请实施例提供一种芯片或者芯片***,该芯片或者芯片***包括至少一个处理器和接口,接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行第一方面或第一方面的任一种可能的实现方式中任一项所描述的方法。
第十九方面,本申请实施例提供一种芯片或者芯片***,该芯片或者芯片***包括至少一个处理器和接口,接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行第二方面或第二方面的任一种可能的实现方式中任一项所描述的方法。
第二十方面,本申请实施例提供一种芯片或者芯片***,该芯片或者芯片***包括至少一个处理器和接口,接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行第三方面或第三方面的任一种可能的实现方式中任一项所描述的方法。
第二十一方面,本申请实施例提供一种芯片或者芯片***,该芯片或者芯片***包括至少一个处理器和接口,接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行第四方面或第四方面的任一种可能的实现方式中任一项所描述的方法。
其中,芯片中的接口可以为输入/输出接口、管脚或电路等。
上述方面中的芯片***可以是片上***(system on chip,SOC),也可以是基带芯片等,其中基带芯片可以包括处理器、信道编码器、数字信号处理器、调制解调器和接口模块等。
在一种可能的实现中,本申请中上述描述的芯片或者芯片***还包括至少一个存储器,该至少一个存储器中存储有指令。该存储器可以为芯片内部的存储模块,例如,寄存器、缓存等,也可以是该芯片的存储模块(例如,只读存储器、随机存取存储器等)。
第二十二方面,本申请实施例提供一种计算机程序或计算机程序产品,包括代码或指令,当代码或指令在计算机上运行时,使得计算机执行第一方面或第一方面中任一种可能实现方式中的方法。
第二十三方面,本申请实施例提供一种计算机程序或计算机程序产品,包括代码或指令,当代码或指令在计算机上运行时,使得计算机执行第二方面或第二方面中任一种可能实现方式中的方法。
第二十四方面,本申请实施例提供一种计算机程序或计算机程序产品,包括代码或指令,当代码或指令在计算机上运行时,使得计算机执行第三方面或第三方面中任一种可能实现方式中的方法。
第二十五方面,本申请实施例提供一种计算机程序或计算机程序产品,包括代码或指令,当代码或指令在计算机上运行时,使得计算机执行第四方面或第四方面中任一种可能实现方式中的方法。
附图说明
图1为一种智能体与环境之间的交互的示意图;
图2为一种马尔可夫决策过程的示意图;
图3a为本申请实施例提供的一种网络架构的示意图;
图3b为本申请实施例提供的一种5G RAN架构的示意图;
图4为本申请实施例提供的一种RAN架构的示意图;
图5为本申请实施例提供的一种信息处理方法的流程示意图;
图6为本申请实施例提供的一种终端设备无AI能力时的信息处理的流程图;
图7为本申请实施例提供的另一种信息处理方法的流程示意图;
图8为本申请实施例提供的一种终端设备具备AI推理能力时的信息处理的流程图;
图9为本申请实施例提供的另一种信息处理方法的流程示意图;
图10为本申请实施例提供的一种联邦学习的流程示意图;
图11为本申请实施例提供的一种AI训练的流程示意图;
图12为本申请实施例提供的一种DRL在线学习的流程示意图;
图13为本申请实施例提供的一种决策早停技术的流程示意图;
图14为本申请实施例提供的一种DRL算法部署在小区的应用示意图;
图15为本申请实施例提供的一种虚拟小区辅助训练的示意图;
图16为本申请实施例提供的一种训练终端部署在真实小区的示意图;
图17为本申请实施例提供的一种第一AI实体的结构示意图;
图18为本申请实施例提供的另一种第一AI实体的结构示意图;
图19为本申请实施例提供的另一种第一AI实体的结构示意图;
图20为本申请实施例提供的另一种第一AI实体的结构示意图;
图21为本申请实施例提供的一种终端设备的结构示意图;
图22为本申请实施例提供的另一种终端设备的结构示意图;
图23为本申请实施例提供的另一种终端设备的结构示意图;
图24为本申请实施例提供的另一种终端设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
在本申请实施例的描述之前,首先对相关概念进行阐述。
人工智能(AI)技术在图像处理与自然语言处理领域有着非常成功的应用,目前,学术界在将AI技术应用于网络层(如网络优化,移动性管理,资源分配等)和物理层(如信道编译码,信道预测、接收机等)等方面均有大量研究。
人工智能(artificial intelligence,AI)技术在图像处理与自然语言处理领域有着非常成功的应用。例如,将AI技术应用于网络层(如网络优化,移动性管理,资源分配等),或者将AI技术应用于物理层(如信道编译码,信道预测、接收机等)等方面。其中,比较常用的AI技术有监督学习和强化学习等。
其中,监督学习是指利用一组已知类别的样本调整分类器的参数,使其达到所要求性能的过程,也称为监督训练。监督学习的目标是给定一个训练集,学习训练集中输入和输出的映射关系。其中,训练集为正确的输入与输出的映射关系的集合。监督学习方法是目前研究较为广泛的一种机器学习方法,举例来说,监督学习方法包括神经网络传播算法、决策树学习算法等。
其中,强化学习是智能体(agent)与环境(environment)以交互的方式进行学习。请 参见图1,图1为一种智能体与环境之间的交互的示意图。其中,智能体可以根据环境反馈的状态(state),对环境做出动作(action),从而获得奖励(reward)及下一个时刻的状态,使智能体可以在一段时间内积累最大的奖赏。
强化学习不同于监督学习,主要表现在无需训练集,强化学习中由环境提供的强化信号对产生动作的好坏进行评价(通常采用标量信号),而不是告诉强化学习***如何去产生正确的动作。由于外部环境提供的信息很少,智能体需要靠自身的经历进行学习。通过这种方式,智能体在行动-评价的环境中获得知识,改进行动方案以适应环境。
常见的强化学习算法有Q学习(Q-learning),策略梯度(policy gradient),演员-批评家(actor-critic)等。例如,目前常用的强化学习算法为深度强化学习(deep reinforcement learning,DRL),其主要将强化学习与深度学习结合,采用神经网络对策略/价值函数进行建模,从而适应更大输入/输出维度。
AI技术中通常可以采用多种数学模型进行推理,以获取AI决策。其中,数学模型可以包括但不限于马尔可夫决策过程、神经网络等模型。例如,请参见图2,图2为一种马尔可夫决策过程(Markov decision processes,MDP)的示意图。其中,马尔可夫决策过程是一种分析决策问题的数学模型,其假设环境具有马尔可夫性质(环境的未来状态的条件概率分布仅依赖于当前状态),决策者通过周期性地观察环境的状态(如图2中的s 0、s 1等),根据当前环境的状态做出决策(如图2中的a 0、a 1等),与环境交互后得到新的状态及奖励(如图2中的r 0、r 1等),如图2所示。
随着未来移动通信网络技术的演进,新无线接入技术(new radio access technology,NR)对接入网的架构进行了重新的定义。请参见图3a,图3a为本申请实施例提供的一种网络***的示意图。其中,该网络***包括核心网(5GC)、接入网(NG-RAN)以及终端设备。其中,5GC与NG-RAN通过NG接口进行信息交互;NG-RAN中的接入网设备(例如gNB)之间可以通过Xn接口进行信息交互。终端设备可以与接入网设备通过无线链路相连接,实现终端设备与接入网设备之间的信息交互。
其中,网络***可以包括但不限于:全球移动通信***(global system for mobile communications,GSM)、宽带码分多址***(wideband code division multiple access,WCDMA)、长期演进***(long term evolution,LTE)、新一代无线接入技术(new radio access technology,NR)中的增强型移动宽带(enhanced mobile broadband,eMBB)场景、超可靠低时延通信(ultra-reliable low latency communications,uRLLC)场景和海量机器类通信(massive machine type communications,mMTC)场景、窄带物联网***(narrow band-internet of things,NB-IoT)等。
其中,接入网设备可以是任意一种具有无线收发功能的设备,为覆盖范围内的终端设备提供无线通信服务。接入网设备可以包括但不限于:长期演进(long term evolution,LTE)***中的演进型基站(NodeB或eNB或e-NodeB,evolutional NodeB),新一代无线接入技术(new radio access technology,NR)中的基站(gNodeB或gNB)或收发点(transmission receiving point/transmission reception point,TRP),3GPP后续演进的基站,WiFi***中的接入节点,无线中继节点,无线回传节点,车联网、D2D通信、机器通信中承担基站功能的设备,卫星等。
其中,终端设备可以是一种具有无线收发功能的设备,或者终端设备也可以是一种芯片。所述终端设备可以是用户设备(user equipment,UE)、手机(mobile phone)、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、车载终端设备、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、可穿戴终端设备、车联网、D2D通信、机器通信中的终端等。
可选的,请参见图3b,图3b为本申请实施例提供的一种5G RAN的架构示意图。其中,NG RAN中的接入网设备(例如gNB)可以包括集中式模块(central unit,CU)和分布式模块(distribute unit)。CU和DU之间可以通过F1接口进行信息交互,如图3b所示。
可见,日益成熟的AI技术将对未来移动通信网络技术的演进产生重要的推动作用。例如,AI实体可以部署在接入网中以提高接入网的处理能力(如提高资源分配效率等),但是目前并未定义接入网中的AI实体与用户设备(user equipment,UE)之间的基础交互方式,无法高效将AI技术应用于无线接入网中。
本申请实施例提供一种信息处理方法,该信息处理方法可以将AI技术应用于无线接入网,有利于提高无线接入网的处理能力。
其中,该信息处理方法可以应用于本申请实施例提供的一种RAN架构中。请参见图4,图4为本申请实施例提供的一种RAN架构。其中,该RAN架构中增加了第一AI实体(AI module),该并定义了第一AI实体与gNB之间可以通过A1接口进行信息交互,如图4所示。需要注意的是,本实施例所述的第一AI实体可以位于边缘/云接入网中,有利于通过边缘计算/云计算实现相应的AI功能。
可选的,第一AI实体还可以进一步拆分为第一AI实体-集中式模块(AM-CU)和第一AI实体-分布式模块(AM-DU)。gNB也可以在物理上拆分为gNB-CU和gNB-DU。其中,AM-CU与gNB-CU之间定义了通过A1-C接口进行信息交互,AM-DU与gNB-DU之间定义了通过A1-D接口进行信息交互,如图4所示。
其中,该AI接口的通信内容可以包括但不限于AI模型的上传/下载,数据的上传/下载,gNB与第一AI实体之间的信息交互(如第一AI实体中的性能跟踪模块可以监控gNB的性能数据)等。可选的,A1接口按照功能拆分为A1-C与A1-D接口,可以对应gNB-CU和gNB-DU的功能划分,各个接口的通信内容也不相同。例如,A1-D接口传输涉及到物理层(physical,PHY)、介质访问控制(media access control,MAC)层和无线链路控制(radio link control,RLC)层的消息;A1-C接口传输涉及到更高层(如分组数据汇聚协议(packet data convergence protocol,PDCP)层)的消息。
下面将结合具体的实施例进行描述。
请参见图5,图5为本申请实施例提供的一种信息处理方法的流程示意图。其中,图5中的信息处理方法流程由第一AI实体和终端设备之间的交互实现,可以包括以下步骤:
S501,终端设备向第一AI实体发送观察信息;对应的,第一AI实体接收终端设备发送的观察信息;
S502,第一AI实体根据观察信息和第一AI模型,确定终端设备的AI决策信息;
S503,第一AI实体向终端设备发送AI决策信息;对应的,终端设备接收第一AI实 体发送的AI决策信息。
本实施例中定义了当终端设备无AI能力时,第一AI实体与终端设备之间的一种基础交互方式。其中,终端设备是否具备AI能力可以通过终端设备的AI信息来指示。终端设备的AI信息可以包括但不限于以下参数:AI能力参数(AICapabilityClass)、AI更新参数(AIUpdateType)和AI交互参数(AIInteractionType)等。
其中,AI能力参数用于指示终端设备是否具备AI能力。具体的,AI能力参数可以通过具体的参数值来指示终端设备是否具备AI能力。
例如,当AICapabilityClass的参数值为class 0时,表示终端设备无AI能力。也就是说,该终端设备不具备AI推理和/或AI训练能力,即终端设备不能实现AI功能。
又例如,当AICapabilityClass的参数值为class 1时,表示终端设备具备AI推理能力。也就是说,该终端设备可以实现部分的AI功能,如获取AI决策。
又例如,当AICapabilityClass的参数值为class 2时,表示终端设备具备AI训练能力。也就是说,该终端设备可以实现部分的AI功能,如对AI模型进行训练,以获取更优的AI模型。
又例如,当AICapabilityClass的参数值为class 3时,表示终端设备具备AI推理能力和AI训练能力。也就是说,该终端设备可以实现AI功能,如对AI模型进行训练,以获取更优的AI模型,从而获取更优的AI决策。
需要注意的是,上述AICapabilityClass的参数值仅为一种示例,AICapabilityClass的参数值还可以是其他形式,例如采用二进制数表示,本实施例不作限定。
其中,AI更新参数用于指示终端设备是否进行AI更新。AI更新是指对数据进行更新。例如,若采用的AI算法为强化学习算法,那么终端设备可以向第一AI实体发送反馈信息,以使第一AI实体进行数据更新。具体的,AI更新参数也可以通过具体的参数值来指示是否进行AI更新。
例如,当AIUpdateType的参数值为type 0时,表示不进行AI更新。
又例如,当AIUpdateType的参数值为type 1时,表示通过事件触发进行AI更新。也就是说,当存在外部事件触发时,例如,由于环境变化导致AI模型不适配,可以通过长期KPI恶化事件触发AI更新。
又例如,当AIUpdateType的参数值为type 2时,表示定时触发进行AI更新。举例来说,***可以设置一个时间参数,该时间参数可以指示每隔一个预设的时间段,***将触发进行AI更新。
需要注意的是,上述AIUpdateType的参数值仅为一种示例,AIUpdateType的参数值还可以是其他形式,例如采用二进制数表示,本实施例不作限定。
其中,AI交互参数用于指示终端设备与第一AI实体之间的交互内容。本实施例中的终端设备与第一AI实体之间的交互内容可以包括但不限于数据、模型等。
终端设备和第一AI实体之间交互的数据是指用于进行AI推理和/或进行AI训练的数据,可以包括但不限于状态信息、观察信息等。
例如,当第一AI实体采用的是强化学习算法时,状态信息可以是如图1所示的强化学习算法中的环境反馈的状态。终端设备和第一AI实体之间交互的模型是指用于进行AI推 理和/或进行AI训练的模型,根据第一AI实体采用的AI算法对应不同的AI模型,本实施例不作限定。
具体的,AI交互参数可以通过具体的参数值来指示终端设备与第一AI实体之间的交互内容。
例如,当AIInteractionType的参数值为type 0时,表示终端设备和第一AI实体之间的交互内容包括上传数据和/或下载数据。
又例如,当AIInteractionType的参数值为type 1时,表示终端设备和第一AI实体之间的交互内容包括上传数据和/或下载模型。
又例如,当AIInteractionType的参数值为type 2时,表示终端设备和第一AI实体之间的交互内容包括上传模型和/或下载模型。
需要注意的是,上述AIInteractionType的参数值仅为一种示例,AIInteractionType的参数值还可以是其他形式,例如采用二进制数表示,本实施例不作限定。
可选的,在终端设备向第一AI实体发送观察信息之前,终端设备可以向第一AI实体发送终端设备的AI信息。其中,终端设备的AI信息包括可以包括上文实施例所述的AI能力参数、AI更新参数或AI交互参数中的一种或多种。
例如,当终端设备与第一AI实体建立通信连接后,终端设备可以向第一AI实体发送业务请求消息(例如资源分配请求消息),该业务请求消息中可以携带终端设备的AI信息,以使第一AI实体知晓终端设备是否具备AI能力。
其中,第一AI实体为接入网中新增的一种实体,该第一AI实体具备AI推理以及AI训练等AI功能。具体的,第一AI实体按照功能可以划分为多个功能模块,包括智能决策模块(intelligent policy function,IPF)、数据收集与训练模块(data and training function,DTF)、预处理模块(pre-processing function,PPF)、性能跟踪模块(performance monitoring function,PMF)等模块,各个模块分别用于执行相应的功能。
可选的,本实施例中的S501,终端设备向第一AI实体发送观察信息,可以是终端设备向第一AI实体的预处理模块发送观察信息。
其中,观察信息指示进行AI决策使用的数据。也就是说,观察信息是提供给AI决策使用的数据。例如,当终端设备向接入网请求资源调度时,终端设备向预处理模块发送的观察信息可以包括终端设备的吞吐量等数据。
可选的,本实施例中的S502,第一AI实体根据观察信息和第一AI模型,确定终端设备的AI决策信息,可以是第一AI实体的智能决策模块执行的。
其中,第一AI模型为第一AI实体中进行AI推理和/或AI训练的模型,也就是说,第一AI模型为边缘/云中的AI模型。根据采用的AI算法的不同,第一AI模型可以包括多种类型。例如,当采用的AI算法为深度强化学习时,第一AI模型可以是全连接神经网络模型。
可选的,本实施例中的S502还可以是第一AI实体的预处理模块和智能决策模块分别执行的,包括以下两个步骤:
预处理模块对观察信息进行预处理,得到对应的状态信息;
智能决策模块将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
其中,由于第一AI模型不能直接对观察信息进行处理,那么预处理模块可以先对观察信息进行预处理(例如,对数据进行归一化处理),得到状态信息。
其中,状态信息为采用AI模型进行推理时可以直接使用的数据,例如,状态信息可以指如图2所示的马尔可夫决策过程中的***状态(如s 0、s 1等),也可以指经过预处理的数据(隐马尔可夫模型中状态无法直接得到)。
智能决策模块可以将状态信息输入第一AI模型进行推理。例如,当第一AI实体采用的是强化学习算法时,智能决策模块可以是如图1所示的强化学习算法中的智能体,可以对环境做出动作,即得到终端设备的AI决策信息。
终端设备的AI决策信息是第一AI实体根据进行AI决策使用的数据进行AI推理得到的结果。
例如,当第一AI实体采用的是强化学习算法时,AI决策信息即为智能体输出的动作。具体来说,当终端设备向接入网请求资源调度时,第一AI实体进行AI推理得到的资源分配结果即终端设备的AI决策信息。
需要注意的是,相较于传统的资源分配方法,第一AI实体将AI技术应用于接入网中的资源分配,可以更针对性地为对应的终端设备分配资源,从而有利于优化整体网络性能。
可选的,本实施例中的S503,第一AI实体向终端设备发送AI决策信息,可以是第一AI实体的智能决策模块向终端设备发送AI决策信息。
可选的,请参见图6,图6为本申请实施例提供的一种终端设备无AI能力时的信息处理的流程图。其中,由于终端设备无AI能力,那么终端设备可以选择向边缘/云的第一AI实体请求AI决策。
需要注意的是,终端设备在采用图6所示的信息处理方法获取AI决策的时延较大,该方法适用于对时延不敏感的业务。
S601,终端设备向预处理模块发送观察信息,观察信息指示进行AI决策使用的数据;对应的,预处理模块接收终端设备发送的观察信息;
S602,预处理模块对观察信息进行预处理,得到对应的状态信息;
S603,预处理模块向智能决策模块发送状态信息;对应的,智能决策模块接收预处理模块发送的状态信息;
S604,智能决策模块将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息;
S605,智能决策模块向终端设备发送AI决策信息;对应的,终端设备接收智能决策模块发送的AI决策信息;
S606,终端设备根据AI决策信息执行决策。
上述S601至S606为终端设备的AIUpdateType的参数值为type 0时的整体处理流程。可选的,当AIUpdateType的参数值为type 1或type 2时,图6所示的信息处理流程还包括AI训练数据收集的过程,包括以下步骤:
S607,智能决策模块将状态信息和AI决策信息发送给数据收集与训练模块;
S608,数据收集与训练模块接收反馈信息。
其中,反馈信息用于指示进行AI训练使用的数据。根据AI算法的不同,数据收集与 训练模块接收的反馈信息也不相同。
例如,若第一AI实体采用的AI算法为强化学习,那么数据收集与训练模块接收终端设备或性能跟踪模块发送的奖励信息。那么S608可以包括两个并行的步骤S608a和S608b。其中,S608a为终端设备向数据收集与训练模块发送反馈信息;S608b为性能跟踪模块向数据收集与训练模块发送反馈信息。
又例如,若第一AI实体采用的AI算法为监督学习,那么数据收集与训练模块接收终端设备发送的标签信息。
需要注意的是,S607与S606在执行时并没有先后顺序,也就是说,S606与S607可以同时执行。
本申请实施例提供一种信息处理方法,该方法定义了一种第一AI实体与终端设备之间的基础交互方式。当终端设备无AI能力时,终端设备可以通过接入网中的第一AI实体来实现AI功能,得到终端设备的AI决策信息。也就是说,接入网中的第一AI实体实现了将AI技术应用于无线接入网,有利于提高无线接入网的处理能力。
请参见图7,图7为本申请实施例提供的另一种信息处理方法的流程示意图。其中,图7中的信息处理方法流程由第一AI实体和终端设备之间的交互实现,可以包括以下步骤:
S701,终端设备向第一AI实体发送请求消息;对应的,第一AI实体接收终端设备发送的请求消息;
S702,第一AI实体向终端设备发送第一AI模型信息;对应的,终端设备接收第一AI实体发送的第一AI模型信息;
S703,终端设备将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
本实施例中定义了当终端设备具备AI推理能力时,第一AI实体与终端设备之间的一种基础交互方式。也就是说,当终端设备的AICapabilityClass参数值为class 1时,终端设备可以实现AI推理,获取AI决策。
其中,由于终端设备只具备AI推理能力,而不具备AI训练能力,那么终端设备需要向第一AI实体发送请求消息以获取AI模型。其中,该请求消息用于向第一AI实体请求第一AI模型信息。第一AI模型信息可以是第一AI模型,也可以是第一AI模型的相关参数。
例如,当第一AI模型为神经网络时,第一AI模型信息可以是一个整体的神经网络,也可以是神经网络的相关参数(如该神经网络的层数,神经元的数目等)。
可选的,上述S701和S702的执行可以根据终端设备的AI更新参数来确定。也就是说,若AIUpdateType的参数值为type 0,则S701和S702只在初始化的时候执行一次。
若AIUpdateType的参数值为type1,则S701和S702根据事件触发执行,例如,性能跟踪模块监控到***性能发生恶化,则触发更新等。
若AIUpdateType的参数值为type2,则S701和S702定时执行。
可选的,本实施例中的终端设备具备AI推理能力,那么该终端设备可以包括多个AI功能模块。例如,该终端设备包括预处理模块和智能决策模块,用于实现AI推理过程。
需要注意的是,终端设备也可以通过本地的第二AI实体来实现AI推理功能。其中,本地的第二AI实体为与终端设备的物理上的距离较近的AI实体,可以是终端设备的外接设备,本实施例不作限定。
下面以终端设备自身包括多个AI功能模块为例进行描述。
请参见图8,图8为本申请实施例提供的一种终端设备具备AI推理能力时的信息处理的流程图。其中,终端设备具有AI推理能力,即终端设备至少包括智能决策模块。终端设备可以向边缘/云的第一AI实体请求第一AI模型信息,并在本地完成AI推理,得到AI决策信息。
需要注意的是,终端设备在采用图8所示的信息处理方法获取AI决策信息的时延较小,该方法适用于对时延敏感的业务。
为了便于描述,图8所示的实施例中的终端设备包括的AI功能模块分别称为第二模块,例如,终端设备的智能决策模块称为第二智能决策模块。图8所示的实施例中的第一AI实体包括的AI功能模块分别称为第一模块,例如,第一AI实体的智能决策模块称为第一智能决策模块。该信息处理的流程可以包括以下步骤:
S801,第二智能决策模块向第一智能决策模块发送请求消息;对应的,第一智能决策模块接收第二智能决策模块发送的请求消息;
S802,第一智能决策模块向第二智能决策模块发送第一AI模型信息;对应的,第二智能决策模块接收第一智能决策模块发送的第一AI模型信息;
S803,第二预处理模块获取观察信息;
S804,第二预处理模块对观察信息进行预处理,得到对应的状态信息;
S805,第二预处理模块向第二智能决策模块发送状态信息;对应的,第二智能决策模块接收第二预处理模块发送的状态信息;
S806,第二智能决策模块将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息;
S807,终端设备根据AI决策信息执行决策。
上述S801至S807为终端设备的AIUpdateType的参数值为type 0时的整体处理流程。可选的,当AIUpdateType的参数值为type 1或type 2时,图8所示的信息处理流程还包括AI训练数据收集的过程,包括以下步骤:
S808,第二智能决策模块将状态信息和AI决策信息发送给第一数据收集与训练模块;
S809,第一数据收集与训练模块接收反馈信息。
其中,反馈信息用于指示进行AI训练使用的数据。根据AI算法的不同,数据收集与训练模块接收的反馈信息也不相同。
例如,若第一AI实体采用的AI算法为强化学习,那么第一数据收集与训练模块接收终端设备或性能跟踪模块发送的奖励信息。那么S809可以包括两个并行的步骤S809a和S809b。其中,S809a为终端设备向第一数据收集与训练模块发送反馈信息;S809b为第一性能跟踪模块向第一数据收集与训练模块发送反馈信息。
又例如,若第一AI实体采用的AI算法为监督学习,那么第一数据收集与训练模块接收终端设备发送的标签信息。那么S809为第一数据收集与训练模块接收终端设备发送的标签信息。
需要注意的是,S807与S808在执行时并没有先后顺序,也就是说,S807与S808可以同时执行。
可选的,当本地的第二AI实体是终端设备的外接设备时,在S806之后,上述处理流程还可以包括以下步骤:
S8071,第二智能决策模块向终端设备发送终端设备的AI决策信息;
S808a,终端设备根据AI决策信息执行决策。
其中,S8071表示该步骤在S806之后执行,替代原S807。S808a表示该步骤与S808无先后执行顺序,即S808a可以与S808同时执行。
下面以一个具体的示例对图8所述的终端设备与第一AI实体之间交互的流程以及交互的信息进行详细的描述。其中,本示例中的终端设备实现的AI功能为使用AI进行信道译码。那么上述S801至S809具体可以包括以下步骤:
第二智能决策模块向第一智能决策模块发送请求消息,该请求消息用于请求信道译码模型;
第一智能决策模块向第二智能决策模块发送信道译码模型信息;
第二智能决策模块根据该信道译码模型信息,确定终端设备的信道译码模型;
第二预处理模块接收信号,该信号为待译码的数据;
第二预处理模块对该信号进行预处理,得到该信号的对数似然比;
第二预处理模块向第二智能决策模块发送该信号的对数似然比;
第二智能决策模块将该信号的对数似然比输入终端设备的信道译码模型进行推理,得到该信号的译码数据;
终端设备使用该信号的译码数据。
可选的,若终端设备的AIUpdateType的参数值为type 1或type 2,还包括以下步骤:
第二智能决策模块将该信号的对数似然比和该信号的译码数据发送给第一数据收集与训练模块;
第一数据收集与训练模块接收标签信息,该标签信息包括正确的译码数据;或者,第一数据收集与训练模块接收奖励信息,当正确译码时该奖励信息为1,译码失败时该奖励信息为0。
本申请实施例提供一种信息处理方法,该方法定义了另一种第一AI实体与终端设备之间的基础交互方式。当终端设备具备AI推理能力时,终端设备可以根据第一AI模型进行推理得到终端设备的AI决策信息,从而实现相应的AI功能。
请参见图9,图9为本申请实施例提供的另一种信息处理方法的流程示意图。其中,图9中的信息处理方法流程由第一AI实体和终端设备之间的交互实现,可以包括以下步骤:
S901,终端设备向第一AI实体发送第二AI模型信息;对应的,第一AI实体接收终端设备发送的第二AI模型信息;
S902,第一AI实体根据第二AI模型信息,更新第一AI模型信息;
S903,第一AI实体向终端设备发送更新后的第一AI模型信息;对应的,终端设备接收第一AI实体发送的更新后的第一AI模型信息。
本实施例中定义了当终端设备具备AI训练能力时,第一AI实体与终端设备之间的一种基础交互方式。也就是说,当终端设备的AICapabilityClass参数值为class 2时,终端设备可以训练AI模型。
其中,第二AI模型信息为终端设备或第二AI实体中的AI模型信息。类似于第一AI模型信息,第二AI模型信息可以是第二AI模型,也可以是第二AI模型的相关参数,本实施例不作限定。
其中,第一AI模型和/或第二AI模型都是对应的第一数据收集与训练模块和/或第二数据收集与训练模块通过训练得到的。例如,当第一AI模型和/或第二AI模型采用的是神经网络时,可以采用神经网络的训练方式对第一AI模型和/或第二AI模型进行训练。
举例来说,数据收集与训练模块可以随机初始化一个神经网络,每一次的训练为用已有数据从随机的神经元的权重矩阵和偏置向量中得到新的神经网络的过程。在训练过程中,可以采用损失函数(loss function)对神经网络的输出结果进行评价,并将误差反向传播,通过梯度下降的方法可以迭代优化,直至损失函数达到最小值。也就是说,数据收集与训练模块可以通过上述迭代优化的过程对AI模型进行训练,得到更优的AI模型。
可选的,第二AI模型信息不包括终端设备的用户数据。也就是说,终端设备向第一AI实体发送的第二AI模型信息与终端设备本身的数据无关,有利于终端设备的隐私保护。
可选的,第二AI模型信息也可以包括终端设备的用户数据,以使训练后的AI模型更优,有利于获取更适用的AI决策信息。
在一种示例中,请参见图10,图10为本申请实施例提供的一种联邦学习的流程示意图。其中,图10所示的联邦学习流程为当终端设备具备AI训练能力时,第一AI实体与终端设备之间的一种基础交互方式的具体应用的示例。该联邦学习流程包括以下步骤:
S1001,第二智能决策模块向第二数据收集与训练模块发送AI训练数据请求消息;
S1002,第二数据收集与训练模块向第二智能决策模块发送第二AI训练数据;
S1003,第二智能决策模块根据第二AI训练数据,训练第二AI模型;
S1004,第二智能决策模块向第一数据收集与训练模块发送第二AI模型信息;
S1005,第一智能决策模块向第一数据收集与训练模块发送AI训练数据请求消息;
S1006,第一数据收集与训练模块向第一智能决策模块发送第一AI训练数据;
S1007,第一智能决策模块根据第一AI训练数据,训练第一AI模型;
S1008,第一智能决策模块向第二智能决策模块发送训练后的第一AI模型信息。
其中,第一AI训练数据是指第一AI实体中的AI训练数据,第二AI训练数据是指终端设备中的AI训练数据。第一AI模型是指第一AI实体中的AI模型,第二AI模型是指终端设备中的AI模型。
其中,第二智能决策模块向第一数据收集与训练模块发送第二AI模型信息的步骤可以是定时触发的。也就是说,一个或多个本地的终端设备可以定时向云端上传一个或多个第二AI模型信息,云端可以保存本地上传的第二AI模型信息。
当云端的第一AI实体训练更新第一AI模型后,第一AI实体可以将训练后的第一AI模型信息下发给本地。本地再对第一AI模型进行训练更新,以此循环。该循环流程可以是无限循环,也可以是设置一个阈值(如损失函数),当损失函数小于阈值时停止循环,联邦学习流程结束。
在一种示例中,当终端设备具备AI训练能力时,终端设备也可以进行本地的AI训练。也就是说,上述AI训练交互流程可以是终端设备内部的模块之间的交互,通过AI训练获 取第二AI模型信息。
类似的,由于第一AI实体具备AI训练能力,上述AI训练交互流程也可以是第一AI实体内部的模块之间的交互,通过AI训练获取第一AI模型信息。
下面对终端设备进行本地训练,或者第一AI实体进行云端训练进行详细的举例说明。
请参见图11,图11为本申请实施例提供的一种AI训练的流程示意图。为了便于描述,图10中的智能决策模块和/或数据收集与训练模块可以指代第一/第二智能决策模块,和/或,第一/第二数据收集与训练模块。
类似的,图11中的AI训练数据和/或AI模型可以指代第一/第二AI训练数据,和/或,第一/第二AI模型。
S1101,智能决策模块向数据收集与训练模块发送AI训练数据请求消息;
S1102,数据收集与训练模块向智能决策模块发送AI训练数据;
S1103,智能决策模块根据AI训练数据,训练AI模型。
其中,AI训练数据可以包括但不限于AI决策信息、状态信息或反馈信息等。例如,当上述AI训练流程是终端设备内部的AI训练流程时,第二智能决策模块可以根据状态信息,更新第二AI模型。又例如,当上述AI训练流程是第一AI实体内部的AI训练流程时,第一智能决策模块可以根据AI决策信息,更新第一AI模型。
在一种示例中,当终端设备具备AI推理能力和AI训练能力时,终端设备可以通过内部模块的实现AI推理和AI训练的过程。也就是说,当终端设备的AICapabilityClass参数值为class 3时,终端设备可以训练AI模型,并且进行AI推理,得到AI决策信息。
其中,终端设备进行AI推理和AI训练的过程即为将前文实施例中的终端设备进行AI推理的过程与终端设备进行AI训练的过程进行结合得到,具体可以参考前文图8和图11所示的实施例中的详细描述,在此不再赘述。
本申请实施例提供一种信息处理方法,该方法定义了另一种第一AI实体与终端设备之间的基础交互方式。当终端设备具备AI训练能力时,终端设备可以对本地的第一AI模型进行训练更新,也可以与云端的第一AI实体进行交互,训练更新第一AI模型,从而使AI模型更适用于不同的应用场景。
基于上文实施例中的描述,下面对本申请实施例所述的信息处理方法应用于不同场景时的具体实现方式进行详细的描述。
在一种示例中,假设终端设备或第一AI实体采用的AI算法为DRL。在DRL中,***的奖励(reward)函数可以作为指示算法最终收敛的性能指标。其中,DRL在线学习流程可以由终端设备和第一AI实体之间的交互实现,也可以是具备AI推理能力和/或AI训练能力的终端设备的内部模块实现的。
下面以终端设备和第一AI实体之间的交互实现为例进行详细的描述。
请参见图12,图12为本申请实施例提供的一种DRL在线学习的流程示意图。该DRL在线学习流程包括以下步骤:
S1201,第一数据收集与训练模块向第一性能跟踪模块发送奖励函数请求消息;
S1202,第一性能跟踪模块向第一数据收集与训练模块发送奖励函数;
S1203a,终端设备向第一数据收集与训练模块发送奖励信息;
S1203b,第一性能跟踪模块向第一数据收集与训练模块发送奖励函数更新指示信息;
S1204,第一数据收集与训练模块根据奖励信息更新奖励函数。
其中,第一性能跟踪模块可以监控***的长期关键绩效指标(key performance indicator,KPI),该KPI可以用于指导第一数据收集与训练模块生成奖励函数R(θ,φ)。其中,R表示奖励;目标参数θ为终端设备执行AI决策信息得到的性能数据,如吞吐量、丢包率等。其中,目标参数的权重值φ为第一AI实体根据一个或多个终端设备的性能数据确定的,用于指示不同短期KPI的权重。也就是说,目标参数的权重值φ可以是第一AI实体中的第一性能跟踪模块对***中的所有终端设备的性能进行长期监测得到的。
其中,S1203a和S1203b无先后执行顺序,即S1203a可以与S1203b同时执行。
可选的,S1203b可以是定时发生,也可以是环境变化等因素导致AI模型不适配,从而触发奖励函数更新。例如,长期KPI恶化触发奖励函数的自适应过程中,可以是第一数据收集与训练模块向第一性能跟踪模块发送奖励函数请求消息,以请求奖励函数的更新。
下面通过一个具体的示例来说明在DRL调度的过程中,***如何自适应地调整奖励函数。
假设奖励函数为R(θ,φ)=α×thp+β×jfi+γ×pdr,其中,目标参数θ={thp,jfi,pdr}即包含三类性能数据,分别表示吞吐量、公平性参数和丢包率。φ={α,β,γ}即包含上述三类性能数据的权重,假设初始值φ={1,1,1}。若***运行一段时间后由于突发事件,导致PMF监测到公平性恶化,则触发奖励函数更新,以使上述三类性能数据的权重更新为φ={1,2,1}。
在一种示例中,假设终端设备或第一AI实体采用的AI算法为DRL。为了避免DRL在线学习中探索对***造成灾难性的影响,本申请实施例提供一种决策早停技术,该决策早停技术可以通过性能跟踪模块对***性能进行预测,并判断是否会出现灾难性的性能损失,从而及早避免探索对***造成灾难性的影响。
请参见图13,图13为本申请实施例提供的一种决策早停技术的流程示意图。其中,该决策早停技术流程可以由终端设备和第一AI实体之间的交互实现,也可以是具备AI推理能力和/或AI训练能力的终端设备的内部模块实现的。
也就是说,图13中的多个AI功能模块可以是云端的第一AI实体中的功能模块,也可以是本地的终端设备内部或外接的第二AI实体中的功能模块,本实施例不作限定。
S1301,终端设备向预处理模块发送观察信息;
S1302,预处理模块对观察信息进行预处理,得到对应的状态信息;
S1303,预处理模块向智能决策发送状态信息;
S1304,智能决策模块进行模型推理得到终端设备的AI决策信息;
S1305,性能跟踪模块对***性能进行预测,获得决策掩码信息和/或惩罚信息;
S1306,性能跟踪模块向智能决策模块发送决策掩码信息;
S1307,智能决策模块根据决策掩码信息对AI决策信息进行掩码处理,得到掩码处理后的AI决策信息;
S1308a,性能跟踪模块向数据收集与训练模块发送状态信息、决策掩码信息和惩罚信息中的一种或多种;
S1308b,智能决策模块向终端设备发送掩码处理后的AI决策信息;
S1309a,终端设备根据掩码处理后的AI决策信息执行决策;
S1309b,智能决策模块向数据收集与训练模块发送状态信息及掩码处理后的AI决策信息;
S1310,终端设备向数据收集与训练模块发送反馈信息。
其中,性能跟踪模块需要具备长期性能预测能力。例如,性能跟踪模块需要根据***目前状态和模型所做的决策,判断是否会出现灾难性的性能损失。
可选的,本实施例所述的决策早停技术还可以包括模型同步的步骤。也就是说,在终端设备向预处理模块发送观察信息之前,还可以包括以下步骤:智能决策模块向性能跟踪模块发送AI模型信息。
其中,模型同步的步骤,以及S1308a这两个步骤是否需要取决于性能跟踪模块的预测能力。也就是说,若性能跟踪模块的预测能力较强,那么上述模型同步以及S1308a这两个步骤均为可选的步骤。
其中,决策掩码信息用于对AI决策信息进行掩码处理,以使降低***性能的部分被处理掉。例如,若接入***的某一个或多个用户将显著降低***性能,那么性能跟踪模块可以将该一个或多个用户的AI决策的权重降至最低,那么该一个或多个用户将不再执行对应的AI决策。决策掩码信息可以直接根据预测结果得到,也可以通过性能跟踪模块内的备份算法得到。
可选的,性能跟踪模块进行预测获得决策掩码信息和/或惩罚信息后,还可以将决策掩码信息和/或惩罚信息作为一个训练样本,将该训练样本发送给数据收集与训练模块。
可见,相较于没有决策早停的方案,图13所述的决策早停方案中一次采样将获得两组训练样本,提高了DRL的采样效率。
下面以DRL调度过程为例来说明图13所述的决策早停方案。
例如,***针对5个用户的调度决策,DRL产生的决策权重可能是{1.5,1.1,1.2,0.2,0}。但是在一种可能的情况下,用户0和用户4的预计吞吐量为0,那么在这种情况下调度用户0和/或用户4必然会带来***资源的浪费。
性能跟踪模块在预测到这种情况之后,可以产生决策掩码,例如该5个用户的决策掩码分别为{0,1,1,1,0}。根据上述决策掩码,性能跟踪模块可以得到进行掩码处理后的决策权重分别为{0,1.1,1.2,0.2,0}。那么根据该决策权重信息,***将会调度用户2。可见,该调度有利于降低***资源的浪费,优化***的整体性能。
在一种示例中,本申请实施例提供一种DRL算法部署在小区的应用示例。当DRL算法在各个小区部署上线后,该DRL算法可以分为两个阶段:智能体的模仿学习阶段和智能体在线强化学习阶段,如图14所示。其中,本示例所述的智能体可以是第一AI实体,也 可以是具备AI推理能力和/或AI训练能力的本地的第二AI实体。
其中,智能体的模仿学习阶段为第一阶段。该第一阶段中,智能体需要训练数据对智能体进行初始化训练。例如,基站采用传统调度算法进行初始化训练,并且保存整个调度过程中的轨迹信息,使得基站可以根据保存的轨迹信息进行监督学习,从而实现对基站进行初始化训练。
可选的,为了解决强化学习需要大量交互数据的问题,本申请实施例提出智能体的模仿学习阶段可以是一种虚拟小区(virtual cell,vCell)辅助训练的过程。其中,第一AI实体可以获取小区的基础真实信息,用于训练生成vCell。其中,小区的基础真实信息可以包括但不限于小区内的终端设备的位置信息、移动性信息、业务信息、信道信息等相关信息。vCell一般由神经网络构成。
例如,在智能体的模仿学习阶段,第一AI实体可以采用生成对抗网络(generative adversarial networks,GAN)算法。其中,GAN的训练过程的原理是先固定生成网络并训练鉴别网络,使之能区分真实数据与虚拟数据;随后固定鉴别网络,训练生成网络,使生成网络生成的虚拟数据与真实数据尽量相似,然后交替直至收敛。
基于上述原理,第一AI实体可以获取真实数据与生成网络得到的虚拟数据,交替训练鉴别网络和生成网络。例如,第一AI实体可以获取小区内的终端设备的位置信息、移动性信息、业务信息、信道信息等相关信息,并将上述相关信息输入生成网络得到虚拟数据。第一AI实体中的数据收集与训练模块可以对虚拟数据进行训练,即根据虚拟数据交替训练鉴别网络和生成网络,从而生成vCell,如图15所示。
可选的,vCell可以进一步分解为多个虚拟用户设备(virtual UE,vUE)和虚拟环境(virtual environment,vEnv)。其中vUE用于对UE进行建模,vEnv用于对环境进行建模。例如,vUE可以采用multi-agentGAN算法,确定UE的位置信息、移动性信息和业务信息等。又例如,vEnv可以采用conditional GAN算法,根据UE位置信息、地形信息、天气信息等生成对应的传输信道。
其中,智能体的在线强化学习阶段为第二阶段。该第二阶段中,智能体可以与已完成训练的vCell进行交互。其中,由于与vCell交互的代价和风险远小于和真实小区进行交互,那么vCell的引入将会大大提高智能体的收敛速度。
可选的,智能体也可以进行在线训练。也就是说,本示例中的智能体可以根据如图12和13所示的深度强化学习流程进行在线训练。具体的实现流程请参考图12和图13所示的实施例中的描述,在此不再赘述。
可选的,本申请实施例还提供一种训练终端,用于辅助DRL算法的在线训练。举例来说,请参见图16,图16为本申请实施例提供的一种训练终端部署在真实小区的示意图。其中,在真实小区内可以部署多个和/或多种训练终端(training UE,tUE),各个tUE可以与智能体之间进行交互,交互的方式可以包括但不限于通过模仿学习进行交互、通过强化学习进行交互等,如图16所示。
其中,tUE具有以下特点:直接与强化学习算法交互;可以在空闲时获取大量训练样本;可以对小区内非通信可感知的数据进行采集;可以提供增强覆盖服务;可以是固定位置的设备,也可以是移动的设备。也就是说,tUE可以是本申请实施例所述的终端设备包 括的类型中的任意一种或多种。
例如,tUE具备在空闲时获取大量训练样本的特点,那么tUE可以在夜晚采集大量的训练样本。又例如,tUE具备对小区内非通信可感知的数据进行采集的特点,那么tUE可以采集天气信息,地形信息,阻挡物信息等可以作为训练样本的数据,并用于vCell建模。再例如,tUE具备提供增强覆盖服务的特点,那么tUE还可以是小站,无人机等设备。
可见,tUE可以有效获取训练数据且不影响实际业务,显著提高训练效率。
以下结合图17至图24详细说明本申请实施例的相关设备。
本申请实施例提供一种第一AI实体,如图17所示。该第一AI实体用于实现上述方法实施例中的第一AI实体所执行的方法,具体包括预处理模块1701和智能决策模块1702。
其中,预处理模块1701用于接收终端设备发送的观察信息,该观察信息指示进行AI决策使用的数据。智能决策模块1702用于根据观察信息和第一AI模型,确定终端设备的AI决策信息。智能决策模块1702还用于向终端设备发送AI决策信息。
在一种实现方式中,预处理模块1701还用于接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种实现方式中,若终端设备的AI能力参数指示终端设备无AI能力,预处理模块1701用于接收终端设备发送的观察信息。
在一种实现方式中,预处理模块1701还用于对观察信息进行预处理,得到对应的状态信息。智能决策模块1702还用于将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
示例性的,上述预处理模块1701可以用于执行图5中的S501和图6中的S601至S603,智能决策模块1702用于执行图5中的S502和S503,以及图6中的S604、S605和S607。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
在一种实现方式中,图17中的各个模块所实现的相关功能可以通过收发器和处理器来实现。请参见图18,图18是本申请实施例提供的一种第一AI实体的结构示意图,该第一AI实体可以为具有执行本申请实施例所述的信息处理功能的设备(例如芯片)。
该第一AI实体可以包括收发器1801、至少一个处理器1802和存储器1803。其中,收发器1801、处理器1802和存储器1803可以通过一条或多条通信总线相互连接,也可以通过其它方式相连接。
其中,收发器1801可以用于发送信息,或者接收信息。可以理解的是,收发器1801是统称,可以包括接收器和发送器。例如,接收器用于接收终端设备发送的观察信息。又例如,发送器用于向终端设备发送AI决策信息。
在一种实现方式中,收发器1801可以用于实现图17所示的预处理模块和智能决策模块的部分或全部功能。
其中,处理器1802可以用于对信息进行处理。例如,处理器1802可以调用存储器1803中存储的程序代码,实现根据观察信息和第一AI模型,确定终端设备的AI决策信息。
其中,处理器1802可以包括一个或多个处理器,例如该处理器1802可以是一个或多个中央处理器(central processing unit,CPU),网络处理器(network processor,NP),硬件 芯片或者其任意组合。在处理器1802是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
在一种实现方式中,处理器1802可以用于实现图17所示的预处理模块和智能决策模块的部分或全部功能。
其中,存储器1803用于存储程序代码等。存储器1803可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器1803也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器1803还可以包括上述种类的存储器的组合。
其中,上述处理器1802和存储器1803可以通过接口耦合,也可以集成在一起,本实施例不作限定。
上述收发器1801和处理器1802可以用于实现本申请实施例中的信息处理方法,其中,具体实现方式如下:
收发器1801用于接收终端设备发送的观察信息,该观察信息指示进行AI决策使用的数据。处理器1802用于根据观察信息和第一AI模型,确定终端设备的AI决策信息。收发器1801还用于向终端设备发送AI决策信息。
在一种实现方式中,收发器1801还用于接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种实现方式中,若终端设备的AI能力参数指示终端设备无AI能力,收发器1801用于接收终端设备发送的观察信息。
在一种实现方式中,处理器1802还用于对观察信息进行预处理,得到对应的状态信息,再将状态信息输入第一AI模型进行推理,得到终端设备的AI决策信息。
示例性的,上述收发器1801可以用于执行图5中的S501和S503,以及图6中的S601、S603和S605,处理器1802用于执行图5中的S502,以及图6中的S602和S604。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
本申请实施例提供另一种第一AI实体,如图19所示。该第一AI实体用于实现上述方法实施例中的第一AI实体所执行的方法,具体包括智能决策模块1901、预处理模块1902、数据收集与训练模块1903和性能跟踪模块1904。其中,智能决策模块1901用于接收终端设备发送的第二AI模型信息,第二AI模型信息不包括终端设备的用户数据。智能决策模块1901还用于根据第二AI模型信息,更新第一AI模型信息;第一AI模型信息为第一AI实体的AI模型信息。智能决策模块1901还用于向终端设备发送更新后的第一AI模型信息。
在一种实现方式中,智能决策模块1901还用于接收终端设备发送的请求消息,该请求消息用于请求第一AI模型信息。智能决策模块1901还用于向终端设备发送第一AI模型信息。
在一种实现方式中,预处理模块1902用于接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种实现方式中,若AI能力参数指示终端设备具备AI推理能力,智能决策模块1901 还用于接收终端设备发送的AI决策信息和状态信息;其中,AI决策信息是终端设备将状态信息输入第二AI模型进行推理得到的,状态信息是终端设备根据观察信息得到的;观察信息指示进行AI决策使用的数据。
在一种实现方式中,预处理模块1902还用于接收终端设备的AI信息,该AI信息包括AI更新参数。若AI更新参数指示定时AI更新或事件触发AI更新,数据收集与训练模块1903用于接收反馈信息,该反馈信息用于指示进行AI训练使用的数据。
在一种实现方式中,智能决策模块1901还用于根据AI训练数据,更新第一AI模型;其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种实现方式中,反馈信息包括奖励信息;奖励信息用于更新所述第一AI模型。
在一种实现方式中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
在一种实现方式中,性能跟踪模块1904用于向数据收集与训练模块1903发送奖励信息。
示例性的,上述智能决策模块1901可以用于执行图9中的S901至S903,以及图10中的S1005至S1008,预处理模块1902用于执行前文实施例中的接收终端设备的AI信息的步骤,数据收集与训练模块1903用于执行图8中的S809a和S809b,图12中的S1203a、S1203b和S1204,图13中的S1309b和S1310。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
在一种实现方式中,图19中的各个模块所实现的相关功能可以通过收发器和处理器来实现。请参见图20,图20是本申请实施例提供的一种第一AI实体的结构示意图,该第一AI实体可以为具有执行本申请实施例所述的信息处理功能的设备(例如芯片)。
其中,第一AI实体可以包括收发器2001、至少一个处理器2002和存储器2003。其中,收发器2001、处理器2002和存储器2003可以通过一条或多条通信总线相互连接,也可以通过其它方式相连接。
其中,收发器2001可以用于发送信息,或者接收信息。可以理解的是,收发器2001是统称,可以包括接收器和发送器。例如,接收器用于接收终端设备发送的第二AI模型信息。又例如,发送器用于向终端设备发送更新后的第一AI模型信息。
在一种实现方式中,收发器2001可以用于实现图19所示的智能决策模块1901、预处理模块1902、数据收集与训练模块1903和性能跟踪模块1904的部分或全部功能。
其中,处理器2002可以用于对信息进行处理。例如,处理器2002可以调用存储器2003中存储的程序代码,实现根据第二AI模型信息,更新第一AI模型信息。
其中,处理器2002可以包括一个或多个处理器,例如该处理器2002可以是一个或多个中央处理器(central processing unit,CPU),网络处理器(network processor,NP),硬件芯片或者其任意组合。在处理器2002是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
在一种实现方式中,处理器2002可以用于实现图19所示的智能决策模块1901、预处 理模块1902、数据收集与训练模块1903和性能跟踪模块1904的部分或全部功能。
其中,存储器2003用于存储程序代码等。存储器2003可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器2003也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器2003还可以包括上述种类的存储器的组合。
其中,上述处理器2002和存储器2003可以通过接口耦合,也可以集成在一起,本实施例不作限定。
上述收发器2001和处理器2002可以用于实现本申请实施例中的信息处理方法,其中,具体实现方式如下:
收发器2001,用于接收终端设备发送的第二AI模型信息,第二AI模型信息不包括终端设备的用户数据;
处理器2002,用于根据第二AI模型信息,更新第一AI模型信息;第一AI模型信息为第一AI实体的AI模型信息;
收发器2001还用于向终端设备发送更新后的第一AI模型信息。
在一种实现方式中,收发器2001还用于:
接收终端设备发送的请求消息,该请求消息用于请求第一AI模型信息;
向终端设备发送第一AI模型信息。
在一种实现方式中,收发器2001还用于:
接收终端设备的AI信息,该AI信息包括AI能力参数。
在一种实现方式中,若AI能力参数指示终端设备具备AI推理能力,收发器2001还用于:
接收终端设备发送的AI决策信息和状态信息;其中,AI决策信息是终端设备将状态信息输入第二AI模型进行推理得到的,状态信息是终端设备根据观察信息得到的;观察信息指示进行AI决策使用的数据。
在一种实现方式中,收发器2001还用于接收终端设备的AI信息,该AI信息包括AI更新参数;
若AI更新参数指示定时AI更新或事件触发AI更新,收发器2001还用于接收反馈信息,该反馈信息用于指示进行AI训练使用的数据。
在一种实现方式中,处理器2002还用于:
根据AI训练数据,更新第一AI模型;其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种实现方式中,反馈信息包括奖励信息;奖励信息用于更新所述第一AI模型。
在一种实现方式中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
示例性的,上述收发器1801可以用于执行图9中的S901和S903,图10中的S1004,图12中的S1201至S1203a,以及图13中的S1301和S1308b。处理器1802用于执行图9 中的S902,图10中的S1005至S1007,以及图12中S1204。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
本申请实施例提供一种终端设备,如图21所示。该终端设备用于实现上述方法实施例中的终端设备所执行的方法,具体包括收发模块2101和处理模块2102。其中,收发模块2101用于向第一AI实体发送观察信息,该观察信息指示进行AI决策使用的数据。收发模块2101还用于接收第一AI实体发送的终端设备的AI决策信息。处理模块2102用于根据AI决策信息执行决策。
在一种实现方式中,收发模块2101还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,AI能力参数指示终端设备无AI能力。
在一种实现方式中,终端设备的AI决策信息是第一AI实体将状态信息输入第一AI模型进行推理得到的;状态信息是第一AI实体根据观察信息得到的。
示例性的,上述收发模块2101可以用于执行图5中的S501和S503,图6中的S601和S605。处理模块2102用于执行图6中的S606。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
在一种实现方式中,图21中的各个模块所实现的相关功能可以通过收发器和处理器来实现。请参见图22,图22是本申请实施例提供的一种终端设备的结构示意图,该终端设备可以为具有执行本申请实施例所述的信息处理功能的设备(例如芯片)。
其中,终端设备可以包括收发器2201、至少一个处理器2202和存储器2203。其中,收发器2201、处理器2202和存储器2203可以通过一条或多条通信总线相互连接,也可以通过其它方式相连接。
其中,收发器2201可以用于发送信息,或者接收信息。可以理解的是,收发器2201是统称,可以包括接收器和发送器。例如,接收器用于接收第一AI实体发送的终端设备的AI决策信息。又例如,发送器用于向第一AI实体发送观察信息。
在一种实现方式中,收发器2201可以用于实现图21所示的收发模块2101的部分或全部功能。
其中,处理器2202可以用于对信息进行处理。例如,处理器2202可以调用存储器2203中存储的程序代码,实现根据AI决策信息执行决策。
其中,处理器2202可以包括一个或多个处理器,例如该处理器2202可以是一个或多个中央处理器(central processing unit,CPU),网络处理器(network processor,NP),硬件芯片或者其任意组合。在处理器2202是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
在一种实现方式中,处理器2201可以用于实现图21所示的处理模块2102的部分或全部功能。
其中,存储器2203用于存储程序代码等。存储器2203可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器2203也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快 闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器2203还可以包括上述种类的存储器的组合。
其中,上述处理器2202和存储器2203可以通过接口耦合,也可以集成在一起,本实施例不作限定。
上述收发器2201和处理器2202可以用于实现本申请实施例中的信息处理方法,其中,具体实现方式如下:
收发器2201用于向第一AI实体发送观察信息,该观察信息指示进行AI决策使用的数据。收发器2201还用于接收第一AI实体发送的终端设备的AI决策信息。处理器2202用于根据AI决策信息执行决策。
在一种实现方式中,收发器2201还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,AI能力参数指示终端设备无AI能力。
在一种实现方式中,终端设备的AI决策信息是第一AI实体将状态信息输入第一AI模型进行推理得到的;状态信息是第一AI实体根据观察信息得到的。
示例性的,上述收发器2201可以用于执行图5中的S501和S503,图6中的S601和S605。处理器2202用于执行图6中的S606。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
本申请实施例提供另一种终端设备,如图23所示。该终端设备用于实现上述方法实施例中的终端设备所执行的方法,具体包括收发模块2301和处理模块2302。其中,收发模块2301用于向第一AI实体发送请求消息,该请求消息用于请求第一AI模型信息。收发模块2301还用于接收第一AI实体发送的第一AI模型信息。处理模块2302用于将状态信息输入第二AI模型进行推理,得到终端设备的AI决策信息;其中,状态信息基于观察信息确定;观察信息指示进行AI决策使用的数据;第二AI模型是终端设备根据第一AI模型信息确定的。
在一种实现方式中,收发模块2301还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,AI能力参数指示终端设备具备AI推理能力。
在一种实现方式中,收发模块2301还用于向第一AI实体发送AI决策信息和状态信息。
在一种实现方式中,终端设备的AI信息包括AI能力参数和/或AI更新参数;收发模块2301还用于若AI更新参数指示定时AI更新或事件触发AI更新,向第一AI实体发送反馈信息,该反馈信息用于指示进行AI训练使用的数据。
在一种实现方式中,处理模块2302还用于若AI能力参数指示终端设备具备AI训练能力,根据AI训练数据,获取第二AI模型;其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种实现方式中,收发模块2301还用于向第一AI实体发送第二AI模型信息。收发模块2301还用于接收第一AI实体发送的更新后的第一AI模型信息,更新后的第一AI模型信息是第一AI实体根据第二AI模型信息确定的。
在一种实现方式中,反馈信息包括奖励信息;奖励信息用于更新所述第一AI模型。
在一种实现方式中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参 数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
示例性的,上述收发模块2301可以用于执行图7中的S701和S702,图8中的S801和S802,图9中的S901和S903,图10中的S1004和S1008,图12中的S1203a,图13中的S1301和S1308b。处理模块2302用于执行图7中的S703,图8中的S803、S804和S808,图10中的S1003,图13中的S1309a。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
在一种实现方式中,图23中的各个模块所实现的相关功能可以通过收发器和处理器来实现。请参见图24,图24是本申请实施例提供的一种终端设备的结构示意图,该终端设备可以为具有执行本申请实施例所述的信息处理功能的设备(例如芯片)。
其中,终端设备可以包括收发器2401、至少一个处理器2402和存储器2403。其中,收发器2401、处理器2402和存储器2403可以通过一条或多条通信总线相互连接,也可以通过其它方式相连接。
其中,收发器2401可以用于发送信息,或者接收信息。可以理解的是,收发器2401是统称,可以包括接收器和发送器。例如,接收器用于接收第一AI实体发送的第一AI模型信息。又例如,发送器用于向第一AI实体发送请求消息。
在一种实现方式中,收发器2401可以用于实现图23所示的收发模块2301的部分或全部功能。
其中,处理器2402可以用于对信息进行处理。例如,处理器2402可以调用存储器2403中存储的程序代码,实现根据AI决策信息执行决策。
其中,处理器2402可以包括一个或多个处理器,例如该处理器2402可以是一个或多个中央处理器(central processing unit,CPU),网络处理器(network processor,NP),硬件芯片或者其任意组合。在处理器2402是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。
在一种实现方式中,处理器2402可以用于实现图23所示的处理模块2302的部分或全部功能。
其中,存储器2403用于存储程序代码等。存储器2403可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器2403也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM),快闪存储器(flash memory),硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器2403还可以包括上述种类的存储器的组合。
其中,上述处理器2402和存储器2403可以通过接口耦合,也可以集成在一起,本实施例不作限定。
上述收发器2401和处理器2402可以用于实现本申请实施例中的信息处理方法,其中,具体实现方式如下:
收发器2401用于向第一AI实体发送请求消息,该请求消息用于请求第一AI模型信息。收发器2401还用于接收第一AI实体发送的第一AI模型信息。处理器2402用于将状态信 息输入第二AI模型进行推理,得到终端设备的AI决策信息;其中,状态信息基于观察信息确定;观察信息指示进行AI决策使用的数据;第二AI模型是终端设备根据第一AI模型信息确定的。
在一种实现方式中,收发器2401还用于向第一AI实体发送终端设备的AI信息,该AI信息包括AI能力参数,其中,AI能力参数指示终端设备具备AI推理能力。
在一种实现方式中,收发器2401还用于向第一AI实体发送AI决策信息和状态信息。在一种实现方式中,终端设备的AI信息包括AI能力参数和/或AI更新参数。若AI更新参数指示定时AI更新或事件触发AI更新,收发器2401还用于向第一AI实体发送反馈信息,该反馈信息用于指示进行AI训练使用的数据。
在一种实现方式中,收发器2401还用于若AI能力参数指示终端设备具备AI训练能力,根据AI训练数据,获取第二AI模型;其中,AI训练数据包括AI决策信息、状态信息或反馈信息中的一种或多种。
在一种实现方式中,收发器2401还用于向第一AI实体发送第二AI模型信息。收发器2401还用于接收第一AI实体发送的更新后的第一AI模型信息,更新后的第一AI模型信息是第一AI实体根据第二AI模型信息确定的。
在一种实现方式中,反馈信息包括奖励信息;奖励信息用于更新所述第一AI模型。
在一种实现方式中,奖励信息是根据奖励函数确定的。其中,奖励函数是根据目标参数θ和目标参数的权重值φ确定的。目标参数为终端设备执行AI决策信息得到的性能数据,目标参数的权重值是第一AI实体根据一个或多个终端设备的性能数据确定的。
示例性的,上述收发器2401可以用于执行图7中的S701和S702,图8中的S801和S802,图9中的S901和S903,图10中的S1004和S1008,图12中的S1203a,图13中的S1301和S1308b。处理器2402用于执行图7中的S703,图8中的S803、S804和S808,图10中的S1003,图13中的S1309a。
可理解,以上所示的各个模块所执行的方法仅为示例,对于该各个模块具体所执行的步骤可参照上文介绍的方法。
本申请实施例提供一种通信***,该通信***包括前述实施例所述的终端设备和第一AI实体。
本申请实施例提供一种计算机可读存储介质,该计算机可读存储介质存储有程序或指令,当所述程序或指令在计算机上运行时,使得计算机执行本申请实施例中的信息处理方法。
本申请实施例提供一种芯片或者芯片***,该芯片或者芯片***包括至少一个处理器和接口,接口和至少一个处理器通过线路互联,至少一个处理器用于运行计算机程序或指令,以进行本申请实施例中的信息处理方法。
其中,芯片中的接口可以为输入/输出接口、管脚或电路等。
上述方面中的芯片***可以是片上***(system on chip,SOC),也可以是基带芯片等,其中基带芯片可以包括处理器、信道编码器、数字信号处理器、调制解调器和接口模块等。
在一种实现方式中,本申请中上述描述的芯片或者芯片***还包括至少一个存储器,该至少一个存储器中存储有指令。该存储器可以为芯片内部的存储模块,例如,寄存器、 缓存等,也可以是该芯片的存储模块(例如,只读存储器、随机存取存储器等)。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机指令时,全部或部分地产生按照本申请实施例所述的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。
计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的模块及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (51)

  1. 一种信息处理方法,其特征在于,应用于接入网中的第一人工智能AI实体;所述方法包括:
    所述第一AI实体接收终端设备发送的第二AI模型信息,所述第二AI模型信息不包括所述终端设备的用户数据;
    所述第一AI实体根据所述第二AI模型信息,更新第一AI模型信息;所述第一AI模型信息为所述第一AI实体的AI模型信息;
    所述第一AI实体向所述终端设备发送更新后的第一AI模型信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一AI实体接收所述终端设备发送的请求消息,所述请求消息用于请求第一AI模型信息;
    所述第一AI实体向所述终端设备发送所述第一AI模型信息。
  3. 根据权利要求2所述的方法,其特征在于,所述第一AI实体接收所述终端设备发送的请求消息之前,所述方法还包括:
    所述第一AI实体接收所述终端设备的AI信息,所述AI信息包括AI能力参数。
  4. 根据权利要求3所述的方法,其特征在于,若所述AI能力参数指示所述终端设备具备AI推理能力,所述第一AI实体接收所述终端设备发送的AI决策信息和状态信息,所述AI决策信息是所述终端设备将所述状态信息输入所述第二AI模型进行推理得到的,所述状态信息是所述终端设备根据观察信息得到的;所述观察信息指示进行AI决策使用的数据。
  5. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一AI实体接收所述终端设备的AI信息,所述AI信息包括AI更新参数;
    若所述AI更新参数指示定时AI更新或事件触发AI更新,所述第一AI实体接收反馈信息,所述反馈信息用于指示进行AI训练使用的数据。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    所述第一AI实体根据AI训练数据,更新所述第一AI模型,所述AI训练数据包括所述AI决策信息、所述状态信息或所述反馈信息中的一种或多种。
  7. 根据权利要求5所述的方法,其特征在于,所述反馈信息包括奖励信息;所述奖励信息用于更新所述第一AI模型。
  8. 根据权利要求7所述的方法,其特征在于,所述奖励信息是根据奖励函数确定的;所述奖励函数是根据目标参数θ和所述目标参数的权重值φ确定的;所述目标参数为所述终端设备执行所述AI决策信息得到的性能数据,所述目标参数的权重值是所述第一AI实体根据一个或多个终端设备的性能数据确定的。
  9. 一种信息处理方法,其特征在于,包括:
    终端设备向第一人工智能AI实体发送请求消息,所述请求消息用于请求第一AI模型信息;
    所述终端设备接收所述第一AI实体发送的第一AI模型信息;
    所述终端设备将状态信息输入第二AI模型进行推理,得到所述终端设备的AI决策信息;其中,所述状态信息基于观察信息确定;所述观察信息指示进行AI决策使用的数据;所述第二AI模型是所述终端设备根据所述第一AI模型信息确定的。
  10. 根据权利要求9所述的方法,其特征在于,所述终端设备向所述第一AI实体发送请求消息之前,所述方法还包括:
    所述终端设备向所述第一AI实体发送所述终端设备的AI信息,所述AI信息包括AI能力参数,其中,所述AI能力参数指示所述终端设备具备AI推理能力。
  11. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    所述终端设备向所述第一AI实体发送所述AI决策信息和所述状态信息。
  12. 根据权利要求9所述的方法,其特征在于,所述终端设备的AI信息包括AI能力参数和/或AI更新参数;所述方法还包括:
    若所述AI更新参数指示定时AI更新或事件触发AI更新,所述终端设备向所述第一AI实体发送反馈信息,所述反馈信息用于指示进行AI训练使用的数据。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    若所述AI能力参数指示所述终端设备具备AI训练能力,所述终端设备根据AI训练数据,获取第二AI模型信息,所述AI训练数据包括所述AI决策信息、所述状态信息或所述反馈信息中的一种或多种。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    所述终端设备向所述第一AI实体发送所述第二AI模型信息;
    所述终端设备接收所述第一AI实体发送的更新后的第一AI模型信息,所述更新后的第一AI模型信息是所述第一AI实体根据所述第二AI模型信息确定的。
  15. 根据权利要求13所述的方法,其特征在于,所述反馈信息包括奖励信息;所述奖励信息用于更新所述第一AI模型。
  16. 根据权利要求15所述的方法,其特征在于,所述奖励信息是根据奖励函数确定的;所述奖励函数是根据目标参数θ和所述目标参数的权重值φ确定的;所述目标参数为所述终端设备执行所述AI决策信息得到的性能数据,所述目标参数的权重值是所述第一AI实体根据一个或多个终端设备的性能数据确定的。
  17. 一种第一人工智能AI实体,其特征在于,包括:
    智能决策模块,用于接收终端设备发送的第二AI模型信息,所述第二AI模型信息不包括所述终端设备的用户数据;
    所述智能决策模块还用于根据所述第二AI模型信息,更新第一AI模型信息;所述第一AI模型信息为所述第一AI实体的AI模型信息;
    所述智能决策模块还用于向所述终端设备发送更新后的第一AI模型信息。
  18. 根据权利要求17所述的实体,其特征在于,所述智能决策模块还用于:
    接收所述终端设备发送的请求消息,所述请求消息用于请求第一AI模型信息;
    向所述终端设备发送所述第一AI模型信息。
  19. 根据权利要求18所述的实体,其特征在于,所述第一AI实体还包括预处理模块;所述预处理模块用于:
    接收所述终端设备的AI信息,所述AI信息包括AI能力参数。
  20. 根据权利要求19所述的实体,其特征在于,若所述AI能力参数指示所述终端设备具备AI推理能力,所述智能决策模块还用于:
    接收所述终端设备发送的AI决策信息和状态信息;其中,所述AI决策信息是所述终端设备将所述状态信息输入所述第二AI模型进行推理得到的,所述状态信息是所述终端设备根据观察信息得到的;所述观察信息指示进行AI决策使用的数据。
  21. 根据权利要求17所述的实体,其特征在于,所述预处理模块还用于接收所述终端设备的AI信息,所述AI信息包括AI更新参数;
    所述第一AI实体还包括数据收集与训练模块;若所述AI更新参数指示定时AI更新或事件触发AI更新,所述数据收集与训练模块用于接收反馈信息,所述反馈信息用于指示进行AI训练使用的数据。
  22. 根据权利要求21所述的实体,其特征在于,所述智能决策模块还用于:
    根据AI训练数据,更新所述第一AI模型,所述AI训练数据包括所述AI决策信息、所述状态信息或所述反馈信息中的一种或多种。
  23. 根据权利要求21所述的实体,其特征在于,所述反馈信息包括奖励信息;所述奖励信息用于更新所述第一AI模型。
  24. 根据权利要求23所述的实体,其特征在于,所述奖励信息是根据奖励函数确定的;所述奖励函数是根据目标参数θ和所述目标参数的权重值φ确定的;所述目标参数为所述终端设备执行所述AI决策信息得到的性能数据,所述目标参数的权重值是所述第一AI实体根据一个或多个终端设备的性能数据确定的。
  25. 一种终端设备,其特征在于,包括:
    收发模块,用于向第一人工智能AI实体发送请求消息,所述请求消息用于请求第一AI模型信息;
    所述收发模块还用于接收所述第一AI实体发送的第一AI模型信息;
    处理模块,用于将状态信息输入第二AI模型进行推理,得到所述终端设备的AI决策信息;其中,所述状态信息基于观察信息确定;所述观察信息指示进行AI决策使用的数据;所述第二AI模型是所述终端设备根据所述第一AI模型信息确定的。
  26. 根据权利要求25所述的设备,其特征在于,所述收发模块还用于:
    向所述第一AI实体发送所述终端设备的AI信息,所述AI信息包括AI能力参数,其中,所述AI能力参数指示所述终端设备具备AI推理能力。
  27. 根据权利要求25所述的设备,其特征在于,所述收发模块还用于:
    向所述第一AI实体发送所述AI决策信息和所述状态信息。
  28. 根据权利要求25所述的设备,其特征在于,所述终端设备的AI信息包括AI能力参数和/或AI更新参数;所述收发模块还用于:
    若所述AI更新参数指示定时AI更新或事件触发AI更新,向所述第一AI实体发送反馈信息,所述反馈信息用于指示进行AI训练使用的数据。
  29. 根据权利要求28所述的设备,其特征在于,所述处理模块还用于:
    若所述AI能力参数指示所述终端设备具备AI训练能力,根据AI训练数据,获取第二 AI模型信息;其中,所述AI训练数据包括所述AI决策信息、所述状态信息或所述反馈信息中的一种或多种。
  30. 根据权利要求29所述的设备,其特征在于,所述收发模块还用于:
    向所述第一AI实体发送所述第二AI模型信息;
    接收所述第一AI实体发送的更新后的第一AI模型信息,所述更新后的第一AI模型信息是所述第一AI实体根据所述第二AI模型信息确定的。
  31. 根据权利要求29所述的设备,其特征在于,所述反馈信息包括奖励信息;所述奖励信息用于更新所述第一AI模型。
  32. 根据权利要求31所述的设备,其特征在于,所述奖励信息是根据奖励函数确定的;所述奖励函数是根据目标参数θ和所述目标参数的权重值φ确定的;所述目标参数为所述终端设备执行所述AI决策信息得到的性能数据,所述目标参数的权重值是所述第一AI实体根据一个或多个终端设备的性能数据确定的。
  33. 一种信息处理方法,其特征在于,应用于接入网中的第一AI实体;所述方法包括:
    所述第一AI实体接收来自终端设备的观察信息,所述观察信息指示进行AI决策使用的数据;
    所述第一AI实体根据所述观察信息和第一AI模型,确定所述终端设备的AI决策信息,并向所述终端设备发送所述AI决策信息。
  34. 根据权利要求33所述的方法,其特征在于,所述第一AI实体接收所述终端设备发送的观察信息之前,所述方法还包括:
    所述第一AI实体接收来自所述终端设备的AI信息,所述AI信息包括AI能力参数,所述AI能力参数用于指示所述终端设备是否具备AI推理能力和/或AI训练能力。
  35. 根据权利要求34所述的方法,其特征在于,所述第一AI实体接收来自终端设备的观察信息,包括:
    若所述终端设备的AI能力参数指示所述终端设备无AI能力,所述第一AI实体接收所述终端设备发送的观察信息。
  36. 根据权利要求33至35任一项所述的方法,其特征在于,所述第一AI实体根据所述观察信息和第一AI模型,确定所述终端设备的AI决策信息,包括:
    所述第一AI实体对所述观察信息进行预处理,确定对应的状态信息;
    所述第一AI实体将所述状态信息输入所述第一AI模型进行推理,确定所述终端设备的AI决策信息。
  37. 一种信息处理方法,其特征在于,包括:
    终端设备向第一AI实体发送观察信息,所述观察信息指示所述第一AI实体进行AI决策使用的数据;
    所述终端设备接收来自所述第一AI实体的AI决策信息,并根据所述AI决策信息执行决策。
  38. 根据权利要求37所述的方法,其特征在于,所述终端设备向第一AI实体发送观察信息之前,所述方法还包括:
    所述终端设备向所述第一AI实体发送所述终端设备的AI信息,所述AI信息包括AI 能力参数,所述AI能力参数指示所述终端设备无AI能力。
  39. 根据权利要求37或38所述的方法,其特征在于,所述终端设备的AI决策信息是所述第一AI实体将状态信息输入第一AI模型进行推理得到的;所述状态信息是所述第一AI实体根据所述观察信息得到的。
  40. 一种第一AI实体,其特征在于,包括:
    预处理模块,用于接收来自终端设备的观察信息,所述观察信息指示进行AI决策使用的数据;
    智能决策模块,用于根据所述观察信息和第一AI模型,确定所述终端设备的AI决策信息,并向所述终端设备发送所述AI决策信息。
  41. 根据权利要求40所述的实体,其特征在于,所述预处理模块还用于接收来自所述终端设备的AI信息,所述AI信息包括AI能力参数,所述AI能力参数用于指示所述终端设备是否具备AI推理能力和/或AI训练能力。
  42. 根据权利要求41所述的实体,其特征在于,所述预处理模块用于接收来自终端设备的观察信息,包括:
    若所述终端设备的AI能力参数指示所述终端设备无AI能力,所述预处理模块接收所述终端设备发送的观察信息。
  43. 根据权利要求40至42任一项所述的实体,其特征在于,所述智能决策模块用于根据所述观察信息和第一AI模型,确定所述终端设备的AI决策信息,包括:
    对所述观察信息进行预处理,确定对应的状态信息;
    将所述状态信息输入所述第一AI模型进行推理,确定所述终端设备的AI决策信息。
  44. 一种终端设备,其特征在于,包括:
    收发模块,用于向第一人工智能AI实体发送观察信息,所述观察信息指示所述第一AI实体进行AI决策使用的数据;
    所述收发模块还用于接收来自所述第一AI实体的AI决策信息;
    处理模块,用于根据所述AI决策信息执行决策。
  45. 根据权利要求44所述的设备,其特征在于,所述收发模块还用于向所述第一AI实体发送所述终端设备的AI信息,所述AI信息包括AI能力参数,所述AI能力参数指示所述终端设备无AI能力。
  46. 根据权利要求44或45所述的设备,其特征在于,所述终端设备的AI决策信息是所述第一AI实体将状态信息输入第一AI模型进行推理得到的;所述状态信息是所述第一AI实体根据所述观察信息得到的。
  47. 一种第一AI实体,其特征在于,包括存储器和处理器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述指令,使得如权利要求1至8或33至36中任一项所述的方法被执行。
  48. 一种终端设备,其特征在于,包括:存储器和处理器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述指令,使得如权利要求9至16或37至39中任一项所述的 方法被执行。
  49. 一种计算机可读存储介质,其特征在于,包括程序或指令,当所述程序或指令在计算机上运行时,如权利要求1至8、9至16、33至36或37至39中任一项所述的方法被执行。
  50. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得如权利要求1至8、9至16、33至36或37至39中任一项所述的方法被执行。
  51. 一种通信***,包括权利要求17至24任一项所述的人工智能AI实体和权利要求25至32所述的终端设备;或者包括权利要求40至43任一项所述的人工智能AI实体和权利要求44至46任一项所述的终端设备。
PCT/CN2021/095336 2020-05-30 2021-05-21 一种信息处理方法及相关设备 WO2021244334A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21816869.8A EP4152797A4 (en) 2020-05-30 2021-05-21 INFORMATION PROCESSING METHOD AND ASSOCIATED APPARATUS
US18/071,316 US20230087821A1 (en) 2020-05-30 2022-11-29 Information processing method and related device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010480881.7 2020-05-30
CN202010480881.7A CN113747462B (zh) 2020-05-30 2020-05-30 一种信息处理方法及相关设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/071,316 Continuation US20230087821A1 (en) 2020-05-30 2022-11-29 Information processing method and related device

Publications (1)

Publication Number Publication Date
WO2021244334A1 true WO2021244334A1 (zh) 2021-12-09

Family

ID=78727837

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/095336 WO2021244334A1 (zh) 2020-05-30 2021-05-21 一种信息处理方法及相关设备

Country Status (4)

Country Link
US (1) US20230087821A1 (zh)
EP (1) EP4152797A4 (zh)
CN (1) CN113747462B (zh)
WO (1) WO2021244334A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023174325A1 (zh) * 2022-03-18 2023-09-21 维沃移动通信有限公司 Ai模型的处理方法及设备
WO2023198184A1 (zh) * 2022-04-15 2023-10-19 维沃移动通信有限公司 模型调整方法、信息传输方法、装置及相关设备
WO2023207269A1 (zh) * 2022-04-29 2023-11-02 富士通株式会社 信息传输方法与装置
GB2620495A (en) * 2022-07-06 2024-01-10 Samsung Electronics Co Ltd Artificial intelligence and machine learning capability indication

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116367187A (zh) * 2021-12-28 2023-06-30 维沃移动通信有限公司 Ai模型传输方法、装置、设备及存储介质
CN116419322A (zh) * 2021-12-31 2023-07-11 维沃移动通信有限公司 Ai网络信息传输方法、装置及通信设备
CN116419321A (zh) * 2021-12-31 2023-07-11 维沃移动通信有限公司 Ai网络信息传输方法、装置及通信设备
CN116541088A (zh) * 2022-01-26 2023-08-04 华为技术有限公司 一种模型配置方法及装置
CN116847312A (zh) * 2022-03-24 2023-10-03 北京邮电大学 数据处理方法、装置、通信***、电子设备及存储介质
WO2023206512A1 (en) * 2022-04-29 2023-11-02 Qualcomm Incorporated Data collection procedure and model training
CN117501777A (zh) * 2022-06-01 2024-02-02 北京小米移动软件有限公司 人工智能模型的确定方法及装置、通信设备及存储介质
WO2023245498A1 (zh) * 2022-06-22 2023-12-28 北京小米移动软件有限公司 一种al/ml模型的数据采集方法及其装置
CN117643134A (zh) * 2022-06-23 2024-03-01 北京小米移动软件有限公司 操作配置方法、装置
CN117768875A (zh) * 2022-09-24 2024-03-26 华为技术有限公司 一种通信方法及装置
WO2024087019A1 (zh) * 2022-10-25 2024-05-02 Oppo广东移动通信有限公司 无线通信的方法及设备
WO2024138384A1 (zh) * 2022-12-27 2024-07-04 Oppo广东移动通信有限公司 通信方案监测方法和设备
CN118282869A (zh) * 2022-12-29 2024-07-02 维沃移动通信有限公司 模型更新方法、装置及设备
CN118283669A (zh) * 2022-12-30 2024-07-02 华为技术有限公司 监控或训练ai模型的方法和通信装置
WO2024148523A1 (zh) * 2023-01-10 2024-07-18 富士通株式会社 信息交互方法以及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766940A (zh) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 用于生成模型的方法和装置
CN108924910A (zh) * 2018-07-25 2018-11-30 Oppo广东移动通信有限公司 Ai模型的更新方法及相关产品
CN109313586A (zh) * 2016-06-10 2019-02-05 苹果公司 使用基于云端的度量迭代训练人工智能的***
WO2019067258A1 (en) * 2017-09-29 2019-04-04 Oracle International Corporation METHODS AND SYSTEMS FOR CONFIGURING COMMUNICATION DECISION TREES BASED ON CONNECTED ELEMENTS THAT CAN BE POSITIONED ON A CANEVAS
US20190364460A1 (en) * 2018-05-23 2019-11-28 Verizon Patent And Licensing Inc. Adaptable radio access network
CN110704850A (zh) * 2019-09-03 2020-01-17 华为技术有限公司 人工智能ai模型的运行方法和装置
CN110730954A (zh) * 2019-09-09 2020-01-24 北京小米移动软件有限公司 数据处理方法、装置、电子设备和计算机可读存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11475350B2 (en) * 2018-01-22 2022-10-18 Google Llc Training user-level differentially private machine-learned models
CN108667850B (zh) * 2018-05-21 2020-10-27 浪潮集团有限公司 一种人工智能服务***及其实现人工智能服务的方法
US20210345134A1 (en) * 2018-10-19 2021-11-04 Telefonaktiebolaget Lm Ericsson (Publ) Handling of machine learning to improve performance of a wireless communications network
KR20190110073A (ko) * 2019-09-09 2019-09-27 엘지전자 주식회사 인공 지능 모델을 갱신하는 인공 지능 장치 및 그 방법

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109313586A (zh) * 2016-06-10 2019-02-05 苹果公司 使用基于云端的度量迭代训练人工智能的***
WO2019067258A1 (en) * 2017-09-29 2019-04-04 Oracle International Corporation METHODS AND SYSTEMS FOR CONFIGURING COMMUNICATION DECISION TREES BASED ON CONNECTED ELEMENTS THAT CAN BE POSITIONED ON A CANEVAS
CN107766940A (zh) * 2017-11-20 2018-03-06 北京百度网讯科技有限公司 用于生成模型的方法和装置
US20190364460A1 (en) * 2018-05-23 2019-11-28 Verizon Patent And Licensing Inc. Adaptable radio access network
CN108924910A (zh) * 2018-07-25 2018-11-30 Oppo广东移动通信有限公司 Ai模型的更新方法及相关产品
CN110704850A (zh) * 2019-09-03 2020-01-17 华为技术有限公司 人工智能ai模型的运行方法和装置
CN110730954A (zh) * 2019-09-09 2020-01-24 北京小米移动软件有限公司 数据处理方法、装置、电子设备和计算机可读存储介质

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
OPPO, CMCC, CHINA TELECOM, CHINA UNICOM, QUALCOMM: "New WID on Study on traffic characteristics and performance requirements for AI/ML model transfer in 5GS", 3GPP DRAFT; S1-193606, 3RD GENERATION PARTNERSHIP PROJECT (3GPP), MOBILE COMPETENCE CENTRE ; 650, ROUTE DES LUCIOLES ; F-06921 SOPHIA-ANTIPOLIS CEDEX ; FRANCE, vol. SA WG1, 25 November 2019 (2019-11-25), Reno, Nevada, XP051831338 *
See also references of EP4152797A4

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023174325A1 (zh) * 2022-03-18 2023-09-21 维沃移动通信有限公司 Ai模型的处理方法及设备
WO2023198184A1 (zh) * 2022-04-15 2023-10-19 维沃移动通信有限公司 模型调整方法、信息传输方法、装置及相关设备
WO2023207269A1 (zh) * 2022-04-29 2023-11-02 富士通株式会社 信息传输方法与装置
GB2620495A (en) * 2022-07-06 2024-01-10 Samsung Electronics Co Ltd Artificial intelligence and machine learning capability indication

Also Published As

Publication number Publication date
CN113747462B (zh) 2024-07-19
EP4152797A4 (en) 2023-11-08
CN113747462A (zh) 2021-12-03
US20230087821A1 (en) 2023-03-23
EP4152797A1 (en) 2023-03-22

Similar Documents

Publication Publication Date Title
WO2021244334A1 (zh) 一种信息处理方法及相关设备
WO2021233053A1 (zh) 计算卸载的方法和通信装置
Chen et al. Echo-liquid state deep learning for 360° content transmission and caching in wireless VR networks with cellular-connected UAVs
US20230179490A1 (en) Artificial intelligence-based communication method and communication apparatus
CN114116198A (zh) 一种移动车辆的异步联邦学习方法、***、设备及终端
He et al. Meta-hierarchical reinforcement learning (MHRL)-based dynamic resource allocation for dynamic vehicular networks
CN111629380A (zh) 面向高并发多业务工业5g网络的动态资源分配方法
EP4307634A1 (en) Feature engineering programming method and apparatus
WO2023143267A1 (zh) 一种模型配置方法及装置
WO2022087930A1 (zh) 一种模型配置方法及装置
US20230262489A1 (en) Apparatuses and methods for collaborative learning
CN116671068A (zh) 策略确定的方法和装置
CN114548416A (zh) 数据模型训练方法及装置
CN115633380A (zh) 一种考虑动态拓扑的多边缘服务缓存调度方法和***
Koudouridis et al. An architecture and performance evaluation framework for artificial intelligence solutions in beyond 5G radio access networks
WO2023246517A1 (zh) 一种构建方法、第一通信节点、存储介质及构建***
US20220391731A1 (en) Agent decision-making method and apparatus
Dridi et al. Reinforcement Learning Vs ILP Optimization in IoT support of Drone assisted Cellular Networks
WO2023175381A1 (en) Iterative training of collaborative distributed coded artificial intelligence model
CN117597969A (zh) Ai数据的传输方法、装置、设备及存储介质
Wang et al. Adaptive compute offloading algorithm for metasystem based on deep reinforcement learning
CN115835185A (zh) 一种人工智能模型下载方法、装置及***
WO2023006205A1 (en) Devices and methods for machine learning model transfer
Saravanan et al. Performance analysis of digital twin edge network implementing bandwidth optimization algorithm
WO2023116787A1 (zh) 智能模型的训练方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21816869

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021816869

Country of ref document: EP

Effective date: 20221213

NENP Non-entry into the national phase

Ref country code: DE