WO2022267049A1 - 人工智能模型的处理方法、装置、设备及可读存储介质 - Google Patents

人工智能模型的处理方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2022267049A1
WO2022267049A1 PCT/CN2021/102522 CN2021102522W WO2022267049A1 WO 2022267049 A1 WO2022267049 A1 WO 2022267049A1 CN 2021102522 W CN2021102522 W CN 2021102522W WO 2022267049 A1 WO2022267049 A1 WO 2022267049A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
unit
model
storage unit
artificial intelligence
Prior art date
Application number
PCT/CN2021/102522
Other languages
English (en)
French (fr)
Inventor
朱湘毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2021/102522 priority Critical patent/WO2022267049A1/zh
Priority to CN202180002364.1A priority patent/CN113614749B/zh
Publication of WO2022267049A1 publication Critical patent/WO2022267049A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the technical field of processors, and in particular to an artificial intelligence model processing method, an artificial intelligence model processing device, an artificial intelligence model processing device, a computer-readable storage medium, and a computer program.
  • AI Artificial Intelligence
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • AI is a cutting-edge technology that is currently popular in science and world development, and it is applied to various scenarios in life.
  • AI models are basically deep neural networks. Neural networks are matrix and vector calculation-intensive and require high computing power (T level). Ordinary CPUs generally cannot meet the computing power requirements of deep neural networks, that is, the massive calculations of AI models. Therefore, dedicated accelerators are needed to execute AI models, such as specially customized Graphics Processing Units (GPU) or neural networks. Network Processor (Network Process Unit, NPU), etc.
  • GPU Graphics Processing Unit
  • NPU Network Processor
  • the application loads the AI model to the AI accelerator, and trains or infers (executes) the AI model through the AI accelerator.
  • the AI accelerator in the prior art encounters conditional branch judgment operators and loop operators, since the AI accelerator can only execute the current operator, it cannot control the execution of subsequent operators according to the execution result of the current operator; then the AI accelerator Some control functions need to fall back to the main processor for execution. Therefore, in the process of training or reasoning of the AI model, frequent interactions between the main processor and the AI accelerator are required to complete, resulting in low performance of model reasoning or model training.
  • Embodiments of the present application provide an artificial intelligence model processing method, an artificial intelligence model processing device, an artificial intelligence model processing device, a computer-readable storage medium, and a computer program, so as to improve the performance of model reasoning or model training.
  • the embodiment of the present application provides a method for processing an artificial intelligence model, which is applied to an artificial intelligence processing unit, and the artificial intelligence processing unit includes a control unit, an operation logic unit, and a storage unit, and the method includes:
  • the artificial intelligence processing unit obtains the AI model delivered by the processor side based on the user state interface (Application Programming Interface, API); wherein, the AI model includes control operators and calculation operators, the API includes a first API, and the second API An API is used to issue the control operator;
  • API Application Programming Interface
  • the artificial intelligence processing unit executes the calculation operator through the operation logic unit, and after the operation logic unit executes the calculation operator, it will store the executed data in the storage in the unit;
  • the artificial intelligence processing unit executes the control operator through the control unit based on the data in the storage unit.
  • the operation logic unit can store the data after executing the operator or task in the storage unit, so that the control unit can directly base on the data in the storage unit to execute the control operator, so that the control unit can control the execution of subsequent operators according to the execution result of the current operator.
  • the storage unit includes a first storage unit and a second storage unit; the above-mentioned storing the data after executing the calculation operator in the storage unit includes: storing the data after executing the calculation operator stored in the second storage unit;
  • the above-mentioned execution of the control operator based on the data in the storage unit by the control unit includes: reading the data in the second storage unit by the control unit, and then writing the data in the second storage unit to the first storage unit; then in When executing the control operator, the control operator may be read and executed based on the data in the first storage unit.
  • the second storage unit is used to store the data after the operation logic unit executes the operator or task, and is read by the control unit to the first storage unit, then when the control operator is executed, the control operator can be directly executed based on the data in the first storage unit, thereby realizing that the control unit can control the subsequent operator according to the execution result of the current operator execution.
  • the first storage unit may be integrated in the control unit, that is to say, the first storage unit may be added in the control unit, and the first storage unit may be a dedicated register of the control unit.
  • the second storage unit may be integrated in the operation logic unit, that is to say, the second storage unit may be added in the operation logic unit, and the second storage unit may be a dedicated register of the operation logic unit. It can be further realized that the control unit can quickly and efficiently read the data after the operator or task is executed by the operation logic unit, so that the execution of subsequent operators can be controlled according to the execution result of the current operator. It realizes that the entire AI model is executed in the controller and the operation logic unit, without returning some control functions to the main processor for processing.
  • each operation logic unit in the artificial intelligence processing unit can increase the second storage unit (self-specific register), so that each operation logic unit can be used to cooperate with the control unit to execute the control operator, thereby further improving the artificial intelligence.
  • the performance of intelligent processing unit model inference or model training.
  • the AI model corresponds to at least one execution sequence, and each first storage unit corresponds to a different execution sequence.
  • the processor can customize the number of first storage units in the control unit according to the number of processed AI models and the number of execution sequences corresponding to each AI model.
  • control operator includes a branch judgment operator, where the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator;
  • the above-mentioned reading and executing the control operator based on the data in the first storage unit includes:
  • control operator may include a branch judgment operator, that is to say, the control unit can directly read the data needed for branch judgment when executing the issued branch judgment operator, and then proceed to the next operator or Task (first branch operator or second branch operator). It realizes that the entire AI model is executed in the control unit and the operation logic unit, without returning some control functions to the main processor for processing.
  • control operator further includes a loop operator, and the loop operator is used to loop execute the first calculation operator of the AI model; after the above-mentioned execution of the first branch operator, it also includes : Execute the loop operator to jump to the position of the first calculation operator of the AI model through the operation logic unit, thereby cyclically executing the first calculation operator, and iterates until the next time based on the first calculation operator
  • the loop iteration is ended and the second branch operator is executed.
  • the AI model in the embodiment of the present application may also include a loop operator. Executing the loop operator can control the operation logic unit to jump to the first calculation operator of the AI model to execute the first calculation operator in a loop. In this way, the branch judgment operator and loop operator in the AI model can be directly executed inside the control unit without frequent interaction with the main processor, which solves the technical problem of low performance of model reasoning or model training ; Improved performance for model inference or model training.
  • the API also includes a second API and a third API, the second API is used to create a label; the third API is used to set the position of the label in the AI model; the AI model also Including the first label and the second label used for jumping; where the first label is placed in the previous operator adjacent to the first calculation operator of the AI model, and the second label is placed in the second In the previous operator adjacent to the branch operator;
  • control unit Before reading by the control unit and executing the control operator based on the data in the first storage unit, it also includes: executing the first calculation operator by the operation logic unit;
  • the above-mentioned execution of the loop operator to iteratively execute the calculation operator of the AI model through the operation logic unit includes: executing the loop operator, jumping to the position where the first label is located, and looping through the operation logic unit execute the first calculation operator;
  • executing the second branch operator includes: if the judgment is no, jumping to the location where the second label is located to execute the second branch operator.
  • the AI model in the embodiment of the present application may also include the first label and the second label, and their respective positions in the AI model; through the jump of the first label and the second label, the branch judgment operator can be efficiently realized Operators, such as jumps when executing loop operators.
  • the branch judgment operator and loop operator in the AI model can be directly executed inside the control unit without frequent interaction with the main processor, which solves the technical problem of low performance of model reasoning or model training; Improved performance of model inference or model training.
  • the control unit before reading the data in the second storage unit through the control unit, it further includes: setting the second storage unit to an invalid value;
  • the above-mentioned writing of the data in the second storage unit to the first storage unit includes: when it is judged that the data of the second storage unit read is a valid value, writing the data of the second storage unit into the first storage unit; if it is judged that the read data of the second storage unit is an invalid value, then the data of the second storage unit is not written into the first storage unit.
  • control unit first sets the second storage unit of the operation logic unit as an invalid value, and then writes the data into the first storage unit when it is judged that the read data of the second storage unit is a valid value.
  • the storage unit the accuracy of training or reasoning AI models can be well ensured.
  • the present application provides a method for processing an artificial intelligence model, the method comprising:
  • the processor (or called the main processor) creates an artificial intelligence AI model; the AI model includes control operators and calculation operators;
  • the API includes a first API, and the first API is used to issue the control operator; the artificial intelligence processing unit is used to execute the control operator and the calculation operator in the process of training or reasoning the AI model .
  • an API for issuing control operators based on the API
  • the entire AI model can be executed independently without returning some control functions to the main processor for processing. It solves the technical problem that in the prior art, in the process of training or inferring the AI model, frequent interactions between the main processor and the AI accelerator are required to complete, resulting in low performance of model inference or model training; improve the performance of model inference or model training.
  • control operator includes a branch judgment operator, and the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator.
  • an API for issuing branch judgment operators can be set, and then the AI model issued to the artificial intelligence processing unit can make the artificial intelligence processing unit independent in the process of training or reasoning the AI model. To complete the execution of the branch judgment operator, there is no need to interact back and forth with the main processor.
  • control operator also includes a loop operator, and the loop operator is used to loop execute the first calculation operator of the AI model;
  • the API also includes a second API and a third API, The second API is used to create a label; the third API is used to set the position of the label in the AI model.
  • the artificial intelligence processing unit includes a control unit, an operation logic unit, and a storage unit.
  • the calculation operator in the AI model is dispatched by the control unit in the artificial intelligence processing unit to the operation logic unit for execution, and after each execution of the calculation operator, the executed data is stored in the storage unit. So that the artificial intelligence processing unit executes the control operator through the control unit based on the data in the storage unit.
  • an API for issuing a loop operator can also be set, so that the AI model delivered to the artificial intelligence processing unit can enable the artificial intelligence processing unit to independently Complete the execution of the loop operator. And in the process of executing the cycle operator, the artificial intelligence processing unit can make relevant jumps through the API for creating tags and the API for setting the position of tags in the AI model, so that the cycle can be further completed quickly and efficiently Operator execution.
  • an artificial intelligence model processing device which is an artificial intelligence processing unit, including:
  • the obtaining unit is used to obtain the AI model issued by the processor side based on the API; wherein, the AI model includes control operators and calculation operators, and the AI model is an AI model issued by the processor based on the API; the API includes the first an API, the first API is used to issue the control operator;
  • the first execution unit is used to execute the calculation operator through the operation logic unit of the artificial intelligence processing unit during the process of training or reasoning the AI model;
  • a storage processing unit configured to store the executed data in the storage unit of the artificial intelligence processing unit after the calculation operator is executed by the first execution unit;
  • the second execution unit is configured to execute the control operator based on the data in the storage unit through the control unit of the artificial intelligence processing unit.
  • the storage unit includes a first storage unit and a second storage unit;
  • the storage processing unit is specifically configured to: store the data after executing the calculation operator in the second storage unit;
  • the second execution unit includes:
  • a first reading unit configured to read data in the second storage unit
  • a first writing unit configured to write data in the second storage unit into the first storage unit
  • the read execution unit is used to read and execute the control operator based on the data in the first storage unit.
  • control operator includes a branch judging operator, where the branch judging operator is used to judge and execute the first branch operator or the second branch operator;
  • read execution unit includes:
  • a second reading unit configured to read data in the first storage unit
  • a judging unit configured to judge whether to execute the first branch operator based on the data in the first storage unit and the parameters in the branch judging operator;
  • the judging processing unit is configured to execute the first branch operator if the judging unit judges yes, and execute the second branch operator if the judging unit judges no.
  • control operator further includes a loop operator, and the loop operator is used to loop execute the first calculation operator of the AI model;
  • the processing device further includes:
  • the third execution unit is configured to execute the loop operator after the judgment processing unit executes the first branch operator, so as to jump to the place of the first calculation operator of the AI model through the operation logic unit, thereby looping Execute the first calculation operator, and iterate in this way until the next time the judging unit judges whether to execute the first branch judging operator based on the data in the first storage unit and the parameters in the branch judging operator, and the judging result is no , the loop iteration is ended, and the judgment processing unit executes the second branch operator.
  • the API also includes a second API and a third API, the second API is used to create a label; the third API is used to set the position of the label in the AI model; the AI model also Including the first label and the second label used for jumping; where the first label is placed in the previous operator adjacent to the first calculation operator of the AI model, and the second label is placed in the second In the previous operator adjacent to the branch operator; the processing device also includes:
  • the fourth execution unit is configured to execute the first calculation operator through the operation logic unit before the read execution unit reads and executes the control operator based on the data in the first storage unit;
  • the third execution unit is specifically configured to: execute the loop operator after the judgment processing unit executes the first branch operator, and jump to the position where the first label is located, so as to loop execute the second branch operator through the operation logic unit a calculation operator;
  • the judging processing unit is specifically configured to jump to the location where the second label is located to execute the second branch operator.
  • the AI model corresponds to at least one execution sequence, and each of the first storage units corresponds to a different execution sequence.
  • the processing device further includes:
  • a setting unit used to set the second storage unit to an invalid value before the first read unit reads the data in the second storage unit
  • the first writing unit is specifically used to: write the data of the second storage unit into the first storage unit when it is judged that the read data of the second storage unit is a valid value; If the fetched data of the second storage unit is an invalid value, then the data of the second storage unit is not written into the first storage unit.
  • an artificial intelligence model processing device which includes:
  • the creation unit is used to create an AI model;
  • the AI model includes control operators and calculation operators;
  • a sending unit configured to send the AI model to the artificial intelligence processing unit based on the API
  • the API includes a first API, and the first API is used to issue the control operator; the artificial intelligence processing unit is used to execute the control operator and the calculation operator in the process of training or reasoning the AI model .
  • control operator includes a branch judgment operator, and the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator.
  • control operator also includes a loop operator, and the loop operator is used to loop execute the first calculation operator of the AI model;
  • the API also includes a second API and a third API, The second API is used to create a label; the third API is used to set the position of the label in the AI model.
  • the present application provides an artificial intelligence model processing device, including the artificial intelligence processing unit and a memory; wherein the memory is used to store program codes, and the artificial intelligence processing unit invokes the program codes stored in the memory to make the The processing device of the artificial intelligence model executes the method in the above first aspect and various possible implementation manners thereof.
  • the present application provides an artificial intelligence model processing device, including a processor and a memory; wherein the memory is used to store program codes, and the processor invokes the program codes stored in the memory to make the artificial intelligence model process
  • the device executes the method in the foregoing second aspect and various possible implementation manners thereof.
  • the present application provides a processing device for an artificial intelligence model, including a processor, an artificial intelligence processing unit, and a memory; wherein, the memory may include multiple, for storing program codes; the processor and the artificial intelligence Processing unit coupling; the artificial intelligence processing unit can call the memory coupled with itself or call the program code stored in its own internal memory to make the processing device of the artificial intelligence model execute the method in the above first aspect and its various possible implementations .
  • the processor can call the program code stored in the memory coupled with itself to make the processing device of the artificial intelligence model execute the method in the above second aspect and various possible implementations thereof.
  • the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above-mentioned first or second aspect and various possible implementations thereof are realized way of way.
  • the present application provides a computer program, the computer-readable program includes instructions, and when the computer program is executed by a processor, the main processor executes the above-mentioned first aspect or the second aspect and each possible method of implementation.
  • Fig. 1 is a functional block diagram of a vehicle 100 provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of the AI computing architecture provided by the embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an artificial intelligence model processing device provided by an embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an artificial intelligence model processing device according to another embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an artificial intelligence model processing device according to another embodiment of the present application.
  • FIG. 6 is a schematic diagram of the loop statement of the AI model provided by the embodiment of the present application.
  • FIG. 7 is a schematic diagram of the execution sequence of stream1 in the AI model provided by the embodiment of the present application.
  • FIG. 8 is a schematic diagram of an execution flow of a control unit provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the execution principle of the artificial intelligence processing unit provided by the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of an artificial intelligence model processing device provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an artificial intelligence model processing device according to another embodiment of the present application.
  • FIG. 12 is a schematic flowchart of a method for processing an artificial intelligence model provided by an embodiment of the present application.
  • FIG. 13 is a schematic flowchart of a method for processing an artificial intelligence model according to another embodiment of the present application.
  • Fig. 14 is a conceptual partial view of a computer program or a computer program product provided by an embodiment of the present application.
  • a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device can be components.
  • One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more packets of data (e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems). Communicate through local and/or remote processes.
  • packets of data e.g., data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet via a signal interacting with other systems.
  • the register is an integral part of the central processing unit, which is related to the CPU. Registers are high-speed storage units with limited storage capacity that can be used to temporarily store instructions, data, and addresses.
  • the registers included are the instruction register (IR) and the program counter (PC).
  • the register included is the accumulator (ACC).
  • AI accelerators are a type of specialized hardware accelerators or computer systems designed to accelerate the application of artificial intelligence, especially artificial neural networks, machine vision and machine learning. Typical applications include algorithms for robotics, IoT, and other data-intensive or sensor-driven tasks. As a type of hardware accelerator dedicated to special tasks, AI accelerators often assist or supplement the main processor in the computer system, including but not limited to specially customized GPUs or NPUs for executing AI models.
  • API refers to some pre-defined interfaces (such as functions, HTTP interfaces), or refers to the agreement on the connection of different components of the software system.
  • Runtime refers to the state of a program running (cc or being executed), that is to say, when the program is running.
  • the runtime library is the library that the program needs to depend on when it runs. In some programming languages, some reusable programs or instances are packaged or rebuilt into Runtime libraries. These instances can be linked or invoked by any program while they are running.
  • the runtime library of the embodiment of the present application provides an API of an artificial intelligence processing unit (such as a GPU or an NPU), for example, including an API for generating and delivering a control operator.
  • an artificial intelligence processing unit such as a GPU or an NPU
  • An AI model will generally split into multiple streams, which are equivalent to the execution sequence. There are multiple tasks (or called operators, or a task encapsulates an operator) under each execution sequence, and there will be events between streams for synchronization. Tasks between multiple streams can be executed in parallel on the artificial intelligence processing unit, and tasks within a stream can generally only be executed serially. Operators or tasks in this embodiment of the present application may include calculation operators and control operators, etc., the calculation operators may be used for data calculation, and the control operators may be used to control the execution sequence of the execution sequence.
  • the operator or task is essentially the code in the AI model, for example, the convolution code in the AI model is an operator or task.
  • the calculation operator in the embodiment of the present application is the code to realize or complete the data calculation, and usually runs on the operation logic unit in the artificial intelligence processing unit to complete the data calculation task.
  • the control operator in the embodiment of the present application is the code that controls the execution sequence of the execution sequence, and can be executed by the control unit in the artificial intelligence processing unit.
  • the processing method of the artificial intelligence model, the processing device of the artificial intelligence model, and the processing equipment of the artificial intelligence model provided in the embodiment of the present application can be aimed at all application scenarios that require the use of AI accelerators, including the use of convolutional neural networks (ConvolutionalNeuralNetworks, CNN ) model, or a mask region-based convolutional neural network (Mask Region-based CNN, Mask RCNN) model performs AI processing on camera images, such as the automatic driving field in the smart car scene, driver monitoring, parking , autonomous driving and other scenarios. It can also include scenarios where AI processing is performed on data using a recurrent neural network (Recurrent Neural Network, RNN) model, such as a scenario of voice interaction between a car, a driver, and passengers in a smart car scene.
  • RNN recurrent neural network
  • the vehicle 100 is configured in a fully or partially autonomous driving mode.
  • the vehicle 100 can control itself while in the automatic driving mode, and can determine the current state of the vehicle and its surrounding environment through human operation, determine the likely behavior of at least one other vehicle in the surrounding environment, and determine the behavior of the other vehicle.
  • a confidence level corresponding to the likelihood of performing the possible action is used to control the vehicle 100 based on the determined information.
  • the vehicle 100 may be set to operate without human interaction.
  • Vehicle 100 may be a car, truck, motorcycle, bus, boat, airplane, helicopter, lawn mower, recreational vehicle, playground vehicle, construction equipment, trolley, golf cart, train, and trolley, etc., the present application Examples are not particularly limited.
  • Vehicle 100 may include various subsystems such as travel system 102 , sensing system 104 , control system 106 , one or more peripheral devices 108 as well as power supply 110 , computer system 112 and user interface 116 .
  • vehicle 100 may include more or fewer subsystems, and each subsystem may include multiple elements.
  • each subsystem and element of the vehicle 100 may be interconnected by wire or wirelessly.
  • the propulsion system 102 may include components that provide powered motion for the vehicle 100 .
  • propulsion system 102 may include engine 118 , energy source 119 , transmission 120 , and wheels/tires 121 .
  • the engine 118 may be an internal combustion engine, an electric motor, an air compression engine or other types of engine combinations, such as a hybrid engine composed of a gasoline engine and an electric motor, or a hybrid engine composed of an internal combustion engine and an air compression engine.
  • Engine 118 converts energy source 119 into mechanical energy.
  • Examples of energy source 119 include gasoline, diesel, other petroleum-based fuels, propane, other compressed gas-based fuels, ethanol, solar panels, batteries, and other sources of electrical power. Energy source 119 may also provide energy to other systems of vehicle 100 .
  • Transmission 120 may transmit mechanical power from engine 118 to wheels 121 .
  • Transmission 120 may include a gearbox, a differential, and drive shafts.
  • the transmission 120 may also include other devices, such as clutches.
  • drive shafts may include one or more axles that may be coupled to one or more wheels 121 .
  • Sensing system 104 may include a number of sensors that sense information about the environment surrounding vehicle 100 .
  • the sensing system 104 may include a global positioning system 122 (the global positioning system may be a GPS system, or a Beidou system or other positioning systems), an inertial measurement unit (inertial measurement unit, IMU) 124, a radar 126, a laser range finder 128 and a camera 130 .
  • the sensing system 104 may also include sensors of the interior systems of the monitored vehicle 100 (eg, interior air quality monitor, fuel gauge, oil temperature gauge, etc.). Sensor data from one or more of these sensors can be used to detect objects and their corresponding properties (position, shape, orientation, velocity, etc.). Such detection and identification are critical functions for safe operation of autonomous vehicle 100 .
  • the global positioning system 122 may be used to estimate the geographic location of the vehicle 100 .
  • the IMU 124 is used to sense changes in position and orientation of the vehicle 100 based on inertial acceleration.
  • IMU 124 may be a combination accelerometer and gyroscope.
  • the radar 126 may utilize radio signals to sense objects within the surrounding environment of the vehicle 100 .
  • radar 126 may be used to sense the velocity and/or heading of objects.
  • the laser rangefinder 128 may utilize laser light to sense objects in the environment in which the vehicle 100 is located.
  • laser rangefinder 128 may include one or more laser sources, a laser scanner, and one or more detectors, among other system components.
  • Camera 130 may be used to capture multiple images of the surrounding environment of vehicle 100 .
  • Camera 130 may be a still camera or a video camera.
  • Control system 106 controls the operation of the vehicle 100 and its components.
  • Control system 106 may include various elements including steering system 132 , accelerator 134 , braking unit 136 , sensor fusion algorithm 138 , computer vision system 140 , route control system 142 , and obstacle avoidance system 144 .
  • the steering system 132 is operable to adjust the heading of the vehicle 100 .
  • it could be a steering wheel system.
  • the throttle 134 is used to control the operating speed of the engine 118 and thus the speed of the vehicle 100 .
  • the braking unit 136 is used to control the deceleration of the vehicle 100 .
  • the braking unit 136 may use friction to slow the wheels 121 .
  • the brake unit 136 can convert the kinetic energy of the wheel 121 into electric current.
  • the braking unit 136 may also take other forms to slow down the wheels 121 to control the speed of the vehicle 100 .
  • Computer vision system 140 is operable to process and analyze images captured by camera 130 in order to identify objects and/or features in the environment surrounding vehicle 100 .
  • the objects and/or features may include traffic signals, road boundaries and obstacles.
  • the computer vision system 140 may use object recognition algorithms, Structure from Motion (SFM) algorithms, video tracking, and other computer vision techniques.
  • SFM Structure from Motion
  • computer vision system 140 may be used to map the environment, track objects, estimate the velocity of objects, and the like.
  • the route control system 142 is used to determine the travel route of the vehicle 100 .
  • route control system 142 may combine data from sensor fusion algorithm 138 , GPS 122 , and one or more predetermined maps to determine a travel route for vehicle 100 .
  • the obstacle avoidance system 144 is used to identify, evaluate and avoid or otherwise overcome potential obstacles in the environment of the vehicle 100 .
  • control system 106 may additionally or alternatively include components other than those shown and described. Alternatively, some of the components shown above may be reduced.
  • Vehicle 100 interacts with external sensors, other vehicles, other computer systems, or users via peripherals 108 .
  • Peripherals 108 may include wireless communication system 146 , on-board computer 148 , microphone 150 and/or speaker 152 .
  • peripheral device 108 provides a means for a user of vehicle 100 to interact with user interface 116 .
  • on-board computer 148 may provide information to a user of vehicle 100 .
  • the user interface 116 may also operate the on-board computer 148 to receive user input.
  • the on-board computer 148 can be operated through a touch screen.
  • peripheral devices 108 may provide a means for vehicle 100 to communicate with other devices located within the vehicle.
  • microphone 150 may receive audio (eg, voice commands or other audio input) from a user of vehicle 100 .
  • speaker 152 may output audio to a user of vehicle 100 .
  • Wireless communication system 146 may communicate wirelessly with one or more devices, either directly or via a communication network.
  • wireless communication system 146 may use 3G cellular communications, such as CDMA, EVDO, GSM/GPRS, or 4G cellular communications, such as LTE. Or 5G cellular communications.
  • the wireless communication system 146 can utilize WiFi to communicate with a wireless local area network (WLAN).
  • WLAN wireless local area network
  • the wireless communication system 146 may communicate directly with the device using an infrared link, Bluetooth, or ZigBee.
  • Other wireless protocols, such as various vehicle communication systems, for example, wireless communication system 146 may include one or more dedicated short range communications (DSRC) devices, which may include public and / or private data communication.
  • DSRC dedicated short range communications
  • the power supply 110 may provide power to various components of the vehicle 100 .
  • the power source 110 may be a rechargeable lithium-ion or lead-acid battery.
  • One or more battery packs of such batteries may be configured as a power source to provide power to various components of the vehicle 100 .
  • power source 110 and energy source 119 may be implemented together, such as in some all-electric vehicles.
  • Computer system 112 may include at least one processor 113 executing instructions 115 stored in a non-transitory computer-readable medium such as data storage device 114 .
  • the computer system 112 may also be a plurality of computing devices that control individual components or subsystems of the vehicle 100 in a distributed manner.
  • Processor 113 may be any conventional processor, such as a commercially available CPU.
  • the processor may be a special purpose device such as an ASIC or other hardware based processor.
  • FIG. 1 functionally illustrates the processor, memory, and other elements of computer 110 in the same block, those of ordinary skill in the art will understand that the processor, computer, or memory may actually include Multiple processors, computers, or memories stored within the same physical enclosure.
  • the memory may be a hard drive or other storage medium located in a different housing than the computer 110 .
  • references to a processor or computer are to be understood to include references to collections of processors or computers or memories that may or may not operate in parallel.
  • some components such as the steering and deceleration components, may each have their own processor that only performs calculations related to component-specific functions.
  • the processor may be located remotely from the vehicle and be in wireless communication with the vehicle. In other aspects, some of the processes described herein are executed on a processor disposed within the vehicle while others are executed by a remote processor, including taking the necessary steps to perform a single maneuver.
  • data storage device 114 may contain instructions 115 (eg, program logic) executable by processor 113 to perform various functions of vehicle 100 , including those described above.
  • Data storage 114 may also contain additional instructions, including sending data to, receiving data from, interacting with, and/or interacting with, one or more of travel system 102, sensing system 104, control system 106, and peripherals 108. Instructions for control.
  • data storage device 114 may also store data such as road maps, route information, the vehicle's position, direction, speed, and other such vehicle data, among other information. Such information may be used by the vehicle 100 and the computer system 112 during operation of the vehicle 100 in autonomous, semi-autonomous, and/or manual modes.
  • a user interface 116 for providing information to or receiving information from a user of the vehicle 100 .
  • user interface 116 may include one or more input/output devices within set of peripheral devices 108 , such as wireless communication system 146 , in-vehicle computer 148 , microphone 150 , and speaker 152 .
  • the computer system 112 may control functions of the vehicle 100 based on input received from various subsystems (eg, the travel system 102 , the sensing system 104 , and the control system 106 ), as well as from the user interface 116 .
  • computer system 112 may utilize input from control system 106 in order to control steering unit 132 to avoid obstacles detected by sensing system 104 and obstacle avoidance system 144 .
  • the computer system 112 is operable to provide control over many aspects of the vehicle 100 and its subsystems.
  • one or more of these components described above may be installed separately from or associated with the vehicle 100 .
  • data storage device 114 may exist partially or completely separate from vehicle 100 .
  • the components described above may be communicatively coupled together in a wired and/or wireless manner.
  • FIG. 1 should not be construed as limiting the embodiment of the present invention.
  • An autonomous vehicle traveling on a road can identify objects within its surroundings to determine adjustments to the current speed.
  • the object may be another vehicle, traffic control device, or other type of object.
  • each identified object may be considered independently and based on the object's respective characteristics, such as its current speed, acceleration, distance to the vehicle, etc., may be used to determine the speed at which the autonomous vehicle is to adjust.
  • the autonomous vehicle 100 or a computing device associated with the autonomous vehicle 100 (such as the computer system 112, the computer vision system 140, the data storage device 114 of FIG. state (eg, traffic, rain, ice on the road, etc.) to predict the behavior of the identified object.
  • each identified object is dependent on the behavior of the other, so all identified objects can also be considered together to predict the behavior of a single identified object.
  • the vehicle 100 can adjust its speed based on the predicted behavior of the identified object.
  • the self-driving car is able to determine what steady state the vehicle will need to adjust to (eg, accelerate, decelerate, or stop) based on the predicted behavior of the object.
  • other factors may also be considered to determine the speed of the vehicle 100 , such as the lateral position of the vehicle 100 in the traveling road, the curvature of the road, the proximity of static and dynamic objects, and the like.
  • the computing device may also provide instructions to modify the steering angle of the vehicle 100 such that the self-driving car follows a given trajectory and/or maintains contact with objects in the vicinity of the self-driving car (e.g., , the safe lateral and longitudinal distances of cars in adjacent lanes on the road.
  • objects in the vicinity of the self-driving car e.g., , the safe lateral and longitudinal distances of cars in adjacent lanes on the road.
  • the data storage device 114 in the computer system 112 may include a system memory, and the data running in the system memory may include the computer's operating system and application program APP.
  • the processor 113 in the computer system 112 can be connected to the system memory through the system bus, read the data in the system memory and process it.
  • the operating system includes Shell and kernel (kernel).
  • Shell is an interface between the user and the kernel of the operating system.
  • the shell is the outermost layer of the operating system. The shell manages the interaction between the user and the operating system: waiting for user input, interpreting user input to the operating system, and processing various operating system output.
  • the kernel consists of those parts of the operating system that manage memory, files, peripherals, and system resources. Directly interacting with hardware, the operating system kernel usually runs processes and provides communication between processes, providing CPU time slice management, interrupts, memory management, IO management, and so on.
  • Applications include programs related to controlling the automatic driving of cars, for example, programs that manage the interaction between self-driving cars and obstacles on the road, programs that control the route or speed of self-driving cars, and programs that control the interaction between self-driving cars and other self-driving cars on the road.
  • the application also exists on the system of the software deployment server deploying server.
  • the computer system 112 can download the application program from the deploying server when the application program needs to be executed.
  • the embodiment of the present application provides a processing method of an artificial intelligence model that can be specifically applied to the computer system 112 of FIG.
  • the artificial intelligence model processing device and the artificial intelligence model processing equipment provided by the implementation of this application can be specifically equivalent to the computer system 112 in FIG. 1 .
  • the process of training or inferring AI by the processor of the computer system 112 will be described below with reference to the schematic structural diagram of the AI computing architecture provided by the embodiment of the present application shown in FIG. 2 .
  • the AI computing architecture can be equivalent to the processor 113 in the computer system 112, which can specifically include a main processor (Host CPU) and an artificial intelligence processing unit, wherein:
  • Host CPU can include artificial intelligence processing unit driver (driver), runtime unit or runtime layer or user mode driver layer (Runtime) and library (Library), which means that Host CPU can read the above data in system memory or storage .
  • the artificial intelligence processing unit driver can provide the driving function of the artificial intelligence processing unit.
  • Runtime can provide the user mode interface (Application Programming Interface, API) of the artificial intelligence processing unit, and deploy it in the application program APP.
  • Library can provide the operator library function that can be directly executed on the operation logic unit of the artificial intelligence processing unit, which is convenient for APP development business functions.
  • the artificial intelligence processing unit can also be called an AI accelerator, and can include AI processors such as GPU or NPU, where the NPU can be a dedicated or customized neural network processor.
  • the artificial intelligence processing unit may include a control unit (or controller) and an operation logic unit.
  • APIs such as model (model), data stream (stream), task (task), and event (event) may be provided for the Runtime of the artificial intelligence processing unit.
  • the upper-level business such as APP
  • splits the AI calculation graph that is, the AI model
  • the control unit of the artificial intelligence processing unit can be used to receive the AI model delivered by the Host CPU, schedule the AI model for training or reasoning, and report the execution result to the Host CPU.
  • the operation logic unit can execute the tasks in the AI model issued by the control unit, and return the result of each task to the control unit (an AI model can contain multiple tasks).
  • the APP loads the AI model to the AI processing unit, and the AI processing unit saves the AI model.
  • the model only needs to be loaded once, and it can be executed multiple times later.
  • the AI processing unit is first notified to unload the previously loaded model.
  • the AI model also saves the loaded AI model according to the similar structure of stream, task, and event.
  • the AI accelerator does not support branch judgment algorithms.
  • branch judgment operators such as branch judgment operators
  • cyclic operators such as MaskRCNN-type network models and recurrent neural network (Recurrent Neural Network, RNN) type network models
  • the AI accelerator does not support branch judgment algorithms.
  • sub and loop operators when sub and loop operators are used, during model training or model execution, some calculations need to fall back to the Host CPU for execution, reducing the inference performance of the model.
  • the host CPU executes branch judgment operators and loop operators to trigger the control unit to schedule operators or tasks to the operation logic unit again. In this way, it interacts with the Host CPU multiple times, and some calculations need to be performed on the Host CPU to complete the branch judgment operator and loop operator in the entire AI model.
  • the inference or training performance of the model is not high.
  • the processing device 30 of the artificial intelligence model includes a processor 300 and an artificial intelligence processing unit 301, wherein:
  • the processor 300 is used to create an artificial intelligence AI model; wherein, the AI model includes a control operator and a calculation operator; and then sends the AI model to the artificial intelligence processing unit 301 based on the user mode interface API; the API includes a first API , the first API is used to deliver the control operator; the processor 300 may be equivalent to a main processor.
  • the program code of the AI model is set in the APP, which can be used to perform AI processing on the input data.
  • the process of the processor 300 reading the data of the APP and running the program code about the AI model inside is to create the AI model.
  • the AI model in this embodiment of the present application includes a control operator, and the control operator may be a code or a function that implements control logic.
  • an API for issuing the control operator may be provided at the Runtime layer; the processor 300 may issue the AI model to the artificial intelligence processing unit 301 by calling the API of the Runtime layer.
  • the artificial intelligence processing unit 301 is configured to execute the control operator and the calculation operator during the process of acquiring the artificial intelligence AI model and training or reasoning the AI model.
  • the processor 300 in the embodiment of the present application may include its own controller and computing unit, etc., for interpreting computer instructions and processing data in computer software.
  • the processor 300 is the core hardware unit of the processing device 30 of the artificial intelligence model, and is mainly responsible for calculation and overall coordination, including all hardware resources of the processing device 30 of the artificial intelligence model (such as memory, input and output units, and The artificial intelligence processing unit 301) performs control deployment and performs general operations.
  • the artificial intelligence processing unit 301 of the embodiment of the present application can actually be a processor or a processing chip, such as a dedicated or customized GPU or NPU, etc., which can be mounted on the processor 300 as a coprocessor, or coupled with the processor 300 , the task is assigned by the processor 300 .
  • the core component of the artificial intelligence processing unit 301 is an operation logic unit, which is controlled by the control unit to extract matrix data and perform operations.
  • the operation logic unit can include multiple processing units (Process Engine, PE).
  • PE Processing Engine
  • the arithmetic logic unit is a two-dimensional systolic array.
  • the arithmetic logic unit may also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the ALU is a general-purpose matrix processor.
  • the artificial intelligence processing unit 301 may also include a unified memory, a storage unit access controller (Direct Memory Access Controller, DMAC), a weight memory, a bus interface unit, a vector calculation unit, and an instruction fetch buffer (Instruction Fetch Buffer), etc. in:
  • the unified memory can be used to store input data and output data.
  • Weight data can be directly transferred to the weight memory through DMAC.
  • Input data can also be moved to unified memory via the DMAC.
  • the bus interface unit can be used for the interaction between the AXI bus and the DMAC and fetch memory. Specifically, it can be used for the instruction fetch memory to obtain instructions from the external memory, and for the DMAC to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to move the input data in the external memory to the unified memory or to move the weight data to the weight memory or to move the input data to the input memory.
  • the vector calculation unit may include multiple operation processing units, and if necessary, further process the output of the operation logic unit, such as vector multiplication, vector addition, exponent operation, logarithmic operation, size comparison and so on. It is mainly used for non-convolutional/FC layer network calculations in neural networks, such as Pooling (pooling), Batch Normalization (batch normalization), Local Response Normalization (local response normalization), etc.
  • the vector computation unit can store the processed output vectors to a unified register.
  • the vector computation unit may apply a non-linear function to the output of the arithmetic logic unit, such as a vector of accumulated values, to generate activation values.
  • the vector computation unit generates normalized values, merged values, or both.
  • the vector of processed outputs can be used as an activation input to an arithmetic logic unit, eg, for use in a subsequent layer in a neural network.
  • the instruction fetch memory connected to the control unit can be used to store instructions used or executed by the control unit.
  • an API for issuing control operators based on the API
  • the entire AI model can be executed independently without returning some control functions to the main processor for processing. It solves the technical problem that in the prior art, in the process of training or inferring the AI model, frequent interactions between the main processor and the AI accelerator are required to complete, resulting in low performance of model inference or model training; improve the performance of model inference or model training.
  • control operator in the embodiment of the present application may include a branch judgment operator, where the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator.
  • control operator also includes a loop operator, and the loop operator is used to loop execute the first calculation operator of the AI model;
  • the API also includes a second API and a third API, The second API is used to create a label; the third API is used to set the position of the label in the AI model.
  • the API (that is, the first API) for issuing the control operator may include an API for issuing a branch judgment operator and an API for issuing a loop operator. specifically:
  • the processor 300 can create or add 4 APIs for delivering AI models at the Runtime layer:
  • the second API create label (CreateLabel): used for positioning when jumping;
  • LabelSet (involving label, stream): place the label at the current position of the stream; that is to say, set its position in the execution sequence or data flow for the task with the label placed;
  • Branch judgment (involving value, condition, false_label, stream): compare the data (or value) in the condition register with the value condition, and if it is false, jump to the task execution after false_label.
  • developers can develop based on the above-mentioned API provided by the Runtime layer, split the AI calculation graph according to the development requirements, and convert it into streams, tasks, and events that can be processed by the artificial intelligence processing unit 301 Wait. Then, by calling the above-mentioned corresponding API, the control operators (such as the branch judgment operator and the loop operator) are delivered to the artificial intelligence processing unit 301 for execution.
  • the control operators such as the branch judgment operator and the loop operator
  • an API for issuing a loop operator can also be set, so that the AI model delivered to the artificial intelligence processing unit can enable the artificial intelligence processing unit to independently Complete the execution of the loop operator. And in the process of executing the cycle operator, the artificial intelligence processing unit can make relevant jumps through the API for creating tags and the API for setting the position of tags in the AI model, so that the cycle can be further completed quickly and efficiently Operator execution.
  • FIG. 4 a schematic structural diagram of an artificial intelligence model processing device according to another embodiment of the present application shown in FIG. 4 , illustrating one of the specific structures of the artificial intelligence processing unit provided by the present application and how to improve model reasoning Or the performance of model training.
  • the processing device 40 of the artificial intelligence model in FIG. 4 may include a control unit (or controller) 400 and an operation logic unit 402 .
  • the artificial intelligence model processing device 40 may also include other units or modules, such as unified memory, DMAC, weight memory, bus interface unit, etc., but it is not shown in FIG. 4 of this embodiment.
  • the processing device 40 of the artificial intelligence model may often include a plurality of operation logic units 402 .
  • the control unit 400 may include a plurality of first storage units (which may be referred to as first registers or condition registers (Condition, COND)), and when the control unit 400 is training or inferring the AI model, one first storage unit may correspond to the AI An execution sequence of the model, different execution sequences of synchronous parallelism correspond to different first storage units; the operation logic unit 402 includes a second storage unit (which may be called a second register or a special condition register (Cond_SPR)). Among them, the control unit 400 is used to schedule the execution of the AI model, that is, to schedule the operators or tasks of the model to the operation logic unit 402 for execution (such as calculation type operators), or the control unit 400 itself to execute (such as event type tasks).
  • the arithmetic logic unit executes a single computation-type operator. specifically:
  • the control unit 400 is used to obtain or read the AI model, the AI model includes control operators and calculation operators; the AI model is the AI model delivered by the main processor based on the API; the API includes a first API, and the first API uses Issue the control operator;
  • the second storage unit is used to store data after the calculation operator is executed by the operation logic unit 402 .
  • the control unit 400 is also used for reading the data of the second storage unit. Wherein, after executing the task, the operation logic unit 402 writes the executed data into its own second storage unit, and may notify the control unit 400 that the task is completed, so as to trigger the control unit 400 to read the data of the second storage unit. After the control unit 400 reads the data of the second storage unit, writes the data into the first storage unit corresponding to the execution sequence. Then the control unit 400 can execute the next task of the execution sequence based on the data stored in the first storage unit and the parameters in the control operator;
  • control operator in the embodiment of the present application is a control task that the control unit 400 needs to execute according to the data read from the operation logic unit 402 .
  • control operator may include a branch judgment operator, where the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator;
  • control unit 400 executes the next task of the execution sequence based on the data and the parameters in the control operator, it can specifically judge whether to execute the first branch operator based on the data and the parameters in the branch judgment operator. ;
  • branch judging operator Since the branch judging operator is used to judge whether to execute the first branch operator or to execute the second branch operator, if the judgment is yes, then execute the first branch operator; if the judgment is no, then execute the second branch operator.
  • the AI model in this embodiment of the present application may further include a loop operator, which is used to loop execute the first calculation operator of the AI model.
  • the control unit 400 may execute the loop operator; to schedule the operation logic unit 402 to loop execute the first calculation operator of the execution sequence until the control unit 400 executes the first calculation operator based on the data and the branch The parameter judgment in the judgment operator does not execute the first branch operator.
  • control unit 400 executes the branch judgment operator, it often judges whether it is true (true) or false based on the data and the parameters in the branch judgment operator through specific judgment logic or judgment conditions. (False), for example, when the judgment is true, then execute the first branch operator, when it is judged as False, then execute the second branch operator; also when it is judged as False, then execute the first branch operator, when it is judged to be true, execute the second branch operator.
  • false for example, when the judgment is true, then execute the first branch operator, when it is judged as False, then execute the second branch operator; also when it is judged as False, then execute the first branch operator, when it is judged to be true, execute the second branch operator.
  • by judging whether to execute the first branch operator it is correspondingly judged whether it is true (True) or false (False).
  • the embodiment of the present application may be based on the data and the parameters in the branch judgment operator when executing the branch judgment operator When the specific judgment logic or judgment condition is judged to be true, the loop operator is executed, or when the judgment is False, the loop operator is executed.
  • the operation logic unit 402 is called by the control unit 400 to execute a certain computing operators (such as the first computing operator in this application). Then, when the control unit 400 is executing the loop operator, the control unit 400 may be triggered to call the operation logic unit 402 to execute the certain calculation operator again, thereby jumping to the first calculation operator for loop execution. Until the control unit 400 judges not to execute the first branch operator based on the data and the parameters in the branch decision operator.
  • the above process of executing the branch judgment operator and the loop operator can be specifically implemented in the following manner:
  • the AI model of the embodiment of the present application may also include a first label and a second label used for jumping, and their respective positions in the execution sequence; where the first label is placed in the first calculation of the execution sequence In the previous task adjacent to the operator, the second label is placed in the previous task adjacent to the second branch operator;
  • control unit 400 executes the next task of the execution sequence based on the data and the parameters in the control operator, it includes that the control unit 400 dispatches the first calculation operator to the operation the tasks performed by the logic unit;
  • control unit 400 determines to execute the first branch operator, it can execute the loop operator and jump to the position of the first label, then the next task is to return to the first calculation operator of the execution sequence. That is, the control unit 400 schedules the operation logic unit 402 to execute the first calculation operator again for iteration;
  • control unit 400 executes the second branch operator, it jumps to the location where the second label is located, and then the next task is the second branch operator to execute the second branch operator.
  • the control unit 400 is further configured to output instruction information indicating that the execution sequence is completed when the execution sequence is completed.
  • the operation logic unit can write the data after executing the task into the second storage unit, then the control The unit can read the data of the second storage unit, and write the data into the first storage unit corresponding to the execution sequence. That is to say, the control unit can read the data after the execution of the current operator, so as to control the execution of subsequent operators according to the execution result of the current operator.
  • FIG. 50 includes an artificial intelligence processing unit, and the artificial intelligence processing unit may include a control unit 500 and an operation logic unit 502 .
  • the artificial intelligence model processing device 50 may further include a main processor 504 .
  • the control unit 500 is coupled with a main processor 504 .
  • a certain number of first storage units can be customized or set for the control unit 500 according to requirements.
  • the first storage unit may only allow the control unit 500 to read and write.
  • one of the first storage units corresponds to an execution sequence of the AI model, and different synchronous and parallel execution sequences correspond to different first storage units. unit.
  • AI model 1 may correspond to two execution sequences, and the two execution sequences correspond to the first storage unit 0 and the first storage unit 1 respectively. That is to say, execution sequence 0 (including task 01, task 02, etc.) corresponds to the first storage unit 0; execution sequence 1 (including task 11, task 12, etc.) corresponds to the first storage unit 1. Among them, the number of tasks corresponding to execution sequence 0 and the number of tasks corresponding to execution sequence 1 can be coded and set by AI model 1 according to requirements.
  • the AI model 2 may correspond to one execution sequence, and the one execution sequence corresponds to the first storage unit 2 respectively.
  • the AI model 3 may correspond to three execution sequences, and the three execution sequences correspond to the first storage unit 3 , the first storage unit 4 and the first storage unit 5 respectively.
  • Each ALU 502 may include a respective second storage unit.
  • the operation logic unit 502 can write the data or calculation results after executing the task into its second storage unit; the control unit 500 can access the second storage unit and read the data in the second storage unit.
  • the embodiment of the present application is not limited to 3 AI models, nor is it limited to 6 execution sequences in FIG. 5 .
  • the processing device of the embodiment of the present application can set the number of first storage units in the control unit according to actual scene requirements, for example, if 50 first storage units are set, then the control unit can process 50 execution sequences in parallel at the same time, and the 50 execution sequences A sequence can be the execution sequence of one or more AI models.
  • 10 AI models need to be loaded for training or inference.
  • the 10 AI models have a total of 100 execution sequences, so the first 50 execution sequences can be sent to the control unit for scheduling and execution. After the execution is completed and the results are returned, the subsequent execution sequences can be issued and executed in sequence until All execution sequences are executed to completion.
  • the second storage unit in the operation logic unit 502 will only store the data after the calculation operator was executed last time, and after the subsequent execution of the new calculation operator, the latest calculation operator will be executed The data after the operator is refreshed and stored in the second storage unit.
  • the control unit will also refresh and write the read new data into its first storage unit, that is, the first storage unit in the control unit may only store the data written once, when a new data write When entering, the old data will be deleted, and only the newly written data will be stored to refresh the data stored in it.
  • the processing device 50 of the artificial intelligence model in the embodiment of the present application may include, but not limited to, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a mobile internet device (mobile internet device, MID), a wearable device, a virtual reality ( virtual reality (VR) equipment, augmented reality (augmented reality, AR) equipment, wireless terminals in industrial control (industrial control), wireless terminals in self driving (self driving), wireless in remote surgery (remote medical surgery) Terminals, wireless terminals in smart grid, wireless terminals in transportation safety, wireless terminals in smart city, wireless terminals in smart home, intelligent robots, vehicle-mounted systems or vehicles containing cockpit domain controllers, etc.
  • the artificial intelligence model processing device 50 may also include at least one of other processors, storage modules, communication modules, display screens, batteries, battery management modules, multiple sensors based on different functions and other units. Only the processing device 50 of the artificial intelligence model in FIG. 5 is not shown.
  • the initial value of i is 0, and when i ⁇ 10, the i+1 operation is executed in a loop.
  • the coding can be as follows:
  • the location of label1 (that is, the location of the execution task where label1 is placed) is adjacent to the Merge operator and before the Merge operator.
  • the location of label2 (that is, the location of the execution task where the label2 is placed) is adjacent to the Exit operator and before the Exit operator.
  • FIG. 7 The execution sequence of stream1 in the AI model may include 8 tasks.
  • the execution sequence of stream1 corresponds to the first storage unit in FIG. 7 .
  • the control unit 500 After the application program (such as APP1) passes the above code through the Runtime layer and sends it to the control unit 500 through the NPU driver, that is to say, after sending the model1 that needs training or reasoning to the control unit 500, the control unit 500 is triggered to schedule the AI model.
  • the calculation operators of the execution sequence are executed by the operation logic unit 502 .
  • the control unit 500 executes the cyclic Goto operator, based on label1, jumps to the position where label1 is located in the execution sequence of the AI model (that is, the position where the execution task with the label label1 is placed), and triggers the operation logic unit 502 to iteratively execute the execution sequence task.
  • the control unit 500 executes the branch judgment operator (Switch), and when the judgment is no, it jumps to the position where the second label label2 is located in the execution sequence of the AI model based on label2 (that is, the position where the execution task of the label label2 is placed), Then schedule the exit task (ie Exit operator) to the operation logic unit 502; after receiving the notification that the operation logic unit 502 finishes executing the exit task, output the instruction information of the completion of execution.
  • the branch judgment operator Switch
  • Step S800 the control unit dispatches the Enter task to the operation logic unit for execution
  • Step S802 the operation logic unit executes the Enter task
  • Step S804 after the execution of the operation logic unit is completed, notify the control unit of the execution completion;
  • the result or data after executing the task may be stored in its second storage unit.
  • Step S806 After receiving the notification, the control unit can directly skip the task or operator (Label1task) of the first label and execute the following tasks;
  • control unit can read the data in the second storage unit after learning that the operation logic unit completes the Enter task, and save (or write) the data into the first storage unit COND corresponding to the execution sequence; and Executing the Label1task, the control unit directly skips the Label1task, that is, executes step S808.
  • Step S808 the control unit dispatches the Merge task to the operation logic unit for execution; in this embodiment of the application, the Merge task may be equivalent to the first calculation operator.
  • Step S810 the operation logic unit executes the Merge task, and writes i into its second storage unit COND_SPR;
  • the operation logic unit executes the Merge task for the first time
  • the i value is directly transparently transmitted to the Merge task by the Enter task.
  • the i value is passed to the Merge task by the loop operator.
  • Step S812 after the execution of the operation logic unit is completed, notify the control unit of the execution completion;
  • Step S814 After receiving the notification, the control unit reads the value of the COND_SPR (that is, i), and writes the read data into the first storage unit COND corresponding to stream1;
  • control unit can read the data (that is, i) in the second storage unit after learning that the operation logic unit completes the Merge task, and write the data into the first storage unit COND corresponding to the execution sequence. It can be understood that, if the first storage unit COND has already stored data, then writing this time is equivalent to refreshing the data in the first storage unit COND.
  • Step S816 the control unit executes the Switch task, and performs judgment processing according to the read data and the parameters in the Switch task (for example, compare i with value, and judge whether i is less than value), if it is true, continue to execute the following task ( Promptly execute step S818), if false, jump to label2 (ie jump to step S824);
  • step S818 if it is judged that the current i is less than 10, execute step S818; if it is not less than 10, then jump to step S824.
  • Step S818 the control unit dispatches the added task Add task (equivalent to the next task, such as the first branch operator) to the operation logic unit for execution;
  • the loop operator in the subsequent step S822 is executed.
  • the embodiment of the present application is not limited thereto.
  • Step S820 the operation logic unit is completed, and the control unit is notified;
  • the operation logic unit can execute Add task (the first branch operator), such as adding 1 to the i value, storing the current i value in its second storage unit, and notifying the control unit after the execution is completed.
  • Add task the first branch operator
  • Step S822 the control unit executes the loop operator (Goto task), and then unconditionally jumps to label1 to start execution (that is, jumps to step S806 to start execution, and then triggers the execution of the first calculation operator Merge task again);
  • control unit can read the data in the second storage unit (that is, the current i value) after learning that the operation logic unit has executed the Goto task, and write the data into the first storage unit COND corresponding to the execution sequence . Then when the Goto task is executed, the data in the current first storage unit COND (that is, the current i value) is passed to the Merge task.
  • the control unit of the embodiment of the present application supports reading data from the second storage unit of the logical operation unit, so that the execution of the branch judgment operator can be completed inside the control unit without repeated interaction with the Host CPU.
  • Step S824 For the Lable2 task (which may be equivalent to another next task, such as the second branch operator), the control unit can skip the Lable2 task, and dispatch the Exit task to the operation logic unit for execution;
  • Step S826 the operation logic unit is completed, and the control unit is notified;
  • Step S828 After receiving the notification, the control unit judges that the execution sequence is completed, and then outputs instruction information indicating that the execution is completed.
  • the instruction information of execution completion may be output to the NPU driver.
  • control unit 500 of the embodiment of the present application can also be used to set the second storage unit of the operation logic unit 502 to an invalid value before reading the data of the second storage unit; and then control The unit 500 subsequently writes the read data into the first storage unit corresponding to the execution sequence when it determines that the read data of the second storage unit is a valid value. If it is determined to be an invalid value, the read data is not written into the first storage unit corresponding to the execution sequence.
  • the present application also provides an artificial intelligence model processing device and an artificial intelligence model processing method, which will be described below with reference to FIGS. 10 to 13 .
  • FIG. 10 is a schematic structural diagram of an artificial intelligence model processing device provided in the embodiment of the present application.
  • the artificial intelligence model processing device 16 may be the processor 300 or the processor 30 of the artificial intelligence model processing device 30 in the embodiment of FIG. 3
  • the main processor 504 of the artificial intelligence model processing device 50 in the embodiment of FIG. 5 may include a creation unit 160 and a delivery unit 162, wherein:
  • the creation unit 160 is used to create an artificial intelligence AI model;
  • the AI model includes a control operator and a calculation operator;
  • the creating unit 160 may be equivalent to the program code executed in the processor 300 or the main processor 504 for creating an artificial intelligence AI model.
  • the sending unit 162 is used to send the AI model to the artificial intelligence processing unit based on the user state interface API; wherein, the API includes a first API, and the first API is used to send the control operator; the artificial intelligence processing unit It is used to execute the control operator and the calculation operator during the process of training or reasoning the AI model.
  • the sending unit 162 may be equivalent to the program code executed in the processor 300 or the main processor 504 for sending the AI model to the artificial intelligence processing unit based on the user mode interface API.
  • control operator includes a branch judgment operator, and the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator.
  • control operator also includes a loop operator, which is used to loop execute the first calculation operator of the AI model;
  • API also includes a second API and a third API, the first The second API is used to create a label; the third API is used to set the position of the label in the AI model.
  • the creation unit 160 and the delivery unit 162 please refer to the process of processing the AI model by the processor 300 or the main processor 504 in the embodiments of Fig. 3 to Fig. 8 above, and details will not be repeated here.
  • FIG. 11 shows a schematic structural diagram of an artificial intelligence model processing device according to another embodiment provided by the present application.
  • the artificial intelligence model processing device 17 may be the artificial intelligence processing device 30 of the artificial intelligence model processing device 30 in the embodiment of Figure 3 Unit 301, or the artificial intelligence processing unit of the artificial intelligence model processing device 40 in FIG. 4 , or the artificial intelligence model processing device 50 of the embodiment in FIG. 5 .
  • the processing device 17 of the artificial intelligence model may include an acquisition unit 170 and an execution operator unit 172, wherein:
  • the acquiring unit 170 is used to acquire an artificial intelligence AI model;
  • the AI model includes control operators and calculation operators, and the AI model is an AI model issued by the processor based on the user mode interface API;
  • the API includes a first API, the first API is used to issue the control operator;
  • the execution operator unit 172 is used for executing the control operator and the calculation operator during the process of training or reasoning the AI model.
  • the acquisition unit 170 may be equivalent to the control unit executed in the artificial intelligence processing unit 301 or the artificial intelligence model processing device 40 or the artificial intelligence model processing device 50 for obtaining the artificial intelligence AI
  • the program code for the model may be equivalent to the control unit executed in the artificial intelligence processing unit 301 or the artificial intelligence model processing device 40 or the artificial intelligence model processing device 50 for obtaining the artificial intelligence AI
  • the program code for the model may be equivalent to the control unit executed in the artificial intelligence processing unit 301 or the artificial intelligence model processing device 40 or the artificial intelligence model processing device 50 for obtaining the artificial intelligence AI
  • the program code for the model may be equivalent to the control unit executed in the artificial intelligence processing unit 301 or the artificial intelligence model processing device 40 or the artificial intelligence model processing device 50 for obtaining the artificial intelligence AI
  • the program code for the model may be equivalent to the control unit executed in the artificial intelligence processing unit 301 or the artificial intelligence model processing device 40 or the artificial intelligence model processing device 50 for obtaining the artificial intelligence AI
  • the program code for the model may be equivalent to the control unit executed in the artificial intelligence processing unit
  • the execution operator unit 172 can be equivalent to the artificial intelligence processing unit 301 or the processing device 40 of the artificial intelligence model or the control unit and the operation logic unit in the artificial intelligence processing unit 50 of the artificial intelligence model.
  • the execution operator unit 172 may include a first execution unit 1720, a storage processing unit 1721, and a second execution unit 1722, wherein:
  • the first execution unit 1720 is used to execute calculation operators in the AI model through the operation logic unit of the artificial intelligence processing unit during the process of training or reasoning the AI model; that is to say, the first execution unit 1720 can be equivalent to The program code used to execute the calculation operator in the AI model on the operation logic unit.
  • the storage processing unit 1721 is used to store the data after executing the calculation operator in the storage unit of the artificial intelligence processing unit; Subsequent data is stored in the program code in the storage unit of the artificial intelligence processing unit.
  • the second execution unit 1722 is configured to execute the control operator based on the data in the storage unit through the control unit of the artificial intelligence processing unit. Specifically, the second execution unit 1722 may be equivalent to the program code on the control unit for executing the control operator based on the data in the storage unit.
  • the storage unit in the artificial intelligence processing unit may include a first storage unit and a second storage unit;
  • the storage processing unit 1721 may be specifically configured to: store the data after executing the calculation operator in the second storage unit;
  • the second execution unit 1722 may specifically include:
  • a first reading unit configured to read data in the second storage unit
  • a first writing unit configured to write data in the second storage unit into the first storage unit
  • the read execution unit is used to read and execute the control operator based on the data in the first storage unit.
  • the read execution unit may specifically include:
  • a second reading unit configured to read data in the first storage unit
  • a judging unit configured to judge whether to execute the first branch operator based on the data in the first storage unit and the parameters in the branch judging operator;
  • the judging processing unit is configured to execute the first branch operator if the judging unit judges yes, and execute the second branch operator if the judging unit judges no.
  • control operator may also include a loop operator, and the loop operator is used to cyclically execute the first calculation operator of the AI model;
  • processing device 17 of the artificial intelligence model may also include a third
  • the execution unit 174 is configured to execute the loop operator after the judgment processing unit executes the first branch operator, so as to loop execute the first calculation operator through the operation logic unit until the judgment is negative.
  • the third execution unit 174 may be equivalent to the program code on the control unit for executing the loop operator.
  • the API also includes a second API and a third API, the second API is used to create a label; the third API is used to set the position of the label in the AI model; the AI model includes The first label and the second label used for jumping; the first label is placed in the previous operator adjacent to the first calculation operator of the AI model, and the second label is placed in the second branch In the previous operator adjacent to the operator; the processing device 17 of the artificial intelligence model may also include a fourth execution unit 176, which is used to read in the read execution unit and execute the operation based on the data in the first storage unit. Before controlling the operator, the first calculation operator is executed by the operation logic unit; that is, the fourth execution unit 176 may be equivalent to the program code for executing the first calculation operator on the operation logic unit.
  • the third execution unit 174 is specifically configured to: execute the loop operator after the judgment processing unit executes the first branch operator, and jump to the position where the first label is located, so as to execute the loop through the operation logic unit. first calculation operator;
  • the judging processing unit is specifically configured to jump to the location where the second label is located to execute the second branch operator.
  • the processing device 17 of the artificial intelligence model may also include a setting unit 178, which is used to set the second storage unit set to an invalid value; specifically, the setting unit 178 may be equivalent to a program code on the control unit for setting the second storage unit to an invalid value.
  • the first writing unit is specifically configured to: write the data of the second storage unit into the first storage unit when it is judged that the read data of the second storage unit is a valid value.
  • the specific implementation of the artificial intelligence model processing device 17 can refer to the artificial intelligence processing unit 301 of the artificial intelligence model processing device 30 in the above-mentioned embodiments of FIG. 3 to FIG. 8 , or the artificial intelligence model of FIG. 4
  • the process of processing the AI model by the processing device 40 of the AI model or the AI processing unit of the AI model processing device 50 in the embodiment of FIG. 5 will not be repeated here.
  • Figure 12 is a schematic flowchart of a processing method for an artificial intelligence model provided by the embodiment of the present application, which is applied to the processor 300 of the processing device 30 of the artificial intelligence model in the embodiment of Figure 3 or the artificial intelligence model in the embodiment of Figure 5
  • the main processor 504 of the model processing device 50 may include the following steps:
  • Step S120 the processor (or called the main processor) creates an artificial intelligence AI model; the AI model includes control operators and calculation operators;
  • Step S122 sending the AI model to the artificial intelligence processing unit based on the user interface API.
  • the API includes a first API, and the first API is used to issue the control operator; the artificial intelligence processing unit is used to execute the control operator and the calculation operator in the process of training or reasoning the AI model .
  • FIG. 13 a schematic flow chart of another embodiment of the processing method of the artificial intelligence model provided by the present application is applied to the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model in the embodiment of FIG. 3 , or in FIG. 4
  • the processing device 40 of the artificial intelligence model, or the artificial intelligence processing unit of the processing device 50 of the artificial intelligence model of the embodiment of FIG. The following steps can be performed:
  • Step S130 Read the artificial intelligence AI model; the AI model includes control operators and computing operators, the AI model is an AI model issued by the processor based on the user mode interface API; the API includes a first API, and the first API Used to issue the control operator;
  • Step S132 Execute the calculation operator, store the data after executing the calculation operator in the storage unit; execute the control operator based on the data in the storage unit.
  • the artificial intelligence processing unit can execute calculation operators in the AI model through the operation logic unit, and the operation logic unit stores the data after executing the calculation operators in the storage unit; then the control unit can execute the control operator based on the data in the storage unit.
  • the storage unit may include a first storage unit and a second storage unit; then storing the data after executing the calculation operator in the storage unit may include: executing the calculation operator The subsequent data is stored in the second storage unit;
  • Executing the control operator based on the data in the storage unit may include: reading the data in the second storage unit, writing the data in the second storage unit into the first storage unit; reading and based on the The data in the first storage unit executes the control operator.
  • the first storage unit may be integrated in the control unit, that is to say, the first storage unit may be added in the control unit, and the first storage unit may be a dedicated register of the control unit.
  • the second storage unit may be integrated in the operation logic unit, that is to say, the second storage unit may be added in the operation logic unit, and the second storage unit may be a dedicated register of the operation logic unit. It can be further realized that the control unit can quickly and efficiently read the data after the operator or task is executed by the operation logic unit, so that the execution of subsequent operators can be controlled according to the execution result of the current operator. It realizes that the entire AI model is executed in the controller and the operation logic unit, without returning some control functions to the main processor for processing.
  • each operation logic unit in the artificial intelligence processing unit can increase the second storage unit (self-specific register), so that each operation logic unit can be used to cooperate with the control unit to execute the control operator, thereby further improving the artificial intelligence.
  • the performance of intelligent processing unit model inference or model training.
  • the AI model corresponds to at least one execution sequence, and each of the first storage units corresponds to a different execution sequence.
  • the processor can customize the number of first storage units in the control unit according to the number of processed AI models and the number of execution sequences corresponding to each AI model.
  • control operator may include a branch judgment operator and a loop operator, where the branch judgment operator is used to judge whether to execute the first branch operator or the second branch operator;
  • the reading and executing the control operator based on the data in the first storage unit may include: reading the data in the first storage unit; based on the data in the first storage unit and the branch judgment operator The parameter judges whether to execute the first branch operator; if it is judged to be yes, then execute the first branch operator; if it is judged to be no, then execute the second branch operator.
  • control operator may also include a loop operator, then after executing the first branch operator, it may also include: executing the loop operator, so as to iteratively execute through the operation logic unit The calculation operator of the AI model until the judgment is no.
  • the AI model may include a first label and a second label for jumping; where the first label is placed on the previous operator adjacent to the first calculation operator of the AI model In the child, the second label is placed in the operator adjacent to the second branch operator; then before the read and execution of the control operator based on the data in the first storage unit, it may also include : Execute the first calculation operator by the operation logic unit;
  • Executing the loop operator to iteratively execute the calculation operator of the AI model through the operation logic unit may include: executing the loop operator, jumping to the position where the first label is located, so as to iterate through the operation logic unit Execute the first calculation operator of the AI model; if the judgment is no, then execute the second branch operator, including: if the judgment is no, then jump to the position where the second label is located to execute the second branch operator Two-branch operator.
  • before reading the data in the second storage unit it may also include: setting the second storage unit to an invalid value; then writing the data in the second storage unit Entering the first storage unit may include: when it is judged that the read data of the second storage unit is a valid value, writing the data of the second storage unit into the first storage unit.
  • the specific implementation of the processing method of the artificial intelligence model in this embodiment can refer to the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model in the above-mentioned embodiments of FIGS. 3 to 8 , or the processing device of the artificial intelligence model in FIG. 4 40, or the artificial intelligence processing unit of the artificial intelligence model processing device 50 in the embodiment of FIG. 5 performs AI model processing, which will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, wherein the computer-readable storage medium can store a program, and when the program is executed by the processor of the embodiment of the present application, it includes any one of the methods described in the above-mentioned method embodiments. Part or all of the steps of the processing method of the artificial intelligence model.
  • an artificial intelligence AI model can be created; the AI model includes control operators; and then the AI model is sent to the artificial intelligence processing unit based on the user mode interface API.
  • the API includes an API for issuing the control operator; the artificial intelligence processing unit is used for executing the control operator during the process of training or reasoning the AI model.
  • the program when executed by the processor, it can acquire or read the artificial intelligence AI model; the AI model includes control operators, and the AI model is an AI model issued by the processor based on the user mode interface API by creating an AI model ;
  • the API includes an API for issuing the control operator; and then executing the control operator during the process of training or reasoning the AI model.
  • Its specific implementation can refer to the artificial intelligence processing unit 301 of the processing device 30 of the artificial intelligence model in the above-mentioned embodiments of FIG. 3 to FIG. 8 , or the processing device 40 of the artificial intelligence model of FIG. The process of processing the AI model by the artificial intelligence processing unit of the model processing device 50 will not be repeated here.
  • the embodiment of the present application also provides a computer program, the computer program includes instructions, when the computer program is executed by a multi-core processor, the processor in the embodiment of the present application can execute any part of the artificial intelligence model processing method or all steps.
  • the disclosed methods can be implemented as computer program instructions encoded in a machine-readable format on a computer-readable storage medium or on other non-transitory media or articles of manufacture.
  • Figure 14 schematically illustrates a conceptual partial view of an example computer program or computer program product comprising a computer program for executing a computer process on a computing device, arranged according to at least some embodiments presented herein.
  • the example computer program product 1400 is provided using a signal bearing medium 1401 .
  • the signal-carrying medium 1401 may include one or more program instructions 1402, which, when executed by one or more processors, may provide the processor 300 or main processor 504, or artificial The functions or partial functions described by the artificial intelligence processing unit 301 of the processing device 30 of the intelligent model, or the processing device 40 of the artificial intelligence model in FIG. 4 , or the processing device 50 of the artificial intelligence model in the embodiment of FIG. 5 .
  • signal bearing medium 1401 may comprise computer readable medium 1403 such as, but not limited to, a hard drive, compact disc (CD), digital video disc (DVD), digital tape, memory, read-only memory (Read Only Memory) -Only Memory, ROM) or Random Access Memory (Random Access Memory, RAM) and so on.
  • signal bearing media 1401 may comprise computer recordable media 1404 such as, but not limited to, memory, read/write (R/W) CDs, R/W DVDs, and the like.
  • signal bearing media 1401 may include communication media 1405 such as, but not limited to, digital and/or analog communication media (eg, fiber optic cables, waveguides, wired communication links, wireless communication links, etc.).
  • signal bearing medium 1401 may be conveyed by a wireless form of communication medium 1405 (eg, a wireless communication medium that complies with the IEEE 802.11 standard or other transmission protocol).
  • One or more program instructions 1402 may be, for example, computer-executable instructions or logic-implemented instructions.
  • the plurality of program instructions 1402 communicated to the computing device to provide various operations, functions, or actions.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division.
  • there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the above integrated units are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, server, or network device, etc., specifically, a processor in the computer device) execute all or part of the steps of the above-mentioned methods in various embodiments of the present application.
  • the aforementioned storage medium may include: U disk, mobile hard disk, magnetic disk, optical disc, read-only memory (Read-Only Memory, abbreviated: ROM) or random access memory (Random Access Memory, abbreviated: RAM) and the like.
  • ROM Read-Only Memory
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Stored Programmes (AREA)

Abstract

一种人工智能模型的处理方法、装置、设备及可读存储介质,其中的人工智能模型的处理方法,应用于人工智能处理单元,人工智能处理单元包括控制单元、运算逻辑单元以及存储单元;方法包括:获取人工智能AI模型;AI模型包括控制算子和计算算子,AI模型为处理器基于用户态接口API下发的AI模型;API包括第一API,第一API用于下发控制算子(S130);执行计算算子,运算逻辑单元将执行计算算子后的数据存储在存储单元中;基于存储单元中的数据执行控制算子(S132)。所述方法可以提升模型推理或模型训练的性能。

Description

人工智能模型的处理方法、装置、设备及可读存储介质 技术领域
本申请涉及处理器技术领域,尤其涉及一种人工智能模型的处理方法、人工智能模型的处理装置、人工智能模型的处理设备、计算机可读存储介质以及计算机程序。
背景技术
人工智能(Artificial Intelligence,AI)是利用数字计算机或者由数字计算机控制的机器,模拟、延伸和扩展人类的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术和应用***。AI是当前热门的科学和世界发展的前沿技术,应用到生活中各种各样的场景中。
自动驾驶***有大量的场景需要用到AI模型推理,AI模型基本上都是深度神经网络,神经网络是矩阵和矢量计算密集型,对算力要求很高(T级)。普通的CPU一般不能满足深度神经网络,也就是AI模型的海量计算的算力需求,因此需要用到专用的加速器来执行AI模型,比如专门定制的图形处理器(Graphics Processing Unit,GPU)或神经网络处理器(Network Process Unit,NPU)等。
应用程序将AI模型加载到AI加速器,通过AI加速器对AI模型进行训练或推理(执行)。现有技术中的AI加速器遇到存在条件分支判断算子和循环算子的时候,由于AI加速器只能执行当前算子,不能根据当前算子的执行结果控制后面算子的执行;那么AI加速器需要将部分控制功能回落到主处理器上执行。因此,在对AI模型进行训练或推理的过程中,需要主处理器和AI加速器相互之间进行频繁的交互才能完成,造成模型推理或模型训练的性能不高。
发明内容
本申请实施例提供一种人工智能模型的处理方法、人工智能模型的处理装置、人工智能模型的处理设备、计算机可读存储介质以及计算机程序,以提升模型推理或模型训练的性能。
第一方面,本申请实施例提供了一种人工智能模型的处理方法,应用于人工智能处理单元,该人工智能处理单元包括控制单元、运算逻辑单元以及存储单元,该方法包括:
该人工智能处理单元获取由处理器侧基于用户态接口(Application Programming Interface,API)下发的AI模型;其中,该AI模型包括控制算子和计算算子,该API包括第一API,该第一API用于下发该控制算子;
该人工智能处理单元在训练或推理该AI模型的过程中,通过该运算逻辑单元执行该计算算子,并且该运算逻辑单元在执行该计算算子后,会将执行后的数据存储在该存储单元中;
该人工智能处理单元通过该控制单元基于该存储单元中的数据来执行该控制算子。
本申请实施例,通过设置用于下发控制算子的API,基于该API即可实现向人工智能处理单元下发包括控制算子的AI模型,以使该人工智能处理单元在训练或推理该AI模型的过程中,通过在人工智能处理单元中设置存储单元,运算逻辑单元可以将执行算子或任 务后的数据存储在该存储单元中,以便于控制单元可直接基于该存储单元中的数据来执行控制算子,从而实现了该控制单元可根据当前算子的执行结果控制后面算子的执行。实现了整个AI模型都在控制器和运算逻辑单元内执行,无需将部分控制功能返回主处理器处理;解决了现有技术中在对AI模型进行训练或推理的过程中,需要主处理器和AI加速器相互之间进行频繁的交互才能完成,造成模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
在一种可能的实现方式中,该存储单元包括第一存储单元和第二存储单元;上述将执行该计算算子后的数据存储在该存储单元中包括:将执行该计算算子后的数据存储在该第二存储单元中;
上述通过控制单元基于存储单元中的数据执行控制算子包括:通过控制单元读取该第二存储单元中的数据,然后将该第二存储单元中的数据写入该第一存储单元;那么在执行控制算子的时候,可以读取并基于该第一存储单元中的数据来执行该控制算子。
本申请实施例,通过在人工智能处理单元中设置第一存储单元和第二存储单元,该第二存储单元用于存储运算逻辑单元执行完算子或任务后的数据,并供控制单元读取到该第一存储单元,那么在执行控制算子的时候可直接基于该第一存储单元中的数据来执行控制算子,从而实现了该控制单元可根据当前算子的执行结果控制后面算子的执行。
在一种可能的实现方式中,该第一存储单元可集成在控制单元中,也就是说可以在控制单元中增加该第一存储单元,该第一存储单元可以为该控制单元的专用寄存器。该第二存储单元可集成在运算逻辑单元中,也就是说可以在运算逻辑单元中增加该第二存储单元,该第二存储单元可以为该运算逻辑单元的专用寄存器。可以进一步实现控制单元快速高效地读取到运算逻辑单元执行完算子或任务后的数据,从而可根据当前算子的执行结果控制后面算子的执行。实现了整个AI模型都在控制器和运算逻辑单元内执行,无需将部分控制功能返回主处理器处理。
其中,人工智能处理单元中每个运算逻辑单元都可增加第二存储单元(自身专用的寄存器),使得每一个运算逻辑单元都可用于配合控制单元来执行控制算子,从而可再进一步提高人工智能处理单元模型推理或模型训练的性能。
在一种可能的实现方式中,AI模型对应有至少一个执行序列,每个第一存储单元对应不同的执行序列。
本申请实施例中,处理器可以根据处理AI模型的数量,以及每个AI模型对应的执行序列的数量,实现定制化地设置控制单元中第一存储单元的数量。
在一种可能的实现方式中,该控制算子包括分支判断算子,其中该分支判断算子用于判断执行第一分支算子或第二分支算子;
上述读取并基于该第一存储单元中的数据执行该控制算子,包括:
读取该第一存储单元中的数据;
基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子;
若判断为是,则执行该第一分支算子;若判断为否,则执行该第二分支算子。
本申请实施例中控制算子可以包括分支判断算子,也就是说控制单元在执行下发的分支判断算子时可直接读取到分支判断需用到的数据,然后进一步下一算子或任务(第一分 支算子或第二分支算子)。实现了整个AI模型都在控制单元和运算逻辑单元内执行,无需将部分控制功能返回主处理器处理。
在一种可能的实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行所述AI模型的第一计算算子;上述执行该第一分支算子之后,还包括:执行该循环算子,以通过该运算逻辑单元跳转到AI模型的该第一计算算子的地方,从而循环执行该第一计算算子,以此循环迭代,直到下次基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子的判断结果为否时,结束该循环迭代,执行该第二分支算子。
本申请实施例中的AI模型还可以包括循环算子,执行该循环算子可以控制运算逻辑单元跳转到AI模型的该第一计算算子的地方,以循环执行该第一计算算子,从而实现了可以在控制单元内部可以直接执行AI模型中的分支判断算子和循环算子,无需与主处理器进行频繁的交互才能完成,解决了模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
在一种可能的实现方式中,该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在该AI模型中的位置;该AI模型还包括用于跳转的第一标签和第二标签;其中该第一标签放置在与该AI模型的第一计算算子相邻的上一个算子中,该第二标签放置在与该第二分支算子相邻的上一个算子中;
上述通过控制单元读取并基于该第一存储单元中的数据执行该控制算子之前,还包括:通过该运算逻辑单元执行该第一计算算子;
那么上述执行该循环算子,以通过该运算逻辑单元迭代执行该AI模型的计算算子,包括:执行该循环算子,跳转到该第一标签所在的位置,以通过该运算逻辑单元循环执行该第一计算算子;
上述若判断为否,则执行该第二分支算子,包括:若判断为否,则跳转到该第二标签所在的位置,以执行该第二分支算子。
本申请实施例中的AI模型还可以包括第一标签和第二标签,及其分别在AI模型中放置的位置;通过第一标签和第二标签的跳转,高效地实现分支判断算子后的算子,例如执行循环算子时的跳转。从而实现了可以在控制单元内部直接执行AI模型中的分支判断算子和循环算子,无需与主处理器进行频繁的交互才能完成,解决了模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
在一种可能的实现方式中,上述通过控制单元读取该第二存储单元中的数据之前,还包括:将该第二存储单元设置为无效值;
那么上述将该第二存储单元中的数据写入该第一存储单元,包括:在判断出读取的该第二存储单元的数据为有效值的情况下,将该第二存储单元的数据写入该第一存储单元中;若判断读取的该第二存储单元的数据为无效值,那么不将该第二存储单元的数据写入该第一存储单元中。
本申请实施例中,控制单元先设置运算逻辑单元的第二存储单元为无效值,后在判断读取的该第二存储单元的数据为有效值的情况下,才将该数据写入第一存储单元中,可很好地确保训练或推理AI模型的准确性。
第二方面,本申请提供一种人工智能模型的处理方法,该方法包括:
处理器(或称为主处理器)创建人工智能AI模型;该AI模型包括控制算子和计算算子;
基于API将该AI模型下发给人工智能处理单元;
其中,该API包括第一API,该第一API用于下发该控制算子;该人工智能处理单元用于在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
本申请实施例,通过设置用于下发控制算子的API,基于该API即可实现向人工智能处理单元下发包括控制算子的AI模型,以使该人工智能处理单元在训练或推理该AI模型的过程中,可独立执行该整个AI模型,无需将部分控制功能返回主处理器处理。解决了现有技术中在对AI模型进行训练或推理的过程中,需要主处理器和AI加速器相互之间进行频繁的交互才能完成,造成模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
在一种可能的实现方式中,该控制算子包括分支判断算子,该分支判断算子用于判断执行第一分支算子或第二分支算子。
本申请实施例,可以设置用于下发分支判断算子的API,那么向人工智能处理单元下发的AI模型,可以使得该人工智能处理单元在训练或推理该AI模型的过程中,可独立完成该分支判断算子的执行,无需与主处理器来回交互。
在一种可能的实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行所述AI模型的第一计算算子;该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在所述AI模型中的位置。
在一种可能的实现方式中,该人工智能处理单元包括控制单元、运算逻辑单元以及存储单元。该AI模型中的计算算子是由人工智能处理单元中的控制单元调度给该运算逻辑单元来执行,并且每次执行完计算算子后,将执行后的数据存储在该存储单元中。以便该人工智能处理单元通过该控制单元基于该存储单元中的数据来执行该控制算子。
本申请实施例,还可以设置用于下发循环算子的API,那么向人工智能处理单元下发的AI模型,可以使得该人工智能处理单元在训练或推理该AI模型的过程中,可独立完成该循环算子的执行。并且该人工智能处理单元在执行循环算子的过程中,可通过用于创建标签的API以及用于设置标签在该AI模型中的位置的API进行相关跳转,从而可进一步快速高效地完成循环算子的执行。
第三方面,本申请提供一种人工智能模型的处理装置,该装置为人工智能处理单元,包括:
获取单元,用于获取由处理器侧基于API下发的AI模型;其中,该AI模型包括控制算子和计算算子,该AI模型为处理器基于API下发的AI模型;该API包括第一API,该第一API用于下发该控制算子;
第一执行单元,用于在训练或推理该AI模型的过程中,通过该人工智能处理单元的运算逻辑单元执行该计算算子;
存储处理单元,用于在该第一执行单元执行该计算算子后,将执行后的数据存储在该人工智能处理单元的存储单元中;
第二执行单元,用于通过该人工智能处理单元的控制单元,基于该存储单元中的数据执行该控制算子。
在一种可能的实现方式中,该存储单元包括第一存储单元和第二存储单元;
该存储处理单元具体用于:将执行该计算算子后的数据存储在该第二存储单元中;
该第二执行单元包括:
第一读取单元,用于读取该第二存储单元中的数据;
第一写入单元,用于将该第二存储单元中的数据写入该第一存储单元;
读取执行单元,用于读取并基于该第一存储单元中的数据执行该控制算子。
在一种可能的实现方式中,该控制算子包括分支判断算子,其中该分支判断算子用于判断执行第一分支算子或第二分支算子;该读取执行单元包括:
第二读取单元,用于读取该第一存储单元中的数据;
判断单元,用于基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子;
判断处理单元,用于若该判断单元判断为是,则执行该第一分支算子;若该判断单元判断为否,则执行该第二分支算子。
在一种可能的实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行该AI模型的第一计算算子;该处理装置还包括:
第三执行单元,用于在该判断处理单元执行该第一分支算子之后,执行该循环算子,以通过该运算逻辑单元跳转到AI模型的该第一计算算子的地方,从而循环执行该第一计算算子,以此循环迭代,直到下次判断单元基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子的判断结果为否时,结束该循环迭代,由该判断处理单元执行该第二分支算子。
在一种可能的实现方式中,该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在该AI模型中的位置;该AI模型还包括用于跳转的第一标签和第二标签;其中该第一标签放置在与该AI模型的第一计算算子相邻的上一个算子中,该第二标签放置在与该第二分支算子相邻的上一个算子中;该处理装置还包括:
第四执行单元,用于在该读取执行单元读取并基于该第一存储单元中的数据执行该控制算子之前,通过该运算逻辑单元执行该第一计算算子;
该第三执行单元具体用于:在该判断处理单元执行该第一分支算子之后,执行该循环算子,跳转到该第一标签所在的位置,以通过该运算逻辑单元循环执行该第一计算算子;
若该判断单元判断为否,该判断处理单元具体用于,跳转到该第二标签所在的位置,以执行该第二分支算子。
在一种可能的实现方式中,AI模型对应有至少一个执行序列,每个该第一存储单元对应不同的执行序列。
在一种可能的实现方式中,该处理装置还包括:
设置单元,用于在该第一读取单元读取该第二存储单元中的数据之前,将该第二存储单元设置为无效值;
该第一写入单元具体用于:在判断出读取的该第二存储单元的数据为有效值的情况下, 将该第二存储单元的数据写入该第一存储单元中;若判断读取的该第二存储单元的数据为无效值,那么不将该第二存储单元的数据写入该第一存储单元中。
第四方面,本申请提供一种人工智能模型的处理装置,该处理装置包括:
创建单元,用于创建AI模型;该AI模型包括控制算子和计算算子;
下发单元,用于基于API将该AI模型下发给人工智能处理单元;
其中,该API包括第一API,该第一API用于下发该控制算子;该人工智能处理单元用于在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
在一种可能的实现方式中,该控制算子包括分支判断算子,该分支判断算子用于判断执行第一分支算子或第二分支算子。
在一种可能的实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行所述AI模型的第一计算算子;该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在所述AI模型中的位置。
第五方面,本申请提供了一种人工智能模型的处理设备,包括该人工智能处理单元和存储器;其中,该存储器用于存储程序代码,该人工智能处理单元调用该存储器存储的程序代码使得该人工智能模型的处理设备执行上述第一方面及其各种可能的实现方式中的方法。
第六方面,本申请提供了一种人工智能模型的处理设备,包括处理器和存储器;其中,该存储器用于存储程序代码,该处理器调用该存储器存储的程序代码使得该人工智能模型的处理设备执行上述第二方面及其各种可能的实现方式中的方法。
第七方面,本申请提供了一种人工智能模型的处理设备,包括处理器、人工智能处理单元和存储器;其中,该存储器可以包括多个,用于存储程序代码;该处理器与该人工智能处理单元耦合;该人工智能处理单元可以调用与自身耦合的存储器或调用自身内部的存储器存储的程序代码使得该人工智能模型的处理设备执行上述第一方面及其各种可能的实现方式中的方法。该处理器可以调用与自身耦合的存储器存储的程序代码使得该人工智能模型的处理设备执行上述第二方面及其各种可能的实现方式中的方法。
第八方面,本申请提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述第一方面或第二方面及其各个可能的实现方式的方法。
第九方面,本申请提供了一种计算机程序,该计算机可读程序包括指令,当该计算机程序被处理器执行时,使得该主处理器执行上述第一方面或第二方面及其各个可能的实现方式的方法。
附图说明
图1为本申请实施例提供的车辆100的功能框图。
图2为本申请实施例提供的AI计算架构的结构示意图。
图3为本申请实施例提供的人工智能模型的处理设备的结构示意图。
图4为本申请提供的另一实施例的人工智能模型的处理设备的结构示意图。
图5为本申请提供的另一实施例的人工智能模型的处理设备的结构示意图。
图6为本申请实施例提供的AI模型的循环语句的示意图。
图7为本申请实施例提供的AI模型中stream1的执行序列的示意图。
图8为本申请实施例提供的控制单元执行流程的示意图。
图9为本申请实施例提供的人工智能处理单元执行的原理示意图。
图10为本申请实施例提供的一种人工智能模型的处理装置的结构示意图。
图11为本申请提供的另一实施例的人工智能模型的处理装置的结构示意图。
图12为本申请实施例提供的一种人工智能模型的处理方法的流程示意图。
图13为本申请提供的另一种实施例的人工智能模型的处理方法的流程示意图。
图14为本申请实施例提供的计算机程序或计算机程序产品的概念性局部视图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例进行描述。本申请的说明书和权利要求书及该附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“***”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地***、分布式***和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它***交互的互联网)的信号通过本地和/或远程进程来通信。
首先,对本申请中的部分用语进行解释说明,以便于本领域技术人员理解。
(1)寄存器,是中央处理器内的组成部份,它跟CPU有关。寄存器是有限存贮容量的高速存贮部件,它们可用来暂存指令、数据和位址。在中央处理器的控制部件中,包含的寄存器有指令寄存器(IR)和程序计数器(PC)。在中央处理器的算术及逻辑部件中,包含的寄存器有累加器(ACC)。
(2)AI加速器,是一类专门的硬件加速器或计算机***,旨在加速人工智能的应用, 尤其是人工神经网络、机器视觉和机器学习。典型应用包括机器人技术,物联网和其他数据密集型或传感器驱动任务的算法。AI加速器作为一类专用于专用任务的硬件加速器,往往是对计算机***中的主处理器的协助或补充,例如包括但不限于用于执行AI模型的专门定制的GPU或NPU等。
(3)API,是一些预先定义的接口(如函数、HTTP接口),或指软件***不同组成部分衔接的约定。用来提供应用程序与开发人员基于某软件或硬件得以访问的一组例程,而又无需访问源码,或理解内部工作机制的细节。
(4)Runtime,是指一个程序在运行(cc或者在被执行)的状态,也就是说程序运行的时候。Runtime库就是程序运行的时候所需要依赖的库。在一些编程语言中,把某些可以重用的程序或者实例打包或者重建成为Runtime库。这些实例可以在它们运行的时候被链接或者被任何程序调用。本申请实施例的Runtime库中提供了人工智能处理单元(例如GPU或NPU)的API,例如包括用于生成并下发控制算子的API。
(5)执行序列,一个AI模型一般会拆分出多个stream,该stream即相当于执行序列。每个执行序列下有多个task(或称为算子,或task封装了一个算子),stream之间会有event进行同步。多个stream之间的task可以在人工智能处理单元上并行执行,stream内的task一般只能串行执行。本申请实施例中的算子或任务可以包括计算算子和控制算子等等,该计算算子可以用于数据计算,该控制算子可以用于控制执行序列的执行顺序。该算子或任务实质为AI模型中的代码,例如AI模型中的卷积代码就是一个算子或任务。也就是说,本申请实施例中的计算算子即为实现或完成数据计算的代码,通常在人工智能处理单元中的运算逻辑单元上运行,以完成数据计算任务。本申请实施例中的控制算子即为控制执行序列执行顺序的代码,可以通过人工智能处理单元中的控制单元来执行。
本申请实施例提供的人工智能模型的处理方法、人工智能模型的处理装置、人工智能模型的处理设备,可以针对需要用到AI加速器的所有应用场景,包括使用的卷积神经网络(ConvolutionalNeuralNetworks,CNN)模型、或掩膜基于区域的卷积神经网络(Mask Region-based CNN,Mask RCNN)模型对摄像头的图片进行AI处理的场景,例如智能车场景中的自动驾驶领域,驾驶员监控、泊车、自动驾驶等场景。也可以包括使用循环神经网络(Recurrent Neural Network,RNN)模型对数据进行AI处理的场景,例如智能车场景中汽车和驾驶员、车内乘客的语音交互的场景。
为了便于理解本申请实施例,进一步分析并提出本申请所具体要解决的技术问题。下面以车辆的自动驾驶场景为例进行说明:
首先结合图1示出的本申请实施例提供的车辆100的功能框图。在一个实施例中,将车辆100配置为完全或部分地自动驾驶模式。例如,车辆100可以在处于自动驾驶模式中的同时控制自身,并且可通过人为操作来确定车辆及其周边环境的当前状态,确定周边环境中的至少一个其他车辆的可能行为,并确定该其他车辆执行可能行为的可能性相对应的置信水平,基于所确定的信息来控制车辆100。在车辆100处于自动驾驶模式中时,可以将车辆100置为在没有和人交互的情况下操作。
车辆100可以为轿车、卡车、摩托车、公共汽车、船、飞机、直升飞机、割草机、娱 乐车、游乐场车辆、施工设备、电车、高尔夫球车、火车、和手推车等,本申请实施例不做特别的限定。
车辆100可包括各种子***,例如行进***102、传感***104、控制***106、一个或多个***设备108以及电源110、计算机***112和用户接口116。可选地,车辆100可包括更多或更少的子***,并且每个子***可包括多个元件。另外,车辆100的每个子***和元件可以通过有线或者无线互连。
行进***102可包括为车辆100提供动力运动的组件。在一个实施例中,行进***102可包括引擎118、能量源119、传动装置120和车轮/轮胎121。引擎118可以是内燃引擎、电动机、空气压缩引擎或其他类型的引擎组合,例如汽油发动机和电动机组成的混动引擎,内燃引擎和空气压缩引擎组成的混动引擎。引擎118将能量源119转换成机械能量。
能量源119的示例包括汽油、柴油、其他基于石油的燃料、丙烷、其他基于压缩气体的燃料、乙醇、太阳能电池板、电池和其他电力来源。能量源119也可以为车辆100的其他***提供能量。
传动装置120可以将来自引擎118的机械动力传送到车轮121。传动装置120可包括变速箱、差速器和驱动轴。在一个实施例中,传动装置120还可以包括其他器件,比如离合器。其中,驱动轴可包括可耦合到一个或多个车轮121的一个或多个轴。
传感***104可包括感测关于车辆100周边的环境的信息的若干个传感器。例如,传感***104可包括全球定位***122(全球定位***可以是GPS***,也可以是北斗***或者其他定位***)、惯性测量单元(inertialmeasurementunit,IMU)124、雷达126、激光测距仪128以及相机130。传感***104还可包括被监视车辆100的内部***的传感器(例如,车内空气质量监测器、燃油量表、机油温度表等)。来自这些传感器中的一个或多个的传感器数据可用于检测对象及其相应特性(位置、形状、方向、速度等)。这种检测和识别是自主车辆100的安全操作的关键功能。
全球定位***122可用于估计车辆100的地理位置。IMU124用于基于惯性加速度来感测车辆100的位置和朝向变化。在一个实施例中,IMU124可以是加速度计和陀螺仪的组合。
雷达126可利用无线电信号来感测车辆100的周边环境内的物体。在一些实施例中,除了感测物体以外,雷达126还可用于感测物体的速度和/或前进方向。
激光测距仪128可利用激光来感测车辆100所位于的环境中的物体。在一些实施例中,激光测距仪128可包括一个或多个激光源、激光扫描器以及一个或多个检测器,以及其他***组件。
相机130可用于捕捉车辆100的周边环境的多个图像。相机130可以是静态相机或视频相机。
控制***106为控制车辆100及其组件的操作。控制***106可包括各种元件,其中包括转向***132、油门134、制动单元136、传感器融合算法138、计算机视觉***140、路线控制***142以及障碍规避***144。
转向***132可操作来调整车辆100的前进方向。例如在一个实施例中可以为方向盘***。
油门134用于控制引擎118的操作速度并进而控制车辆100的速度。
制动单元136用于控制车辆100减速。制动单元136可使用摩擦力来减慢车轮121。在其他实施例中,制动单元136可将车轮121的动能转换为电流。制动单元136也可采取其他形式来减慢车轮121转速从而控制车辆100的速度。
计算机视觉***140可以操作来处理和分析由相机130捕捉的图像以便识别车辆100周边环境中的物体和/或特征。该物体和/或特征可包括交通信号、道路边界和障碍物。计算机视觉***140可使用物体识别算法、运动中恢复结构(StructurefromMotion,SFM)算法、视频跟踪和其他计算机视觉技术。在一些实施例中,计算机视觉***140可以用于为环境绘制地图、跟踪物体、估计物体的速度等等。
路线控制***142用于确定车辆100的行驶路线。在一些实施例中,路线控制***142可结合来自传感器融合算法138、GPS122和一个或多个预定地图的数据以为车辆100确定行驶路线。
障碍规避***144用于识别、评估和避免或者以其他方式越过车辆100的环境中的潜在障碍物。
当然,在一个实例中,控制***106可以增加或替换地包括除了所示出和描述的那些以外的组件。或者也可以减少一部分上述示出的组件。
车辆100通过***设备108与外部传感器、其他车辆、其他计算机***或用户之间进行交互。***设备108可包括无线通信***146、车载电脑148、麦克风150和/或扬声器152。
在一些实施例中,***设备108提供车辆100的用户与用户接口116交互的手段。例如,车载电脑148可向车辆100的用户提供信息。用户接口116还可操作车载电脑148来接收用户的输入。车载电脑148可以通过触摸屏进行操作。在其他情况中,***设备108可提供用于车辆100与位于车内的其它设备通信的手段。例如,麦克风150可从车辆100的用户接收音频(例如,语音命令或其他音频输入)。类似地,扬声器152可向车辆100的用户输出音频。
无线通信***146可以直接地或者经由通信网络来与一个或多个设备无线通信。例如,无线通信***146可使用3G蜂窝通信,例如CDMA、EVD0、GSM/GPRS,或者4G蜂窝通信,例如LTE。或者5G蜂窝通信。无线通信***146可利用WiFi与无线局域网(wirelesslocalareanetwork,WLAN)通信。在一些实施例中,无线通信***146可利用红外链路、蓝牙或ZigBee与设备直接通信。其他无线协议,例如各种车辆通信***,例如,无线通信***146可包括一个或多个专用短程通信(dedicatedshortrangecommunications,DSRC)设备,这些设备可包括车辆和/或路边台站之间的公共和/或私有数据通信。
电源110可向车辆100的各种组件提供电力。在一个实施例中,电源110可以为可再充电锂离子或铅酸电池。这种电池的一个或多个电池组可被配置为电源为车辆100的各种组件提供电力。在一些实施例中,电源110和能量源119可一起实现,例如一些全电动车中那样。
车辆100的部分或所有功能受计算机***112控制。计算机***112可包括至少一个处理器113,处理器113执行存储在例如数据存储装置114这样的非暂态计算机可读介质中的指令115。计算机***112还可以是采用分布式方式控制车辆100的个体组件或子***的 多个计算设备。
处理器113可以是任何常规的处理器,诸如商业可获得的CPU。替选地,该处理器可以是诸如ASIC或其它基于硬件的处理器的专用设备。尽管图1功能性地图示了处理器、存储器、和在相同块中的计算机110的其它元件,但是本领域的普通技术人员应该理解该处理器、计算机、或存储器实际上可以包括可以或者可以不存储在相同的物理外壳内的多个处理器、计算机、或存储器。例如,存储器可以是硬盘驱动器或位于不同于计算机110的外壳内的其它存储介质。因此,对处理器或计算机的引用将被理解为包括对可以或者可以不并行操作的处理器或计算机或存储器的集合的引用。不同于使用单一的处理器来执行此处所描述的步骤,诸如转向组件和减速组件的一些组件每个都可以具有其自己的处理器,该处理器只执行与特定于组件的功能相关的计算。
在此处所描述的各个方面中,处理器可以位于远离该车辆并且与该车辆进行无线通信。在其它方面中,此处所描述的过程中的一些在布置于车辆内的处理器上执行而其它则由远程处理器执行,包括采取执行单一操纵的必要步骤。
在一些实施例中,数据存储装置114可包含指令115(例如,程序逻辑),指令115可被处理器113执行来执行车辆100的各种功能,包括以上描述的那些功能。数据存储装置114也可包含额外的指令,包括向行进***102、传感***104、控制***106和***设备108中的一个或多个发送数据、从其接收数据、与其交互和/或对其进行控制的指令。
除了指令115以外,数据存储装置114还可存储数据,例如道路地图、路线信息,车辆的位置、方向、速度以及其它这样的车辆数据,以及其他信息。这种信息可在车辆100在自主、半自主和/或手动模式中操作期间被车辆100和计算机***112使用。
用户接口116,用于向车辆100的用户提供信息或从其接收信息。可选地,用户接口116可包括在***设备108的集合内的一个或多个输入/输出设备,例如无线通信***146、车车在电脑148、麦克风150和扬声器152。
计算机***112可基于从各种子***(例如,行进***102、传感***104和控制***106)以及从用户接口116接收的输入来控制车辆100的功能。例如,计算机***112可利用来自控制***106的输入以便控制转向单元132来避免由传感***104和障碍规避***144检测到的障碍物。在一些实施例中,计算机***112可操作来对车辆100及其子***的许多方面提供控制。
可选地,上述这些组件中的一个或多个可与车辆100分开安装或关联。例如,数据存储装置114可以部分或完全地与车辆100分开存在。上述组件可以按有线和/或无线方式来通信地耦合在一起。
可选地,上述组件只是一个示例,实际应用中,上述各个模块中的组件有可能根据实际需要增添或者删除,图1不应理解为对本发明实施例的限制。
在道路行进的自动驾驶汽车,如上面的车辆100,可以识别其周围环境内的物体以确定对当前速度的调整。该物体可以是其它车辆、交通控制设备、或者其它类型的物体。在一些示例中,可以独立地考虑每个识别的物体,并且基于物体的各自的特性,诸如它的当前速度、加速度、与车辆的间距等,可以用来确定自动驾驶汽车所要调整的速度。
可选地,自动驾驶汽车车辆100或者与自动驾驶车辆100相关联的计算设备(如图1 的计算机***112、计算机视觉***140、数据存储装置114)可以基于所识别的物体的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测该识别的物体的行为。可选地,每一个所识别的物体都依赖于彼此的行为,因此还可以将所识别的所有物体全部一起考虑来预测单个识别的物体的行为。车辆100能够基于预测的该识别的物体的行为来调整它的速度。换句话说,自动驾驶汽车能够基于所预测的物体的行为来确定车辆将需要调整到(例如,加速、减速、或者停止)什么稳定状态。在这个过程中,也可以考虑其它因素来确定车辆100的速度,诸如,车辆100在行驶的道路中的横向位置、道路的曲率、静态和动态物体的接近度等等。
除了提供调整自动驾驶汽车的速度的指令之外,计算设备还可以提供修改车辆100的转向角的指令,以使得自动驾驶汽车遵循给定的轨迹和/或维持与自动驾驶汽车附近的物体(例如,道路上的相邻车道中的轿车)的安全横向和纵向距离。
进一步地,计算机***112中的数据存储装置114可以包括***内存,运行在***内存的数据可以包括计算机的操作***和应用程序APP等。计算机***112中的处理器113可以通过***总线与该***内存连接,读取该***内存的数据并进行处理。
操作***包括Shell和内核(kernel)。Shell是介于使用者和操作***之内核(kernel)间的一个接口。shell是操作***最外面的一层。shell管理使用者与操作***之间的交互:等待使用者的输入,向操作***解释使用者的输入,并且处理各种各样的操作***的输出结果。
内核由操作***中用于管理存储器、文件、外设和***资源的那些部分组成。直接与硬件交互,操作***内核通常运行进程,并提供进程间的通信,提供CPU时间片管理、中断、内存管理、IO管理等等。
应用程序包括控制汽车自动驾驶相关的程序,比如,管理自动驾驶的汽车和路上障碍物交互的程序,控制自动驾驶汽车路线或者速度的程序,控制自动驾驶汽车和路上其他自动驾驶汽车交互的程序。应用程序也存在于软件部署服务器deploying server的***上。在一个实施例中,在需要执行应用程序时,计算机***112可以从deploying server下载应用程序。
那么,本申请实施例提供一种人工智能模型的处理方法可以具体应用在图1的计算机***112上,应用在该计算机***112的处理器113使用CNN模型或Mask RCNN模型等AI模型对摄像头获取的图像进行AI处理的场景,以识别的物体的特性和周围环境的状态(例如,交通、雨、道路上的冰、等等)来预测该识别的物体的行为,从而确定车辆将需要调整到(例如,加速、减速、或者停止)什么稳定状态。本申请实施提供的人工智能模型的处理装置、人工智能模型的处理设备,具体即可以相当于图1的计算机***112。
下面结合图2示出的本申请实施例提供的AI计算架构的结构示意图,说明计算机***112的处理器训练或推理AI的过程。AI计算架构可以相当于计算机***112中的处理器113,其可以具体包括主处理器(Host CPU)以及人工智能处理单元,其中:
Host CPU可以包括人工智能处理单元驱动(driver)、运行时单元或运行时层或用户态驱动层(Runtime)以及库(Library),也就是说Host CPU可以读取***内存或存储器中的上述数据。其中,人工智能处理单元driver可以提供人工智能处理单元的驱动功能。Runtime 可以提供人工智能处理单元的用户态接口(Application Programming Interface,API),部署在应用程序APP中。Library可以提供人工智能处理单元的运算逻辑单元上可以直接执行的算子库功能,方便APP开发业务功能。
人工智能处理单元也可以称为AI加速器,可以包括GPU或NPU等AI处理器,其中NPU可以为专用的或定制的神经网络处理器。人工智能处理单元可以包括控制单元(或控制器)和运算逻辑单元。
在一种实现方式中,针对人工智能处理单元的Runtime,可以提供模型(model)、数据流(stream)、任务(task)、事件(event)等API。上层业务(如APP)将AI计算图(即AI模型)进行分拆,转换成人工智能处理单元能够处理的stream、task、event等,通过调用这些API将AI模型下发给人工智能处理单元处理。人工智能处理单元的控制单元可以用于接收Host CPU下发的AI模型,调度AI模型进行训练或推理,给Host CPU上报执行结果。运算逻辑单元可以执行控制单元下发的AI模型中的task,给控制单元返回每个task的结果(一个AI模型可以包含多个task)。
为提高模型执行效率,APP将AI模型加载到人工智能处理单元,人工智能处理单元保存该AI模型。模型只需要加载一次,后续便可以多次执行,APP退出或业务结束时,先通知人工智能处理单元卸载之前加载的模型。在人工智能处理单元侧,AI模型也是按照stream、task、event类似的结构方式,保存了加载的AI模型。
针对存在控制算子(如分支判断算子)和循环算子的神经网络模型,例如MaskRCNN类型的网络模型以及循环神经网络(Recurrent Neural Network,RNN)类型的网络模型,AI加速器不支持分支判断算子和循环算子时,在模型训练或模型执行的过程中,部分计算需要回落到Host CPU上执行,降低模型的推理性能。
也就是说,针对AI模型中的某一个分支判断算子和循环算子,在运算逻辑单元执行本次算子后,并不能控制后面算子的执行,只能通过控制单元将数据返回给Host CPU侧,由Host CPU执行分支判断算子以及循环算子,以触发控制单元再次给运算逻辑单元调度算子或任务。以此与Host CPU多次交互,将部分计算需要回落到Host CPU上执行,来完成整个AI模型中的分支判断算子和循环算子。造成模型的推理或训练性能不高。
下面结合图3示出的本申请实施例提供的人工智能模型的处理设备的结构示意图,来说明本申请提供的AI计算架构如何提升模型推理或模型训练的性能。人工智能模型的处理设备30包括处理器300和人工智能处理单元301,其中:
处理器300用于创建人工智能AI模型;其中,该AI模型包括控制算子和计算算子;然后基于用户态接口API将该AI模型下发给人工智能处理单元301;该API包括第一API,该第一API用于下发该控制算子;该处理器300可相当于主处理器。
具体地,APP里设置有AI模型的程序代码,可用于对输入的数据进行AI处理。处理器300读取到APP的数据,并运行里面的关于AI模型的程序代码的过程,即为创建该AI模型。
本申请实施例中的AI模型包括控制算子,该控制算子可以为实现控制逻辑的代码或函数。本申请实施例中在Runtime层可设有用于下发该控制算子的API;处理器300可以通 过调用该Runtime层的API将该AI模型下发给人工智能处理单元301。
人工智能处理单元301用于在获取到该人工智能AI模型,在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
可理解的是,本申请实施例的处理器300可以包括自身的控制器和运算器等,用于解释计算机指令以及处理计算机软件中的数据。处理器300是人工智能模型的处理设备30的中核心硬件单元,主要负责计算和整体协调,包括对人工智能模型的处理设备30的所有硬件资源(如存储器、输入输出单元以及本申请实施例的人工智能处理单元301)进行控制调配以及执行通用运算。
本申请实施例的人工智能处理单元301实际上可以为处理器或处理芯片,例如为专用或定制的GPU或NPU等,可以作为协处理器挂载到处理器300上,或与处理器300耦合,由处理器300分配任务。
以NPU为例,人工智能处理单元301的核心部件为运算逻辑单元,通过控制单元控制该运算逻辑单元提取矩阵数据并进行运算。运算逻辑单元内部可以包括多个处理单元(Process Engine,PE)。在一些实现中,运算逻辑单元是二维脉动阵列。运算逻辑单元还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算逻辑单元是通用的矩阵处理器。人工智能处理单元301还可以包括统一存储器、存储单元访问控制器(Direct Memory Access Controller,DMAC)、权重存储器、总线接口单元、向量计算单元、取指存储器(Instruction Fetch Buffer)等。其中:
该统一存储器可以用于存放输入数据以及输出数据。权重数据可以直接通过DMAC被搬运到权重存储器中。输入数据也可以通过DMAC被搬运到统一存储器。
总线接口单元可以用于AXI总线与DMAC和取指存储器的交互。具体可以用于取指存储器从外部存储器获取指令,以及用于DMAC从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器中的输入数据搬运到统一存储器或将权重数据搬运到权重存储器中或将输入数据数据搬运到输入存储器中。
向量计算单元可以包括多个运算处理单元,在需要的情况下,对运算逻辑单元的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/FC层网络计算,如Pooling(池化),Batch Normalization(批归一化),Local Response Normalization(局部响应归一化)等。
在一些实现种,向量计算单元能将经处理的输出的向量存储到统一缓存器。例如,向量计算单元可以将非线性函数应用到运算逻辑单元的输出,例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元生成归一化的值、合并值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算逻辑单元的激活输入,例如用于在神经网络中的后续层中的使用。
控制单元连接的取指存储器可以用于存储控制单元使用或执行的指令。
本申请实施例,通过设置用于下发控制算子的API,基于该API即可实现向人工智能处理单元下发包括控制算子的AI模型,以使该人工智能处理单元在训练或推理该AI模型的过程中,可独立执行该整个AI模型,无需将部分控制功能返回主处理器处理。解决了现 有技术中在对AI模型进行训练或推理的过程中,需要主处理器和AI加速器相互之间进行频繁的交互才能完成,造成模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
在一种可能的实现方式中,本申请实施例的控制算子可以包括分支判断算子,该分支判断算子用于判断执行第一分支算子或第二分支算子。
在一种可能的实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行所述AI模型的第一计算算子;该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在所述AI模型中的位置。
也就是说,用于下发该控制算子的API(即第一API)可以包括下发分支判断算子的API以及下发循环算子的API。具体地:
处理器300可以在Runtime层创建或增加4个用于下发AI模型的API:
1、第二API,创建标签(CreateLabel):用于跳转时的定位;
2、第三API,标签设置(LabelSet)(涉及label,stream):在stream的当前位置放置label;也就是说针对放置有label的任务,设置其在执行序列或数据流中的位置;
3、分支判断(Switch)(涉及value,condition,false_label,stream):将条件寄存器中的数据(或值)与value条件比较,如果为false,则跳转到false_label后的task执行。条件支持“>”、“<”、“==”、“!=”、“<=”、“>=”等无符号整形数值比较;
4、循环(Goto)(涉及label,stream):无条件跳转到label后的task执行。
那么在开发应用程序APP的过程中,开发人员可基于该Runtime层提供的上述API进行开发,根据开发需求将AI计算图进行分拆,转换成人工智能处理单元301能够处理的stream、task、event等。然后通过调用上述相应的API,将控制算子(如分支判断算子和循环算子)下发到人工智能处理单元301执行。
本申请实施例,还可以设置用于下发循环算子的API,那么向人工智能处理单元下发的AI模型,可以使得该人工智能处理单元在训练或推理该AI模型的过程中,可独立完成该循环算子的执行。并且该人工智能处理单元在执行循环算子的过程中,可通过用于创建标签的API以及用于设置标签在该AI模型中的位置的API进行相关跳转,从而可进一步快速高效地完成循环算子的执行。
进一步地,下面结合图4示出的本申请提供的另一实施例的人工智能模型的处理设备的结构示意图,说明本申请提供的人工智能处理单元的其中一种具体结构,以及如何提升模型推理或模型训练的性能。
图4的人工智能模型的处理设备40可以包括控制单元(或控制器)400和运算逻辑单元402。如上述所描述,人工智能模型的处理设备40还可以包括其他单元或模块,如统一存储器、DMAC、权重存储器、总线接口单元等等,只是本实施例图4中没有画出。如图4所示,人工智能模型的处理设备40往往可以包括多个运算逻辑单元402。控制单元400可以包括多个第一存储单元(可称为第一寄存器或条件寄存器(Condition,COND)),控制单元400在训练或推理AI模型的过程中,一个第一存储单元可以对应该AI模型的一个执行序列,同步并行的不同执行序列对应不同的第一存储单元;运算逻辑单元402包括第 二存储单元(可称为第二寄存器或专用条件寄存器(Condition Special Purpose Register,COND_SPR))。其中,控制单元400用于调度AI模型执行,也就是将模型的算子或任务调度到运算逻辑单元402执行(如计算类型算子),或控制单元400自己执行(如event类型的任务)。运算逻辑单元执行单个的计算类型算子。具体地:
控制单元400用于获取或读取AI模型,该AI模型包括控制算子和计算算子;AI模型为主处理器基于API下发的AI模型;该API包括第一API,该第一API用于下发该控制算子;
第二存储单元用于存储运算逻辑单元402执行计算算子后的数据。
控制单元400还用于读取该第二存储单元的数据。其中,运算逻辑单元402执行该任务后,将执行后的数据写入自身的第二存储单元,并可以通知控制单元400执行完毕,以触发控制单元400读取该第二存储单元的数据。控制单元400读取到该第二存储单元的数据后,将该数据写入该执行序列对应的第一存储单元中。然后控制单元400可以基于第一存储单元中存储的该数据以及控制算子中的参数来执行该执行序列的下一任务;
本申请实施例的控制算子为控制单元400需要依据从该402运算逻辑单元读取的数据来执行的控制任务。在一种可能的实现方式中,控制算子可以包括分支判断算子,该分支判断算子用于判断执行第一分支算子或第二分支算子;
那么控制单元400在基于该数据和该控制算子中的参数执行该执行序列的下一任务时,可以具体地,基于该数据和该分支判断算子中的参数判断是否执行第一分支算子;
由于该分支判断算子是用于判断执行第一分支算子还是执行第二分支算子,故若判断为是,则执行该第一分支算子;若判断为否,则执行该第二分支算子。
进一步地,本申请实施例的AI模型还可以包括循环算子,该循环算子用于循环执行所述AI模型的第一计算算子。控制单元400在判断出执行第一分支算子后,可执行该循环算子;以调度该运算逻辑单元402循环执行该执行序列的第一计算算子,直到控制单元400基于该数据和该分支判断算子中的参数判断不执行第一分支算子。
需要说明的是,控制单元400在执行该分支判断算子的时候,往往是基于该数据和该分支判断算子中的参数,通过具体的判断逻辑或判断条件来判断是真(ture)还是假(False),例如可以在判断为ture时,则执行该第一分支算子,在判断为False时,则执行该第二分支算子;也可以在判断为False时,则执行该第一分支算子,在判断为ture时,则执行该第二分支算子。而本申请实施例通过判断是否执行第一分支算子,来对应判断是真(ture)还是假(False)。由于在判断执行第一分支算子后,会执行该循环算子,也就是说,本申请实施例可以是在执行该分支判断算子的时候,基于该数据和该分支判断算子中的参数通过具体的判断逻辑或判断条件判断为ture时,执行该循环算子,也可以是判断为False时,执行该循环算子。
例如,在该AI模型的执行序列中,在控制单元400基于该数据和该分支判断算子中的参数判断是否执行第一分支算子之前,该运算逻辑单元402被该控制单元400调用执行某个计算算子(如本申请的第一计算算子)。那么控制单元400在执行该循环算子,可触发控制单元400调用运算逻辑单元402再次执行该某个计算算子,从而跳转到该第一计算算子,以进行循环执行。直到控制单元400基于该数据和该分支判断算子中的参数判断不执行第 一分支算子。
在一种可能的实现方式中,上述在执行分支判断算子和循环算子的过程中,具体地可以通过以下方式实现:
本申请实施例的AI模型还可以包括用于跳转的第一标签和第二标签,及其分别在该执行序列中放置的位置;其中该第一标签放置在与该执行序列的第一计算算子相邻的上一个任务中,该第二标签放置在与该第二分支算子相邻的上一个任务中;
假设在该AI模型的执行序列中,控制单元400在基于该数据和该控制算子中的参数执行该执行序列的下一任务之前,包括控制单元400将该第一计算算子调度给该运算逻辑单元执行的任务;
那么控制单元400在判断出执行第一分支算子后,可执行该循环算子,跳转到该第一标签所在的位置,那么下一个任务即回到该执行序列的第一计算算子,即该控制单元400调度该运算逻辑单元402再次执行该第一计算算子,以进行迭代;
控制单元400在执行第二分支算子时,即跳转到该第二标签所在的位置,那么下一个任务即为该第二分支算子,以执行该第二分支算子。
控制单元400还用于在执行完毕该执行序列的情况下,输出执行完毕的指示信息。
本申请实施例,通过在控制单元中设置多个第一存储单元,以及在运算逻辑单元中设置第二存储单元,运算逻辑单元可以将执行任务后的数据写入该第二存储单元,那么控制单元可以读取该第二存储单元的数据,并将数据写入执行序列对应的第一存储单元。也就是说,控制单元可以读取到当前算子执行后的数据,实现了根据当前算子的执行结果控制后面算子的执行。实现了整个AI模型都在控制单元和运算逻辑单元内执行,无需将部分控制功能返回主处理器处理;解决了现有技术中在对AI模型进行训练或推理的过程中,需要主处理器和AI加速器相互之间进行频繁的交互才能完成,造成模型推理或模型训练的性能不高的技术问题;提升了模型推理或模型训练的性能。
下面结合图5来进一步说明主处理器和人工智能处理单元的结构,如图5示出的本申请提供的另一实施例的人工智能模型的处理设备的结构示意图,该人工智能模型的处理设备50包括人工智能处理单元,该人工智能处理单元可以包括控制单元500和运算逻辑单元502。在一种实现方式中,人工智能模型的处理设备50还可以包括主处理器504。控制单元500与主处理器504耦合。
可以按照需求为控制单元500定制或设置一定数量的第一存储单元。该第一存储单元可以只允许控制单元500进行读和写。在一种可能的实现方式中,控制单元在训练或推理人工智能AI模型的过程中,一个该第一存储单元对应该AI模型的一个执行序列,同步并行的不同执行序列对应不同的第一存储单元。
如图5中,列举了3个AI模型为例:AI模型1可以对应有2个执行序列,该2个执行序列分别对应第一存储单元0和第一存储单元1。也就是说,执行序列0(包括任务01、任务02等)对应第一存储单元0;执行序列1(包括任务11、任务12等)对应第一存储单元1。其中,执行序列0对应多少个任务,以及执行序列1对应多少个任务,可以由AI 模型1根据需求进行编码设置。
类似的,AI模型2可以对应有1个执行序列,该1个执行序列分别对应第一存储单元2。AI模型3可以对应有3个执行序列,该3个执行序列分别对应第一存储单元3、第一存储单元4和第一存储单元5。
每个运算逻辑单元502可以包括各自的第二存储单元。运算逻辑单元502可以将执行任务后的数据或运算结果写入自身的第二存储单元;控制单元500可以访问到该第二存储单元,并读取第二存储单元中的数据。
可理解的是,本申请实施例不限于3个AI模型,也不限于图5中6个执行序列。本申请实施例的处理装置可以根据实际场景需求设置控制单元中第一存储单元的数量,例如设置有50个第一存储单元,那么该控制单元可以同时并行处理50个执行序列,该50个执行序列可以为一个或多个AI模型的执行序列。
假设当前三个应用程序(应用程序1、应用程序2和应用程序3)都在运行,需要加载10个AI模型进行训练或推理。该10个AI模型总共有100个执行序列,那么可以先将前50个执行序列下发到控制单元进行调度执行,在执行完毕并返回结果后,可以依次下发并执行后续的执行序列,直到所有执行序列都执行完毕。
在其中一种实现方式中,运算逻辑单元502中的第二存储单元只会存储上一次执行完计算算子后的数据,在后续执行完新的计算算子后,会将最新的执行完计算算子后的数据刷新存储在该第二存储单元中。同样地,控制单元也会将读取到的新的数据刷新写入其第一存储单元中,即控制单元中的第一存储单元也可以只会存储一次写入的数据,当新一次数据写入时,会就旧的数据删除,只存储新写入的数据,以刷新里面存储的数据。
本申请实施例中的人工智能模型的处理设备50可以包括但不限于手机(mobilephone)、平板电脑、笔记本电脑、掌上电脑、移动互联网设备(mobile internet device,MID)、可穿戴设备,虚拟现实(virtual reality,VR)设备、增强现实(augmented reality,AR)设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程手术(remote medical surgery)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端、智能机器人、车载***或含有驾驶舱域控制器的车辆等。
可理解的是,人工智能模型的处理设备50还可以包括其他处理器、存储模块、通信模块、显示屏、电池、电池管理模块、基于不同功能的多个传感器等单元中的至少一个。只是图5中的人工智能模型的处理设备50没有画出。
下面结合图6示出的本申请实施例提供的AI模型的循环语句的示意图,举例进行说明。假设model1(即AI模型1)的循环语句的执行逻辑是:
i初始值为0,当i<10时,循环执行i+1操作。图6中计算图入口为输入(Enter)算子,Enter透传输入张量0,融合(Merge)算子收到0后启动执行,将0传递给Switch算子,同时输出0和10进行比较,比较结果(true)作为Switch的控制输入(p),Switch算子将输入0转发到true分支,增加(add)算子收到输入0后启动0+1计算,计算结果1输出给循环或迭代(NextIteration)算子,NextIteration算子将输入1传递给Merge,如此循环, 直至Merge的输出>=10,Switch算子将输入传递给false分支,退出(Exit)循环。
那么结合上述在Runtime层创建或增加的4个API,即包括第一API(生成分支判断算子的API和生成循环算子的API)、第二API和第三API,编码可以如下:
Create model1
Create stream1
Create label1
Create label2
Model1.add(stream 1)
Launch(stream1,Enter,…)
LabelSet(label1,stream1)
Launch(stream1,Merge,…)
Switch(10,“<”,label2,stream1)
Launch(stream1,Add,…)
Goto(label1,stream1)
LabelSet(label2,stream1)
Launch(stream1,Exit,…)
也就是说,创建了用于跳转的第一标签label1和第二标签label2,以及第一标签label1和第二标签label2分布在数据流stream1中的位置。label1的位置(也即放置了label1这个标签的执行任务的位置)与Merge算子相邻,在该Merge算子之前。label2的位置(也即放置了label2这个标签的执行任务的位置)与Exit算子相邻,在该Exit算子之前。可对应参考图7示出的本申请实施例提供的AI模型中stream1的执行序列的示意图,关于AI模型中stream1的执行序列可包括8个任务,例如,如图中Enter即对应任务1、label1即对应任务2、Merge即对应任务3、Switch即对应任务4、add即对应任务5、Goto即对应任务6、label2即对应任务7、Exit即对应任务8。该stream1的执行序列对应有图7中的第一存储单元。
应用程序(例如APP1)将上述编码通过Runtime层,经过NPU driver下发给控制单元500后,也就是说将需要训练或推理的model1下发给控制单元500后,触发控制单元500在调度AI模型的执行序列的计算算子给运算逻辑单元502执行。控制单元500在执行循环Goto算子时,基于label1跳转到label1在AI模型的执行序列所在的位置(即放置了label1这个标签的执行任务的位置),触发运算逻辑单元502迭代执行该执行序列的任务。控制单元500执行分支判断算子(Switch),判断为否时,即基于label2跳转到第二标签label2在AI模型的执行序列所在的位置(即放置了label2这个标签的执行任务的位置),然后调度退出任务(即Exit算子)给运算逻辑单元502;在接收到运算逻辑单元502执行完该退出任务的通知后,输出执行完毕的指示信息。
具体结合图8示出的本申请实施例提供的控制单元执行流程的示意图,以及图9示出的本申请实施例提供的人工智能处理单元执行的原理示意图,说明控制单元的执行流程:
步骤S800:控制单元将Enter task调度给运算逻辑单元执行;
步骤S802:运算逻辑单元执行该Enter task;
步骤S804:运算逻辑单元执行完毕后,通知控制单元执行完成;
具体地,运算逻辑单元执行完后,可将执行任务后的结果或数据存储在其第二存储单元中。
步骤S806:控制单元接收到通知后,针对该第一标签的任务或算子(Label1task),可直接跳过,执行后面的task;
具体地,控制单元获知运算逻辑单元执行完毕Enter task后,可以读取该第二存储单元中的数据,并将数据保存(或写入)到该执行序列对应的第一存储单元COND中;并且执行Label1task,针对该Label1task,控制单元是直接跳过的,即执行步骤S808。
步骤S808:控制单元将Merge task调度给运算逻辑单元执行;本申请实施例中,该Merge task即可以相当于第一计算算子。
步骤S810:运算逻辑单元执行该Merge task,并将i写入到自身的第二存储单元COND_SPR中;
具体地,运算逻辑单元第一次执行该Merge task时,i值是由该Enter task直接透传给该Merge task的。在后续迭代执行该Merge task时,i值是由循环算子传递给Merge task。
步骤S812:运算逻辑单元执行完成后,通知控制单元执行完成;
步骤S814:控制单元接收到通知后,读取该COND_SPR的值(也就是i),并将读取的数据写入到stream1对应的第一存储单元COND中;
具体地,控制单元获知运算逻辑单元执行完毕Merge task后,可以读取该第二存储单元中的数据(也就是i),并将数据写入该执行序列对应的第一存储单元COND中。可理解的是,若该第一存储单元COND已存储有数据,那么本次写入相当于刷新第一存储单元COND中的数据。
步骤S816:控制单元执行Switch task,根据读取的数据以及Switch task中的参数进行判断处理(例如,将i和value进行比较,判断是否i小于value),如果是true,继续执行下面的task(即执行步骤S818),如果是false,跳转到label2(即跳转到步骤S824);
具体地,例如value为10,那么若判断当前的i小于10,则执行步骤S818;若不小于10,则跳转到步骤S824。
步骤S818:控制单元将增加任务Add task(相当于下一任务,例如第一分支算子)调度给运算逻辑单元执行;也就是说,本申请实施例是举例在步骤S816中判断为ture时,才执行后续步骤S822的循环算子。实际上如上述图4实施例中所描述,本申请实施例并不限定。
步骤S820:运算逻辑单元执行完成,通知控制单元;
例如,运算逻辑单元可以执行Add task(第一分支算子),例如将i值加1,将当前i值存储在自身的第二存储单元中,并在执行完毕后通知控制单元。
步骤S822:控制单元执行循环算子(Goto task),然后无条件跳转到label1开始执行(即跳转到步骤S806开始执行,并接着再次触发执行第一计算算子Merge task);
具体地,控制单元获知运算逻辑单元执行完毕Goto task后,可以读取该第二存储单元 中的数据(也就是当前i值),并将数据写入该执行序列对应的第一存储单元COND中。然后在执行该Goto task时将当前第一存储单元COND中的数据(即当前i值),传递给Merge task。本申请实施例的控制单元支持从逻辑运算单元的第二存储单元中读取数据,从而可在控制单元内部来完成分支判断算子的执行,无需与Host CPU反复交互。
步骤S824:控制单元对于Lable2 task(可以相当于另一个下一任务,例如第二分支算子),可跳过Lable2 task,调度Exit task给运算逻辑单元执行;
步骤S826:运算逻辑单元执行完成,通知控制单元;
步骤S828:控制单元接收到通知后,判断执行序列完成,然后输出执行完毕的指示信息。
具体的,可以向该NPU driver输出执行完毕的指示信息。
在一种可能的实现方式中,本申请实施例的控制单元500,还可以用于在读取第二存储单元的数据之前,将运算逻辑单元502的第二存储单元设置为无效值;然后控制单元500后续在判断出读取的第二存储单元的数据为有效值的情况下,再将该读取的数据写入该执行序列对应的第一存储单元中。若判断为无效值,则不将该读取的数据写入该执行序列对应的第一存储单元中。
可理解的是,上述图6-图8的实施例描述仅仅是本申请的一个实施例,本申请的第一分支算子不限定为上述的Add task,第二分支算子不限定为Lable2 task或Exit task。
相应的,本申请还提供了人工智能模型的处理装置以及人工智能模型的处理方法,下面将结合图10至图13进行说明。
如图10示出的本申请实施例提供的一种人工智能模型的处理装置的结构示意图,人工智能模型的处理装置16可以为图3实施例中人工智能模型的处理设备30的处理器300或图5实施例中人工智能模型的处理设备50的主处理器504。其中,人工智能模型的处理装置16可以包括创建单元160和下发单元162,其中:
创建单元160用于创建人工智能AI模型;该AI模型包括控制算子和计算算子;
具体地,该创建单元160可以相当于处理器300或主处理器504中执行的用于创建人工智能AI模型的程序代码。
下发单元162用于基于用户态接口API将该AI模型下发给人工智能处理单元;其中,该API包括第一API,该第一API用于下发该控制算子;该人工智能处理单元用于在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
具体地,该下发单元162可以相当于在处理器300或主处理器504中执行的用于基于用户态接口API将该AI模型下发给人工智能处理单元的程序代码。
在其中一种实现方式中,该控制算子包括分支判断算子,该分支判断算子用于判断执行第一分支算子或第二分支算子。
在其中一种实现方式中,该控制算子还包括循环算子,该循环算子用于循环执行该AI模型的第一计算算子;该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在该AI模型中的位置。
该创建单元160和下发单元162的具体实现方式可以参考上述图3至图8实施例中关 于处理器300或主处理器504进行AI模型处理的过程,这里不再赘述。
如图11示出的本申请提供的另一实施例的人工智能模型的处理装置的结构示意图,人工智能模型的处理装置17可以为图3实施例中人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元。其中,人工智能模型的处理装置17可以包括获取单元170和执行算子单元172,其中:
获取单元170用于获取人工智能AI模型;该AI模型包括控制算子和计算算子,该AI模型为处理器基于用户态接口API下发的AI模型;该API包括第一API,该第一API用于下发该控制算子;
执行算子单元172用于在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
具体地,该获取单元170可以相当于在人工智能处理单元301或人工智能模型的处理设备40或人工智能模型的处理设备50的人工智能处理单元中的控制单元中执行的用于获取人工智能AI模型的程序代码。
该执行算子单元172可以相当于在人工智能处理单元301或人工智能模型的处理设备40或人工智能模型的处理设备50的人工智能处理单元中的控制单元和运算逻辑单元中共同或相互配合执行的用于在训练或推理该AI模型的过程中执行该控制算子的程序代码。
在其中一种实现方式中,执行算子单元172可以包括第一执行单元1720、存储处理单元1721和第二执行单元1722,其中:
第一执行单元1720用于在训练或推理该AI模型的过程中,通过该人工智能处理单元的运算逻辑单元执行该AI模型中的计算算子;也就是说,该第一执行单元1720可以相当于在运算逻辑单元上用于执行该AI模型中的计算算子的程序代码。
存储处理单元1721用于将执行该计算算子后的数据存储在该人工智能处理单元的存储单元中;具体地,该存储处理单元1721可以相当于在运算逻辑单元上用于将执行该计算算子后的数据存储在该人工智能处理单元的存储单元中的程序代码。
第二执行单元1722用于通过该人工智能处理单元的控制单元,基于该存储单元中的数据执行该控制算子。具体地,该第二执行单元1722可以相当于在控制单元上用于基于该存储单元中的数据执行该控制算子的程序代码。
在其中一种实现方式中,人工智能处理单元中的存储单元可以包括第一存储单元和第二存储单元;
那么存储处理单元1721可以具体用于:将执行该计算算子后的数据存储在该第二存储单元中;
第二执行单元1722可以具体包括:
第一读取单元,用于读取该第二存储单元中的数据;
第一写入单元,用于将该第二存储单元中的数据写入该第一存储单元;
读取执行单元,用于读取并基于该第一存储单元中的数据执行该控制算子。
在其中一种实现方式中,该读取执行单元可以具体包括:
第二读取单元,用于读取该第一存储单元中的数据;
判断单元,用于基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子;
判断处理单元,用于若该判断单元判断为是,则执行该第一分支算子;若该判断单元判断为否,则执行该第二分支算子。
在其中一种实现方式中,该控制算子还可以包括循环算子,该循环算子用于循环执行该AI模型的第一计算算子;该人工智能模型的处理装置17还可以包括第三执行单元174,用于在该判断处理单元执行该第一分支算子之后,执行该循环算子,以通过该运算逻辑单元循环执行该第一计算算子,直到该判断为否。具体地,该第三执行单元174可以相当于在控制单元上用于执行该循环算子的程序代码。
在其中一种实现方式中,该API还包括第二API和第三API,该第二API用于创建标签;该第三API用于设置标签在所述AI模型中的位置;该AI模型包括用于跳转的第一标签和第二标签;其中该第一标签放置在与该AI模型的第一计算算子相邻的上一个算子中,该第二标签放置在与该第二分支算子相邻的上一个算子中;该人工智能模型的处理装置17还可以包括第四执行单元176,用于在该读取执行单元读取并基于该第一存储单元中的数据执行该控制算子之前,通过该运算逻辑单元执行该第一计算算子;也就是说,该第四执行单元176可以相当于在运算逻辑单元上用于执行该第一计算算子的程序代码。
该第三执行单元174具体用于:在该判断处理单元执行该第一分支算子之后,执行该循环算子,跳转到该第一标签所在的位置,以通过该运算逻辑单元循环执行该第一计算算子;
若该判断单元判断为否,该判断处理单元具体用于,跳转到该第二标签所在的位置,以执行该第二分支算子。
在其中一种实现方式中,该人工智能模型的处理装置17还可以包括设置单元178,用于在该第一读取单元读取该第二存储单元中的数据之前,将该第二存储单元设置为无效值;具体地,该设置单元178可以相当于在控制单元上用于将该第二存储单元设置为无效值的程序代码。
该第一写入单元具体用于:在判断出读取的该第二存储单元的数据为有效值的情况下,将该第二存储单元的数据写入该第一存储单元中。
需要说明的是,该人工智能模型的处理装置17的具体实现方式可以参考上述图3至图8实施例中关于人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元进行AI模型处理的过程,这里不再赘述。
如图12示出的本申请实施例提供的一种人工智能模型的处理方法的流程示意图,应用于图3实施例中人工智能模型的处理设备30的处理器300或图5实施例中人工智能模型的处理设备50的主处理器504,可以包括以下步骤:
步骤S120:处理器(或称为主处理器)创建人工智能AI模型;该AI模型包括控制算子和计算算子;
步骤S122:基于用户态接口API将该AI模型下发给人工智能处理单元。
其中,该API包括第一API,该第一API用于下发该控制算子;该人工智能处理单元用于在训练或推理该AI模型的过程中,执行该控制算子和该计算算子。
本实施例该人工智能模型的处理方法的具体实现方式可以参考上述图3至图8实施例中关于处理器300或主处理器504进行AI模型处理的过程,这里不再赘述。
如图13示出的本申请提供的另一种实施例的人工智能模型的处理方法的流程示意图,应用于图3实施例中人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元,该人工智能处理单元可以包括控制单元、运算逻辑单元以及存储单元,该人工智能处理单元可以执行以下步骤:
步骤S130:读取人工智能AI模型;该AI模型包括控制算子和计算算子,该AI模型为处理器基于用户态接口API下发的AI模型;该API包括第一API,该第一API用于下发该控制算子;
步骤S132:执行该计算算子,将执行该计算算子后的数据存储在该存储单元中;基于该存储单元中的数据执行该控制算子。
其中,该人工智能处理单元在训练或推理该AI模型的过程中,可以通过该运算逻辑单元执行该AI模型中的计算算子,该运算逻辑单元将执行该计算算子后的数据存储在该存储单元中;然后该控制单元可以基于该存储单元中的数据执行该控制算子。
在一种可能的实现方式中,该存储单元可以包括第一存储单元和第二存储单元;那么该将执行该计算算子后的数据存储在该存储单元中可以包括:将执行该计算算子后的数据存储在该第二存储单元中;
该基于该存储单元中的数据执行该控制算子课可以包括:读取该第二存储单元中的数据,将该第二存储单元中的数据写入该第一存储单元;读取并基于该第一存储单元中的数据执行该控制算子。
在一种可能的实现方式中,该第一存储单元可集成在控制单元中,也就是说可以在控制单元中增加该第一存储单元,该第一存储单元可以为该控制单元的专用寄存器。该第二存储单元可集成在运算逻辑单元中,也就是说可以在运算逻辑单元中增加该第二存储单元,该第二存储单元可以为该运算逻辑单元的专用寄存器。可以进一步实现控制单元快速高效地读取到运算逻辑单元执行完算子或任务后的数据,从而可根据当前算子的执行结果控制后面算子的执行。实现了整个AI模型都在控制器和运算逻辑单元内执行,无需将部分控制功能返回主处理器处理。
其中,人工智能处理单元中每个运算逻辑单元都可增加第二存储单元(自身专用的寄存器),使得每一个运算逻辑单元都可用于配合控制单元来执行控制算子,从而可再进一步提高人工智能处理单元模型推理或模型训练的性能。
在一种可能的实现方式中,AI模型对应有至少一个执行序列,每个该第一存储单元对应不同的执行序列。
本申请实施例中,处理器可以根据处理AI模型的数量,以及每个AI模型对应的执行 序列的数量,实现定制化地设置控制单元中第一存储单元的数量。
在一种可能的实现方式中,该控制算子可以包括分支判断算子和循环算子,其中该分支判断算子用于判断执行第一分支算子或第二分支算子;
那么该读取并基于该第一存储单元中的数据执行该控制算子,可以包括:读取该第一存储单元中的数据;基于该第一存储单元中数据和该分支判断算子中的参数判断是否执行该第一分支算子;若判断为是,则执行该第一分支算子;若判断为否,则执行该第二分支算子。
在一种可能的实现方式中,该控制算子还可以包括循环算子,那么在该执行该第一分支算子之后,还可以包括:执行该循环算子,以通过该运算逻辑单元迭代执行该AI模型的计算算子,直到该判断为否。
在一种可能的实现方式中,该AI模型可以包括用于跳转的第一标签和第二标签;其中该第一标签放置在与该AI模型的第一计算算子相邻的上一个算子中,该第二标签放置在与该第二分支算子相邻的上一个算子中;那么在该读取并基于该第一存储单元中的数据执行该控制算子之前,还可以包括:通过该运算逻辑单元执行该第一计算算子;
该执行该循环算子,以通过该运算逻辑单元迭代执行该AI模型的计算算子,可以包括:执行该循环算子,跳转到该第一标签所在的位置,以通过该运算逻辑单元迭代执行该AI模型的该第一计算算子;该若判断为否,则执行该第二分支算子,包括:若判断为否,则跳转到该第二标签所在的位置,以执行该第二分支算子。
在一种可能的实现方式中,在该读取该第二存储单元中的数据之前,还可以包括:将该第二存储单元设置为无效值;那么该将该第二存储单元中的数据写入该第一存储单元,可以包括:在判断出读取的该第二存储单元的数据为有效值的情况下,将该第二存储单元的数据写入该第一存储单元中。
本实施例该人工智能模型的处理方法的具体实现方式可以参考上述图3至图8实施例中关于人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元进行AI模型处理的过程,这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,其中,该计算机可读存储介质可存储有程序,该程序被本申请实施例的处理器执行时包括上述方法实施例中记载的任意一种人工智能模型的处理方法的部分或全部步骤。
例如,该程序被处理器执行时,可以创建人工智能AI模型;该AI模型包括控制算子;然后基于用户态接口API将该AI模型下发给人工智能处理单元。其中,该API包括用于下发该控制算子的API;该人工智能处理单元用于在训练或推理该AI模型的过程中,执行该控制算子。其具体实现方式可以参考上述图3至图8实施例中关于处理器300或主处理器504进行AI模型处理的过程,这里不再赘述。
又如,该程序被处理器执行时,可以获取或读取人工智能AI模型;该AI模型包括控制算子,该AI模型为处理器通过创建AI模型,基于用户态接口API下发的AI模型;该API包括用于下发该控制算子的API;然后在训练或推理该AI模型的过程中,执行该控制 算子。其具体实现方式可以参考上述图3至图8实施例中关于人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元进行AI模型处理的过程,这里不再赘述。
本申请实施例还提供一种计算机程序,该计算机程序包括指令,当该计算机程序被多核处理器执行时,使得该本申请实施例的处理器可以执行任意一种人工智能模型的处理方法的部分或全部步骤。
在一些实施例中,所公开的方法可以实施为以机器可读格式被编码在计算机可读存储介质上的或者被编码在其它非瞬时性介质或者制品上的计算机程序指令。图14示意性地示出根据这里展示的至少一些实施例而布置的示例计算机程序或计算机程序产品的概念性局部视图,该示例计算机程序产品包括用于在计算设备上执行计算机进程的计算机程序。在一个实施例中,示例计算机程序产品1400是使用信号承载介质1401来提供的。该信号承载介质1401可以包括一个或多个程序指令1402,其当被一个或多个处理器运行时可以提供以上针对图3至图8实施例中关于处理器300或主处理器504、或人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元描述的功能或者部分功能。
在一些示例中,信号承载介质1401可以包含计算机可读介质1403,诸如但不限于,硬盘驱动器、紧密盘(CD)、数字视频光盘(DVD)、数字磁带、存储器、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等等。在一些实施方式中,信号承载介质1401可以包含计算机可记录介质1404,诸如但不限于,存储器、读/写(R/W)CD、R/W DVD、等等。在一些实施方式中,信号承载介质1401可以包含通信介质1405,诸如但不限于,数字和/或模拟通信介质(例如,光纤电缆、波导、有线通信链路、无线通信链路、等等)。因此,例如,信号承载介质1401可以由无线形式的通信介质1405(例如,遵守IEEE 802.11标准或者其它传输协议的无线通信介质)来传达。一个或多个程序指令1402可以是,例如,计算机可执行指令或者逻辑实施指令。在一些示例中,诸如针对图3至图8实施例中关于处理器300或主处理器504、或人工智能模型的处理设备30的人工智能处理单元301、或图4的人工智能模型的处理设备40、或图5实施例的人工智能模型的处理设备50的人工智能处理单元可以被配置为,响应于通过计算机可读介质1403、计算机可记录介质1404、和/或通信介质1405中的一个或多个传达到计算设备的程序指令1402,提供各种操作、功能、或者动作。应该理解,这里描述的布置仅仅是用于示例的目的。因而,本领域技术人员将理解,其它布置和其它元素(例如,机器、接口、功能、顺序、和功能组等等)能够被取而代之地使用,并且一些元素可以根据所期望的结果而一并省略。另外,所描述的元素中的许多是可以被实现为离散的或者分布式的组件的、或者以任何适当的组合和位置来结合其它组件实施的功能实体。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的 动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务器或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。
以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。

Claims (23)

  1. 一种人工智能模型的处理方法,其特征在于,应用于人工智能处理单元,所述人工智能处理单元包括控制单元、运算逻辑单元以及存储单元;所述方法包括:
    获取人工智能AI模型;所述AI模型包括控制算子和计算算子,所述AI模型为处理器基于用户态接口API下发的AI模型;所述API包括第一API,所述第一API用于下发所述控制算子;
    执行所述计算算子,将执行所述计算算子后的数据存储在所述存储单元中;
    基于所述存储单元中的数据执行所述控制算子。
  2. 如权利要求1所述的方法,其特征在于,所述存储单元包括第一存储单元和第二存储单元;所述将执行所述计算算子后的数据存储在所述存储单元中包括:将执行所述计算算子后的数据存储在所述第二存储单元中;
    所述基于所述存储单元中的数据执行所述控制算子包括:通过所述控制单元读取所述第二存储单元中的数据,将所述第二存储单元中的数据写入所述第一存储单元;读取并基于所述第一存储单元中的数据执行所述控制算子。
  3. 如权利要求2所述的方法,其特征在于,所述控制算子包括分支判断算子和循环算子,其中所述分支判断算子用于判断执行第一分支算子或第二分支算子;
    所述读取并基于所述第一存储单元中的数据执行所述控制算子,包括:
    读取所述第一存储单元中的数据;
    基于所述第一存储单元中数据和所述分支判断算子中的参数判断是否执行所述第一分支算子;
    若判断为是,则执行所述第一分支算子;若判断为否,则执行所述第二分支算子。
  4. 如权利要求3所述的方法,其特征在于,所述控制算子还包括循环算子,所述循环算子用于循环执行所述AI模型的第一计算算子;所述执行所述第一分支算子之后,还包括:执行所述循环算子,以通过所述运算逻辑单元循环执行所述第一计算算子,直到所述判断为否。
  5. 如权利要求4所述的方法,其特征在于,所述API还包括第二API和第三API,所述第二API用于创建标签;所述第三API用于设置标签在所述AI模型中的位置;所述AI模型还包括用于跳转的第一标签和第二标签;其中所述第一标签放置在与所述第一计算算子相邻的上一个算子中,所述第二标签放置在与所述第二分支算子相邻的上一个算子中;
    所述读取并基于所述第一存储单元中的数据执行所述控制算子之前,还包括:通过所述运算逻辑单元执行所述第一计算算子;
    所述执行所述循环算子,以通过所述运算逻辑单元循环执行所述第一计算算子,包括:执行所述循环算子,跳转到所述第一标签所在的位置,以通过所述运算逻辑单元循环执行 所述第一计算算子;
    所述执行所述第二分支算子包括:跳转到所述第二标签所在的位置,以执行所述第二分支算子。
  6. 如权利要求2-5任一项所述的方法,其特征在于,所述读取所述第二存储单元中的数据之前,还包括:将所述第二存储单元设置为无效值;
    所述将所述第二存储单元中的数据写入所述第一存储单元,包括:在判断出读取的所述第二存储单元的数据为有效值的情况下,将所述第二存储单元的数据写入所述第一存储单元中。
  7. 一种人工智能模型的处理方法,其特征在于,所述方法包括:
    创建人工智能AI模型;所述AI模型包括控制算子和计算算子;
    基于用户态接口API将所述AI模型下发给人工智能处理单元;
    其中,所述API包括第一API,所述第一API用于下发所述控制算子;所述人工智能处理单元用于在训练或推理所述AI模型的过程中,执行所述控制算子和所述计算算子。
  8. 如权利要求7所述的方法,其特征在于,所述控制算子包括分支判断算子,所述分支判断算子用于判断执行第一分支算子或第二分支算子。
  9. 如权利要求8所述的方法,其特征在于,所述控制算子还包括循环算子,所述循环算子用于循环执行所述AI模型的第一计算算子;所述API还包括第二API和第三API,所述第二API用于创建标签;所述第三API用于设置标签在所述AI模型中的位置。
  10. 一种人工智能模型的处理装置,其特征在于,所述装置为人工智能处理单元,包括:
    获取单元,用于获取人工智能AI模型;所述AI模型包括控制算子和计算算子,所述AI模型为处理器基于用户态接口API下发的AI模型;所述API包括第一API,所述第一API用于下发所述控制算子;
    第一执行单元,用于执行所述计算算子;
    存储处理单元,用于将执行所述计算算子后的数据存储在所述人工智能处理单元的存储单元中;
    第二执行单元,用于基于所述存储单元中的数据执行所述控制算子。
  11. 如权利要求10所述的处理装置,其特征在于,所述存储单元包括第一存储单元和第二存储单元;
    所述存储处理单元具体用于:将执行所述计算算子后的数据存储在所述第二存储单元中;
    所述第二执行单元包括:
    第一读取单元,用于读取所述第二存储单元中的数据;
    第一写入单元,用于将所述第二存储单元中的数据写入所述第一存储单元;
    读取执行单元,用于读取并基于所述第一存储单元中的数据执行所述控制算子。
  12. 如权利要求11所述的处理装置,其特征在于,所述控制算子包括分支判断算子和循环算子,其中所述分支判断算子用于判断执行第一分支算子或第二分支算子;所述读取执行单元包括:
    第二读取单元,用于读取所述第一存储单元中的数据;
    判断单元,用于基于所述第一存储单元中数据和所述分支判断算子中的参数判断是否执行所述第一分支算子;
    判断处理单元,用于若所述判断单元判断为是,则执行所述第一分支算子;若所述判断单元判断为否,则执行所述第二分支算子。
  13. 如权利要求12所述的处理装置,其特征在于,所述控制算子还包括循环算子,所述循环算子用于循环执行所述AI模型的第一计算算子;所述处理装置还包括:
    第三执行单元,用于在所述判断处理单元执行所述第一分支算子之后,执行所述循环算子,以通过所述运算逻辑单元循环执行所述第一计算算子,直到所述判断为否。
  14. 如权利要求13所述的处理装置,其特征在于,所述API还包括第二API和第三API,所述第二API用于创建标签;所述第三API用于设置标签在所述AI模型中的位置;所述AI模型还包括用于跳转的第一标签和第二标签;其中所述第一标签放置在与所述AI模型的第一计算算子相邻的上一个算子中,所述第二标签放置在与所述第二分支算子相邻的上一个算子中;所述处理装置还包括:
    第四执行单元,用于在所述读取执行单元读取并基于所述第一存储单元中的数据执行所述控制算子之前,通过所述运算逻辑单元执行所述第一计算算子;
    所述第三执行单元具体用于:在所述判断处理单元执行所述第一分支算子之后,执行所述循环算子,跳转到所述第一标签所在的位置,以通过所述运算逻辑单元循环执行所述第一计算算子;
    若所述判断单元判断为否,所述判断处理单元具体用于,跳转到所述第二标签所在的位置,以执行所述第二分支算子。
  15. 如权利要求11-14任一项所述的处理装置,其特征在于,所述处理装置还包括:
    设置单元,用于在所述第一读取单元读取所述第二存储单元中的数据之前,将所述第二存储单元设置为无效值;
    所述第一写入单元具体用于:在判断出读取的所述第二存储单元的数据为有效值的情况下,将所述第二存储单元的数据写入所述第一存储单元中。
  16. 一种人工智能模型的处理装置,其特征在于,包括:
    创建单元,用于创建人工智能AI模型;所述AI模型包括控制算子和计算算子;
    下发单元,用于基于用户态接口API将所述AI模型下发给人工智能处理单元;
    其中,所述API包括第一API,所述第一API用于下发所述控制算子;所述人工智能处理单元用于在训练或推理所述AI模型的过程中,执行所述控制算子和所述计算算子。
  17. 如权利要求16所述的处理装置,其特征在于,所述控制算子包括分支判断算子,所述分支判断算子用于判断执行第一分支算子或第二分支算子。
  18. 如权利要求17所述的处理装置,其特征在于,所述控制算子还包括循环算子,所述循环算子用于循环执行所述AI模型的第一计算算子;所述API还包括第二API和第三API,所述第二API用于创建标签;所述第三API用于设置标签在所述AI模型中的位置。
  19. 一种人工智能模型的处理设备,其特征在于,包括所述人工智能处理单元和存储器;其中,所述存储器用于存储程序代码,所述人工智能处理单元调用所述存储器存储的程序代码使得所述人工智能模型的处理设备执行如权利要求1-6任一项所述的方法。
  20. 一种人工智能模型的处理设备,其特征在于,包括处理器和存储器;其中,所述存储器用于存储程序代码,所述处理器调用所述存储器存储的程序代码使得所述人工智能模型的处理设备执行如权利要求7-9任一项所述的方法。
  21. 一种人工智能模型的处理设备,其特征在于,包括处理器、人工智能处理单元和存储器;其中,所述存储器用于存储程序代码,所述处理器与所述人工智能处理单元耦合;所述人工智能处理单元调用所述存储器存储的程序代码使得所述人工智能模型的处理设备执行如权利要求1-6任一项所述的方法;所述处理器调用所述存储器存储的程序代码使得所述人工智能模型的处理设备执行如权利要求7-9任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时实现上述权利要求1-9任一项所述的方法。
  23. 一种计算机程序,其特征在于,所述计算机程序包括指令,当所述计算机程序被处理器执行时实现上述权利要求1-9任一项所述的方法。
PCT/CN2021/102522 2021-06-25 2021-06-25 人工智能模型的处理方法、装置、设备及可读存储介质 WO2022267049A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2021/102522 WO2022267049A1 (zh) 2021-06-25 2021-06-25 人工智能模型的处理方法、装置、设备及可读存储介质
CN202180002364.1A CN113614749B (zh) 2021-06-25 2021-06-25 人工智能模型的处理方法、装置、设备及可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/102522 WO2022267049A1 (zh) 2021-06-25 2021-06-25 人工智能模型的处理方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022267049A1 true WO2022267049A1 (zh) 2022-12-29

Family

ID=78310967

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/102522 WO2022267049A1 (zh) 2021-06-25 2021-06-25 人工智能模型的处理方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113614749B (zh)
WO (1) WO2022267049A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117675608A (zh) * 2022-08-31 2024-03-08 华为技术有限公司 一种处理装置及控制方法
CN115782835B (zh) * 2023-02-09 2023-04-28 江苏天一航空工业股份有限公司 一种旅客登机车自动驻车远程驾驶控制方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111602117A (zh) * 2018-01-19 2020-08-28 龙加智科技有限公司 具有记录及回放支持的任务关键型ai处理器
CN112231270A (zh) * 2020-10-14 2021-01-15 苏州浪潮智能科技有限公司 一种人工智能加速器以及计算机设备
CN112308198A (zh) * 2019-07-26 2021-02-02 中科寒武纪科技股份有限公司 循环神经网络的计算方法及相关产品
US10931588B1 (en) * 2019-05-10 2021-02-23 Innovium, Inc. Network switch with integrated compute subsystem for distributed artificial intelligence and other applications
CN112465129A (zh) * 2019-09-09 2021-03-09 上海登临科技有限公司 片内异构人工智能处理器
CN112513817A (zh) * 2020-08-14 2021-03-16 华为技术有限公司 一种主cpu与npu的数据交互方法及计算设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465116B (zh) * 2020-11-25 2022-12-09 安徽寒武纪信息科技有限公司 编译方法、运算方法、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111602117A (zh) * 2018-01-19 2020-08-28 龙加智科技有限公司 具有记录及回放支持的任务关键型ai处理器
US10931588B1 (en) * 2019-05-10 2021-02-23 Innovium, Inc. Network switch with integrated compute subsystem for distributed artificial intelligence and other applications
CN112308198A (zh) * 2019-07-26 2021-02-02 中科寒武纪科技股份有限公司 循环神经网络的计算方法及相关产品
CN112465129A (zh) * 2019-09-09 2021-03-09 上海登临科技有限公司 片内异构人工智能处理器
CN112513817A (zh) * 2020-08-14 2021-03-16 华为技术有限公司 一种主cpu与npu的数据交互方法及计算设备
CN112231270A (zh) * 2020-10-14 2021-01-15 苏州浪潮智能科技有限公司 一种人工智能加速器以及计算机设备

Also Published As

Publication number Publication date
CN113614749A (zh) 2021-11-05
CN113614749B (zh) 2022-08-09

Similar Documents

Publication Publication Date Title
US20240078363A1 (en) Virtual environment scenarios and observers for autonomous machine applications
CN111919225B (zh) 使用模拟环境对自主机器进行培训、测试和验证
WO2022267049A1 (zh) 人工智能模型的处理方法、装置、设备及可读存储介质
CN109131340A (zh) 基于驾驶员行为的主动车辆性能调整
US20220080972A1 (en) Autonomous lane change method and apparatus, and storage medium
CN113835421B (zh) 训练驾驶行为决策模型的方法及装置
CN114004329A (zh) 物体交接的机器学习控制
CN110866600A (zh) 动态响应性预测
US11379308B2 (en) Data processing pipeline failure recovery
US11544896B2 (en) Spatial and temporal upsampling techniques for simulated sensor data
US11715257B2 (en) Simulation view generation based on simulated sensor operations
US11741661B2 (en) Sensor simulation with unified multi-sensor views
WO2022017307A1 (zh) 自动驾驶场景生成方法、装置及***
US11926346B2 (en) Behavior planning for autonomous vehicles in yield scenarios
US11334094B2 (en) Method for maintaining stability of mobile robot and mobile robot thereof
CN115599460A (zh) 用于优化自主车辆的启动过程的状态暂停
US20220374428A1 (en) Simulation query engine in autonomous machine applications
WO2022068643A1 (zh) 多任务部署的方法及装置
US20230260136A1 (en) Dynamic object detection using lidar data for autonomous machine systems and applications
US11734017B1 (en) Methods and systems for processing vehicle sensor data across multiple digital signal processing cores virtually arranged in segments based on a type of sensor
US20210343091A1 (en) Deported compute for teleoperation and autonomous systems
US11312201B2 (en) Method for controlling mobile robot and mobile robot therefor
Chen et al. Rule-based graded braking for unsignalized intersection collision avoidance via vehicle-to-vehicle communication
CN110370949A (zh) 基于低速蠕行的自动驻车方法、***、设备及存储介质
US11430177B1 (en) Mapping simulated sensors to simulated environment views

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21946541

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21946541

Country of ref document: EP

Kind code of ref document: A1