WO2023116312A1 - 数据处理方法、装置、计算机设备及存储介质 - Google Patents

数据处理方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2023116312A1
WO2023116312A1 PCT/CN2022/133413 CN2022133413W WO2023116312A1 WO 2023116312 A1 WO2023116312 A1 WO 2023116312A1 CN 2022133413 W CN2022133413 W CN 2022133413W WO 2023116312 A1 WO2023116312 A1 WO 2023116312A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
processing
processing unit
data
instruction
Prior art date
Application number
PCT/CN2022/133413
Other languages
English (en)
French (fr)
Inventor
孙炜
祝叶华
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2023116312A1 publication Critical patent/WO2023116312A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular, to a data processing method, device, computer equipment, and storage medium.
  • Embodiments of the present application provide a data processing method, device, computer equipment, and storage medium, which can improve data processing efficiency.
  • the technical solution is as follows:
  • the embodiment of the present application provides a data processing method, the method comprising:
  • the processing result is processed based on the second operator by the second processing unit.
  • an embodiment of the present application provides a data processing device, and the device includes:
  • the first processing module is configured to process the input data based on the first operator of the neural network through the first processing unit to obtain a processing result, and the first processing unit matches the first operator;
  • a data sending module configured to send the processing result to the second processing unit based on the direct connection path between the first processing unit and the second processing unit, and the connection between the second processing unit and the neural network The second operator matches;
  • the second processing module is configured to process the processing result based on the second operator by the second processing unit.
  • an embodiment of the present application provides a computer device, the computer device includes a processor and a memory; the memory stores at least one program, and the at least one program is used to be executed by the processor to implement the following: The data processing method described in the above aspects.
  • an embodiment of the present application provides a computer-readable storage medium, the storage medium stores at least one program, and the at least one program is used to be executed by a processor to implement the data processing method as described in the above aspect .
  • an embodiment of the present application provides a computer program product, the computer program product has computer instructions, and the computer instructions are stored in a computer-readable storage medium; a processor of a computer device reads the computer program from the computer-readable storage medium. Instructions, the processor executes the computer instructions, so that the computer device executes the data processing method provided in the above aspect.
  • an embodiment of the present application provides a chip, the chip includes a programmable logic circuit and/or program instructions, and when the chip runs on a terminal, it is used to implement the data processing method described in the above aspect.
  • FIG. 1 shows a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application
  • FIG. 2 shows a flowchart of a data processing method provided by an exemplary embodiment of the present application
  • FIG. 3 shows a flowchart of another data processing method provided by an exemplary embodiment of the present application
  • Fig. 4 shows a schematic diagram of a connection manner between processing units provided by an exemplary embodiment of the present application
  • Fig. 5 shows a schematic structural diagram of a processing unit provided by an exemplary embodiment of the present application
  • FIG. 6 shows a schematic diagram of a data processing process provided by an exemplary embodiment of the present application
  • FIG. 7 shows a flow chart of another data processing procedure provided by an exemplary embodiment of the present application.
  • Fig. 8 shows a structural block diagram of a data processing device provided by an exemplary embodiment of the present application
  • Fig. 9 shows a structural block diagram of a terminal provided by an exemplary embodiment of the present application.
  • Fig. 10 shows a structural block diagram of a server provided by an exemplary embodiment of the present application.
  • the "at least one” mentioned herein means one or more, and the “multiple” means two or more.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B may indicate: A exists alone, A and B exist simultaneously, and B exists independently.
  • the character “/” generally indicates that the contextual objects are an "or” relationship.
  • the embodiment of the present application provides a data processing method
  • the execution body is a computer device 100.
  • the computer device 100 is a terminal, such as a mobile phone, a desktop computer, a notebook computer, a tablet computer, a smart TV, a smart speaker, Various types of terminals such as vehicle terminals and intelligent robots.
  • the computer device 100 is a server, and the server may be a server, or a server cluster composed of several servers, or a cloud computing server center.
  • the computer device 100 performs data processing on the input data based on the operator in the neural network through the method provided in this application, which can improve the efficiency of data processing.
  • a chip is configured in the computer device, for example, an artificial intelligence chip, and the computer device can execute the data processing method in the embodiment of the present application through the chip.
  • FIG. 1 is a schematic diagram of a computer device 100 provided by an embodiment of the present application.
  • the computer device 100 includes at least two processing units, such as a first processing unit 101 and a second processing unit 102, wherein the operators matched by the first processing unit 101 and the second processing unit 102 are different, and optionally
  • the operator matched by the first processing unit 101 is a linear operator with high calculation density, for example, a convolution operator, a pooling operator, and the like.
  • the operator matched by the second processing unit 102 is a non-linear operator, for example, an activation function.
  • the computer device 100 uses a processing unit to execute an operator matching the processing unit, so as to ensure the execution efficiency of the operator.
  • the processing units matched by these multiple operators are different. Therefore, when processing the input data of the neural network, each processing unit matched by multiple operators is required. unit to cooperate.
  • the first processing unit 101 processes the input data based on the first operator in the neural network, obtains the processing result, and then based on the direct connection path between the first processing unit 101 and the second processing unit 102, sends to the second The processing unit 102 sends the processing result.
  • the second processing unit 102 continues to process the processing result based on the second operator in the neural network. That is to say, each processing unit respectively executes an operator matching itself in the neural network, and the obtained processing results are sent to other processing units through a direct connection with other processing units.
  • the data processing method provided in the embodiment of the present application can be applied in the scene of image processing.
  • the computer equipment acquires the input image of the image processing network, and processes the input image based on the first operator in the image processing network through the first image processing unit to obtain the processing result, based on the first image processing unit and the second image
  • the direct connection path between the processing units sends the processing result to the second image processing unit.
  • the second image processing unit continues to process the processing result based on the second operator in the image processing network.
  • the first image processing unit is matched with the first operator
  • the second image processing unit is matched with the second operator.
  • the data processing method provided in the embodiment of the present application can be applied in the audio processing scenario.
  • the computer equipment obtains the input audio of the audio processing network, and processes the input audio based on the first operator in the audio processing network through the first audio processing unit to obtain the processing result, based on the first audio processing unit and the second audio
  • the direct connection path between the processing units sends the processing result to the second audio processing unit.
  • the second audio processing unit continues to process the processing result based on the second operator in the audio processing network.
  • the first audio processing unit matches the first operator, and the second audio processing unit matches the second operator.
  • the data processing method provided in the embodiment of the present application can be applied in the scene of video processing.
  • the computer equipment obtains the input video of the video processing network, and processes the input video based on the first operator in the video processing network through the first video processing unit to obtain the processing result, based on the first video processing unit and the second video
  • the direct connection path between the processing units sends the processing result to the second video processing unit.
  • the second video processing unit continues to process the processing result based on the second operator in the video processing network.
  • the first video processing unit matches the first operator
  • the second video processing unit matches the second operator.
  • the method provided by the embodiment of the present application can also be applied to other data processing scenarios, for example, the scenario of processing multimedia data through a multimedia processing network, the scenario of processing text data through a text processing network, etc., the embodiment of the present application does not Do limit.
  • Fig. 2 shows a flow chart of a data processing method provided by an exemplary embodiment of the present application.
  • the execution subject is a computer device, and the method includes:
  • step 201 the first processing unit processes the input data based on the first operator of the neural network to obtain a processing result, and the first processing unit matches the first operator.
  • the neural network is any type of data processing network, for example, an image processing network, an audio processing network, a text processing network, a multimedia processing network, and the like.
  • the neural network includes multiple operators.
  • the operators include convolution operators, activation operators, pooling operators, and normalization operators.
  • Any type of data is input to a corresponding type of data processing network, and the computer device processes the data based on multiple operators in the data processing network, which can realize the effect corresponding to the function of the data processing network.
  • the function of the image processing network is to denoise the image, and after processing the input image based on multiple operators in the image processing network, the noise in the image can be removed.
  • the first operator is the highest-ranked operator in the neural network. Since different operators have different computing characteristics and different processing units have different data processing methods, different operators may match different processing units. Compared with the processing unit matching the operator, although other processing units can also execute the operator, the execution speed is slower. Therefore, in the embodiment of the present application, each operator in the neural network is executed by a processing unit matched with the operator, so as to ensure the efficiency of data processing based on the neural network.
  • the computer device includes multiple processing units, wherein the operators matched by the first processing unit and the second processing unit are different.
  • the first processing unit and the second processing unit are arbitrary processing engines, for example, the first processing unit is an NPU (Neural-network Processing Unit, embedded neural network processor).
  • the second processing unit is DSP (Digital Signal Processor, digital signal processor).
  • the first processing unit is TPU (Tensor Processing Unit, tensor processing unit), and the second processing unit is GPU (Graphics Processing Unit, graphics processing unit).
  • the first processing unit is an NPU
  • the second processing unit is a TPU.
  • the first processing unit and the second processing unit can be any data processing acceleration engine, which is not limited in this embodiment of the present application.
  • Step 202 based on the direct connection path between the first processing unit and the second processing unit, the processing result is sent to the second processing unit, and the second processing unit is matched with the second operator of the neural network.
  • a direct link refers to a path directly connecting two processing units without other units between the two processing units.
  • each processing unit matched with multiple operators in the neural network has a direct connection path, and any two processing units can directly perform data interaction based on the direct connection path.
  • step 203 the second processing unit processes the processing result based on the second operator.
  • the second processing unit receives the processing result sent by the first processing unit, and then continues to process the processing result based on the second operator. It should be noted that if the second operator is the last operator in the neural network, then the second processing unit can obtain the output data of the neural network after processing the processing result based on the second operator. If the second operator is not the last operator in the neural network, then the second processing unit processes the processing result based on the second operator to obtain an updated processing result, which is subsequently processed by the second processing unit or other The unit executes other operators in the neural network until the execution of multiple operators in the neural network is completed.
  • the data processing solution provided by the embodiment of the present application adds a direct connection path between the processing units, so that the processing units can directly perform data interaction through the direct connection path, which greatly improves the communication between the processing units.
  • the processing unit after the processing unit performs data processing based on the matching operator, it directly sends the processing result to another processing unit, and the other processing unit can directly obtain the processing result.
  • the processing result continues to be processed, and this solution gets rid of the control unit, so that the processing units can cooperate directly and without obstacles, thereby improving the efficiency of data processing through the operators in the neural network.
  • the first processing unit includes a first processor
  • the first operator based on the neural network processes the input data to obtain processing results, including:
  • the input data is processed based on the first operator to obtain a processing result.
  • the second processing unit includes a second processor and a second memory
  • Sending a processing result to the second processing unit based on the direct connection path between the first processing unit and the second processing unit includes:
  • the processing result is stored in the second memory based on the direct connection path, and the data processing instruction is sent to the second processor.
  • the method also includes:
  • the processing result is processed based on the second operator.
  • the method also includes:
  • each processing unit For each processing unit, according to the order in which the multiple operators matched by the processing unit are arranged in the neural network, store multiple operators in the memory of the processing unit, insert a waiting instruction before at least one operator in the memory, and Insert a data sending instruction after at least one operator;
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing operation is re-executed when the data processing instruction is received
  • the data sending instruction is used to instruct the current processing unit to send the processing to other processing units when the processing based on the operator is completed. Results and data processing instructions.
  • inserting a waiting instruction before at least one operator in the memory, and inserting a data sending instruction after at least one operator includes:
  • a waiting instruction is inserted before the first operator of the at least two operators, and a waiting instruction is inserted before the last operator of the at least two operators Then insert the data sending instruction;
  • the method further includes:
  • the processing result is processed by the first processing unit based on the third operator to obtain an updated processing result, and the third operator matches the first processing unit.
  • the method before sending the processing result to the second processing unit based on the direct connection path between the first processing unit and the second processing unit, the method further includes:
  • Fig. 3 shows a flowchart of a data processing method provided by an exemplary embodiment of the present application. Referring to Fig. 3, the method includes:
  • step 301 the input data of the neural network is acquired, and the neural network includes multiple operators.
  • the computer device obtains the data stored by the CPU (Central Processing Unit, central processing unit) in the target storage location, and determines the data as the input data of the neural network.
  • the target storage location is used to store the input data of the neural network.
  • step 302 the first processing unit processes the input data based on the first operator to obtain a processing result, and the first processing unit matches the first operator among the plurality of operators.
  • the CPU sends a data processing instruction to the first processing unit to instruct the first processing unit to start executing the data processing operation.
  • the first processing unit processes the input data based on the first operator to obtain a processing result.
  • the first processing unit includes a first processor.
  • the computer device processes the input data based on the first operator through the first processing unit to obtain a processing result, including: The first processor executes the waiting instruction to wait for the data processing instruction; in response to receiving the data processing instruction, processes the input data based on the first operator to obtain a processing result.
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing operation is re-executed when the data processing instruction is received.
  • the first processor in the first processing unit is an ALU (Arithmetic&Logical Unit, arithmetic logic unit), and the first memory is called memory.
  • the waiting instruction is sent by the CPU to the first processing unit, or the waiting instruction is read by the first processing unit from a memory
  • the memory is any memory, for example, a public memory or the first processing unit
  • the memory in this application is not limited in this embodiment.
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing instruction is received, the data processing operation is re-executed, so that the first processor executes the waiting instruction, and the first processor is controlled by the data processing instruction.
  • the first processing unit further includes a first memory, and the first memory stores instructions to be executed by the first processing unit.
  • the computer device uses the first processing unit to The input data is processed to obtain the processing result, including: the computer device reads the waiting instruction before the first operator from the first memory through the first processor; then executes the waiting instruction, that is, stops executing the data processing operation until When the data processing instruction is received, the first operator is read from the first memory, and the input data is processed based on the first operator to obtain a processing result.
  • the computer device uses the first processing unit to sequentially read and execute the instructions according to the order of the instructions stored in the first memory, each time reading the highest-ranked instruction among the unexecuted instructions. Since there is a waiting instruction before the first operator, the first processor will first execute the waiting instruction until it receives a data processing instruction, read the next instruction after the waiting instruction in the first memory, that is, the first operator, Execute the first operator.
  • the first processing unit can only execute the first operator after receiving the data processing instruction, thereby realizing the control of the first operator.
  • the computer device before the computer device obtains the input data of the neural network, it first divides the multiple operators in the neural network into matching processing units, and determines the Execution order. That is, the computer device determines the matching processing unit of each operator in the plurality of operators; for each processing unit, according to the order in which the multiple operators matching the processing unit are arranged in the neural network, the The plurality of operators are stored in the memory of the memory, a waiting instruction is inserted before at least one operator in the memory, and a data sending instruction is inserted after the at least one operator.
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing operation is re-executed when the data processing instruction is received
  • the data sending instruction is used to instruct the current processing unit to send the processing to other processing units when the processing based on the operator is completed. Results and data processing instructions.
  • the waiting instruction is inserted before the operator, and indicates that the data processing operation is performed when the data processing instruction is received, and the data sending instruction is inserted after the operator, it is used to indicate that the current processing unit is based on the completion of the operator processing , sending processing results and data processing instructions to other processing units, so that other processing units process the processing results in response to the data processing instructions, and each processing unit executes the instructions sequentially according to the order of the instructions in the memory, therefore, no There will be resource occupation conflicts, that is, there will be no situation where one processing unit executes multiple operators at the same time, which ensures that the processing unit can only execute one operator at the same time, and there will be no data conflicts, that is, it ensures that the neural network Multiple operators can be processed sequentially, and the execution of the latter operator can depend on the processing result of the previous operator.
  • the waiting instruction and the data sending instruction are actually handshake signals between processing units, so that without the participation of the control unit, multiple processing units can avoid data conflicts and resource conflicts when they jointly process data based on the neural network.
  • the computer device uses a compiler to determine the matching processing unit of each operator in the plurality of operators; for each processing unit, according to the order in which the plurality of operators matching the processing unit are arranged in the neural network , storing the plurality of operators in a memory of the processing unit, inserting a wait instruction before at least one operator in the memory, and inserting a data sending instruction after at least one operator.
  • a compiler is a program that compiles a high-level language into a low-level language that can be executed by a computer device.
  • the compiler is used to divide the operators in the neural network, and determine the execution order of the operators divided by each processing unit, so that the neural network can complete the division and execution order of the operators in the compilation stage ok.
  • each operator in the neural network may have different matching processing units, therefore, the operators in the neural network are divided into matching processing units before processing the input data through the neural network, And determine the execution order of the multiple operators assigned to each processing unit, so that when the input data of the neural network is obtained later, the processing unit can directly execute based on the multiple operators assigned to itself and the determined operators Data processing is performed sequentially, instead of temporarily determining the matching processing unit of each operator in the process of data processing based on the neural network, which can improve data processing efficiency.
  • the waiting instruction before the operator indicates that the data processing operation is performed when the data processing instruction is received
  • the data sending instruction is inserted after the operator, which is used to indicate that the current processing unit
  • send processing results and data processing instructions to other processing units so that other processing units process the processing results in response to the data processing instructions, so that multiple processing units can be processed without the participation of the control unit
  • the units jointly process data based on the neural network data conflicts and resource conflicts can be avoided, and a self-synchronization mechanism is realized.
  • the computer device inserts a waiting instruction before at least one operator in the memory, and inserts a data sending instruction after the at least one operator, including: the computer device inserts at least two adjacent operators in the memory In the case that there is an association between the sub-operators, a waiting instruction is inserted before the first of the at least two operators, and a data sending instruction is inserted after the last of the at least two operators.
  • step 303 the associated operator of the first operator is obtained, and the associated operator is determined as the second operator, wherein the data processing of the associated operator depends on the processing result of the first operator.
  • the data processing of an associated operator depends on the processing result of the first operator, that is, the output of the first operator should be the input of the associated operator.
  • the computer device acquires an associated operator of the first operator, and determines the associated operator as the second operator, including: the computer device acquires an operator association relationship, and the operator association relationship indicates that the operator The associated operators of the operators included in the network, the data processing of the associated operators of any operator depends on the processing results of the operator. Then the computer device inquires the associated operator of the first operator from the operator association relationship; and determines the queried associated operator as the second operator.
  • the operator association relationship is the arrangement order of multiple operators in the neural network, where an operator after each operator depends on the processing result of the operator, which is the associated operator of the operator .
  • the computer device queries an operator after the first operator from the sequence, and determines this operator as an associated operator of the first operator.
  • the associated operator of the first operator is determined as the second operator Subsequent to send the processing result of the first operator to the processing unit matched by the second operator, it can ensure that the execution of the second operator can be based on the processing result of the first operator, and ensure the correct execution of the operator in the neural network .
  • Step 304 Send the processing result to the second processing unit based on the direct connection path between the first processing unit and the second processing unit, and the second processing unit matches the second operator in the plurality of operators.
  • the second processing unit includes a second processor and a second memory
  • the computer device sends the Sending the processing result includes: the computer device stores the processing result in the second memory through the first processing unit based on a direct connection with the second processing unit, and sends a data processing instruction to the second processor.
  • the data processing instruction carries the storage address of the processing result.
  • the first processing unit does not need to store the processing result in the common storage unit based on the bus, and then the second processing unit reads the processing result from the common storage unit based on the bus. Instead, based on the direct connection path with the second processing unit, the processing result is directly stored in the memory of the second processing unit, so that the processor of the second processing unit can directly The processing result is read from its own memory and processed, which greatly shortens the data transmission link, thereby improving the efficiency of data interaction.
  • the first processing unit includes a first processor and a first memory.
  • the computer device passes through the first processing unit based on a direct connection with the second processing unit, and in the second Storing the processing result in the memory, and sending the data processing instruction to the second processor includes: the computer device reads the data sending instruction after the first operator from the first memory through the first processor, and responds to the data sending The instruction stores the processing result in the second memory based on the direct communication path with the second processor, and sends the data processing instruction to the second processor, wherein the data sending instruction is used to indicate that the current processing unit When the operator-based processing is completed, the processing results and data processing instructions are sent to other processing units.
  • the processing result and the data processing instruction are sent to other processing units, so through the operator in the memory of the processing unit Storing the data sending instruction, so that the processing unit can execute the data sending instruction after executing the operator, thereby storing the processing result in the memory of the second processor, and sending the data processing instruction to the second processor, so that the second processing
  • the unit can read the processing result from its own memory based on the data processing instruction, and continue to process the processing result.
  • the computer device processes the input data based on the first operator through the first processing unit, and after obtaining the processing result and before sending the processing result to the second processing unit, the method further includes: The device uses the first processing unit to process the processing result based on the third operator to obtain an updated processing result, wherein the third operator is an operator in the neural network, and the third operator matches the first processing unit.
  • the third operator is an associated operator of the first operator, that is, the input of the third operator is the output of the first operator, and the third operator depends on the processing result of the first operator.
  • the current processing unit performs processing based on the current operator, if the next If an operator still matches the processing unit, the processing unit will continue to process based on the operator and get updated processing results until the next operator after the currently processed operator does not match the current processing unit. Then the current processing unit sends the latest processing result to the processing unit matching the next operator.
  • the first processing unit can process the processing result based on the third operator to obtain an updated processing result, and then send the updated processing result to the second processing unit, that is, the same processing
  • the unit can continuously execute multiple matching operators, which ensures that when multiple processing units perform data processing based on the neural network, each processing unit can execute the operator that matches itself, and multiple operators in the neural network The order of execution is exact.
  • step 305 the second processing unit processes the processing result based on the second operator.
  • the computer device executes a waiting instruction to wait for a data processing instruction from the first processing unit through the second processor of the second processing unit; in response to receiving the data processing instruction from the first processing unit , and process the processing result based on the second operator.
  • the waiting instruction is sent by the CPU to the second processing unit, or the waiting instruction is read by the second processing unit from the memory, and the memory is any memory, for example, a public memory or the second processing unit
  • the memory in this application is not limited in this embodiment.
  • the computer device stores the processing result in the second memory of the second processing unit through the first processor of the first processing unit, and the data processing instruction includes the storage address of the processing result in the second memory, corresponding Yes, the computer device reads the processing result from the second memory based on the storage address through the second processor in response to receiving the data processing instruction from the first processing unit.
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing instruction is received, the data processing operation is re-executed, so that the second processor executes the waiting instruction, and the second processor is controlled by the data processing instruction.
  • the timing at which a processor performs a data processing operation since the waiting instruction is used to instruct to stop executing the data processing operation until the data processing instruction is received, the data processing operation is re-executed, so that the second processor executes the waiting instruction, and the second processor is controlled by the data processing instruction.
  • the computer device reads the waiting instruction before the second operator from the second memory through the second processor; then executes the waiting instruction, that is, stops executing the data processing operation until In the case of a data processing instruction, the second operator is read from the second memory, and the processing result is processed based on the second operator.
  • the computer device uses the second processing unit to sequentially read the instructions according to the order of the instructions stored in the second memory for execution, and each time reads the highest-ranked instruction among the unexecuted instructions. Since there is a waiting instruction before the second operator, the second processor will first execute the waiting instruction until it receives the data processing instruction, read the next instruction after the waiting instruction in the second memory, that is, the second operator, Execute the second operator.
  • the second processing unit can execute the second operator only after receiving the data processing instruction, thereby realizing the control of the second operator.
  • the timing at which the second processor executes data processing operations is a simple operation, arithmetic operations, and arithmetic operations.
  • the second processing unit can obtain the output data of the neural network after processing the processing result based on the second operator. If the second operator is not the last operator in the neural network, then the second processing unit processes the processing result based on the second operator to obtain an updated processing result, which is subsequently processed by the second processing unit or other Units perform other operations in the neural network. For example, in the neural network, the next operator after the second operator is the fourth operator, and the fourth operator matches the third processing unit, then the second processing unit sends the updated processing result to the third processing unit , the third processing unit continues to process the processing result based on the fourth operator, and so on until the execution of multiple operators in the neural network is completed.
  • the manner in which the second processing unit sends the processing result to the third processing unit is the same as the implementation manner in which the first processing unit sends the processing result to the second processing unit, and the manner in which the third processing unit performs data processing based on the fourth operator It is the same as the manner in which the second processing unit performs data processing based on the second operator, and details are not repeated here.
  • step 303 is an optional step.
  • the computer device does not need to determine the second operator first, and determines which processor the first processing unit sends to based on the second operator. process result. For example, after the computer device executes the first operator through the first processing unit, if the next instruction after the first operator in the first memory is a data sending instruction, the computer device directly sends the instruction to the division first operator through the first processing unit. Another processing unit other than the processing unit sends the processing result and the data processing instruction.
  • FIG. 4 is a schematic diagram of a connection manner between processing units provided in an embodiment of the present application.
  • the first processing unit and the second processing unit are respectively connected to the control unit, and the first processing unit and the second processing unit are connected to the common storage unit through a bus.
  • the first processing unit executes an operator in the neural network, it stores the obtained processing result in the public storage unit based on the bus, and then notifies the control unit by way of interrupt, and the control unit determines the next operation in the neural network.
  • the processing unit matched by an operator if the processing unit is the second processing unit, the control unit sends a notification to the second processing unit, and then the second processing unit reads the processing result from the public storage unit based on the bus, and then based on The next operator performs data processing, and so on until the processing of multiple operators in the neural network is completed.
  • the control unit needs to interact frequently with each processing unit, occupying the resources of the control unit, and the interaction efficiency between each processing unit is extremely low, and, because the processing results are stored in the common storage unit, the data The link between storage and data reading is long, and the efficiency of data transmission is extremely low.
  • a direct connection path is added between the first processing unit and the second processing unit, so that the first processing unit and the second processing unit directly perform data interaction based on the direct connection path, which greatly shortens the processing result.
  • the transmission link, and the first processing unit and the second processing unit realize self-synchronization based on waiting instructions and data sending instructions, get rid of the control unit, improve the interaction efficiency, and thus improve the overall efficiency of data processing based on the neural network .
  • FIG. 5 is a schematic structural diagram of a processing unit provided in an embodiment of the present application.
  • any processing unit includes a processor and a memory, the processor and the memory are connected, and the processor can read data from the memory and store data into the memory.
  • the two processing units are directly connected, and any processing unit can directly store data in the memory of the other processing unit directly connected to it.
  • the first processor in the first processing unit can directly store data to the second memory
  • the second processor in the second processing unit can directly store data to the first memory.
  • the following takes the neural network shown in FIG. 6 as an example to illustrate the data processing process in this application.
  • the computer device compiles the neural network based on the compiler, so as to divide multiple operators in the neural network into matching processing units.
  • the computer device since operator 0, operator 2, and operator 4 in the neural network match the first processing unit, and operator 1, operator 3, and operator 5 match the second processing unit, therefore, the operator Operator 0, operator 2, and operator 4 are allocated to the first memory of the first processing unit, and operator 1, operator 3, and operator 5 are allocated to the second memory of the second processing unit.
  • the computer device inserts a waiting instruction before each operator in the first memory and the second memory, and inserts a data sending instruction after each operator.
  • the first processor in the first processing unit sequentially reads the instructions in the order of the instructions in the first memory for execution, first reads the waiting instruction, executes the waiting instruction, that is, stops the data processing operation, and waits for the data Processing instructions.
  • the second processor in the second processing unit sequentially reads and executes the instructions according to the order of the instructions in the second memory, first reads the waiting instruction, executes the waiting instruction, that is, stops the data processing operation, and waits for the data processing instruction.
  • the computer device stores the input data of the neural network in the target storage location through the CPU, and then sends a data processing instruction to the first processor in the first processing unit, and the first processor responds to the data processing instruction,
  • the input data is read from the storage location, and the operator 0 is read from the first memory, and based on the operator 0, the input data is processed to obtain a processing result.
  • read the data sending instruction after operator 0 from the first memory and store the processing result in the second processing unit based on the direct connection path between the first processing unit and the second processing unit in response to the data sending instruction In the second memory of the unit, and send the data processing instruction to the second processor in the second processing unit.
  • the first processor reads the waiting instruction before operator 2 from the first memory, and executes the waiting instruction.
  • the second processor receives the data processing instruction sent by the first processor during the execution of the waiting instruction, and reads the operator 1 and the processing result from the second memory in response to the data processing instruction, based on Operator 1 continues to process the processing result to obtain an updated processing result. Then read the data sending instruction after operator 1, store the updated processing result in the first memory in response to the data sending instruction, and send the data processing instruction to the first processor. Then, the second processor reads the waiting instruction before operator 3, and executes the waiting instruction.
  • Step 5 During the process of executing the waiting instruction, the first processor receives the data processing instruction sent by the second processor, and in response to the data processing instruction, reads operator 2 and the latest current processing instruction from the first memory. As a result, the processing result is continued to be processed based on operator 2, and an updated processing result is obtained. Then, the first processor reads the data sending instruction after operator 2, stores the updated processing result in the second memory in response to the data sending instruction, and sends the data processing instruction to the second processor. Then, the first processor reads the waiting instruction before the operator 4, and executes the waiting instruction. And so on, until the data processing of multiple operators in the neural network is completed.
  • the sixth step take the processing unit matched by the last operator in the neural network as the first processing unit as an example.
  • the first processor executes the last operator, it stores the obtained output data in the public storage unit based on the bus , and then send a processing completion notification to the CPU in the form of an interrupt to notify the server that the processing of the current input data is complete.
  • Step 7 In response to the processing completion notification, the CPU stores new input data in the target storage location, sends a data processing instruction to the first processor, and then the first processor processes the new input data in the same way.
  • the following takes the neural network shown in FIG. 7 as an example to illustrate the data processing process in this application.
  • the computer device compiles the neural network based on the compiler, so as to divide multiple operators in the neural network into matching processing units.
  • the computer device inserts a waiting instruction before each operator in the first memory, and inserts a data sending instruction after each operator.
  • the computer device inserts a wait instruction before operator 1 and operator 5 in the second memory, and inserts a data transmission instruction after operator 3 and operator 5.
  • the first processor in the first processing unit sequentially reads the instructions in the order of the instructions in the first memory for execution, first reads the waiting instruction, executes the waiting instruction, that is, stops the data processing operation, and waits for the data Processing instructions.
  • the second processor in the second processing unit sequentially reads and executes the instructions according to the order of the instructions in the second memory, first reads the waiting instruction, executes the waiting instruction, that is, stops the data processing operation, and waits for the data processing instruction.
  • the computer device stores the input data of the neural network in the target storage location through the CPU, and then sends a data processing instruction to the first processor in the first processing unit, and the first processor responds to the data processing instruction,
  • the input data is read from the storage location, and the operator 0 is read from the first memory, and based on the operator 0, the input data is processed to obtain a processing result.
  • read the data sending instruction after operator 0 from the first memory and store the processing result in the second processing unit based on the direct connection path between the first processing unit and the second processing unit in response to the data sending instruction In the second memory of the unit, and send the data processing instruction to the second processor in the second processing unit.
  • the first processor reads the waiting instruction before the operator 4 from the first memory, and executes the waiting instruction.
  • the second processor receives the data processing instruction sent by the first processor during the execution of the waiting instruction, and reads the operator 1 and the processing result from the second memory in response to the data processing instruction, based on Operator 1 continues to process the processing result to obtain an updated processing result.
  • the operator 2 is read from the second memory, and the current processing result is continued to be processed based on the operator 2 to obtain an updated processing result.
  • read operator 3 from the second memory continue to process the current processing result based on operator 3, obtain the updated processing result, and then read the data sending instruction after operator 3, and respond to the data
  • An instruction is sent to store the updated processing result in the first memory, and a data processing instruction is sent to the first processor.
  • the second processor reads the waiting instruction before the operator 5, and executes the waiting instruction.
  • Step 5 During the process of executing the waiting instruction, the first processor receives the data processing instruction sent by the second processor, and in response to the data processing instruction, reads the operator 4 and the current latest processing instruction from the first memory. As a result, based on operator 4, the processing result continues to be processed to obtain an updated processing result. Then, the first processor reads the data transmission instruction after operator 4, and in response to the data transmission instruction, the updated processing result The results are stored in the second memory, and data processing instructions are sent to the second processor. Then, the first processor reads the waiting instruction after operator 4, and executes the waiting instruction. And so on, until the data processing of multiple operators in the neural network is completed.
  • the sixth step take the processing unit matched by the last operator in the neural network as the first processing unit as an example.
  • the first processor executes the last operator, it stores the obtained output data in the public storage unit based on the bus , and then send a processing completion notification to the CPU in the form of an interrupt to notify the server that the processing of the current input data is complete.
  • Step 7 In response to the processing completion notification, the CPU stores new input data in the target storage location, sends a data processing instruction to the first processor, and then the first processor processes the new input data in the same way.
  • the data processing solution provided by the embodiment of the present application adds a direct connection path between the processing units, so that the processing units can directly perform data interaction through the direct connection path, which greatly improves the interaction efficiency between the processing units.
  • the processing unit after the processing unit performs data processing based on the matching operator, it directly sends the obtained processing result to another processing unit, and the other processing unit can directly obtain the processing result and continue to process the processing result.
  • this kind of scheme gets rid of the control unit, so that the processing units can cooperate directly and without obstacles, thus improving the efficiency of data processing through the operators in the neural network.
  • the waiting instruction is used to instruct to stop executing the data processing operation, until the data processing instruction is received, the data processing operation is re-executed, so that the first processor executes the waiting instruction, and the data processing instruction is used to control the first processor to perform data processing.
  • the timing of the operation since the waiting instruction is used to instruct to stop executing the data processing operation, until the data processing instruction is received, the data processing operation is re-executed, so that the first processor executes the waiting instruction, and the data processing instruction is used to control the first processor to perform data processing.
  • the first processing unit can execute the first operator only after receiving the data processing instruction, thereby realizing controlling the first processor to execute the data The timing of the processing operation.
  • the matching processing units of each operator in the neural network may be different, before the input data is processed by the neural network, the operators in the neural network are divided into matching processing units, and the allocation to each operator is determined.
  • the execution order of multiple operators of a processing unit enables the processing unit to perform data processing directly based on the multiple operators assigned to itself and the execution order of the determined operators when the input data of the neural network is obtained later. , instead of temporarily determining the matching processing unit of each operator in the process of data processing based on the neural network, which can improve the data processing efficiency.
  • the waiting instruction before the operator indicates that the data processing operation is performed when the data processing instruction is received
  • the data sending instruction is inserted after the operator, which is used to indicate that the current processing unit
  • send processing results and data processing instructions to other processing units so that other processing units process the processing results in response to the data processing instructions, so that multiple processing units can be processed without the participation of the control unit
  • the units jointly process data based on the neural network data conflicts and resource conflicts can be avoided, and a self-synchronization mechanism is realized.
  • the adjacent operators in the neural network have data dependence before, if there is no data dependence between two adjacent operators assigned to any processing unit, it means that the previous one of the two operators is Operators with data dependencies are assigned to other processing units. Therefore, if there is an association relationship between at least two adjacent operators in the memory, the first of the at least two operators Insert the waiting instruction before the operator, and insert the data sending instruction after the last operator of at least two operators, so that the two operators with data dependencies can be processed continuously, thus ensuring the correct execution of the operator in the neural network .
  • the processing unit of the two-operator matching sends the processing result of the first operator, which can ensure that the execution of the second operator can be based on the processing result of the first operator, and ensure the correct execution of the operators in the neural network.
  • the first processing unit does not need to store the processing result in the common storage unit based on the bus, and then the second processing unit reads the processing result from the common storage unit based on the bus. Instead, based on the direct connection path with the second processing unit, the processing result is directly stored in the memory of the second processing unit, so that the processor of the second processing unit can directly The processing result is read from its own memory and processed, which greatly shortens the data transmission link, thereby improving the efficiency of data interaction.
  • the first processing unit can process the processing result based on the third operator, obtain an updated processing result, and then send the updated processing result to the second processing unit, that is, the same processing unit can continuously execute multiple This ensures that when multiple processing units perform data processing based on the neural network, each processing unit can execute the operator that matches itself, and the execution order of multiple operators in the neural network is accurate. of.
  • the waiting instruction is used to instruct to stop executing the data processing operation, until the data processing instruction is received, the data processing operation is re-executed, so that the second processor executes the waiting instruction, and the data processing instruction is used to control the second processor to perform data processing.
  • the timing of the operation since the waiting instruction is used to instruct to stop executing the data processing operation, until the data processing instruction is received, the data processing operation is re-executed, so that the second processor executes the waiting instruction, and the data processing instruction is used to control the second processor to perform data processing.
  • the second processing unit can execute the second operator only after receiving the data processing instruction, thereby realizing controlling the second processor to execute the data The timing of the processing operation.
  • FIG. 8 shows a structural block diagram of a data processing device provided by an exemplary embodiment of the present application.
  • the data processing unit includes:
  • the first processing module 801 is configured to use the first processing unit to process the input data based on the first operator of the neural network to obtain a processing result, and the first processing unit matches the first operator;
  • a data sending module 802 configured to send a processing result to the second processing unit based on the direct connection between the first processing unit and the second processing unit, and the second processing unit matches the second operator of the neural network;
  • the second processing module 803 is configured to use the second processing unit to process the processing result based on the second operator.
  • the first processing unit includes a first processor
  • the first processing module 801 is configured to:
  • the input data is processed based on the first operator to obtain a processing result.
  • the second processing unit includes a second processor and a second memory
  • the data sending module 802 is configured to store the processing result in the second memory and send the data processing instruction to the second processor through the first processing unit based on the direct connection path.
  • the second processing module 803 is configured to:
  • the processing result is processed based on the second operator.
  • the device also includes:
  • a unit determination module configured to determine a matching processing unit for each operator in the plurality of operators included in the neural network
  • the data storage module is configured to, for each processing unit, store multiple operators in the memory of the processing unit according to the order in which the multiple operators matched by the processing unit are arranged in the neural network, before at least one operator in the memory inserting a waiting instruction, and inserting a data sending instruction after at least one operator;
  • the waiting instruction is used to instruct to stop executing the data processing operation until the data processing operation is re-executed when the data processing instruction is received
  • the data sending instruction is used to instruct the current processing unit to send the processing to other processing units when the processing based on the operator is completed. Results and data processing instructions.
  • the data storage module is used for:
  • a waiting instruction is inserted before the first operator of the at least two operators, and a waiting instruction is inserted before the last operator of the at least two operators Then insert the data sending instruction;
  • the first processing module 801 is further configured to use the first processing unit to process the processing result based on the third operator to obtain an updated processing result, and the third operator matches the first processing unit.
  • the device further includes:
  • the operator determination module is used to obtain the operator association relationship.
  • the operator association relationship indicates the associated operator of the operator included in the neural network.
  • the data processing of the associated operator depends on the processing result of the operator; from the operator association relationship, Query the associated operator of the first operator; determine the queried associated operator as the second operator.
  • the data processing solution provided by the embodiment of the present application adds a direct connection path between the processing units, so that the processing units can directly perform data interaction through the direct connection path, which greatly improves the communication between the processing units.
  • the processing unit after the processing unit performs data processing based on the matching operator, it directly sends the processing result to another processing unit, and the other processing unit can directly obtain the processing result.
  • the processing result continues to be processed, and this solution gets rid of the control unit, so that the processing units can cooperate directly and without obstacles, thereby improving the efficiency of data processing through the operators in the neural network.
  • An embodiment of the present application provides a computer device, the computer device includes a processor and a memory; the memory stores at least one program, and the at least one program is used to be executed by the processor to realize the data processing provided by the above-mentioned various method embodiments method.
  • a chip is configured in the computer device, for example, an artificial intelligence chip, and the computer device can execute the data processing method in the embodiment of the present application through the chip.
  • the computer device is a terminal.
  • FIG. 9 shows a structural block diagram of a terminal provided in an exemplary embodiment of the present application.
  • the terminal 900 is a terminal capable of accessing a wireless local area network as a wireless station, such as a smart phone, a tablet computer, and a wearable device.
  • the terminal 900 in this application includes at least one or more of the following components: a processor 910 , a memory 920 and at least two wireless links 930 .
  • processor 910 includes one or more processing cores.
  • the processor 910 uses various interfaces and lines to connect various parts of the entire terminal 900, and executes various functions and processes of the terminal 900 by running or executing program codes stored in the memory 920 and calling data stored in the memory 920 result.
  • the processor 910 adopts at least one of Digital Signal Processing (Digital Signal Processing, DSP), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), and Programmable Logic Array (Programmable Logic Array, PLA). A form of hardware to achieve.
  • DSP Digital Signal Processing
  • FPGA Field-Programmable Gate Array
  • PLA Programmable Logic Array
  • the processor 910 can integrate one or more of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a neural network processor (Neural-network Processing Unit, NPU) and a modem, etc.
  • a central processing unit Central Processing Unit, CPU
  • an image processor Graphics Processing Unit, GPU
  • a neural network processor Neural-network Processing Unit, NPU
  • the CPU mainly handles the operating system, user interface and application programs, etc.
  • the GPU is used to render and draw the content that needs to be displayed on the display screen
  • the NPU is used to realize artificial intelligence (Artificial Intelligence, AI) functions
  • the modem is used to process wireless communication. It can be understood that, the above-mentioned modem may not be integrated into the processor 910, but may be realized by a single chip.
  • the processor 910 is used to control the working conditions of at least two wireless links 930, and correspondingly, the processor 910 is a processor integrated with a wireless fidelity (Wireless Fidelity, Wi-Fi) chip.
  • the Wi-Fi chip is a chip with dual Wi-Fi processing capabilities.
  • the Wi-Fi chip is a Dual Band Dual Concurrent (DBDC) chip, or a Dual Band Simultaneous (DBS) chip.
  • DBDC Dual Band Dual Concurrent
  • DBS Dual Band Simultaneous
  • the memory 920 includes a random access memory (Random Access Memory, RAM), and in some embodiments, the memory 920 includes a read-only memory (Read-Only Memory, ROM). In some embodiments, the memory 920 includes a non-transitory computer-readable storage medium. The memory 920 can be used to store program codes.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • the memory 920 includes a non-transitory computer-readable storage medium. The memory 920 can be used to store program codes.
  • the memory 920 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like for implementing the following method embodiments; the storage data area can store data created according to the use of the terminal 900 (such as audio data, phonebook) and the like.
  • the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playback function, an image playback function, etc.), Instructions and the like for implementing the following method embodiments; the storage data area can store data created according to the use of the terminal 900 (such as audio data, phonebook) and the like.
  • the memory 920 stores different schemes for receiving beacon frames of the wireless link 930 . And, the identifiers of the access nodes connected to the different wireless links 930, the identifiers of the wireless links 930, and the like.
  • the at least two wireless links 930 are used to connect different access nodes (Access Point, AP). Receive downlink data sent by the AP.
  • the different access nodes are access nodes in the same router or access nodes in different routers.
  • the terminal 900 further includes a display screen.
  • a display is a display component for displaying a user interface.
  • the display screen is a display screen with a touch function, and through the touch function, the user can use any suitable object such as a finger or a touch pen to perform a touch operation on the display screen.
  • the display screen is usually set on the front panel of the terminal 900 .
  • the display screen is designed as a full screen, a curved screen, a special-shaped screen, a double-sided screen or a folding screen.
  • the display screen is also designed as a combination of a full screen and a curved screen, a combination of a special-shaped screen and a curved screen, etc., which are not limited in this embodiment.
  • the structure of the terminal 900 shown in the above drawings does not constitute a limitation on the terminal 900, and the terminal 900 includes more or less components than those shown in the figure, or combines some components, or different component arrangements.
  • the terminal 900 also includes components such as a microphone, a loudspeaker, an input unit, a sensor, an audio circuit, a module, a power supply, and a Bluetooth module, which will not be repeated here.
  • the computer device is a server.
  • FIG. 10 shows a structural block diagram of a server provided by an exemplary embodiment of the present application.
  • the server 1000 may have relatively large differences due to different configurations or performances.
  • the device 1001 is loaded and executed to implement the methods provided by the above method embodiments.
  • the server may also have components such as a wired or wireless network interface, a keyboard, and an input and output interface for input and output, and the server may also include other components for realizing device functions, which will not be repeated here.
  • the present application also provides a computer-readable storage medium, the storage medium stores at least one program, and the at least one program is loaded and executed by the processor to implement the data processing methods shown in the above embodiments.
  • a chip is provided, the chip includes a programmable logic circuit and/or program instructions, and when the chip is run on a terminal, it is used to implement the data processing methods shown in the above embodiments .
  • the present application also provides a computer program product, the computer program product includes computer instructions, the computer instructions are stored in a computer-readable storage medium; a processor of a computer device reads the computer program from the computer-readable storage medium Instructions, the processor executes the computer instructions, so that the computer device implements the data processing methods shown in the above embodiments.
  • the steps in the data processing method of the above-mentioned embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Abstract

一种数据处理方法、装置、计算机设备及存储介质,属于计算机技术领域。方法包括:通过第一处理单元,基于第一算子对神经网络的输入数据进行处理,得到处理结果,第一处理单元与第一算子匹配(201);基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果,第二处理单元与神经网络的第二算子匹配(202);通过第二处理单元,基于第二算子对处理结果进行处理(203)。该方法通过在处理单元间添加直连通路,使得处理单元间能够直接通过该直连通路进行数据交互,摆脱了控制单元,因此处理单元之间能够直接地、无障碍地进行合作,提高了通过神经网络中的算子进行数据处理的效率。

Description

数据处理方法、装置、计算机设备及存储介质
本申请要求于2021年12月24日提交的申请号为202111596506.X、发明名称为“数据处理方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及计算机技术领域,特别涉及一种数据处理方法、装置、计算机设备及存储介质。
背景技术
近年来,随着人工智能技术的飞速发展,基于神经网络的数据处理方法得到了广泛的应用。由于神经网络中的不同算子适配不同的处理单元,因此需要由不同的处理单元来执行对应的算子,共同完成数据处理任务。相关技术中,各个处理单元之间通过控制单元来进行数据交互,以完成数据处理任务。然而,不同处理单元之间交互的效率极低,导致数据的处理效率低。
发明内容
本申请实施例提供了一种数据处理方法、装置、计算机设备及存储介质,能够提高数据的处理效率。技术方案如下:
一方面,本申请实施例提供了一种数据处理方法,所述方法包括:
通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,所述第一处理单元与所述第一算子匹配;
基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果,所述第二处理单元与所述神经网络的第二算子匹配;
通过所述第二处理单元,基于所述第二算子对所述处理结果进行处理。
另一方面,本申请实施例提供了一种数据处理装置,所述装置包括:
第一处理模块,用于通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,所述第一处理单元与所述第一算子匹配;
数据发送模块,用于基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果,所述第二处理单元与所述神经网络的第二算子匹配;
第二处理模块,用于通过所述第二处理单元,基于所述第二算子对所述处理结果进行处理。
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器;所述存储器存储有至少一段程序,所述至少一段程序用于被所述处理器执行以实现如上述方面所述的数据处理方法。
另一方面,本申请实施例提供了一种计算机可读存储介质,所述存储介质存储有至少一段程序,所述至少一段程序用于被处理器执行以实现如上述方面所述的数据处理方法。
另一方面,本申请实施例提供了一种计算机程序产品,该计算机程序产品计算机指令,该计算机指令存储在计算机可读存储介质中;计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面提供的数据处理方法。
另一方面,本申请实施例提供了一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片在终端上运行时,用于实现上述方面所述的数据处理方法。
附图说明
图1示出了本申请一个示例性实施例提供的一种实施环境的示意图;
图2示出了本申请一个示例性实施例提供的一种数据处理方法的流程图;
图3示出了本申请一个示例性实施例提供的另一种数据处理方法的流程图;
图4示出了本申请一个示例性实施例提供的一种处理单元之间的连接方式的示意图;
图5示出了本申请一个示例性实施例提供的一种处理单元的结构示意图;
图6示出了本申请一个示例性实施例提供的一种数据处理过程的示意图;
图7示出了本申请一个示例性实施例提供的另一种数据处理过程的流程图;
图8示出了本申请一个示例性实施例提供的一种数据处理装置的结构框图;
图9示出了本申请一个示例性实施例提供的一种终端的结构方框图;
图10示出了本申请一个示例性实施例提供的一种服务器的结构方框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在本文中提及的“至少一个”是指一个或多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。
本申请实施例提供了一种数据处理方法,执行主体为计算机设备100,可选地,该计算机设备100为终端,例如终端为手机、台式电脑、笔记本电脑、平板电脑、智能电视、智能音箱、车载终端、智能机器人等多种类型的终端。可选地,该计算机设备100为服务器,该服务器可以为一台服务器,或者由若干台服务器组成的服务器集群,或者是一个云计算服务器中心。该计算机设备100通过本申请提供的方法,基于神经网络中的算子对输入数据进行数据处理,能够提高数据处理的效率。可选地,计算机设备中配置有芯片,例如,人工智能芯片,计算机设备能够通过该芯片执行本申请实施例中的数据处理方法。
图1为本申请实施例提供的一种计算机设备100的示意图。参考图1,该计算机设备100包括至少两个处理单元,例如第一处理单元101和第二处理单元102,其中第一处理单元101和第二处理单元102所匹配的算子不同,可选地,第一处理单元101匹配的算子为计算密度较大且为线性的算子,例如,卷积算子、池化算子等。第二处理单元102匹配的算子为非线性的算子,例如,激活函数。计算机设备100通过处理单元来执行与该处理单元匹配的算子,以保证该算子的执行效率。
在本申请实施例中,由于神经网络包括多个算子,这多个算子匹配的处理单元不同,因此,在对神经网络的输入数据进行处理时,需要多个算子所匹配的各个处理单元进行合作。其中,第一处理单元101在基于神经网络中的第一算子对输入数据进行处理,得到处理结果,然后基于第一处理单元101与第二处理单元102之间的直连通路,向第二处理单元102发送该处理结果。第二处理单元102接收到该处理结果后,基于神经网络中的第二算子继续对该处理结果进行处理。也就是说,各处理单元分别执行神经网络中与自己匹配的算子,所得到的处理结果通过与其他处理单元之间的直连通路发送给其他处理单元。
本申请实施例提供的数据处理方法,可应用于图像处理的场景下。例如,计算机设备获取图像处理网络的输入图像,通过第一图像处理单元,基于图像处理网络中的第一算子对该输入图像进行处理,得到处理结果,基于第一图像处理单元与第二图像处理单元之间的直连通路,向第二图像处理单元发送该处理结果。然后通过第二图像处理单元,基于图像处理网络中的第二算子对该处理结果继续进行处理。其中第一图像处理单元与第一算子匹配,第二图像处理单元与第二算子匹配。通过该方法,能够提高基于图像处理网络进行图像处理的效率。
本申请实施例提供的数据处理方法,可应用于音频处理的场景下。例如,计算机设备获取音频处理网络的输入音频,通过第一音频处理单元,基于音频处理网络中的第一算子对该输入音频进行处理,得到处理结果,基于第一音频处理单元与第二音频处理单元之间的直连 通路,向第二音频处理单元发送该处理结果。然后通过第二音频处理单元,基于音频处理网络中的第二算子对该处理结果继续进行处理。其中第一音频处理单元与第一算子匹配,第二音频处理单元与第二算子匹配。通过该方法,能够提高基于音频处理网络进行音频处理的效率。
本申请实施例提供的数据处理方法,可应用于视频处理的场景下。例如,计算机设备获取视频处理网络的输入视频,通过第一视频处理单元,基于视频处理网络中的第一算子对该输入视频进行处理,得到处理结果,基于第一视频处理单元与第二视频处理单元之间的直连通路,向第二视频处理单元发送该处理结果。然后通过第二视频处理单元,基于视频处理网络中的第二算子对该处理结果继续进行处理。其中第一视频处理单元与第一算子匹配,第二视频处理单元与第二算子匹配。通过该方法,能够提高基于视频处理网络进行视频处理的效率。
当然,本申请实施例提供的方法还能够应用于其他数据处理的场景,例如,对通过多媒体处理网络处理多媒体数据的场景、通过文本处理网络处理文本数据的场景等,本申请实施例对此不做限制。
图2示出了本申请一个示例性实施例提供的一种数据处理方法的流程图,参见图2,执行主体为计算机设备,该方法包括:
步骤201,通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,第一处理单元与第一算子匹配。
其中,神经网络为任意类型的数据处理网络,例如,图像处理网络、音频处理网络、文本处理网络、多媒体处理网络等。神经网络包括多个算子,可选地,该算子包括卷积算子,激活算子,池化算子,均一化算子等。将任意类型的数据输入到对应类型的数据处理网络,计算机设备基于该数据处理网络中的多个算子对该数据进行处理,能够实现该数据处理网络的功能对应的效果。例如,图像处理网络的功能为对图像进行去噪,则基于该图像处理网络中的多个算子对输入的图像进行处理后,能够去除该图像中的噪点。
可选地,第一算子为神经网络中排序最靠前的算子。由于不同算子的计算特点不同,且不同处理单元的数据处理方式也不同,因此,不同算子匹配的处理单元可能不同。相对于与算子匹配的处理单元来讲,其他处理单元虽然也能够执行该算子,但是执行的速度较慢。因此,在本申请实施例中,神经网络中的各个算子由算子所匹配的处理单元来执行,以保证基于神经网络进行数据处理的效率。
计算机设备包括多个处理单元,其中,第一处理单元和第二处理单元所匹配的算子不同。可选地,第一处理单元和第二处理单元为任意处理引擎,例如,第一处理单元为NPU(Neural-network Processing Unit,嵌入式神经网络处理器)。第二处理单元为DSP(Digital Signal Processor,数字信号处理器)。又如,第一处理单元为TPU(Tensor Processing Unit,张量处理单元),第二处理单元为GPU(Graphics Processing Unit,图形处理器)。再如,第一处理单元为NPU,第二处理单元为TPU。第一处理单元和第二处理单元能够为任意数据处理加速引擎,本申请实施例对此不做限制。
步骤202,基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果,该第二处理单元与神经网络的第二算子匹配。
其中,直连通路(direct link)是指将两个处理单元直接连接起来的通路,这两处理单元之间没有其他单元。可选地,与神经网络中的多个算子匹配的各个处理单元之间具有直连通路,任意两个处理单元之间能够直接基于该直连通路进行数据交互。
步骤203,通过第二处理单元,基于第二算子对该处理结果进行处理。
第二处理单元接收第一处理单元发送的处理结果,然后基于第二算子对该处理结果继续进行处理。需要说明的一点是,如果第二算子为神经网络中的最后一个算子,那么第二处理单元基于第二算子对该处理结果进行处理后,即得到神经网络的输出数据。如果第二算子不 是神经网络中的最后一个算子,那么第二处理单元基于第二算子对该处理结果进行处理后,得到更新后的处理结果,后续由该第二处理单元或者其他处理单元执行神经网络中的其他算子,直至神经网络中的多个算子执行完成。
综上所述,本申请实施例提供的数据处理方案,在处理单元之间添加了直连通路,使得处理单元之间能够直接通过该直连通路进行数据交互,极大地提高了处理单元之间的交互效率,这种情况下,处理单元在基于匹配的算子进行数据处理后,将得到的处理结果,直接发送给另一个处理单元,另一个处理单元就能够直接获取到该处理结果,对该处理结果继续进行处理,这种方案摆脱了控制单元,使得处理单元之间能够直接地、无障碍地进行合作,从而提高了通过神经网络中的算子进行数据处理的效率。
可选地,第一处理单元包括第一处理器;
通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,包括:
通过第一处理器,执行等待指令以等待数据处理指令;
响应于接收到数据处理指令,基于第一算子对输入数据进行处理,得到处理结果。
可选地,第二处理单元包括第二处理器和第二存储器;
基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果,包括:
通过第一处理单元,基于直连通路,在第二存储器中存储处理结果,以及向第二处理器发送数据处理指令。
可选地,方法还包括:
通过第二处理器,执行等待指令以等待来自第一处理单元的数据处理指令;
响应于接收到来自第一处理单元的数据处理指令,基于第二算子对处理结果进行处理。
可选地,方法还包括:
确定神经网络包含的多个算子中的每个算子匹配的处理单元;
对于每个处理单元,按照处理单元匹配的多个算子在神经网络中的排列顺序,在处理单元的存储器中存储多个算子,在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令;
其中,等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,数据发送指令用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令。
可选地,在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令,包括:
在存储器中相邻的至少两个算子之间存在关联关系的情况下,在至少两个算子中的第一个算子之前***等待指令,在至少两个算子中的最后一个算子之后***数据发送指令;
其中存在关联关系的算子之间具有数据依赖性。
可选地,通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果之后,以及在向第二处理单元发送处理结果之前,方法还包括:
通过第一处理单元,基于第三算子对处理结果进行处理,得到更新后的处理结果,第三算子与第一处理单元匹配。
可选地,基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果之前,方法还包括:
获取算子关联关系,算子关联关系指示神经网络包含的算子的关联算子,关联算子的数据处理依赖于算子的处理结果;
从算子关联关系中,查询第一算子的关联算子;
将查询到的关联算子确定为第二算子。
图3示出了本申请一个示例性实施例提供的一种数据处理方法的流程图,参见图3,该方法包括:
步骤301,获取神经网络的输入数据,神经网络包括多个算子。
可选地,计算机设备获取CPU(Central Processing Unit,中央处理器)在目标存储位置存储的数据,将该数据确定为神经网络的输入数据。其中该目标存储位置用于存储该神经网络的输入数据。
步骤302,通过第一处理单元,基于第一算子对输入数据进行处理,得到处理结果,第一处理单元与该多个算子中的第一算子匹配。
可选地,CPU在目标存储位置存储神经网络的输入数据之后,向第一处理单元发送数据处理指令,以指示第一处理单元开始执行数据处理操作。相应的,第一处理单元响应于该数据处理指令,基于第一算子对输入数据进行处理,得到处理结果。
在一种可能的实现方式中,第一处理单元包括第一处理器,相应的,计算机设备通过第一处理单元,基于第一算子对输入数据进行处理,得到处理结果,包括:计算机设备通过第一处理器,执行等待指令以等待数据处理指令;响应于接收到数据处理指令,基于第一算子对输入数据进行处理,得到处理结果。其中,等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作。可选地,第一处理单元中的第一处理器为ALU(Arithmetic&Logical Unit,算术逻辑单元),第一存储器称为memory。
可选地,等待指令是由CPU发送给第一处理单元的,或者,等待指令是由第一处理单元从存储器中读取的,该存储器为任意存储器,例如,公共存储器或者该第一处理单元中的存储器,本申请实施例对此不做限制。
在本申请实施例中,由于等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,使第一处理器执行等待指令,实现了通过数据处理指令控制第一处理器执行数据处理操作的时机。
在一种可能的实现方式中,第一处理单元还包括第一存储器,第一存储器存储有第一处理单元需要执行的指令,相应的,计算机设备通过第一处理单元,基于第一算子对输入数据进行处理,得到处理结果,包括:计算机设备通过第一处理器,从第一存储器中读取位于第一算子之前的等待指令;然后执行该等待指令,即停止执行数据处理操作,直至在接收到数据处理指令的情况下,从第一存储器中读取第一算子,基于第一算子对输入数据进行处理,得到处理结果。
计算机设备通过第一处理单元,按照第一存储器中存储的指令的顺序,依次读取指令来执行,每次读取未执行的指令中排序最靠前的指令。由于第一算子之前具有等待指令,因此,第一处理器会先执行等待指令,直至接收到数据处理指令,读取第一存储器中的等待指令之后的下一个指令,即第一算子,执行该第一算子。
在本申请实施例中,通过在第一存储器中的第一算子之前存储等待指令,使得第一处理单元在接收到数据处理指令的情况下,才能执行第一算子,从而实现了控制第一处理器执行数据处理操作的时机。
在一种可能的实现方式中,计算机设备获取神经网络的输入数据之前,先将神经网络中的多个算子划分至匹配的处理单元,并确定每个处理单元中划分的多个算子的执行顺序。也即是,计算机设备确定该多个算子中的每个算子匹配的处理单元;对于每个处理单元,按照该处理单元匹配的多个算子在神经网络中的排列顺序,在处理单元的存储器中存储该多个算子,在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令。其中,等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,数据发送指令用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令。
由于等待指令***在算子之前,且指示在接收到数据处理指令的情况下,执行数据处理 操作,而数据发送指令***在算子之后,用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令,以使其他处理单元响应于该数据处理指令,对该处理结果进行处理,且各处理单元按照存储器中的指令的顺序依次执行指令,因此,不会出现资源占用冲突,即不会出现一个处理单元同时执行多个算子的情况,保证了处理单元在同一时刻只能执行一个算子,并且也不会出现数据冲突,即保证了神经网络中的多个算子能够按照顺序依次进行处理,且后一个算子的执行能够依赖于前一个算子的处理结果。该等待指令和该数据发送指令实际上是处理单元之间的握手信号,使得在没有控制单元的参与下,多个处理单元在共同基于神经网络进行数据处理时,能够避免数据冲突和资源冲突,实现了自同步机制。
可选地,计算机设备通过编译器来确定该多个算子中的每个算子匹配的处理单元;对于每个处理单元,按照该处理单元匹配的多个算子在神经网络中的排列顺序,在处理单元的存储器中存储该多个算子,在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令。其中,编译器是一种将高级语言编译为计算机设备能够执行的低级语言的程序。在本申请实施例中,通过编译器来划分神经网络中的算子,以及确定每个处理单元所划分的算子的执行顺序,使得神经网络在编译阶段就能够完成算子的划分和执行顺序的确定。
在本申请实施例中,考虑到神经网络中的每个算子匹配的处理单元可能不同,因此,在通过神经网络对输入数据处理之前就将神经网络中的算子划分至匹配的处理单元,并且确定划分给每个处理单元的多个算子的执行顺序,使得后续在获取到神经网络的输入数据时,处理单元能够直接基于划分给自己的多个算子以及确定好的算子的执行顺序来进行数据处理,而不是在基于神经网络进行数据处理的过程中,临时确定每个算子匹配的处理单元,如此能够提高数据处理效率。
并且,通过将等待指令***在算子之前,且等待指令指示在接收到数据处理指令的情况下,执行数据处理操作,而将数据发送指令***在算子之后,用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令,以使其他处理单元响应于该数据处理指令,对该处理结果进行处理,使得在没有控制单元的参与下,多个处理单元在共同基于神经网络进行数据处理时,能够避免数据冲突和资源冲突,实现了自同步机制。
在一种可能的实现方式中,计算机设备在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令,包括:计算机设备在存储器中相邻的至少两个算子之间存在关联关系的情况下,在至少两个算子中的第一个算子之前***等待指令,在至少两个算子中的最后一个算子之后***数据发送指令。
其中存在关联关系的算子之间具有数据依赖性。例如,两个算子之间具有数据依赖性,那么其中一个算子的执行要依赖于另一个算子的处理结果,即另一个算子的输出。
在本申请实施例中,由于神经网络中的相邻算子之间具有数据依赖性,如果分配到任一处理单元的两个相邻的算子之间没有数据依赖性,说明与这两个算子中的前一算子具有数据依赖性的算子被分配到了其他处理单元中,因此,在存储器中相邻的至少两个算子之间存在关联关系的情况下,在至少两个算子中的第一个算子之前***等待指令,在至少两个算子中的最后一个算子之后***数据发送指令,使得神经网络中具有数据依赖关系的两个算子能够被连续处理,从而保证神经网络中算子的正确执行。
步骤303,获取第一算子的关联算子,将该关联算子确定为第二算子,其中,该关联算子的数据处理依赖于第一算子的处理结果。
关联算子的数据处理依赖于第一算子的处理结果,也就是说,第一算子的输出应为该关联算子的输入。
在一种可能的实现方式中,计算机设备获取第一算子的关联算子,将该关联算子确定为第二算子,包括:计算机设备获取算子关联关系,该算子关联关系指示神经网络包含的算子 的关联算子,其中任一算子的关联算子的数据处理依赖于该算子的处理结果。然后计算机设备从该算子关联关系中,查询第一算子的关联算子;将查询到的关联算子确定为第二算子。
可选地,算子关联关系为神经网络中的多个算子的排列顺序,其中,每个算子之后的一个算子依赖于该算子的处理结果,即为该算子的关联算子。相应的,计算机设备从该排列顺序中查询第一算子之后的一个算子,将该算子确定为第一算子的关联算子。
在本申请实施例中,考虑到神经网络中的多个算子之间具有数据依赖性,因此,在执行完第一算子之后,将该第一算子的关联算子确定为第二算子,后续向第二算子匹配的处理单元发送第一算子的处理结果,则能够保证第二算子的执行能够依据第一算子的处理结果,保证神经网络中的算子的正确执行。
步骤304,基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送该处理结果,第二处理单元与该多个算子中的第二算子匹配。
在一种可能的实现方式中,第二处理单元包括第二处理器和第二存储器,相应的,计算机设备基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果,包括:计算机设备通过第一处理单元,基于与第二处理单元之间的直连通路,在第二存储器中存储该处理结果,以及向第二处理器发送数据处理指令。可选地,该数据处理指令携带该处理结果的存储地址。
在本申请实施例中,第一处理单元无需基于总线将处理结果存储到公共存储单元中,再由第二处理单元基于总线从公共存储单元中读取该处理结果。而是基于与第二处理单元之间的直连通路,直接将该处理结果存储到第二处理单元的存储器中,使得第二处理单元的处理器在接收到数据处理指令的情况下,能够直接从自己的存储器中读取该处理结果,并进行处理,极大地缩短了数据的传输链路,从而提高了数据交互的效率。
在一种可能的实现方式中,第一处理单元包括第一处理器和第一存储器,相应的,计算机设备通过第一处理单元,基于与第二处理单元之间的直连通路,在第二存储器中存储处理结果,以及向第二处理器发送数据处理指令,包括:计算机设备通过第一处理器,从第一存储器中读取位于第一算子之后的数据发送指令,响应于该数据发送指令,基于与第二处理器之间的直连通路,在第二存储器中存储该处理结果,以及,向第二处理器发送数据处理指令,其中,数据发送指令用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令。
在本申请实施例中,由于数据发送指令用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令,因此通过在处理单元的存储器中的算子后面存储数据发送指令,使得处理单元能够在执行算子之后,执行该数据发送指令,从而实现在第二处理器的存储器中存储处理结果,并向第二处理器发送数据处理指令,使得第二处理单元能够基于该数据处理指令从自己的存储器中读取该处理结果,对该处理结果继续进行处理。
在一种可能的实现方式中,计算机设备通过第一处理单元,基于第一算子对输入数据进行处理,得到处理结果之后,以及在向第二处理单元发送处理结果之前,方法还包括:计算机设备通过第一处理单元,基于第三算子对处理结果进行处理,得到更新后的处理结果,其中第三算子为神经网络中的算子,且第三算子与第一处理单元匹配。可选地,第三算子是第一算子的关联算子,即第三算子的输入为第一算子的输出,第三算子依赖于该第一算子的处理结果。
需要说明的一点是,由于神经网络的多个算子中,可能会有连续的至少两个算子与同一处理单元匹配,因此,当前的处理单元在基于当前的算子进行处理之后,如果下一个算子仍然与该处理单元匹配,那么该处理单元会基于该算子继续进行处理,得到更新后的处理结果,直至当前处理的算子之后的下一个算子与当前的处理单元不匹配,则当前的处理单元向下一个算子匹配的处理单元发送当前最新的处理结果。
在本申请实施例中,第一处理单元能够基于第三算子对处理结果进行处理,得到更新后的处理结果,然后再向第二处理单元发送更新后的处理结果,也就是说,同一处理单元能够连续执行多个匹配的算子,这样就保证了多个处理单元在基于神经网络共同进行数据处理时,各个处理单元能够执行与自己匹配的算子,且神经网络中的多个算子的执行顺序是准确的。
步骤305,通过第二处理单元,基于第二算子对该处理结果进行处理。
在一种可能的实现方式中,计算机设备通过第二处理单元的第二处理器,执行等待指令以等待来自第一处理单元的数据处理指令;响应于接收到来自第一处理单元的数据处理指令,基于第二算子对处理结果进行处理。
可选地,等待指令是由CPU发送给第二处理单元的,或者,等待指令是由第二处理单元从存储器中读取的,该存储器为任意存储器,例如,公共存储器或者该第二处理单元中的存储器,本申请实施例对此不做限制。
可选地,计算机设备通过第一处理单元的第一处理器,将处理结果存储在第二处理单元的第二存储器中,且数据处理指令包括该处理结果在第二存储器中的存储地址,相应的,计算机设备通过第二处理器响应于接收到来自第一处理单元的数据处理指令,基于该存储地址,从第二存储器中读取该处理结果。
在本申请实施例中,由于等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,使第二处理器执行等待指令,实现了通过数据处理指令控制第二处理器执行数据处理操作的时机。
在一种可能的实现方式中,计算机设备通过第二处理器,从第二存储器中读取位于第二算子之前的等待指令;然后执行该等待指令,即停止执行数据处理操作,直至在接收到数据处理指令的情况下,从第二存储器中读取第二算子,基于第二算子对该处理结果进行处理。
计算机设备通过第二处理单元,按照第二存储器中存储的指令的顺序,依次读取指令来执行,每次读取未执行的指令中排序最靠前的指令。由于第二算子之前具有等待指令,因此,第二处理器会先执行等待指令,直至接收到数据处理指令,读取第二存储器中的等待指令之后的下一个指令,即第二算子,执行该第二算子。
在本申请实施例中,通过在第二存储器中的第二算子之前存储等待指令,使得第二处理单元在接收到数据处理指令的情况下,才能执行第二算子,从而实现了控制第二处理器执行数据处理操作的时机。
需要说明的一点是,如果第二算子为神经网络中的最后一个算子,那么第二处理单元基于第二算子对该处理结果进行处理后,即得到神经网络的输出数据。如果第二算子不是神经网络中的最后一个算子,那么第二处理单元基于第二算子对该处理结果进行处理后,得到更新后的处理结果,后续由该第二处理单元或者其他处理单元执行神经网络中的其他算子。例如,神经网络中,位于第二算子之后的下一个算子为第四算子,第四算子与第三处理单元匹配,则第二处理单元向第三处理单元发送更新后的处理结果,第三处理单元基于该第四算子对该处理结果继续进行处理,以此类推,直至神经网络中的多个算子执行完成。其中,第二处理单元向第三处理单元发送处理结果的方式,与第一处理单元向第二处理单元发送处理结果的实现方式同理,第三处理单元基于第四算子进行数据处理的方式与第二处理单元基于第二算子进行数据处理的方式同理,此处不再赘述。
需要说明的一点是,上述步骤303为可选步骤,例如,只有两个处理单元的情况下,计算机设备无需先确定第二算子,基于第二算子确定第一处理单元向哪个处理器发送处理结果。例如,计算机设备通过第一处理单元执行第一算子之后,如果第一存储器中,位于第一算子后面下一条指令为数据发送指令,则计算机设备通过第一处理单元,直接向除第一处理单元之外的另一处理单元发送该处理结果以及数据处理指令。
示意性的,图4为本申请实施例提供的一种处理单元之间的连接方式的示意图。参考图4,以两个处理单元为例,第一处理单元与第二处理单元分别与控制单元连接,且第一处理单 元和第二处理单元通过总线与公共存储单元连接。相关技术中,第一处理单元执行了神经网络中的一个算子之后,基于总线将得到的处理结果存储到公共存储单元中,然后通过中断的方式告知控制单元,控制单元确定神经网络中的下一个算子所匹配的处理单元,如果该处理单元为第二处理单元,则控制单元向第二处理单元发送通知,然后第二处理单元基于总线,从公共存储单元中读取处理结果,然后基于该下一个算子进行数据处理,以此类推,直到神经网络中的多个算子处理完成。在这个过程中,控制单元需要与各个处理单元进行频繁的交互,占用了控制单元的资源,并且各个处理单元之间的交互效率极低,并且,由于处理结果是存储在公共存储单元中,数据存储和数据读取的链路较长,数据传输的效率极低。而本申请实施例中,在第一处理单元和第二处理单元之间添加直连通路,使得第一处理单元与第二处理单元直接基于该直连通路进行数据交互,极大地缩短了处理结果的传输链路,并且第一处理单元与第二处理单元基于等待指令和数据发送指令来实现自同步,摆脱了控制单元,提高了交互效率,从而整体上提高了基于神经网络进行数据处理的效率。
示意性的,图5为本申请实施例提供的一种处理单元的结构示意图。参考图5,任一处理单元包括处理器和存储器,处理器和存储器连接,处理器能够从存储器中读取数据,以及将数据存储到存储器中。另外,两个处理单元之间直接连接,任一处理单元能够直接将数据存储到与其直接连接的另一处理单元的存储器中。例如,第一处理单元中的第一处理器能够直接将数据存储到第二存储器,第二处理单元中的第二处理器能够直接将数据存储到第一存储器。
下面以图6所示的神经网络为例,说明本申请中的数据处理过程。
第一步,计算机设备基于编译器对神经网络进行编译,以将神经网络中的多个算子分别划分至匹配的处理单元。如图6所示,由于神经网络中的算子0、算子2、算子4与第一处理单元匹配,算子1、算子3、算子5与第二处理单元匹配,因此,算子0、算子2、算子4被划分至第一处理单元的第一存储器中,算子1、算子3、算子5被划分至第二处理单元的第二存储器中。然后,计算机设备在第一存储器以及第二存储器中的每个算子之前***等待指令,在每个算子之后***数据发送指令。
第二步,第一处理单元中的第一处理器按照第一存储器中的指令的顺序依次读取指令来执行,首先读取等待指令,执行该等待指令,即停止数据处理操作,以等待数据处理指令。第二处理单元中的第二处理器按照第二存储器中的指令的顺序依次读取指令来执行,首先读取等待指令,执行该等待指令,即停止数据处理操作,以等待数据处理指令。
第三步,计算机设备通过CPU,将神经网络的输入数据存储到目标存储位置,然后向第一处理单元中的第一处理器发送数据处理指令,则第一处理器响应于该数据处理指令,从该存储位置读取该输入数据,以及从第一存储器中读取算子0,基于算子0,对该输入数据进行处理,得到处理结果。然后从第一存储器中读取算子0之后的数据发送指令,响应于该数据发送指令,基于第一处理单元与第二处理单元之间的直连通路,将该处理结果存储到第二处理单元的第二存储器中,并向第二处理单元中的第二处理器发送数据处理指令。然后,第一处理器从第一存储器中读取算子2之前的等待指令,执行该等待指令。
第四步,第二处理器在执行等待指令的过程中,接收到第一处理器发送的数据处理指令,响应于该数据处理指令,从第二存储器中读取算子1以及处理结果,基于算子1对该处理结果继续进行处理,得到更新后的处理结果。然后读取算子1之后的数据发送指令,响应于该数据发送指令,将更新后的处理结果存储到第一存储器中,并且,向第一处理器发送数据处理指令。然后,第二处理器读取算子3之前的等待指令,执行该等待指令。
第五步,第一处理器在执行等待指令的过程中,接收到第二处理器发送的数据处理指令,响应于该数据处理指令,从第一存储器中读取算子2以及当前最新的处理结果,基于算子2对该处理结果继续进行处理,得到更新后的处理结果。然后,第一处理器读取算子2之后的数据发送指令,响应于该数据发送指令,将更新后的处理结果存储到第二存储器中,并向第 二处理器发送数据处理指令。然后,第一处理器读取算子4之前的等待指令,执行该等待指令。以此类推,直至神经网络中的多个算子进行数据处理完成。
第六步,以神经网络中的最后一个算子匹配的处理单元为第一处理单元为例,第一处理器执行完最后一个算子后,基于总线将得到的输出数据存储至公共存储单元中,然后以中断的方式向CPU发送处理完成通知,以通知服务器当前的输入数据处理完成。
第七步,CPU响应于该处理完成通知,在目标存储位置存储新的输入数据,向第一处理器发送数据处理指令,然后,第一处理器按照同样的方法对新的输入数据进行处理。
下面以图7所示的神经网络为例,说明本申请中的数据处理过程。
第一步,计算机设备基于编译器对神经网络进行编译,以将神经网络中的多个算子分别划分至匹配的处理单元。如图7所示,由于神经网络中的算子0和算子4与第一处理单元匹配,算子1、算子2、算子3、算子5与第二处理单元匹配,因此,算子0和算子4被划分至第一处理单元的第一存储器中,算子1、算子2、算子3、算子5被划分至第二处理单元的第二存储器中。然后,计算机设备在第一存储器中的每个算子之前***等待指令,在每个算子之后***数据发送指令。计算机设备在第二存储器中的算子1和算子5之前***等待指令,在算子3和算子5之后***数据发送指令。需要说明的一点是,由于算子1和算子2之间存在关联关系,即算子2要基于算子1的处理结果进行处理。算子2和算子3之间具有关联关系,即算子3要基于算子2的处理结果进行处理。因此,在算子1之前***等待指令,在算子3之后***数据发送指令,而不在这三个算子之间***等待指令和数据发送指令,以保证第二处理单元能够连续执行这三个算子。
第二步,第一处理单元中的第一处理器按照第一存储器中的指令的顺序依次读取指令来执行,首先读取等待指令,执行该等待指令,即停止数据处理操作,以等待数据处理指令。第二处理单元中的第二处理器按照第二存储器中的指令的顺序依次读取指令来执行,首先读取等待指令,执行该等待指令,即停止数据处理操作,以等待数据处理指令。
第三步,计算机设备通过CPU,将神经网络的输入数据存储到目标存储位置,然后向第一处理单元中的第一处理器发送数据处理指令,则第一处理器响应于该数据处理指令,从该存储位置读取该输入数据,以及从第一存储器中读取算子0,基于算子0,对该输入数据进行处理,得到处理结果。然后从第一存储器中读取算子0之后的数据发送指令,响应于该数据发送指令,基于第一处理单元与第二处理单元之间的直连通路,将该处理结果存储到第二处理单元的第二存储器中,并向第二处理单元中的第二处理器发送数据处理指令。然后,第一处理器从第一存储器中读取算子4之前的等待指令,执行该等待指令。
第四步,第二处理器在执行等待指令的过程中,接收到第一处理器发送的数据处理指令,响应于该数据处理指令,从第二存储器中读取算子1以及处理结果,基于算子1对该处理结果继续进行处理,得到更新后的处理结果。然后,再从第二存储器中读取算子2,基于算子2对当前的处理结果继续进行处理,得到更新后的处理结果。然后,再从第二存储器中读取算子3,基于算子3对当前的处理结果继续进行处理,得到更新后的处理结果,然后读取算子3之后的数据发送指令,响应于该数据发送指令,将更新后的处理结果存储到第一存储器中,并且向第一处理器发送数据处理指令。然后,第二处理器读取算子5之前的等待指令,执行该等待指令。
第五步,第一处理器在执行等待指令的过程中,接收到第二处理器发送的数据处理指令,响应于该数据处理指令,从第一存储器中读取算子4以及当前最新的处理结果,基于算子4对该处理结果继续进行处理,得到更新后的处理结果,然后,第一处理器读取算子4之后的数据发送指令,响应于该数据发送指令,将更新后的处理结果存储到第二存储器中,并向第二处理器发送数据处理指令。然后,第一处理器读取算子4之后的等待指令,执行该等待指令。以此类推,直至神经网络中的多个算子进行数据处理完成。
第六步,以神经网络中的最后一个算子匹配的处理单元为第一处理单元为例,第一处理 器执行完最后一个算子后,基于总线将得到的输出数据存储至公共存储单元中,然后以中断的方式向CPU发送处理完成通知,以通知服务器当前的输入数据处理完成。
第七步,CPU响应于该处理完成通知,在目标存储位置存储新的输入数据,向第一处理器发送数据处理指令,然后,第一处理器按照同样的方法对新的输入数据进行处理。
本申请实施例提供的数据处理方案,在处理单元之间添加了直连通路,使得处理单元之间能够直接通过该直连通路进行数据交互,极大地提高了处理单元之间的交互效率,这种情况下,处理单元在基于匹配的算子进行数据处理后,将得到的处理结果,直接发送给另一个处理单元,另一个处理单元就能够直接获取到该处理结果,对该处理结果继续进行处理,这种方案摆脱了控制单元,使得处理单元之间能够直接地、无障碍地进行合作,从而提高了通过神经网络中的算子进行数据处理的效率。
并且,由于等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,使第一处理器执行等待指令,实现了通过数据处理指令控制第一处理器执行数据处理操作的时机。
并且,通过在第一存储器中的第一算子之前存储等待指令,使得第一处理单元在接收到数据处理指令的情况下,才能执行第一算子,从而实现了控制第一处理器执行数据处理操作的时机。
并且,考虑到神经网络中的每个算子匹配的处理单元可能不同,因此,在通过神经网络对输入数据处理之前就将神经网络中的算子划分至匹配的处理单元,并且确定划分给每个处理单元的多个算子的执行顺序,使得后续在获取到神经网络的输入数据时,处理单元能够直接基于划分给自己的多个算子以及确定好的算子的执行顺序来进行数据处理,而不是在基于神经网络进行数据处理的过程中,临时确定每个算子匹配的处理单元,如此能够提高数据处理效率。
并且,通过将等待指令***在算子之前,且等待指令指示在接收到数据处理指令的情况下,执行数据处理操作,而将数据发送指令***在算子之后,用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令,以使其他处理单元响应于该数据处理指令,对该处理结果进行处理,使得在没有控制单元的参与下,多个处理单元在共同基于神经网络进行数据处理时,能够避免数据冲突和资源冲突,实现了自同步机制。
并且,由于神经网络中的相邻算子之前具有数据依赖性,如果分配到任一处理单元的两个相邻的算子之间没有数据依赖性,说明与这两个算子中的前一算子具有数据依赖性的算子被分配到了其他处理单元中,因此,在存储器中相邻的至少两个算子之间存在关联关系的情况下,在至少两个算子中的第一个算子之前***等待指令,在至少两个算子中的最后一个算子之后***数据发送指令,使得具有数据依赖关系的两个算子能够被连续处理,从而保证神经网络中算子的正确执行。
并且,考虑到神经网络中的多个算子之间具有数据依赖性,因此,在执行完第一算子之后,将该第一算子的关联算子确定为第二算子,后续向第二算子匹配的处理单元发送第一算子的处理结果,则能够保证第二算子的执行能够依据第一算子的处理结果,保证神经网络中的算子的正确执行。
并且,第一处理单元无需基于总线将处理结果存储到公共存储单元中,再由第二处理单元基于总线从公共存储单元中读取该处理结果。而是基于与第二处理单元之间的直连通路,直接将该处理结果存储到第二处理单元的存储器中,使得第二处理单元的处理器在接收到数据处理指令的情况下,能够直接从自己的存储器中读取该处理结果,并进行处理,极大地缩短了数据的传输链路,从而提高了数据交互的效率。
并且,第一处理单元能够基于第三算子对处理结果进行处理,得到更新后的处理结果,然后再向第二处理单元发送更新后的处理结果,也就是说,同一处理单元能够连续执行多个匹配的算子,这样就保证了多个处理单元在基于神经网络共同进行数据处理时,各个处理单 元能够执行与自己匹配的算子,且神经网络中的多个算子的执行顺序是准确的。
并且,由于等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,使第二处理器执行等待指令,实现了通过数据处理指令控制第二处理器执行数据处理操作的时机。
并且,通过在第二存储器中的第二算子之前存储等待指令,使得第二处理单元在接收到数据处理指令的情况下,才能执行第二算子,从而实现了控制第二处理器执行数据处理操作的时机。
下述为本申请装置实施例,可以用于执行本申请方法实施例。对于本申请装置实施例中未披露的细节,请参照本申请方法实施例。
请参考图8,其示出了本申请一个示例性实施例提供的数据处理装置的结构框图。该数据处理装置包括:
第一处理模块801,用于通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,第一处理单元与第一算子匹配;
数据发送模块802,用于基于第一处理单元与第二处理单元之间的直连通路,向第二处理单元发送处理结果,第二处理单元与神经网络的第二算子匹配;
第二处理模块803,用于通过第二处理单元,基于第二算子对处理结果进行处理。
可选地,第一处理单元包括第一处理器;
所述第一处理模块801,用于:
通过第一处理器,执行等待指令以等待数据处理指令;
响应于接收到数据处理指令,基于第一算子对输入数据进行处理,得到处理结果。
可选地,第二处理单元包括第二处理器和第二存储器;
所述数据发送模块802,用于通过第一处理单元,基于直连通路,在第二存储器中存储处理结果,以及向第二处理器发送数据处理指令。
可选地,所述第二处理模块803,用于:
通过第二处理器,执行等待指令以等待来自第一处理单元的数据处理指令;
响应于接收到来自第一处理单元的数据处理指令,基于第二算子对处理结果进行处理。
可选地,所述装置还包括:
单元确定模块,用于确定神经网络包含的多个算子中的每个算子匹配的处理单元;
数据存储模块,用于对于每个处理单元,按照处理单元匹配的多个算子在神经网络中的排列顺序,在处理单元的存储器中存储多个算子,在存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令;
其中,等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,数据发送指令用于指示在当前的处理单元基于算子处理完成时,向其他处理单元发送处理结果以及数据处理指令。
可选地,所述数据存储模块,用于:
在存储器中相邻的至少两个算子之间存在关联关系的情况下,在至少两个算子中的第一个算子之前***等待指令,在至少两个算子中的最后一个算子之后***数据发送指令;
其中存在关联关系的算子之间具有数据依赖性。
可选地,所述第一处理模块801,还用于通过第一处理单元,基于第三算子对处理结果进行处理,得到更新后的处理结果,第三算子与第一处理单元匹配。
在一种可能的实现方式中,装置还包括:
算子确定模块,用于获取算子关联关系,算子关联关系指示神经网络包含的算子的关联算子,关联算子的数据处理依赖于算子的处理结果;从算子关联关系中,查询第一算子的关联算子;将查询到的关联算子确定为第二算子。
综上所述,本申请实施例提供的数据处理方案,在处理单元之间添加了直连通路,使得 处理单元之间能够直接通过该直连通路进行数据交互,极大地提高了处理单元之间的交互效率,这种情况下,处理单元在基于匹配的算子进行数据处理后,将得到的处理结果,直接发送给另一个处理单元,另一个处理单元就能够直接获取到该处理结果,对该处理结果继续进行处理,这种方案摆脱了控制单元,使得处理单元之间能够直接地、无障碍地进行合作,从而提高了通过神经网络中的算子进行数据处理的效率。
需要说明的是,上述实施例提供的装置,在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将计算机设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
本申请实施例提供了一种计算机设备,该计算机设备包括处理器和存储器;该存储器存储有至少一段程序,该至少一段程序用于被处理器执行以实现如上述各个方法实施例提供的数据处理方法。
可选地,计算机设备中配置有芯片,例如,人工智能芯片,计算机设备能够通过该芯片执行本申请实施例中的数据处理方法。
在一些实施例中,该计算机设备为终端,请参考图9,其示出了本申请一个示例性实施例提供的终端的结构方框图。在一些实施例中,终端900是智能手机、平板电脑、可穿戴设备等能够作为无线站点接入无线局域网的终端。本申请中的终端900至少包括一个或多个以下部件:处理器910、存储器920和至少两个无线链路930。
在一些实施例中,处理器910包括一个或者多个处理核心。处理器910利用各种接口和线路连接整个终端900内的各个部分,通过运行或执行存储在存储器920内的程序代码,以及调用存储在存储器920内的数据,执行终端900的各种功能和处理结果。在一些实施例中,处理器910采用数字信号处理(Digital Signal Processing,DSP)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、可编程逻辑阵列(Programmable Logic Array,PLA)中的至少一种硬件形式来实现。处理器910能集成中央处理器(Central Processing Unit,CPU)、图像处理器(Graphics Processing Unit,GPU)、神经网络处理器(Neural-network Processing Unit,NPU)和调制解调器等中的一种或几种的组合。其中,CPU主要处理操作***、用户界面和应用程序等;GPU用于负责显示屏所需要显示的内容的渲染和绘制;NPU用于实现人工智能(Artificial Intelligence,AI)功能;调制解调器用于处理无线通信。能够理解的是,上述调制解调器也能不集成到处理器910中,单独通过一块芯片进行实现。
在一些实施例中,该处理器910用于控制至少两个无线链路930的工作状况,相应的,该处理器910为集成了无线保真(Wireless Fidelity,Wi-Fi)芯片的处理器。其中,该Wi-Fi芯片为具有双Wi-Fi处理能力的芯片。例如,该Wi-Fi芯片为双频双发(Dual Band Dual Concurrent,DBDC)芯片,或者,双频同步(Dual Band Simultaneous,DBS)芯片等。
在一些实施例中,存储器920包括随机存储器(Random Access Memory,RAM),在一些实施例中,存储器920包括只读存储器(Read-Only Memory,ROM)。在一些实施例中,该存储器920包括非瞬时性计算机可读介质(non-transitory computer-readable storage medium)。存储器920可用于存储程序代码。存储器920可包括存储程序区和存储数据区,其中,存储程序区可存储用于实现操作***的指令、用于至少一个功能的指令(比如触控功能、声音播放功能、图像播放功能等)、用于实现下述各个方法实施例的指令等;存储数据区可存储根据终端900的使用所创建的数据(比如音频数据、电话本)等。
在一些实施例中,存储器920中存储有不同的无线链路930的接收信标帧的接收方案。以及,不同的无线链路930连接的接入节点的标识、无线链路930的标识等。
该至少两个无线链路930用于连接不同的接入节点(Access Point,AP)。接收AP下发的下行数据。其中,该不同的接入节点为同一路由器中的接入节点或者不同路由器中的接入 节点。
在一些实施例中,终端900中还包括显示屏。显示屏是用于显示用户界面的显示组件。在一些实施例中,该显示屏为具有触控功能的显示屏,通过触控功能,用户可以使用手指、触摸笔等任何适合的物体在显示屏上进行触控操作。在一些实施例中,显示屏通常设置在终端900的前面板。在一些实施例中,显示屏被设计成为全面屏、曲面屏、异型屏、双面屏或折叠屏。在一些实施例中,显示屏还被设计成为全面屏与曲面屏的结合,异型屏与曲面屏的结合等,本实施例对此不加以限定。
除此之外,本领域技术人员能够理解,上述附图所示出的终端900的结构并不构成对终端900的限定,终端900包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。比如,终端900中还包括麦克风、扬声器、输入单元、传感器、音频电路、模块、电源、蓝牙模块等部件,在此不再赘述。
在一些实施例中,该计算机设备为服务器,请参考图10,其示出了本申请一个示例性实施例提供的服务器的结构方框图,该服务器1000可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)1001和一个或一个以上的存储器1002,其中,所述存储器1002中存储有至少一条程序代码,所述至少一条程序代码由所述处理器1001加载并执行以实现上述各个方法实施例提供的方法。当然,该服务器还可以具有有线或无线网络接口、键盘以及输入输出接口等部件,以便进行输入输出,该服务器还可以包括其他用于实现设备功能的部件,在此不做赘述。
本申请还提供一种计算机可读存储介质,该存储介质存储有至少一段程序,该至少一段程序由该处理器加载并执行以实现如上各个实施例示出的数据处理方法。
根据本申请实施例的另一方面,提供了一种芯片,该芯片包括可编程逻辑电路和/或程序指令,当该芯片在终端上运行时,用于实现如上各个实施例示出的数据处理方法。
本申请还提供了一种计算机程序产品,该计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中;计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备实现如上各个实施例示出的数据处理方法。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
本领域普通技术人员可以理解实现上述实施例的数据处理方法中全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种数据处理方法,所述方法包括:
    通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,所述第一处理单元与所述第一算子匹配;
    基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果,所述第二处理单元与所述神经网络的第二算子匹配;
    通过所述第二处理单元,基于所述第二算子对所述处理结果进行处理。
  2. 根据权利要求1所述的方法,其中,所述第一处理单元包括第一处理器;
    所述通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,包括:
    通过所述第一处理器,执行等待指令以等待数据处理指令;
    响应于接收到所述数据处理指令,基于所述第一算子对所述输入数据进行处理,得到所述处理结果。
  3. 根据权利要求1所述的方法,其中,所述第二处理单元包括第二处理器和第二存储器;
    所述基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果,包括:
    通过所述第一处理单元,基于所述直连通路,在所述第二存储器中存储所述处理结果,以及向所述第二处理器发送数据处理指令。
  4. 根据权利要求3所述的方法,其中,所述方法还包括:
    通过所述第二处理器,执行等待指令以等待来自所述第一处理单元的数据处理指令;
    响应于接收到来自所述第一处理单元的数据处理指令,基于所述第二算子对所述处理结果进行处理。
  5. 根据权利要求1至4任一所述的方法,其中,所述方法还包括:
    确定所述神经网络包含的多个算子中的每个算子匹配的处理单元;
    对于每个所述处理单元,按照所述处理单元匹配的多个算子在所述神经网络中的排列顺序,在所述处理单元的存储器中存储所述多个算子,在所述存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令;
    其中,所述等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,所述数据发送指令用于指示在当前的所述处理单元基于所述算子处理完成时,向其他处理单元发送处理结果以及所述数据处理指令。
  6. 根据权利要求5所述的方法,其中,所述在所述存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令,包括:
    在所述存储器中相邻的至少两个算子之间存在关联关系的情况下,在所述至少两个算子中的第一个算子之前***所述等待指令,在所述至少两个算子中的最后一个算子之后***所述数据发送指令;
    其中存在所述关联关系的算子之间具有数据依赖性。
  7. 根据权利要求1至4任一所述的方法,其中,所述通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果之后,以及在向所述第二处理单元发送所述处理结果之前,所述方法还包括:
    通过所述第一处理单元,基于第三算子对所述处理结果进行处理,得到更新后的所述处理结果,所述第三算子与所述第一处理单元匹配。
  8. 根据权利要求1至4任一所述的方法,其中,所述基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果之前,所述方法还包括:
    获取算子关联关系,所述算子关联关系指示所述神经网络包含的算子的关联算子,所述 关联算子的数据处理依赖于所述算子的处理结果;
    从所述算子关联关系中,查询所述第一算子的关联算子;
    将查询到的关联算子确定为所述第二算子。
  9. 一种数据处理装置,所述装置包括:
    第一处理模块,用于通过第一处理单元,基于神经网络的第一算子对输入数据进行处理,得到处理结果,所述第一处理单元与所述第一算子匹配;
    数据发送模块,用于基于所述第一处理单元与第二处理单元之间的直连通路,向所述第二处理单元发送所述处理结果,所述第二处理单元与所述神经网络的第二算子匹配;
    第二处理模块,用于通过所述第二处理单元,基于所述第二算子对所述处理结果进行处理。
  10. 根据权利要求9所述的装置,其中,所述第一处理单元包括第一处理器;
    所述第一处理模块,用于:
    通过所述第一处理器,执行等待指令以等待数据处理指令;
    响应于接收到所述数据处理指令,基于所述第一算子对所述输入数据进行处理,得到所述处理结果。
  11. 根据权利要求9所述的装置,其中,所述第二处理单元包括第二处理器和第二存储器;
    所述数据发送模块,用于:
    通过所述第一处理单元,基于所述直连通路,在所述第二存储器中存储所述处理结果,以及向所述第二处理器发送数据处理指令。
  12. 根据权利要求11所述的装置,其中,所述第二处理模块,用于:
    通过所述第二处理器,执行等待指令以等待来自所述第一处理单元的数据处理指令;
    响应于接收到来自所述第一处理单元的数据处理指令,基于所述第二算子对所述处理结果进行处理。
  13. 根据权利要求9至12任一所述的装置,其中,所述装置还包括:
    单元确定模块,用于确定所述神经网络包含的多个算子中的每个算子匹配的处理单元;
    数据存储模块,用于对于每个所述处理单元,按照所述处理单元匹配的多个算子在所述神经网络中的排列顺序,在所述处理单元的存储器中存储所述多个算子,在所述存储器中的至少一个算子之前***等待指令,以及在至少一个算子之后***数据发送指令;
    其中,所述等待指令用于指示停止执行数据处理操作,直至接收到数据处理指令时重新执行数据处理操作,所述数据发送指令用于指示在当前的所述处理单元基于所述算子处理完成时,向其他处理单元发送处理结果以及所述数据处理指令。
  14. 根据权利要求13所述的装置,其中,所述数据存储模块,用于:
    在所述存储器中相邻的至少两个算子之间存在关联关系的情况下,在所述至少两个算子中的第一个算子之前***所述等待指令,在所述至少两个算子中的最后一个算子之后***所述数据发送指令;
    其中存在所述关联关系的算子之间具有数据依赖性。
  15. 根据权利要求9至12任一所述的装置,其中,所述第一处理模块,用于:
    通过所述第一处理单元,基于第三算子对所述处理结果进行处理,得到更新后的所述处理结果,所述第三算子与所述第一处理单元匹配。
  16. 根据权利要求9至12任一所述的装置,其中,所述装置还包括:
    算子确定模块,用于获取算子关联关系,所述算子关联关系指示所述神经网络包含的算子的关联算子,所述关联算子的数据处理依赖于所述算子的处理结果;从所述算子关联关系中,查询所述第一算子的关联算子;将查询到的关联算子确定为所述第二算子。
  17. 一种计算机设备,所述计算机设备包括处理器和存储器;所述存储器存储有至少一段程序,所述至少一段程序用于被所述处理器执行以实现如权利要求1至8任一所述的数据处 理方法。
  18. 一种计算机可读存储介质,所述存储介质存储有至少一段程序,所述至少一段程序用于被处理器执行以实现如权利要求1至8任一所述的数据处理方法。
  19. 一种芯片,所述芯片包括可编程逻辑电路和/或程序指令,当所述芯片在终端上运行时,用于实现如权利要求1至8任一所述的数据处理方法。
  20. 一种计算机程序产品,所述计算机程序产品包括计算机指令,所述计算机指令存储在计算机可读存储介质中;计算机设备的处理器从所述计算机可读存储介质读取所述计算机指令,所述处理器执行所述计算机指令,使得所述计算机设备实现如权利要求1至8任一所述的数据处理方法。
PCT/CN2022/133413 2021-12-24 2022-11-22 数据处理方法、装置、计算机设备及存储介质 WO2023116312A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111596506.X 2021-12-24
CN202111596506.XA CN116362305A (zh) 2021-12-24 2021-12-24 数据处理方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023116312A1 true WO2023116312A1 (zh) 2023-06-29

Family

ID=86901190

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/133413 WO2023116312A1 (zh) 2021-12-24 2022-11-22 数据处理方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN116362305A (zh)
WO (1) WO2023116312A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549934A (zh) * 2018-04-25 2018-09-18 福州瑞芯微电子股份有限公司 一种基于自动集群神经网络芯片组的运算方法和装置
CN109359732A (zh) * 2018-09-30 2019-02-19 阿里巴巴集团控股有限公司 一种芯片及基于其的数据处理方法
US20200065654A1 (en) * 2018-08-22 2020-02-27 Electronics And Telecommunications Research Institute Neural network fusion apparatus and modular neural network fusion method and matching interface generation method for the same
CN111782403A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111860820A (zh) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 神经网络算子的划分方法、装置及划分设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549934A (zh) * 2018-04-25 2018-09-18 福州瑞芯微电子股份有限公司 一种基于自动集群神经网络芯片组的运算方法和装置
US20200065654A1 (en) * 2018-08-22 2020-02-27 Electronics And Telecommunications Research Institute Neural network fusion apparatus and modular neural network fusion method and matching interface generation method for the same
CN109359732A (zh) * 2018-09-30 2019-02-19 阿里巴巴集团控股有限公司 一种芯片及基于其的数据处理方法
CN111782403A (zh) * 2020-07-17 2020-10-16 Oppo广东移动通信有限公司 数据处理方法、装置以及电子设备
CN111860820A (zh) * 2020-07-31 2020-10-30 北京灵汐科技有限公司 神经网络算子的划分方法、装置及划分设备

Also Published As

Publication number Publication date
CN116362305A (zh) 2023-06-30

Similar Documents

Publication Publication Date Title
US20210182177A1 (en) On-chip code breakpoint debugging method, on-chip processor, and chip breakpoint debugging system
US10884817B2 (en) Method and apparatus for parallel execution in terminal database using data partitions
US20220121603A1 (en) Network-on-chip data processing method and device
US11782756B2 (en) Method and apparatus for scheduling processor core, and storage medium
WO2023185166A1 (zh) 服务调用方法、装置、设备及存储介质
US20220201058A1 (en) Graph representation and description of configuration parameters for media processing function in network-based media processing (nbmp)
WO2017128701A1 (zh) 存储数据的方法和装置
US11016769B1 (en) Method and apparatus for processing information
CN109033393B (zh) 贴纸处理方法、装置、存储介质及电子设备
CN111736982B (zh) 一种5g数据转发平面的数据转发处理方法和服务器
WO2023103759A1 (zh) 服务调用方法、***、装置、设备及存储介质
CN110221877A (zh) 一种应用程序的运行方法、装置、电子设备、及存储介质
WO2021237630A1 (zh) 多键值命令的处理方法、装置、电子设备及存储介质
CN114168301A (zh) 线程调度方法、处理器以及电子装置
WO2021128918A1 (zh) 终端设备的控制方法、终端设备及存储介质
WO2024139536A1 (zh) 射频前端芯片、串行通信方法、设备及存储介质
US20220036206A1 (en) Containerized distributed rules engine
WO2023116312A1 (zh) 数据处理方法、装置、计算机设备及存储介质
CN116721007A (zh) 任务控制方法、***及装置、电子设备和存储介质
US20230083565A1 (en) Image data processing method and apparatus, storage medium, and electronic device
JP7500776B2 (ja) ビデオ特殊効果の配置方法、ビデオ特殊効果の配置装置、デバイス及び記憶媒体
EP4170538A1 (en) Chip simulation method, apparatus and system, and device and storage medium
CN111061518B (zh) 基于驱动节点的数据处理方法、***、终端设备和存储介质
JP2023539879A (ja) 共有ライブラリを再利用するための方法および電子デバイス
CN113918290A (zh) 一种api调用方法以及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22909628

Country of ref document: EP

Kind code of ref document: A1