CN114461400A

CN114461400A - Data processing method and device, electronic equipment and storage medium

Info

Publication number: CN114461400A
Application number: CN202210135614.5A
Authority: CN
Inventors: 杨尊程; 张演龙
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-05-10

Abstract

The disclosure provides a data processing method and device, electronic equipment and a storage medium, and relates to the field of artificial intelligence, in particular to the field of deep learning. The specific implementation scheme is as follows: acquiring first information, second information, third information and fourth information; splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information; distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information; and transferring the plurality of first data segments and the plurality of second data segments to the plurality of cores in batches so as to execute the data processing operation corresponding to each core in the plurality of cores.

Description

Data processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method and an apparatus for data processing in the field of deep learning, an electronic device, and a storage medium.

Background

At present, a model of an execution scheme of a deep learning model comprises a lot of data processing operations, and the model is large in size, so that data needs to be frequently transmitted between a processor and a memory, and the execution speed of the model is seriously influenced.

Disclosure of Invention

The present disclosure provides a method, apparatus, device and storage medium for data processing.

According to an aspect of the present disclosure, a method of data processing is provided. The method can comprise the following steps: acquiring first information, second information, third information and fourth information, wherein the first information is a plurality of model parameters used by a deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores set in a processor; splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information; distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information; and transferring the plurality of first data segments and the plurality of second data segments to the plurality of cores in batches so as to execute the data processing operation corresponding to each core in the plurality of cores.

According to another aspect of the present disclosure, there is also provided an apparatus for data processing. The apparatus may include: the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first information, second information, third information and fourth information, the first information is a plurality of model parameters used by a deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores arranged in a processor; the splitting module is used for splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information; an allocation module for allocating a plurality of data processing operations to a plurality of cores using second information and the fourth information; and the processing module is used for transmitting the plurality of first data segments and the plurality of second data segments to the plurality of kernels in batches so as to execute the data processing operation corresponding to each kernel in the plurality of kernels.

According to another aspect of the present disclosure, an electronic device is also provided. The electronic device may include: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of data processing of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of data processing of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is also provided a computer program product, which may comprise a computer program, which when executed by a processor, implements the method of data processing of an embodiment of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a method of data processing according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a data processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;

fig. 4 is a block diagram of an electronic device of a data processing method according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a flow chart of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the method may include the steps of:

step S102, obtaining first information, second information, third information and fourth information, wherein the first information is a plurality of model parameters used by the deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores set in the processor.

In the technical solution provided in the above step S102 of the present disclosure, the deep learning model may be composed of a large number of model parameters, data processing operations, and a process control module, wherein the parameters in the model may be generated during training by using a deep learning framework and stored in a file.

In this embodiment, the third information may be input data to be processed by the deep learning model, and may be data input outside the system; the fourth information may be the number of cores of a plurality of cores set in the processor, and one central processing unit includes a plurality of cores, for example, there are many central processing units with two cores, four cores, and eight cores.

For example, parameters of a deep learning model and data processing operation types are analyzed in advance to obtain first information and second information; acquiring data input from the outside of the system to obtain third information; and analyzing the processor to obtain the number of the cores of the plurality of cores arranged in the processor.

Step S104, splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information.

In the technical solution provided in the above step S104 of the present disclosure, the first information in the model and the second information to be processed are split to obtain a plurality of first data segments and a plurality of second data segments.

For example, before processing deep learning model data and executing, a program is used for analyzing parameters of the used deep learning model and data processing operation types of the model in advance, and then the parameters are divided into small units and stored in a file to obtain a plurality of first data segments; when the processing deep learning model data is executed, the input data is also split to obtain a plurality of second data segments.

And step S106, distributing a plurality of data processing operations to a plurality of kernels by using the second information and the fourth information.

In the technical solution provided in the above step S106 of the present disclosure, the second information and the fourth information are utilized to distribute the plurality of data processing operations to the plurality of cores, so that the data processing operations can just fill all the cores of the processor, thereby enhancing throughput of data processing and improving instruction execution efficiency.

Alternatively, the data processing operation may be performed in a serial or parallel manner on a plurality of data input data accepted by the deep learning model.

Step S108, transmitting the plurality of first data segments and the plurality of second data segments to the plurality of kernels in batches so as to execute the data processing operation corresponding to each kernel in the plurality of kernels.

In the technical solution provided by the above step S108 of the present disclosure, according to the type of each core data processing operation, at least one first data segment and at least one second data segment matching the type of the data processing operation are selected from the plurality of first data segments and the plurality of second data segments, and the at least one first data segment and the at least one second data segment are transferred to a core where the data processing operation exists.

Optionally, different data processing operation types are selected, and a matching data distribution scheme is selected, that is, according to a plurality of data processing operations allocated to the plurality of cores, a plurality of first data segments and a plurality of second data segments matched with each other are selected and transferred to the plurality of cores in batches, so as to perform the data processing operation corresponding to each core in the plurality of cores.

Through the steps S102 to S108, first information, second information, third information, and fourth information are obtained, where the first information is a plurality of model parameters used by the deep learning model, the second information is an operation type of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores set in the processor; splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information; distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information; and transferring the plurality of first data segments and the plurality of second data segments to the plurality of cores in batches so as to execute the data processing operation corresponding to each core in the plurality of cores. That is, the present disclosure splits the plurality of model parameters into a plurality of first data segments; splitting input data to obtain a plurality of second data segments; the plurality of first data segments and the plurality of second data segments with good split numbers are transmitted to the plurality of kernels in batches, and the data processing operation corresponding to each kernel in the plurality of kernels is executed, so that the model parameters, the input data and the data processing operation are put into the processor as much as possible, the technical effect of improving the efficiency in the data processing operation process is achieved, and the technical problem of low efficiency in the data processing operation process is solved.

The above-described method of this embodiment is described in further detail below.

As an optional implementation, the method further comprises: splitting the plurality of model parameters into a plurality of first data segments based on the first information, the second information, and the third information, and splitting the input data into a plurality of second data segments comprises: determining an operation type corresponding to each data processing operation in the plurality of data processing operations based on the second information; acquiring associated model parameters from the plurality of model parameters and associated data from the input data by using the operation type corresponding to each data processing operation; and splitting the associated model parameters to obtain a plurality of first data segments, and splitting the associated data to obtain a plurality of second data segments.

In this embodiment, an operation type corresponding to each of the plurality of data processing operations is determined based on the second information, the operation type corresponding to each of the plurality of data processing operations is utilized, the associated model parameter is obtained from the plurality of model parameters and the associated data is obtained from the input data, the associated model parameter is split to obtain a plurality of first data segments, and the associated data is split to obtain a plurality of second data segments.

Optionally, the types of data processing operations of the model are simplified, and the associated model parameters and the associated data are split by using the operation types corresponding to the simplified data processing operations, so that frequent switching of the data processing operations is avoided. The type of the data processing operation of the model may be a type of several types of data processing operations, for example, some parameter values are multiplied (or added) with some data values, or some data values are averaged, and the like, which is not limited herein.

For example, according to an actual requirement design rule, a plurality of first data segments and a plurality of second data segments are obtained for first information and third information required by data processing operation through a plurality of data distribution schemes such as fine granularity and regularized splitting. For example, the parameters of the used model and the data processing operation type of the model are analyzed in advance, then the parameters are divided into small units and stored in a file to obtain a plurality of first data segments, and the data to be subjected to the same operation type are classified together to obtain a plurality of second data segments.

As an optional implementation, the method further comprises: obtaining the associated model parameters from the plurality of model parameters and the associated data from the input data using the operation type corresponding to each data processing operation comprises at least one of: determining model parameters belonging to the same operation type as associated model parameters and determining input data belonging to the same operation type as associated data by using the operation type corresponding to each data processing operation; determining model parameters participating in the same instruction operation as associated model parameters and determining input data participating in the same instruction operation as associated data by using the operation type corresponding to each data processing operation; and determining the model parameters participating in the operation of the two adjacent instructions as associated model parameters and determining the input data participating in the operation of the two adjacent instructions as associated data by utilizing the operation type corresponding to each data processing operation.

In this embodiment, with the operation type corresponding to each data processing operation, model parameters belonging to the same operation type are determined as associated model parameters, and input data belonging to the same operation type are determined as associated data.

Optionally, model parameters to be subjected to the same operation type are classified together and are arranged in order in the processor to obtain associated model parameters; input data to be subjected to the same operation type are classified together and are arranged in order in a processor to obtain associated data.

In this embodiment, with the operation type corresponding to each data processing operation, the model parameters participating in the same instruction operation are determined as the associated model parameters, and the input data participating in the same instruction operation is determined as the associated data.

Optionally, an operation type corresponding to each model parameter and input data is determined, and the model parameters and the input data to be involved in the same instruction operation are arranged together to obtain associated model parameters and associated data.

In this embodiment, with the operation type corresponding to each data processing operation, the model parameters participating in the operation of two adjacent instructions are determined as associated model parameters, and the input data participating in the operation of two adjacent instructions are determined as associated data.

Optionally, determining the model parameters participating in the operation of the two adjacent instructions as associated model parameters, for example, storing the model parameter to be operated by the next instruction next to the model parameter of the previous instruction; the input data participating in the operation of two adjacent instructions is determined as related data, for example, the data to be operated by the next instruction is stored next to the data of the previous instruction, and when the previous 1 instruction is executed, 2 parts of data and model parameters are loaded to enter the processor, so that the central processing unit can quickly acquire the data when the next instruction is executed, and the waiting time is reduced.

As an optional implementation, the method further comprises: distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information comprises: determining a number of types of the plurality of data processing operations using the second information, and determining a number of cores of the plurality of cores using the fourth information; the plurality of data processing operations are evenly distributed to the plurality of cores based on the number of types and the number of cores.

In this embodiment, the second information is used to determine the number of types of the plurality of data processing operations, the fourth information is used to determine the number of cores of the plurality of cores, and the plurality of data processing operations are evenly distributed to the plurality of cores.

Optionally, according to actual requirements, the size and granularity of model splitting are designed, so that data processing operations can be split evenly onto cores of different processors, input data and model parameters are also split into multiple parts, and then the split data and model parameters are loaded into a memory together to fill all registers.

As an optional implementation, the method further comprises: and loading the plurality of first data segments and the plurality of second data segments to the memory.

Optionally, the split plurality of first data segments and the plurality of second data segments are loaded into the memory together for operation.

It should be noted that, different instructions and data formats of input data required by the instructions are different, frequent switching may bring extra data loading overhead, and if the same type of data operation instruction is executed within a period of time, the same type of data operation instruction may be loaded into the memory through the plurality of first data segments and the plurality of second data segments for operation, so as to reduce data transmission loss and avoid overhead caused by frequent switching between different operation instructions.

As an optional implementation, the method further comprises: the batch-wise delivering the plurality of first data segments and the plurality of second data segments to the plurality of cores includes: and transferring the plurality of first data segments and the plurality of second data segments from the memory to corresponding cores in the plurality of cores in batches.

Optionally, since the register space of the processor is limited and all data and parameters cannot be stored and operated at one time, the plurality of first data segments and the plurality of second data segments need to be transferred from the memory to corresponding cores of the plurality of cores in batches.

In this embodiment, the split data and the model parameters are loaded into the memory together, and then the split data and the model parameters are sequentially sent to the processor in batches for operation.

The above technical solutions of the embodiments of the present disclosure are further described below with reference to preferred embodiments.

Parameters, specific data processing operation and execution flow required to be performed on data are generally defined in deep learning models on the market, and existing schemes generally design model files containing a large number of parameters and simultaneously combine and use multiple data processing operations. Specifically, a model file is loaded into a memory, a data processing operation to be executed in the file is analyzed, data and parameters are read from the memory into hardware computing chips such as a central processing unit according to rules dynamically according to an execution flow defined by the model file, the computing operation is executed, mathematical operation is executed on the parameters and the data, and an execution result is obtained.

In such execution schemes, because the model parameters are large and cannot be transmitted to the central processing unit, data are mostly stored in the memory, and each time the program is executed for one step, a part of data is transmitted from the memory to the central processing unit, and after the execution is finished, the next part of data is transmitted, so that the data processing operation needs to acquire parameters and data distributed in each part of the memory, and the parameters and the data are frequently transmitted between the central processing unit and the memory, thereby causing the problems of consumption, low calculation speed and the like.

Common model prediction libraries in the related art are all general systems, and often need to support multiple algorithms, that is, various data processing operations, so that when a processor reads model data, problems that the data cannot hit a cache in the processor and execution of processor instructions waits easily occur.

Aiming at the problems, the data processing operation type of the model is simplified, so that frequent switching of the data processing operation is avoided, parameters and data required by the data processing operation are subjected to fine-grained and regular splitting, the data enter a central processing unit in batches, all registers of the central processing unit can be filled with the data, the size and the granularity of the model splitting are designed, the data processing operation can be averagely split to logic operation units of different central processing units, the data of a deep learning model can be placed in the registers of the central processing unit as far as possible, consumption of frequent data transmission is avoided, the data parallel processing capability is optimized, and the model execution speed is improved.

The deep learning model has high execution speed, so that the computing power of hardware can be fully utilized, and meanwhile, the better model execution speed can be obtained on lower-price hardware, such as a low-computing-power processor.

It should be noted that the present disclosure can be applied to various project products of current artificial intelligence, so as to help improve the cost performance of the artificial intelligence products.

To further introduce the technical solution of the embodiment of the present disclosure, fig. 2 is a schematic diagram of a data processing method according to the embodiment of the present disclosure, and as shown in fig. 2, the implementation of the data processing method includes four parts, which are external input, model processing, a memory and a central processing unit, and may include the four parts.

Firstly, data is input externally.

Second, the model generally contains parameters, data processing operations, and flow control.

The kind of the data processing operation of the model is reduced, and frequent switching of the data processing operation is avoided, wherein the kind of the data processing operation of the model can be several kinds which are designed in advance, for example, certain parameter values are multiplied and/or added with certain data values, and/or certain data values are averaged.

Optionally, because different instructions and data formats of required input data are different, frequent switching may bring extra data loading overhead, for example, when data enters a processor from the outside, by simplifying the types of data processing operations of the model, data and parameters of the same type of data operation instructions executed within a period of time are loaded into the processor together through multiple data required by several operation instructions at one time, thereby reducing data transmission loss, and further avoiding overhead brought by frequent switching between different operation instructions.

And thirdly, performing various data distribution schemes such as fine granularity and regularized splitting on parameters and data required by data processing operation in the memory, so that the parameters and the data enter the CPU in batches, and the data volume can just fill all registers of the processor, thereby enhancing the throughput of data processing and improving the instruction execution efficiency.

Optionally, the regularized splitting is that a user sets a rule according to actual requirements, and then splits data according to the set rule, for example, data to be subjected to the same operation are classified together and arranged in order in a memory, so that the data can be easily transmitted into a processor in batches; arranging 'parameters' and 'data' to be involved in the same instruction operation together and sending the parameters and the data into a processor; storing the data to be operated by the next instruction next to the data of the previous instruction; when the first 1 instruction is executed, 2 parts of data are loaded into the processor, so that the processor can quickly acquire the data without waiting when the next instruction is executed.

Optionally, at least one of the multiple data distribution schemes may be selected for use because a model will often accept multiple data inputs that will perform multiple different types of data processing operations in series or in parallel, and thus, different data processing operation types may be selected for their respective data distribution schemes.

Optionally, batch entry is performed to simultaneously enter parameters and data required by data operation in the same batch, so as to avoid frequently calling external data or parameters. Due to the limited register space of the processor, it is not possible to store and manipulate all data and parameters at once, so that batching is still required.

And fourthly, designing the size granularity of model splitting, enabling data processing operation to be split on the kernels of different processors on average, and enabling data input from the outside of the system and parameters carried by the model to fill all registers, so as to avoid delay waiting.

Alternatively, a processor includes a plurality of cores, for example, many dual-core, 4-core, and 8-core processors exist, and 1 core includes a plurality of logical operation units. Therefore, the data processing operation should be averagely divided into a plurality of cores, and a plurality of logic operation units are fully and parallelly used for operation in 1 core.

Optionally, the parameters in the model and the input data to be processed are split into small units, wherein the parameters in the model are generated when the deep learning framework is used for training and are stored in a file. Before the method for processing deep learning model data is executed, a program is used for analyzing the parameters of the model and the data processing operation type of the model in advance, and then the parameters are divided into small units and stored in a file. When the method for processing the deep learning model data is executed, the input data is also split into a plurality of parts, then the split data and the model parameters are loaded into the memory together, and then the split data and the model parameters are sequentially sent to the processor for operation.

According to the data processing method and device, data distribution is organized, data are divided into small-unit data sections, parameters and data are distributed continuously and orderly, so that frequent transmission of the data in a memory and a processor channel is reduced, one-time data input is realized, a plurality of data processing operations can be executed and then output, the technical problem that processing efficiency is low due to frequent transmission of the data in the data processing operation process is solved, and the technical effect of improving efficiency in the data processing operation process is achieved.

The embodiment of the disclosure also provides a data processing device for executing the data processing method of the embodiment shown in fig. 1.

Fig. 3 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure. As shown in fig. 3, the data processing apparatus 300 may include: an acquisition module 301, a splitting module 302, an assignment module 303, and a processing module 304.

The obtaining module 301 is configured to obtain first information, second information, third information, and fourth information, where the first information is a plurality of model parameters used by the deep learning model, the second information is an operation type of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores set in the processor.

A splitting module 302, configured to split the plurality of model parameters into a plurality of first data segments and split the input data into a plurality of second data segments based on the first information, the second information, and the third information.

An assigning module 303 for assigning the plurality of data processing operations to the plurality of cores using the second information and the fourth information.

The processing module 304 is configured to batch-transfer the plurality of first data segments and the plurality of second data segments to the plurality of cores to perform the data processing operation corresponding to each of the plurality of cores.

Optionally, the splitting module 302 includes: a first determining unit, configured to determine, based on the second information, an operation type corresponding to each of the plurality of data processing operations; an acquisition unit, configured to acquire, from the plurality of model parameters, associated model parameters and associated data from the input data, using an operation type corresponding to each data processing operation; and the splitting unit is used for splitting the associated model parameters to obtain a plurality of first data segments, and splitting the associated data to obtain a plurality of second data segments.

Optionally, the obtaining unit includes: a first determining subunit, configured to determine, as associated model parameters, model parameters belonging to the same operation type and determine input data belonging to the same operation type as associated data, using an operation type corresponding to each data processing operation; the second determining subunit is used for determining the model parameters participating in the same instruction operation as the associated model parameters and determining the input data participating in the same instruction operation as the associated data by using the operation type corresponding to each data processing operation; and the third determining subunit is used for determining the model parameters participating in the operation of the two adjacent instructions as the associated model parameters and determining the input data participating in the operation of the two adjacent instructions as the associated data by using the operation type corresponding to each data processing operation.

Optionally, the allocating module 303 comprises: a second determining unit for determining the number of types of the plurality of data processing operations using the second information and the number of cores of the plurality of cores using the fourth information; and the distribution unit is used for distributing the plurality of data processing operations to the plurality of cores on average based on the type number and the core number.

Optionally, the apparatus further includes a loading module, configured to load the plurality of first data segments and the plurality of second data segments to the memory.

Optionally, the processing module 304 comprises: and the transfer unit is used for transferring the plurality of first data segments and the plurality of second data segments from the memory to corresponding cores in the plurality of cores in batches.

In the data processing apparatus of this embodiment, a plurality of first data segments are obtained by splitting a plurality of model parameters; splitting input data to obtain a plurality of second data segments; the plurality of first data segments and the plurality of second data segments with good split numbers are transmitted to the plurality of kernels in batches, and the data processing operation corresponding to each kernel in the plurality of kernels is executed, so that the model parameters, the input data and the data processing operation are put into the processor as much as possible, the technical effect of improving the efficiency in the data processing operation process is achieved, and the technical problem of low efficiency in the data processing operation process is solved.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

Embodiments of the present disclosure provide an electronic device, which may include: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of data processing of the embodiments of the present disclosure.

Optionally, the electronic device may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

The present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of data processing of the embodiments of the present disclosure, according to the embodiments of the present disclosure.

Alternatively, in the present embodiment, the above-mentioned nonvolatile storage medium may be configured to store a computer program for executing the steps of:

s1, acquiring first information, second information, third information and fourth information, wherein the first information is a plurality of model parameters used by the deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores arranged in the processor;

s2, splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information and the third information;

s3, distributing a plurality of data processing operations to a plurality of cores by using the second information and the fourth information;

s4, the first data segments and the second data segments are transmitted to the cores in batches, so that the data processing operation corresponding to each core in the cores is executed.

Alternatively, in the present embodiment, the non-transitory computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

According to an embodiment of the present disclosure, the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, realizes the steps of:

s4, transmitting the plurality of first data segments and the plurality of second data segments to the plurality of kernels in batches so as to execute the data processing operation corresponding to each kernel in the plurality of kernels.

Fig. 4 is a block diagram of an electronic device of a data processing method according to an embodiment of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the device 400 comprises a computing unit 401, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the device 400 can also be stored. The calculation unit 401, the ROM 402, and the RAM 403 are connected to each other via a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

A number of components in device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, or the like; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408 such as a magnetic disk, optical disk, or the like; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 401 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 401 executes the respective methods and processes described above, such as the method data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 400 via the ROM 402 and/or the communication unit 409. When the computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of data processing, comprising:

acquiring first information, second information, third information and fourth information, wherein the first information is a plurality of model parameters used by a deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores set in a processor;

splitting the plurality of model parameters into a plurality of first data segments and splitting the input data into a plurality of second data segments based on the first information, the second information, and the third information;

distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information;

and transferring the plurality of first data segments and the plurality of second data segments to the plurality of kernels in batches so as to execute the data processing operation corresponding to each kernel in the plurality of kernels.

2. The method of claim 1, wherein splitting the plurality of model parameters into the plurality of first data segments and splitting the input data into the plurality of second data segments based on the first information, the second information, and the third information comprises:

determining an operation type corresponding to each data processing operation in the plurality of data processing operations based on the second information;

acquiring associated model parameters from the plurality of model parameters and associated data from the input data by using an operation type corresponding to each data processing operation;

splitting the associated model parameters to obtain the plurality of first data segments, and splitting the associated data to obtain the plurality of second data segments.

3. The method of claim 2, wherein obtaining the associated model parameters from the plurality of model parameters and the associated data from the input data with the operation type corresponding to each data processing operation comprises at least one of:

determining model parameters belonging to the same operation type as the associated model parameters and determining input data belonging to the same operation type as the associated data by using the operation type corresponding to each data processing operation;

determining model parameters participating in the same instruction operation as the associated model parameters and determining input data participating in the same instruction operation as the associated data by using the operation type corresponding to each data processing operation;

and determining model parameters participating in the operation of two adjacent instructions as the associated model parameters and determining input data participating in the operation of two adjacent instructions as the associated data by utilizing the operation type corresponding to each data processing operation.

4. The method of claim 1, wherein distributing the plurality of data processing operations to the plurality of cores using the second information and the fourth information comprises:

determining a number of types of the plurality of data processing operations using the second information and a number of cores of the plurality of cores using the fourth information;

evenly distributing the plurality of data processing operations to the plurality of cores based on the number of types and the number of cores.

5. The method of claim 1, wherein the method further comprises:

and loading the plurality of first data segments and the plurality of second data segments to a memory.

6. The method of claim 5, wherein batch-wise delivering the first and second pluralities of data segments to the plurality of cores comprises:

and transferring the plurality of first data segments and the plurality of second data segments from the memory to corresponding cores in the plurality of cores in batches.

7. An apparatus for data processing, comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring first information, second information, third information and fourth information, the first information is a plurality of model parameters used by a deep learning model, the second information is operation types of a plurality of data processing operations used by the deep learning model, the third information is input data to be processed by the deep learning model, and the fourth information is the number of cores of a plurality of cores arranged in a processor;

a splitting module configured to split the plurality of model parameters into a plurality of first data segments and the input data into a plurality of second data segments based on the first information, the second information, and the third information;

an allocation module configured to allocate the plurality of data processing operations to the plurality of cores using the second information and the fourth information;

the processing module is used for transmitting the plurality of first data segments and the plurality of second data segments to the plurality of cores in batches so as to execute the data processing operation corresponding to each core in the plurality of cores.

8. The apparatus of claim 7, wherein the splitting module comprises:

a first determining unit, configured to determine, based on the second information, an operation type corresponding to each of the plurality of data processing operations;

an obtaining unit, configured to obtain, by using an operation type corresponding to each of the data processing operations, associated model parameters from the plurality of model parameters and associated data from the input data;

and the splitting unit is used for splitting the associated model parameters to obtain the plurality of first data segments, and splitting the associated data to obtain the plurality of second data segments.

9. The apparatus of claim 8, wherein the obtaining unit comprises at least one of:

a first determining subunit, configured to determine, by using an operation type corresponding to each data processing operation, model parameters belonging to the same operation type as the associated model parameters, and determine input data belonging to the same operation type as the associated data;

the second determining subunit is used for determining the model parameters participating in the same instruction operation as the associated model parameters and determining the input data participating in the same instruction operation as the associated data by using the operation type corresponding to each data processing operation;

and the third determining subunit is used for determining the model parameters participating in the operation of the two adjacent instructions as the associated model parameters and determining the input data participating in the operation of the two adjacent instructions as the associated data by using the operation type corresponding to each data processing operation.

10. The apparatus of claim 7, wherein the assignment module comprises:

a second determining unit configured to determine the number of types of the plurality of data processing operations using the second information, and determine the number of cores of the plurality of cores using the fourth information;

an allocation unit to evenly allocate the plurality of data processing operations to the plurality of cores based on the number of types and the number of cores.

11. The apparatus of claim 7, wherein the apparatus further comprises:

and the loading module is used for loading the plurality of first data segments and the plurality of second data segments to the memory.

12. The apparatus of claim 11, wherein the processing module comprises:

and the transfer unit is used for transferring the plurality of first data segments and the plurality of second data segments from the memory to corresponding cores in the plurality of cores in batches.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.