CN109739514B - Parameter processing method and related product - Google Patents

Parameter processing method and related product Download PDF

Info

Publication number
CN109739514B
CN109739514B CN201811570061.6A CN201811570061A CN109739514B CN 109739514 B CN109739514 B CN 109739514B CN 201811570061 A CN201811570061 A CN 201811570061A CN 109739514 B CN109739514 B CN 109739514B
Authority
CN
China
Prior art keywords
parameter
deep learning
container
learning framework
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811570061.6A
Other languages
Chinese (zh)
Other versions
CN109739514A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201811570061.6A priority Critical patent/CN109739514B/en
Publication of CN109739514A publication Critical patent/CN109739514A/en
Priority to PCT/CN2019/087631 priority patent/WO2020124948A1/en
Application granted granted Critical
Publication of CN109739514B publication Critical patent/CN109739514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a parameter processing method and a related product, which are applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, and the container is a class or a structural body for storing parameters and is connected with the upper language interface; the deep learning framework acquires a first parameter from the container, interacts the first parameter with module data of the deep learning framework to acquire a second parameter, and transmits the second parameter to the container; the upper language interface retrieves the second parameter from the container. The embodiment of the application improves the parallel operation effect in the deep learning frame by writing the first parameter into the container, and improves the monitoring performance of the parallel operation performance by counting and acquiring the second parameter.

Description

Parameter processing method and related product
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly to a parameter processing method and related products.
Background
With the development of the artificial intelligence industry, more and more deep learning frameworks are re-developed and used by people. In the development and use process of the deep learning frame matched with the artificial intelligence chip, a user is usually required to set some parameters for the frame to achieve a better calculation effect, or some parameters in the frame are obtained to monitor the running state of the frame.
At present, a deep learning framework has no mechanism and mode for setting parameters related to an artificial intelligence chip, so that a user cannot set the parameters or acquire data related to chip operation for the artificial intelligence chip. How to improve the current situation becomes a problem to be solved urgently.
BRIEF SUMMARY OF THE PRESENT DISCLOSURE
In view of this, an object of the present disclosure is to provide a parameter processing method and a related product, in which a container is newly added, a first parameter for describing a parallelism degree of a deep learning framework is written into the container, and then the first parameter in the container is combined with other modules of the deep learning framework to obtain a second parameter for monitoring a parallel computing performance, so as to improve a computing effect of the deep learning framework and increase a monitorable performance of the parallel computing performance.
In order to solve the above technical problems, a first aspect of the embodiments of the present invention provides a parameter processing method applied to an artificial intelligence chip,
an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, and the container is connected with the upper language interface, and the method comprises the following steps:
the upper layer language interface injects a first parameter into the container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework;
the deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter, and the container is a class or a structure body used for storing the parameter;
and the upper layer language interface acquires a second parameter from the container.
Optionally, before the upper language interface writes the first parameter into the container, the method further comprises:
the container comprises a parameter data field, and the parameter data field is used for pointing to a first parameter and a second parameter.
Optionally, the first parameter includes data parallelism and model parallelism.
Optionally, the second parameter includes a channel disappearance time and a sum of the channel disappearance times.
Optionally, the interacting the first parameter with the module data of the deep learning framework to obtain a second parameter includes:
transmitting the data parallelism to a module of a deep learning framework for data interaction, and obtaining a channel disappearance time (CET) and a channel disappearance time sum (CETS) corresponding to the data parallelism, wherein the CETS and the CET are used for calculating time of a statistical operator;
and transmitting the model parallelism to a module of a deep learning framework for data interaction, and obtaining the CET and the CETS corresponding to the data parallelism.
Optionally, the deep learning framework is an MXNet deep learning framework.
Optionally, the deep learning framework further comprises a carrier, and the method further comprises:
performing parameter passing interactions between the container and modules of the deep learning framework through the carrier, the parameters including a first parameter and a second parameter.
Optionally, the artificial intelligence chip further includes an underlying library module, and the method further includes:
and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.
Optionally, the container includes a native class or structure in the deep learning framework, or a class or structure independently created in the deep learning framework for the artificial intelligence chip.
A second aspect of the embodiments of the present invention provides a parameter processing apparatus, which is applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework includes a container, and the container is connected to the upper language interface, and the apparatus includes:
a writing module, configured to write a first parameter into a container through the upper layer language interface, where the first parameter is used to describe a parallelism degree of the deep learning framework;
the calculation module is used for acquiring the first parameter from the container through the deep learning frame, interacting the first parameter with data of a module of the deep learning frame to acquire a second parameter, and transmitting the second parameter to the container, wherein the second parameter is used for monitoring the performance of parallel operation, and the container is a class or a structural body used for storing the parameter;
and the acquisition module is used for acquiring a second parameter from the container through the upper layer language interface.
A third aspect of the embodiments of the present invention provides a chip, including the parameter processing apparatus provided in the second aspect.
A fourth aspect of the embodiments of the present invention provides a chip packaging structure, where the chip packaging structure includes the chip described in the third aspect;
a fifth aspect of the embodiments of the present invention provides a board card, where the board card includes the chip packaging structure described in the fourth aspect.
In a sixth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes the chip packaging structure described in the fourth aspect or the board card described in the fifth aspect.
A seventh aspect of embodiments of the present invention provides a storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the instructions of the steps of the method of the first aspect.
It can be seen that, in the parameter processing method disclosed in the embodiment of the present application, an upper layer language interface and a deep learning framework are deployed in an artificial intelligence chip, the deep learning framework includes a container, and the container is connected to the upper layer language interface. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1A is an artificial intelligence chip according to an embodiment of the present disclosure.
Fig. 1B is a schematic flow chart of a parameter processing method according to an embodiment of the present application.
Fig. 2 is a schematic flow chart of another parameter processing method according to an embodiment of the present application.
Fig. 3 is a schematic flow chart of another parameter processing method according to an embodiment of the present application.
Fig. 4 is a parameter processing apparatus according to an embodiment of the present application.
Fig. 5 is a schematic view of a combined processing device according to an embodiment of the present application.
Fig. 6 is a block diagram of another combined processing device according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a board card provided in the embodiment of the present application.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1A, fig. 1A is a diagram of an artificial intelligence chip provided in an embodiment of the present application, as shown in fig. 1A, an artificial intelligence chip 10 includes an upper layer language interface 101 and a deep learning framework 100, where the upper layer language interface is used for accessing a programming language, the deep learning framework includes a container and other modules of the deep learning framework, the container can perform data interaction with the modules of the deep learning framework, and the modules of the deep learning framework include a graph executor module, each operator module, and an engine module. Optionally, the upper layer language interface 101 may also be disposed on other chips or devices, and the other chips or devices are connected to the artificial intelligence chip, and information interaction between the two chips or devices can also be performed. In addition, the artificial intelligence chip 10 may also include an underlying library module 102, which includes an underlying runtime library, a driver module, and the like. The deep learning framework 100 also comprises a carrier for data transmission between the container and other modules of the deep learning framework or the underlying library module.
Referring to fig. 1B, fig. 1B is a schematic flow chart of a parameter processing method disclosed in the application embodiment, where the parameter processing method is applied to an artificial intelligence chip shown in fig. 1A, and as shown in fig. 1B, the method specifically includes the following steps:
111. the upper language interface writes a first parameter into a container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework.
The deep learning framework is a code framework for performing deep learning items, and currently popular deep learning frameworks include Tensorflow, Caffe, Theano, MXNet, Torch, PyTorch and the like. An interface is a shared boundary for the exchange of information between two separate components in a system. The upper language and deep learning framework are two independent components, so that an interface exists between the two components for information interaction. The upper layer language such as Python, R language and the like can be used in deep learning, and conventionally, an upper layer language interface is directly connected with a deep learning framework. However, the interface lacks a related parameter setting mechanism, so that a user cannot perform parameter setting and parameter acquisition on the artificial intelligence chip, and therefore, a container is newly added at the lower layer of the upper layer language interface for performing parameter setting and acquiring related data. For the parameter data field for parameter setting and parameter acquisition in the container, the parameter data field can be newly added in the container or in other modules, and then the position for parameter setting and parameter acquisition is designated as the container position.
The container is a class or a structural body for storing data and belongs to a module in a deep learning framework. The container in the deep learning framework can be a native class or a structural body in the deep learning framework, and then a field for parameter setting and parameter acquisition, such as a grapxecutor class, is newly added in the class or the structural body. Or, the container in the deep learning framework may also be a class or structure independently created by the user for the parameter processing method in the artificial intelligence chip, such as an mludevice class, and a field separately used for parameter setting and parameter acquisition.
Optionally, the method further includes: the container comprises a parameter data field, and the parameter data field is used for pointing to a first parameter and a second parameter.
Specifically, before the parameter data field is created in the container, the entire artificial intelligence chip has no data field related to the first parameter and the second parameter, and therefore, the setting of the first parameter and the acquisition of the second parameter cannot be performed. And creating a parameter data field related to the first parameter and the second parameter in the container, wherein the parameter data field is used for indicating the acquisition mode of the first parameter and the second parameter, the interaction mode with other modules or interfaces, the data storage position and the like, and is also convenient for managing the first parameter and the second parameter. In addition, the parameter data field may be created at another location, but data storage is performed by a container.
Optionally, the first parameter includes data parallelism and model parallelism.
Optionally, the deep learning framework in this embodiment is an MXNet deep learning framework.
Data Parallelism (DP) refers to parallel processing of data by different kernels or processing units, and data parallelism refers to the maximum number of parallel executions when parallel processing of data is performed; model Parallelism (MP) refers to parallel processing of an operator or a Model on multiple kernels, and the degree of Model Parallelism refers to the maximum number of parallel executions when parallel processing is performed on the Model or the operator. When the MXNet deep learning framework runs on an artificial intelligence chip, the calculation amount is huge, and DP or MP or two parallel calculations are needed to be adopted in order to reduce the calculation time and improve the calculation efficiency. In order to achieve a better operation effect, data parallelism and model parallelism need to be set, on one hand, the set parallelism parameter needs to be matched with a hardware base of the artificial intelligence chip, and on the other hand, when the scale, sparsity or other characteristics of input data are different, different parallelism parameters also need to be set. And writing the set data parallelism and/or model parallelism through a programming language, and then injecting the data parallelism and/or model parallelism into the container through an upper layer language interface, namely completing the setting of the first parameter.
The MXNet is a deep learning framework, supports languages such as C + +, Python, R, Scala, Julia, Matlab and JavaScript, supports command and symbol programming, can run on any hardware including an artificial intelligence chip, and is one of the most excellent deep learning frameworks at present. Therefore, the MXNet deep learning framework can be well combined with the method of the embodiment of the application to complete the setting of the first parameter and the acquisition of the second parameter.
112. The deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter.
After the first parameter is set and injected into the container, the module of the deep learning framework acquires the first parameter from the container, and the module of the deep learning framework comprises a graph expert module, each operator module, an engine module and the like. For example, if each operator module needs to perform parallel operation, a first parameter needs to be obtained, and then a second parameter can be obtained according to the first parameter in combination with other parameters in the operator module, such as data size, and the like, wherein the second parameter is a parameter for monitoring parallel operation performance, and the obtained second parameter needs to be transmitted back to the container.
Optionally, the second parameter includes a channel disappearance time and a sum of the channel disappearance times.
Optionally, the interacting the first parameter with the module data of the deep learning framework to obtain a second parameter includes: transmitting the data parallelism to a module of a deep learning framework for data interaction to obtain channel disappearance time (CET) and channel disappearance time sum (CETS) corresponding to the data parallelism; and transmitting the model parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator.
Specifically, when the deep learning framework adopts DP or MP, there are multiple parallel channels, and the Channel Elapsed Time (CET) and the Channel Elapsed Time Sum (CETs) are all performance parameters for describing parallel operations performed by the multiple parallel channels, and are used for calculating the computation Time of the statistical operator. And transferring the second parameters of the single module or the whole deep learning framework obtained according to the first parameters and the modules of the deep learning framework into the container, namely completing the acquisition of the second parameters.
113. And the upper layer language interface acquires a second parameter from the container.
The upper layer language interface and the container can acquire the second parameter from the container and expose the second parameter, so that the second parameter is visible to a user, the user can monitor the operation performance of the deep learning framework through the second parameter, and then the second parameter can be adjusted or improved by modifying the first parameter or other parameters, and the operation effect of the deep learning framework is improved.
Optionally, the deep learning framework further includes a carrier, and the method further includes: the container and the module of the deep learning framework are in data transmission interaction through the carrier.
The carrier is a class or a structural body used for data transmission interaction in the deep learning framework, and the container is not directly related to other modules of the deep learning framework, so that data transmission can be carried out through the carrier. For example, the carrier in the MXNet framework may be a context class OpContext of an operator, the container may assign a first parameter to the carrier after injecting the first parameter, and the carrier passes the first parameter to the module of the deep learning framework. Likewise, the second parameter may also be communicated by the carrier from a module of the deep learning framework to the container.
Optionally, the artificial intelligence chip further includes a bottom library module, and the method further includes: and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.
Specifically, the bottom layer library module comprises a bottom layer runtime library, a driving module and the like, and parameters of the bottom layer libraries may also affect the parallel performance or other performance of the deep learning framework, so that the container may also perform data interaction with the bottom layer library module through the carrier so as to obtain parallel operation performance parameters or other performance parameters.
It can be seen that, in the embodiment of the present application, an upper layer language interface and a deep learning framework are deployed in an artificial intelligence chip, where the deep learning framework includes a container, and the container is connected to the upper layer language interface, first, the upper layer language interface writes a first parameter into the container, then the deep learning framework obtains the first parameter from the container, obtains a second parameter by combining the first parameter and a module parameter of the deep learning framework, and transmits the second parameter to the container, and finally, the upper layer language interface obtains the second parameter from the container and provides the second parameter to a user. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.
In accordance with the above, please refer to fig. 2, fig. 2 is a schematic flow chart of another parameter processing method provided in the embodiment of the present application, and as shown in fig. 2, the parameter processing method includes:
201. creating parameter data fields related to the artificial intelligence chip in the container, wherein the parameter data fields relate to a first parameter and a second parameter;
202. an upper language interface injects the first parameter into the container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework;
203. the deep learning framework further comprises a carrier, the deep learning framework acquires the first parameter from the container, and the first parameter is interacted with module data of the deep learning framework through the carrier to acquire a second parameter;
204. the deep learning framework passes the second parameter into the container through the carrier, the second parameter being used for monitoring performance of parallel operations;
205. the artificial intelligence chip also comprises a bottom layer library module, the container and the bottom layer library module carry out parameter transmission interaction through the carrier, and the parameters comprise a first parameter and a second parameter.
The detailed descriptions of the steps 201-205 may refer to the corresponding descriptions of the parameter processing method described in the steps 101-103, and are not repeated herein.
It can be seen that in the embodiment of the application, through newly adding the container in the deep learning frame, then carry out the parameter interaction between deep learning frame and the container through the carrier, and the parameter interaction between bottom library module and the container, because first parameter is used for describing the degree of parallelism of deep learning frame, the second parameter is used for monitoring the performance of parallel operation, therefore this process is through writing into first parameter in to the container, parallel operation effect in the deep learning frame has been promoted, through statistics and acquisition second parameter, the monitorability of parallel operation performance has been promoted.
In accordance with the above, please refer to fig. 3, fig. 3 is a schematic flow chart of another parameter processing method provided in the embodiment of the present application, and as shown in fig. 3, the parameter processing method includes:
301. setting data parallelism, wherein the data parallelism is used for describing the maximum number of parallel executions when different kernels process different parts of data;
302. setting model parallelism, wherein the model parallelism is used for describing the maximum number of parallel executions when an operator or a model operates on a plurality of kernels;
303. injecting the data parallelism and/or the model parallelism into the container through the upper language interface;
304. transmitting the data parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator;
305. transmitting the model parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism;
306. passing CETS and CET corresponding to the data parallelism and/or the model parallelism into the container;
307. and the upper layer language interface acquires the CETS and the CET corresponding to the data parallelism and/or the model parallelism from the container.
The detailed descriptions of the steps 301 to 307 may refer to the corresponding descriptions of the parameter processing method described in the steps 101-103, and are not repeated herein.
It can be seen that in the embodiment of the present application, through adding a container in the deep learning frame, then performing parameter interaction between the deep learning frame and the container through the carrier, and parameter interaction between the bottom library module and the container, through setting the data parallelism and/or the model parallelism, the parallel operation effect in the deep learning frame is improved, through statistics and obtaining the second parameter, the monitorability of the parallel operation performance is improved through obtaining CETS and CETS.
Referring to fig. 4, fig. 4 is a parameter processing apparatus according to an embodiment of the present application, applied to the artificial intelligence chip shown in fig. 1A, and as shown in fig. 4, the parameter processing apparatus 400 includes:
a writing module 401, configured to write a first parameter into a container through the upper layer language interface, where the first parameter is used to describe a parallelism degree of the deep learning framework;
a calculating module 402, configured to obtain the first parameter from the container through the deep learning framework, perform interaction between the first parameter and data of a module of the deep learning framework to obtain a second parameter, and transmit the second parameter to the container, where the second parameter is used to monitor performance of parallel operations;
an obtaining module 403, configured to obtain a second parameter from the container through the upper layer language interface.
The detailed description of the parameter processing apparatus may refer to the corresponding description of the parameter processing method described in step 101-103, and is not repeated herein.
It can be seen that, in the parameter processing apparatus in this embodiment of the application, the upper layer language interface writes the first parameter into the container, then the deep learning framework obtains the first parameter from the container, obtains the second parameter by combining the first parameter and the module parameter of the deep learning framework, and transmits the second parameter to the container, and finally the upper layer language interface obtains the second parameter from the container and provides the second parameter to the user. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.
In an alternative embodiment, the write module is further configured to:
a parameter data field is included in the container, the parameter data field for pointing to a first parameter and a second parameter.
In an alternative embodiment, the first parameter includes data parallelism and model parallelism.
In an alternative embodiment, the second parameter is a sum of the channel disappearance time and the channel disappearance time.
In an optional embodiment, the calculation module is specifically configured to:
transmitting the data parallelism to a module of a deep learning framework for data interaction, and obtaining a channel disappearance time (CET) and a channel disappearance time sum (CETS) corresponding to the data parallelism, wherein the CETS and the CET are used for calculating time of a statistical operator;
and transmitting the model parallelism to a module of a deep learning framework for data interaction, and obtaining the CET and the CETS corresponding to the data parallelism.
In an alternative embodiment, the deep learning framework is an MXNet deep learning framework.
In an optional embodiment, the deep learning framework further comprises a carrier, and the computing module is further configured to:
performing parameter passing interactions between the container and modules of the deep learning framework through the carrier, the parameters including a first parameter and a second parameter.
In an optional embodiment, the artificial intelligence chip further comprises an underlying library module, and the calculation module is further configured to:
and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.
In an alternative embodiment, the container includes a native class or structure in the deep learning framework, or a class or structure independently created in the deep learning framework for the artificial intelligence chip.
The application also discloses a combined processing device which comprises the parameter processing device, the universal interconnection interface and other processing devices. The parameter processing device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 5 is a schematic view of a combined treatment apparatus.
Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the parameter processing device and external data and control, and include data transportation, so that basic control such as starting and stopping of the parameter processing device is completed; other processing devices can cooperate with the parameter processing device to complete the operation task.
And the universal interconnection interface is used for transmitting data and control instructions between the parameter processing device and other processing devices. The parameter processing device acquires required input data from other processing devices and writes the input data into a storage device on a parameter processing device chip; control instructions can be obtained from other processing devices and written into a control cache on a parameter processing device slice; the data in the storage module of the parameter processing device can also be read and transmitted to other processing devices.
Optionally, as shown in fig. 6, the structure may further include a storage device, and the storage device is connected to the parameter processing device and the other processing device, respectively. The storage device is used for storing data in the parameter processing device and the other processing devices, and is particularly suitable for storing all data which cannot be stored in the parameter processing device or the other processing devices.
The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.
In some embodiments, a chip is also claimed, which includes the parameter processing device.
In some embodiments, a chip package structure is provided, which includes the above chip.
In some embodiments, a board card is provided, which includes the above chip package structure. Referring to fig. 7, fig. 7 provides a board that may include other accessories in addition to the chip, including but not limited to: a storage device 710, a receiving device 720, and a control device 730;
the memory device 710 is connected to the chips in the chip package structure through a bus for storing data. The memory device may include a plurality of sets of memory cells 711. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).
DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.
In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.
The interface device is electrically connected with a chip in the chip packaging structure. The interface device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the interface device.
The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.
In some embodiments, an electronic device is provided that includes the above board card.
The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.
The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.
The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. The parameter processing method is characterized by being applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, the container is a class or a structural body used for storing parameters and is connected with the upper language interface, and the method comprises the following steps:
the upper layer language interface injects a first parameter into the container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework;
the deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter;
and the upper layer language interface acquires a second parameter from the container.
2. The method of claim 1, further comprising:
the container comprises a parameter data field, and the parameter data field is used for pointing to a first parameter and a second parameter.
3. The method of claim 1 or 2, wherein the first parameters include data parallelism and model parallelism.
4. The method of claim 3, wherein the second parameter comprises a channel disappearance time and a sum of channel disappearance times.
5. The method of claim 4, wherein interacting the first parameter with module data of the deep learning framework to obtain a second parameter comprises:
transmitting the data parallelism to a module of a deep learning framework for data interaction, and obtaining a channel disappearance time (CET) and a channel disappearance time sum (CETS) corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator;
and transmitting the model parallelism to a module of a deep learning framework for data interaction, and obtaining the CET and the CETS corresponding to the data parallelism.
6. The method of claim 1, wherein the deep learning framework is an MXNet deep learning framework.
7. The method of claim 1, wherein the deep learning framework further comprises a carrier, the method further comprising:
performing parameter passing interactions between the container and modules of the deep learning framework through the carrier, the parameters including a first parameter and a second parameter.
8. The method of claim 7, wherein the artificial intelligence chip further comprises an underlying library module, the method further comprising:
and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.
9. The method of claim 1, wherein the container comprises a native class or structure in the deep learning framework or a class or structure independently created in the deep learning framework for the artificial intelligence chip.
10. The utility model provides a parameter processing apparatus, its characterized in that is applied to artificial intelligence chip, upper language interface and deep learning frame have been deployed to the artificial intelligence chip, including the container in the deep learning frame, the container is for being used for depositing the class or the structure of parameter, with upper language interface connection, the device includes:
a writing module, configured to write a first parameter into a container through the upper layer language interface, where the first parameter is used to describe a parallelism degree of the deep learning framework;
the calculation module is used for acquiring the first parameter from the container through the deep learning framework, interacting the first parameter with data of a module of the deep learning framework to acquire a second parameter, and transmitting the second parameter to the container, wherein the second parameter is used for monitoring the performance of parallel operation;
and the acquisition module is used for acquiring a second parameter from the container through the upper layer language interface.
11. An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-9.
12. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-9.
CN201811570061.6A 2018-12-21 2018-12-21 Parameter processing method and related product Active CN109739514B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811570061.6A CN109739514B (en) 2018-12-21 2018-12-21 Parameter processing method and related product
PCT/CN2019/087631 WO2020124948A1 (en) 2018-12-21 2019-05-20 Network offline model processing method, artificial intelligence processing device, and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811570061.6A CN109739514B (en) 2018-12-21 2018-12-21 Parameter processing method and related product

Publications (2)

Publication Number Publication Date
CN109739514A CN109739514A (en) 2019-05-10
CN109739514B true CN109739514B (en) 2021-03-02

Family

ID=66360837

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811570061.6A Active CN109739514B (en) 2018-12-21 2018-12-21 Parameter processing method and related product

Country Status (1)

Country Link
CN (1) CN109739514B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112114931B (en) * 2019-06-21 2023-12-26 富联精密电子(天津)有限公司 Deep learning program configuration method and device, electronic equipment and storage medium
CN112860424A (en) * 2019-11-28 2021-05-28 上海商汤智能科技有限公司 Task processing method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156851A (en) * 2016-06-24 2016-11-23 科大讯飞股份有限公司 The accelerator pursued one's vocational study towards the degree of depth and method
CN107844371A (en) * 2017-10-12 2018-03-27 北京京东尚科信息技术有限公司 Task processing method, system and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229258A (en) * 2016-12-21 2018-06-29 田文洪 A kind of face parallelism recognition method based on deep learning and Spark
US11074500B2 (en) * 2017-06-20 2021-07-27 Battelle Memorial Institute Prediction of social media postings as trusted news or as types of suspicious news
CN107480789B (en) * 2017-08-07 2020-12-29 北京中星微电子有限公司 Efficient conversion method and device of deep learning model
CN109032671B (en) * 2018-06-25 2022-05-03 电子科技大学 Distributed deep learning method and system based on data parallel strategy
CN109034386A (en) * 2018-06-26 2018-12-18 中国科学院计算机网络信息中心 A kind of deep learning system and method based on Resource Scheduler
CN108921210B (en) * 2018-06-26 2021-03-26 南京信息工程大学 Cloud classification method based on convolutional neural network
CN110110621B (en) * 2019-04-23 2022-03-08 安徽大学 Oblique photography point cloud classification method based on multi-feature integration deep learning model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156851A (en) * 2016-06-24 2016-11-23 科大讯飞股份有限公司 The accelerator pursued one's vocational study towards the degree of depth and method
CN107844371A (en) * 2017-10-12 2018-03-27 北京京东尚科信息技术有限公司 Task processing method, system and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度学习 模型训练超参数调整总结;pandsu;《https://blog.csdn.net/m0_37167788/article/details/84059452?utm_medium=distribute.pc_aggpage_search_result.none-task-blog-2~all~first_rank_v2~rank_v25-3-84059452.nonecase&utm_term》;20181114;1-3 *

Also Published As

Publication number Publication date
CN109739514A (en) 2019-05-10

Similar Documents

Publication Publication Date Title
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
CN110119807B (en) Operation method, operation device, computer equipment and storage medium
CN109739514B (en) Parameter processing method and related product
CN104850516B (en) A kind of DDR Frequency Conversion Designs method and apparatus
CN111767995B (en) Operation method, device and related product
US20240004650A1 (en) Data processing method and apparatus, and related product
CN111723920B (en) Artificial intelligence computing device and related products
CN111813449A (en) Operation method, device and related product
CN111382852B (en) Data processing device, method, chip and electronic equipment
CN111382856B (en) Data processing device, method, chip and electronic equipment
CN111767999B (en) Data processing method and device and related products
US11983535B2 (en) Artificial intelligence computing device and related product
CN112395009A (en) Operation method, operation device, computer equipment and storage medium
CN111340202A (en) Operation method, device and related product
CN112395008A (en) Operation method, operation device, computer equipment and storage medium
CN112396169B (en) Operation method, device, computer equipment and storage medium
CN111723921A (en) Artificial intelligence computing device and related products
CN111124497B (en) Operation method, operation device, computer equipment and storage medium
CN111382855B (en) Data processing device, method, chip and electronic equipment
CN112232498B (en) Data processing device, integrated circuit chip, electronic equipment, board card and method
CN111339060B (en) Operation method, device, computer equipment and storage medium
CN112396170B (en) Operation method, device, computer equipment and storage medium
CN111045729A (en) Operation method, device and related product
WO2020124948A1 (en) Network offline model processing method, artificial intelligence processing device, and related product
CN111047028A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant