CN109739514B

CN109739514B - Parameter processing method and related product

Info

Publication number: CN109739514B
Application number: CN201811570061.6A
Authority: CN
Inventors: 不公告发明人
Original assignee: Cambricon Technologies Corp Ltd
Current assignee: Cambricon Technologies Corp Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2021-03-02
Anticipated expiration: 2038-12-21
Also published as: CN109739514A

Abstract

The invention provides a parameter processing method and a related product, which are applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, and the container is a class or a structural body for storing parameters and is connected with the upper language interface; the deep learning framework acquires a first parameter from the container, interacts the first parameter with module data of the deep learning framework to acquire a second parameter, and transmits the second parameter to the container; the upper language interface retrieves the second parameter from the container. The embodiment of the application improves the parallel operation effect in the deep learning frame by writing the first parameter into the container, and improves the monitoring performance of the parallel operation performance by counting and acquiring the second parameter.

Description

Parameter processing method and related product

Technical Field

The present disclosure relates to the field of artificial intelligence, and more particularly to a parameter processing method and related products.

Background

With the development of the artificial intelligence industry, more and more deep learning frameworks are re-developed and used by people. In the development and use process of the deep learning frame matched with the artificial intelligence chip, a user is usually required to set some parameters for the frame to achieve a better calculation effect, or some parameters in the frame are obtained to monitor the running state of the frame.

At present, a deep learning framework has no mechanism and mode for setting parameters related to an artificial intelligence chip, so that a user cannot set the parameters or acquire data related to chip operation for the artificial intelligence chip. How to improve the current situation becomes a problem to be solved urgently.

BRIEF SUMMARY OF THE PRESENT DISCLOSURE

In view of this, an object of the present disclosure is to provide a parameter processing method and a related product, in which a container is newly added, a first parameter for describing a parallelism degree of a deep learning framework is written into the container, and then the first parameter in the container is combined with other modules of the deep learning framework to obtain a second parameter for monitoring a parallel computing performance, so as to improve a computing effect of the deep learning framework and increase a monitorable performance of the parallel computing performance.

In order to solve the above technical problems, a first aspect of the embodiments of the present invention provides a parameter processing method applied to an artificial intelligence chip,

an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, and the container is connected with the upper language interface, and the method comprises the following steps:

the upper layer language interface injects a first parameter into the container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework;

the deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter, and the container is a class or a structure body used for storing the parameter;

and the upper layer language interface acquires a second parameter from the container.

Optionally, before the upper language interface writes the first parameter into the container, the method further comprises:

the container comprises a parameter data field, and the parameter data field is used for pointing to a first parameter and a second parameter.

Optionally, the first parameter includes data parallelism and model parallelism.

Optionally, the second parameter includes a channel disappearance time and a sum of the channel disappearance times.

Optionally, the interacting the first parameter with the module data of the deep learning framework to obtain a second parameter includes:

transmitting the data parallelism to a module of a deep learning framework for data interaction, and obtaining a channel disappearance time (CET) and a channel disappearance time sum (CETS) corresponding to the data parallelism, wherein the CETS and the CET are used for calculating time of a statistical operator;

and transmitting the model parallelism to a module of a deep learning framework for data interaction, and obtaining the CET and the CETS corresponding to the data parallelism.

Optionally, the deep learning framework is an MXNet deep learning framework.

Optionally, the deep learning framework further comprises a carrier, and the method further comprises:

performing parameter passing interactions between the container and modules of the deep learning framework through the carrier, the parameters including a first parameter and a second parameter.

Optionally, the artificial intelligence chip further includes an underlying library module, and the method further includes:

and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.

Optionally, the container includes a native class or structure in the deep learning framework, or a class or structure independently created in the deep learning framework for the artificial intelligence chip.

A second aspect of the embodiments of the present invention provides a parameter processing apparatus, which is applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework includes a container, and the container is connected to the upper language interface, and the apparatus includes:

a writing module, configured to write a first parameter into a container through the upper layer language interface, where the first parameter is used to describe a parallelism degree of the deep learning framework;

the calculation module is used for acquiring the first parameter from the container through the deep learning frame, interacting the first parameter with data of a module of the deep learning frame to acquire a second parameter, and transmitting the second parameter to the container, wherein the second parameter is used for monitoring the performance of parallel operation, and the container is a class or a structural body used for storing the parameter;

and the acquisition module is used for acquiring a second parameter from the container through the upper layer language interface.

A third aspect of the embodiments of the present invention provides a chip, including the parameter processing apparatus provided in the second aspect.

A fourth aspect of the embodiments of the present invention provides a chip packaging structure, where the chip packaging structure includes the chip described in the third aspect;

a fifth aspect of the embodiments of the present invention provides a board card, where the board card includes the chip packaging structure described in the fourth aspect.

In a sixth aspect, an embodiment of the present application provides an electronic device, where the electronic device includes the chip packaging structure described in the fourth aspect or the board card described in the fifth aspect.

A seventh aspect of embodiments of the present invention provides a storage medium storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the instructions of the steps of the method of the first aspect.

It can be seen that, in the parameter processing method disclosed in the embodiment of the present application, an upper layer language interface and a deep learning framework are deployed in an artificial intelligence chip, the deep learning framework includes a container, and the container is connected to the upper layer language interface. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1A is an artificial intelligence chip according to an embodiment of the present disclosure.

Fig. 1B is a schematic flow chart of a parameter processing method according to an embodiment of the present application.

Fig. 2 is a schematic flow chart of another parameter processing method according to an embodiment of the present application.

Fig. 3 is a schematic flow chart of another parameter processing method according to an embodiment of the present application.

Fig. 4 is a parameter processing apparatus according to an embodiment of the present application.

Fig. 5 is a schematic view of a combined processing device according to an embodiment of the present application.

Fig. 6 is a block diagram of another combined processing device according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of a board card provided in the embodiment of the present application.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1A, fig. 1A is a diagram of an artificial intelligence chip provided in an embodiment of the present application, as shown in fig. 1A, an artificial intelligence chip 10 includes an upper layer language interface 101 and a deep learning framework 100, where the upper layer language interface is used for accessing a programming language, the deep learning framework includes a container and other modules of the deep learning framework, the container can perform data interaction with the modules of the deep learning framework, and the modules of the deep learning framework include a graph executor module, each operator module, and an engine module. Optionally, the upper layer language interface 101 may also be disposed on other chips or devices, and the other chips or devices are connected to the artificial intelligence chip, and information interaction between the two chips or devices can also be performed. In addition, the artificial intelligence chip 10 may also include an underlying library module 102, which includes an underlying runtime library, a driver module, and the like. The deep learning framework 100 also comprises a carrier for data transmission between the container and other modules of the deep learning framework or the underlying library module.

Referring to fig. 1B, fig. 1B is a schematic flow chart of a parameter processing method disclosed in the application embodiment, where the parameter processing method is applied to an artificial intelligence chip shown in fig. 1A, and as shown in fig. 1B, the method specifically includes the following steps:

111. the upper language interface writes a first parameter into a container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework.

The deep learning framework is a code framework for performing deep learning items, and currently popular deep learning frameworks include Tensorflow, Caffe, Theano, MXNet, Torch, PyTorch and the like. An interface is a shared boundary for the exchange of information between two separate components in a system. The upper language and deep learning framework are two independent components, so that an interface exists between the two components for information interaction. The upper layer language such as Python, R language and the like can be used in deep learning, and conventionally, an upper layer language interface is directly connected with a deep learning framework. However, the interface lacks a related parameter setting mechanism, so that a user cannot perform parameter setting and parameter acquisition on the artificial intelligence chip, and therefore, a container is newly added at the lower layer of the upper layer language interface for performing parameter setting and acquiring related data. For the parameter data field for parameter setting and parameter acquisition in the container, the parameter data field can be newly added in the container or in other modules, and then the position for parameter setting and parameter acquisition is designated as the container position.

The container is a class or a structural body for storing data and belongs to a module in a deep learning framework. The container in the deep learning framework can be a native class or a structural body in the deep learning framework, and then a field for parameter setting and parameter acquisition, such as a grapxecutor class, is newly added in the class or the structural body. Or, the container in the deep learning framework may also be a class or structure independently created by the user for the parameter processing method in the artificial intelligence chip, such as an mludevice class, and a field separately used for parameter setting and parameter acquisition.

Optionally, the method further includes: the container comprises a parameter data field, and the parameter data field is used for pointing to a first parameter and a second parameter.

Specifically, before the parameter data field is created in the container, the entire artificial intelligence chip has no data field related to the first parameter and the second parameter, and therefore, the setting of the first parameter and the acquisition of the second parameter cannot be performed. And creating a parameter data field related to the first parameter and the second parameter in the container, wherein the parameter data field is used for indicating the acquisition mode of the first parameter and the second parameter, the interaction mode with other modules or interfaces, the data storage position and the like, and is also convenient for managing the first parameter and the second parameter. In addition, the parameter data field may be created at another location, but data storage is performed by a container.

Optionally, the deep learning framework in this embodiment is an MXNet deep learning framework.

Data Parallelism (DP) refers to parallel processing of data by different kernels or processing units, and data parallelism refers to the maximum number of parallel executions when parallel processing of data is performed; model Parallelism (MP) refers to parallel processing of an operator or a Model on multiple kernels, and the degree of Model Parallelism refers to the maximum number of parallel executions when parallel processing is performed on the Model or the operator. When the MXNet deep learning framework runs on an artificial intelligence chip, the calculation amount is huge, and DP or MP or two parallel calculations are needed to be adopted in order to reduce the calculation time and improve the calculation efficiency. In order to achieve a better operation effect, data parallelism and model parallelism need to be set, on one hand, the set parallelism parameter needs to be matched with a hardware base of the artificial intelligence chip, and on the other hand, when the scale, sparsity or other characteristics of input data are different, different parallelism parameters also need to be set. And writing the set data parallelism and/or model parallelism through a programming language, and then injecting the data parallelism and/or model parallelism into the container through an upper layer language interface, namely completing the setting of the first parameter.

The MXNet is a deep learning framework, supports languages such as C + +, Python, R, Scala, Julia, Matlab and JavaScript, supports command and symbol programming, can run on any hardware including an artificial intelligence chip, and is one of the most excellent deep learning frameworks at present. Therefore, the MXNet deep learning framework can be well combined with the method of the embodiment of the application to complete the setting of the first parameter and the acquisition of the second parameter.

112. The deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter.

After the first parameter is set and injected into the container, the module of the deep learning framework acquires the first parameter from the container, and the module of the deep learning framework comprises a graph expert module, each operator module, an engine module and the like. For example, if each operator module needs to perform parallel operation, a first parameter needs to be obtained, and then a second parameter can be obtained according to the first parameter in combination with other parameters in the operator module, such as data size, and the like, wherein the second parameter is a parameter for monitoring parallel operation performance, and the obtained second parameter needs to be transmitted back to the container.

Optionally, the interacting the first parameter with the module data of the deep learning framework to obtain a second parameter includes: transmitting the data parallelism to a module of a deep learning framework for data interaction to obtain channel disappearance time (CET) and channel disappearance time sum (CETS) corresponding to the data parallelism; and transmitting the model parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator.

Specifically, when the deep learning framework adopts DP or MP, there are multiple parallel channels, and the Channel Elapsed Time (CET) and the Channel Elapsed Time Sum (CETs) are all performance parameters for describing parallel operations performed by the multiple parallel channels, and are used for calculating the computation Time of the statistical operator. And transferring the second parameters of the single module or the whole deep learning framework obtained according to the first parameters and the modules of the deep learning framework into the container, namely completing the acquisition of the second parameters.

113. And the upper layer language interface acquires a second parameter from the container.

The upper layer language interface and the container can acquire the second parameter from the container and expose the second parameter, so that the second parameter is visible to a user, the user can monitor the operation performance of the deep learning framework through the second parameter, and then the second parameter can be adjusted or improved by modifying the first parameter or other parameters, and the operation effect of the deep learning framework is improved.

Optionally, the deep learning framework further includes a carrier, and the method further includes: the container and the module of the deep learning framework are in data transmission interaction through the carrier.

The carrier is a class or a structural body used for data transmission interaction in the deep learning framework, and the container is not directly related to other modules of the deep learning framework, so that data transmission can be carried out through the carrier. For example, the carrier in the MXNet framework may be a context class OpContext of an operator, the container may assign a first parameter to the carrier after injecting the first parameter, and the carrier passes the first parameter to the module of the deep learning framework. Likewise, the second parameter may also be communicated by the carrier from a module of the deep learning framework to the container.

Optionally, the artificial intelligence chip further includes a bottom library module, and the method further includes: and performing parameter transmission interaction between the container and the bottom layer library module through the carrier, wherein the parameters comprise a first parameter and a second parameter.

Specifically, the bottom layer library module comprises a bottom layer runtime library, a driving module and the like, and parameters of the bottom layer libraries may also affect the parallel performance or other performance of the deep learning framework, so that the container may also perform data interaction with the bottom layer library module through the carrier so as to obtain parallel operation performance parameters or other performance parameters.

It can be seen that, in the embodiment of the present application, an upper layer language interface and a deep learning framework are deployed in an artificial intelligence chip, where the deep learning framework includes a container, and the container is connected to the upper layer language interface, first, the upper layer language interface writes a first parameter into the container, then the deep learning framework obtains the first parameter from the container, obtains a second parameter by combining the first parameter and a module parameter of the deep learning framework, and transmits the second parameter to the container, and finally, the upper layer language interface obtains the second parameter from the container and provides the second parameter to a user. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.

In accordance with the above, please refer to fig. 2, fig. 2 is a schematic flow chart of another parameter processing method provided in the embodiment of the present application, and as shown in fig. 2, the parameter processing method includes:

201. creating parameter data fields related to the artificial intelligence chip in the container, wherein the parameter data fields relate to a first parameter and a second parameter;

202. an upper language interface injects the first parameter into the container, wherein the first parameter is used for describing the parallelism degree of the deep learning framework;

203. the deep learning framework further comprises a carrier, the deep learning framework acquires the first parameter from the container, and the first parameter is interacted with module data of the deep learning framework through the carrier to acquire a second parameter;

204. the deep learning framework passes the second parameter into the container through the carrier, the second parameter being used for monitoring performance of parallel operations;

205. the artificial intelligence chip also comprises a bottom layer library module, the container and the bottom layer library module carry out parameter transmission interaction through the carrier, and the parameters comprise a first parameter and a second parameter.

The detailed descriptions of the steps 201-205 may refer to the corresponding descriptions of the parameter processing method described in the steps 101-103, and are not repeated herein.

It can be seen that in the embodiment of the application, through newly adding the container in the deep learning frame, then carry out the parameter interaction between deep learning frame and the container through the carrier, and the parameter interaction between bottom library module and the container, because first parameter is used for describing the degree of parallelism of deep learning frame, the second parameter is used for monitoring the performance of parallel operation, therefore this process is through writing into first parameter in to the container, parallel operation effect in the deep learning frame has been promoted, through statistics and acquisition second parameter, the monitorability of parallel operation performance has been promoted.

In accordance with the above, please refer to fig. 3, fig. 3 is a schematic flow chart of another parameter processing method provided in the embodiment of the present application, and as shown in fig. 3, the parameter processing method includes:

301. setting data parallelism, wherein the data parallelism is used for describing the maximum number of parallel executions when different kernels process different parts of data;

302. setting model parallelism, wherein the model parallelism is used for describing the maximum number of parallel executions when an operator or a model operates on a plurality of kernels;

303. injecting the data parallelism and/or the model parallelism into the container through the upper language interface;

304. transmitting the data parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator;

305. transmitting the model parallelism to a module of a deep learning framework for data interaction to obtain a CET and a CETS corresponding to the data parallelism;

306. passing CETS and CET corresponding to the data parallelism and/or the model parallelism into the container;

307. and the upper layer language interface acquires the CETS and the CET corresponding to the data parallelism and/or the model parallelism from the container.

The detailed descriptions of the steps 301 to 307 may refer to the corresponding descriptions of the parameter processing method described in the steps 101-103, and are not repeated herein.

It can be seen that in the embodiment of the present application, through adding a container in the deep learning frame, then performing parameter interaction between the deep learning frame and the container through the carrier, and parameter interaction between the bottom library module and the container, through setting the data parallelism and/or the model parallelism, the parallel operation effect in the deep learning frame is improved, through statistics and obtaining the second parameter, the monitorability of the parallel operation performance is improved through obtaining CETS and CETS.

Referring to fig. 4, fig. 4 is a parameter processing apparatus according to an embodiment of the present application, applied to the artificial intelligence chip shown in fig. 1A, and as shown in fig. 4, the parameter processing apparatus 400 includes:

a writing module 401, configured to write a first parameter into a container through the upper layer language interface, where the first parameter is used to describe a parallelism degree of the deep learning framework;

a calculating module 402, configured to obtain the first parameter from the container through the deep learning framework, perform interaction between the first parameter and data of a module of the deep learning framework to obtain a second parameter, and transmit the second parameter to the container, where the second parameter is used to monitor performance of parallel operations;

an obtaining module 403, configured to obtain a second parameter from the container through the upper layer language interface.

The detailed description of the parameter processing apparatus may refer to the corresponding description of the parameter processing method described in step 101-103, and is not repeated herein.

It can be seen that, in the parameter processing apparatus in this embodiment of the application, the upper layer language interface writes the first parameter into the container, then the deep learning framework obtains the first parameter from the container, obtains the second parameter by combining the first parameter and the module parameter of the deep learning framework, and transmits the second parameter to the container, and finally the upper layer language interface obtains the second parameter from the container and provides the second parameter to the user. Because the first parameter is used for describing the parallelism degree of the deep learning frame and the second parameter is used for monitoring the performance of the parallel operation, the parallel operation effect in the deep learning frame is improved by writing the first parameter into the container in the process, and the monitoring performance of the parallel operation performance is improved by counting and acquiring the second parameter.

In an alternative embodiment, the write module is further configured to:

a parameter data field is included in the container, the parameter data field for pointing to a first parameter and a second parameter.

In an alternative embodiment, the first parameter includes data parallelism and model parallelism.

In an alternative embodiment, the second parameter is a sum of the channel disappearance time and the channel disappearance time.

In an optional embodiment, the calculation module is specifically configured to:

In an alternative embodiment, the deep learning framework is an MXNet deep learning framework.

In an optional embodiment, the deep learning framework further comprises a carrier, and the computing module is further configured to:

In an optional embodiment, the artificial intelligence chip further comprises an underlying library module, and the calculation module is further configured to:

In an alternative embodiment, the container includes a native class or structure in the deep learning framework, or a class or structure independently created in the deep learning framework for the artificial intelligence chip.

The application also discloses a combined processing device which comprises the parameter processing device, the universal interconnection interface and other processing devices. The parameter processing device interacts with other processing devices to jointly complete the operation designated by the user. Fig. 5 is a schematic view of a combined treatment apparatus.

Other processing devices include one or more of general purpose/special purpose processors such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), neural network processors, and the like. The number of processors included in the other processing devices is not limited. The other processing devices are used as interfaces of the parameter processing device and external data and control, and include data transportation, so that basic control such as starting and stopping of the parameter processing device is completed; other processing devices can cooperate with the parameter processing device to complete the operation task.

And the universal interconnection interface is used for transmitting data and control instructions between the parameter processing device and other processing devices. The parameter processing device acquires required input data from other processing devices and writes the input data into a storage device on a parameter processing device chip; control instructions can be obtained from other processing devices and written into a control cache on a parameter processing device slice; the data in the storage module of the parameter processing device can also be read and transmitted to other processing devices.

Optionally, as shown in fig. 6, the structure may further include a storage device, and the storage device is connected to the parameter processing device and the other processing device, respectively. The storage device is used for storing data in the parameter processing device and the other processing devices, and is particularly suitable for storing all data which cannot be stored in the parameter processing device or the other processing devices.

The combined processing device can be used as an SOC (system on chip) system of equipment such as a mobile phone, a robot, an unmanned aerial vehicle and video monitoring equipment, the core area of a control part is effectively reduced, the processing speed is increased, and the overall power consumption is reduced. In this case, the generic interconnect interface of the combined processing device is connected to some component of the apparatus. Some parts are such as camera, display, mouse, keyboard, network card, wifi interface.

In some embodiments, a chip is also claimed, which includes the parameter processing device.

In some embodiments, a chip package structure is provided, which includes the above chip.

In some embodiments, a board card is provided, which includes the above chip package structure. Referring to fig. 7, fig. 7 provides a board that may include other accessories in addition to the chip, including but not limited to: a storage device 710, a receiving device 720, and a control device 730;

the memory device 710 is connected to the chips in the chip package structure through a bus for storing data. The memory device may include a plurality of sets of memory cells 711. Each group of the storage units is connected with the chip through a bus. It is understood that each group of the memory cells may be a DDR SDRAM (Double Data Rate SDRAM).

DDR can double the speed of SDRAM without increasing the clock frequency. DDR allows data to be read out on the rising and falling edges of the clock pulse. DDR is twice as fast as standard SDRAM. In one embodiment, the storage device may include 4 sets of the storage unit. Each group of the memory cells may include a plurality of DDR4 particles (chips). In one embodiment, the chip may internally include 4 72-bit DDR4 controllers, and 64 bits of the 72-bit DDR4 controller are used for data transmission, and 8 bits are used for ECC check. It can be understood that when DDR4-3200 particles are adopted in each group of memory cells, the theoretical bandwidth of data transmission can reach 25600 MB/s.

In one embodiment, each group of the memory cells includes a plurality of double rate synchronous dynamic random access memories arranged in parallel. DDR can transfer data twice in one clock cycle. And a controller for controlling DDR is arranged in the chip and is used for controlling data transmission and data storage of each memory unit.

The interface device is electrically connected with a chip in the chip packaging structure. The interface device is used for realizing data transmission between the chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transmitted to the chip by the server through the standard PCIE interface, so as to implement data transfer. Preferably, when PCIE 3.0X 16 interface transmission is adopted, the theoretical bandwidth can reach 16000 MB/s. In another embodiment, the interface device may also be another interface, and the present application does not limit the concrete expression of the other interface, and the interface unit may implement the switching function. In addition, the calculation result of the chip is still transmitted back to an external device (e.g., a server) by the interface device.

The control device is electrically connected with the chip. The control device is used for monitoring the state of the chip. Specifically, the chip and the control device may be electrically connected through an SPI interface. The control device may include a single chip Microcomputer (MCU). The chip may include a plurality of processing chips, a plurality of processing cores, or a plurality of processing circuits, and may carry a plurality of loads. Therefore, the chip can be in different working states such as multi-load and light load. The control device can realize the regulation and control of the working states of a plurality of processing chips, a plurality of processing andor a plurality of processing circuits in the chip.

In some embodiments, an electronic device is provided that includes the above board card.

The electronic device comprises a data processing device, a robot, a computer, a printer, a scanner, a tablet computer, an intelligent terminal, a mobile phone, a vehicle data recorder, a navigator, a sensor, a camera, a server, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a vehicle, a household appliance, and/or a medical device.

The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are exemplary embodiments and that the acts and modules referred to are not necessarily required in this application.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of some interfaces, devices or units, and may be an electric or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may be implemented in the form of a software program module.

The integrated units, if implemented in the form of software program modules and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a memory, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned memory comprises: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by associated hardware instructed by a program, which may be stored in a computer-readable memory, which may include: flash Memory disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The foregoing detailed description of the embodiments of the present application has been presented to illustrate the principles and implementations of the present application, and the above description of the embodiments is only provided to help understand the method and the core concept of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. The parameter processing method is characterized by being applied to an artificial intelligence chip, wherein an upper language interface and a deep learning framework are deployed in the artificial intelligence chip, the deep learning framework comprises a container, the container is a class or a structural body used for storing parameters and is connected with the upper language interface, and the method comprises the following steps:

the deep learning framework obtains the first parameter from the container, interacts the first parameter with module data of the deep learning framework to obtain a second parameter, and transmits the second parameter to the container, wherein the second parameter is used for monitoring the parallel operation performance of the deep learning framework described by the first parameter;

2. The method of claim 1, further comprising:

3. The method of claim 1 or 2, wherein the first parameters include data parallelism and model parallelism.

4. The method of claim 3, wherein the second parameter comprises a channel disappearance time and a sum of channel disappearance times.

5. The method of claim 4, wherein interacting the first parameter with module data of the deep learning framework to obtain a second parameter comprises:

transmitting the data parallelism to a module of a deep learning framework for data interaction, and obtaining a channel disappearance time (CET) and a channel disappearance time sum (CETS) corresponding to the data parallelism, wherein the CETS and the CET are used for counting the calculation time of an operator;

6. The method of claim 1, wherein the deep learning framework is an MXNet deep learning framework.

7. The method of claim 1, wherein the deep learning framework further comprises a carrier, the method further comprising:

8. The method of claim 7, wherein the artificial intelligence chip further comprises an underlying library module, the method further comprising:

9. The method of claim 1, wherein the container comprises a native class or structure in the deep learning framework or a class or structure independently created in the deep learning framework for the artificial intelligence chip.

10. The utility model provides a parameter processing apparatus, its characterized in that is applied to artificial intelligence chip, upper language interface and deep learning frame have been deployed to the artificial intelligence chip, including the container in the deep learning frame, the container is for being used for depositing the class or the structure of parameter, with upper language interface connection, the device includes:

the calculation module is used for acquiring the first parameter from the container through the deep learning framework, interacting the first parameter with data of a module of the deep learning framework to acquire a second parameter, and transmitting the second parameter to the container, wherein the second parameter is used for monitoring the performance of parallel operation;

11. An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method of any of claims 1-9.

12. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-9.