CN114819106A

CN114819106A - Calculation graph optimization method and device, electronic equipment and computer readable medium

Info

Publication number: CN114819106A
Application number: CN202210617134.2A
Authority: CN
Inventors: 张伟豪; 王封; 祝夭龙
Original assignee: Beijing Lynxi Technology Co Ltd
Current assignee: Beijing Lynxi Technology Co Ltd
Priority date: 2022-06-01
Filing date: 2022-06-01
Publication date: 2022-07-29

Abstract

The present disclosure provides a computation graph optimization method and apparatus, an electronic device, and a computer-readable medium, wherein the method includes: acquiring a first calculation chart to be optimized; detecting the first computational graph through a pre-trained graph neural network, and determining a first sub-graph to be optimized in the first computational graph; and optimizing the first calculation graph according to the first subgraph to obtain an optimized second calculation graph. According to the embodiment of the disclosure, the optimization efficiency of the computation graph can be improved, so that the compiling efficiency of the computation graph is improved.

Description

Calculation graph optimization method and device, electronic equipment and computer readable medium

Technical Field

The present disclosure relates to the field of computational graph technologies, and in particular, to a computational graph optimization method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the continuous development of artificial intelligence technology, more and more neural networks are applied to production practice. These neural networks are often described in the form of computation graphs during programming, deployment and execution, and the computation graphs are also used in other fields such as high-performance computing as IR (Intermediate Representation), so that optimization (such as compilation optimization and execution optimization) for the computation graphs has been increasingly emphasized.

Disclosure of Invention

The disclosure provides a computation graph optimization method and device, a processing core, an electronic device and a computer-readable storage medium.

In a first aspect, the present disclosure provides a computational graph optimization method, including: acquiring a first calculation chart to be optimized; detecting the first computational graph through a pre-trained graph neural network, and determining a first sub-graph to be optimized in the first computational graph; and optimizing the first calculation graph according to the first subgraph to obtain an optimized second calculation graph.

In a second aspect, the present disclosure provides a computation graph optimization apparatus, including: the obtaining module is used for obtaining a first calculation chart to be optimized; the detection module is used for detecting the first computational graph through a pre-trained graph neural network and determining a first sub-graph to be optimized in the first computational graph; and the optimization module is used for optimizing the first calculation graph according to the first subgraph to obtain an optimized second calculation graph.

In a third aspect, the present disclosure provides an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores one or more computer programs executable by the at least one processor, the one or more computer programs being executable by the at least one processor to enable the at least one processor to perform the computational graph optimization method described above.

In a fourth aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor/processing core, implements the computational graph optimization method described above.

According to the embodiment provided by the disclosure, the Graph Neural Networks (GNN) are used for detecting the calculation Graph, so that fuzzy structures or operators which are difficult to define by uniform and definite rules and to be optimized can be detected, the success rate of detection is improved, the optimization efficiency of the calculation Graph is improved, and the compiling efficiency of the calculation Graph is further improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing in detail exemplary embodiments thereof with reference to the attached drawings, in which:

fig. 1 is a schematic diagram of a flow of optimizing a computation graph in the related art;

fig. 2 is a flowchart of a computational graph optimization method provided by an embodiment of the present disclosure;

FIG. 3 is a flowchart illustrating a portion of steps of a computational graph optimization method provided by an embodiment of the present disclosure;

FIG. 4 is a flowchart illustrating a portion of steps of a computational graph optimization method provided by an embodiment of the present disclosure;

FIG. 5 is a flowchart illustrating a portion of steps of a computational graph optimization method provided by an embodiment of the present disclosure;

FIG. 6 is a flowchart illustrating a portion of steps of a computational graph optimization method according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of a computational graph optimization method provided by an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a computational graph optimization method provided in an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a portion of steps of a computational graph optimization method provided by an embodiment of the present disclosure;

fig. 10 is a block diagram of a computation graph optimization apparatus according to an embodiment of the present disclosure;

fig. 11 is a block diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

To facilitate a better understanding of the technical aspects of the present disclosure, exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, wherein various details of the embodiments of the present disclosure are included to facilitate an understanding, and they should be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Embodiments of the present disclosure and features of embodiments may be combined with each other without conflict.

As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising … …, when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and the present disclosure, and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

In some related technologies, a series of optimization processes are written for a specific operator or structure, these optimization processes are generally referred to as "pass", and the pass is used to complete the conversion, analysis or optimization of a compiled object (e.g., a computation graph). referring to fig. 1, the execution of the pass is the process of the compiler performing the conversion, analysis and optimization on the compiled object, and the compilation result corresponding to the computation graph can be obtained by executing the pass corresponding to the computation graph (i.e., pass1, pass2, pass3, pass4 in the graph).

The specific operator or structure can be searched on the compiled object (such as a computational graph) by a template matching method, namely, the specific operator or structure is expressed by using a definite rule, then the matching is carried out on the compiled object by using a corresponding rule, and after the operator or structure which meets the rule is matched, the computational graph is optimized by using a corresponding compiling process or an optimizing method (such as pass).

However, in practical applications, some specific operators or structures are often fuzzy and difficult to be defined by a uniform and definite rule, for example, branch structures which may cause a great increase in hardware intermediate storage, unbalanced structures which may cause a decrease in calculation utilization rate, and the like, during compiling, these operators or structures often need to be identified, and a targeted optimization method is called for optimization, and these operators or structures are difficult to be identified by using a template matching method, so that the identification accuracy is low, the optimization efficiency of a computation graph is affected, and the compilation efficiency of the computation graph is further affected.

According to the method for optimizing the calculation Graph, the calculation Graph is detected by using the Graph Neural Network (GNN), so that a fuzzy structure or operator which is difficult to define by a uniform explicit rule can be detected, the success rate of detecting the fuzzy structure or operator which is difficult to define by the uniform explicit rule is improved, the optimization efficiency of the calculation Graph is improved, and the compiling efficiency of the calculation Graph is further improved.

The computational graph optimization method according to the embodiment of the present disclosure may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a vehicle-mounted device, a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling a computer-readable program instruction stored in a memory. Alternatively, the method may be performed by a server.

Fig. 2 is a flowchart of a computational graph optimization method according to an embodiment of the present disclosure. Referring to fig. 2, the method includes:

in step S21, a first calculation map to be optimized is acquired;

in step S22, detecting the first computation graph through a pre-trained graph neural network, and determining a first sub-graph to be optimized in the first computation graph;

in step S23, the first computation graph is optimized according to the first sub-graph, and an optimized second computation graph is obtained.

For example, in step S21, the first computation graph may be a computation graph that is not compiled and optimized, or may be an uncompiled computation graph that is optimized by an optimization manner in the related art, and the embodiment of the present disclosure does not limit a specific manner of obtaining the first computation graph.

In some possible implementations, the first computational graph may be an intermediate representation of the neural network during programming, deployment, or execution. The neural network is used for executing processing tasks, and the processing tasks comprise any one of image processing tasks, voice processing tasks, text processing tasks and video processing tasks. The present disclosure does not limit the type of tasks that the neural network performs.

In step S22, the graph neural network for detecting the first computation graph is a graph neural network trained in advance.

In some possible implementations, the graph neural network is trained using a computation graph containing a specific operator or structure and the specific operator or structure at a specific position of the computation graph as a positive sample, so that the trained graph neural network is sensitive to the specific operator or structure.

The first computation graph is detected through a graph neural network, the first computation graph can be input into a trained graph neural network, the graph neural network scans and detects the first computation graph, the output of the graph neural network can be whether the first computation graph contains a specific operator or structure and the specific operator or structure is located at a specific position of the first computation graph, and the specific operator or structure in the first computation graph is a first sub-graph to be optimized in the first computation graph.

In some possible implementations, the output of the graph neural network may include an identification of the first sub-graph corresponding to a particular operator or structure, the identification may be an index of the particular operator or structure from which an optimizer or optimization result corresponding to the particular operator or structure may be determined, e.g., the output of the graph neural network may be directly an index of an optimizer corresponding to the particular operator or structure from which the optimizer corresponding to the particular operator or structure may be found in an optimization library composed of multiple optimizers.

In some possible implementations, the first computational graph may be detected using a plurality of graph neural networks, each of the plurality of graph neural networks being sensitive to a different particular operator or structure; the first computational graph may also be detected using a graph neural network, which is a general-purpose graph neural network that is sensitive to a plurality of specific operators or structures.

In some possible implementations, the number of the first sub-graphs may be multiple, and the result detected by all the graph neural networks is the first sub-graph.

In step S23, when the graph neural network detects a first sub-graph in the first computation graph, the first computation graph is optimized according to the first sub-graph, the first sub-graph may be extracted, and an optimizer corresponding to the first sub-graph is found through the identifier of the first sub-graph, and the optimizer is used to optimize the first sub-graph, and the optimized result is merged into the first computation graph, so as to obtain a second computation graph.

According to the embodiment of the disclosure, the graph neural network is used for detecting the calculation graph, so that a fuzzy structure or operator to be optimized which is difficult to define by a uniform and definite rule can be detected, the success rate of detection is improved, the optimization efficiency of the calculation graph is improved, and the compiling efficiency of the calculation graph is further improved.

The following is a description of a computation graph optimization method according to an embodiment of the present disclosure.

As previously described, the first computational graph may be detected using a plurality of graph neural networks, such as a graph neural network comprising a plurality of first graph neural networks in a network library, each first graph neural network being sensitive to one or more different specific operators or structures. The first computational graph may also be detected using a graph neural network that is sensitive to a plurality of specific operators or structures to determine the first sub-graph.

Fig. 3 is a flowchart of a step of detecting the first computation graph using a plurality of first graph neural networks in the network library, and referring to fig. 3, the step S22 may include steps S31 and S32.

In step S31, dividing the first graph neural networks in the network library into a plurality of detection groups, each detection group including at least one graph neural network;

in step S32, the first computation graph is detected by using each detection group, and at least one first sub-graph in the first computation graph is determined.

For example, in step S31, the plurality of first graph neural networks in the network library may be divided into a plurality of detection groups according to a certain rule, and each detection group may include the same number of first graph neural networks or different numbers of first graph neural networks, but each detection group should include at least one first graph neural network.

In a special case, each test group comprises only one first-graph neural network, that is, there are as many test groups as there are first-graph neural networks in the network library.

In step S32, the first computation graph is detected by using a plurality of detection groups, and a plurality of operation units may be used to operate in parallel, each operation unit runs a first graph neural network, and the plurality of operation units operate in parallel, which is equivalent to the plurality of first graph neural networks to detect the first computation graph in parallel, and determine whether the first computation graph includes a specific operator or structure, i.e. a first sub-graph, to which the first graph neural network of the detection group is sensitive.

In some possible implementations, after a detection group completes detection on a first computation graph, after a first sub-graph in the first computation graph is determined, the first computation graph may be optimized according to the determined first sub-graph, and the optimized computation graph is used as an input of a first graph neural network of a next detection group, and so on until detection of a last detection group is completed, and the computation graph is optimized according to a detection result of the last detection group, and an obtained result is a second computation graph.

That is, the inputs of the first graph neural network of each detection group are consistent, that is, the inputs of the first graph neural networks of different detection groups are different according to the calculation graph (the input of the first detection group is the first calculation graph) with the optimized result output by the first graph neural network of the previous detection group.

In some possible implementations, after all the detection groups have completed detection, the first computation graph may be optimized according to the first subgraph determined by all the detection groups to obtain the second computation graph.

Because each first graph neural network is only sensitive to one or more specific operators or structures, compared with a general graph neural network, the structure is simpler, the training is better, the accuracy is higher, meanwhile, the operation units perform parallel operation, the defect that the total detection time is too long due to the fact that a plurality of first graph neural networks are used for detection is overcome, the total detection time of all the first graph neural networks is shortened, and the detection efficiency is improved.

Fig. 4 is a flowchart illustrating the step of detecting the first computation graph using a graph neural network sensitive to a plurality of specific operators or structures, and referring to fig. 4, the step S22 may include a step S41.

In step S41, the first computation graph is detected by the second graph neural network, and at least one first sub-graph in the first computation graph is determined.

The second graph neural network may be a more general graph neural network which is sensitive to a plurality of specific operators or structures, and after the first computational graph is input into the second graph neural network, the second graph neural network may detect all the specific operators or structures in the first computational graph and output the specific operators or structures in the first computational graph in the form of the first subgraph.

Since the second graph neural network is sensitive to a plurality of specific operators or structures, compared with the first graph neural network, the structure of the second graph neural network is necessarily more complex, the training process is also necessarily more complex, and the time required for detecting once is longer, but the second graph neural network is used for detecting only once, that is, only once detection is needed, namely, only once reading of the first computation graph is needed, so that the reading memory occupied by the second graph neural network is less, and although the time for detecting once is longer than that of the first graph neural network, all the first sub-graphs can be determined only by once detection, so that the total time for detecting the first computation graph by using the second graph neural network may be shorter than that for detecting the first computation graph by using a plurality of first graph neural networks.

Of course, the optimization process of optimizing the first computation graph according to the first sub-graph is consistent whether the first graph neural network of one detection group is detected and the optimization result is used as the input of the next detection group or the first graph neural network of all the detection groups is detected and the first computation graph is optimized according to the results of all the detection groups or the second graph neural network is used for detecting the first computation graph and optimizing the first computation graph according to the detection result.

Fig. 5 is a flowchart illustrating the step of optimizing the first computation graph according to the first sub-graph, and referring to fig. 5, step S23 may include step S51, step S52, and step S53.

In step S51, determining an optimizer corresponding to the first sub-graph according to the identifier of the first sub-graph;

in step S52, calling an optimizer corresponding to the first sub-graph, and optimizing the first sub-graph to obtain a second sub-graph;

in step S53, the second sub-graph is used to replace the corresponding first sub-graph in the first computation graph, so as to obtain a second computation graph.

The identifier of the first sub-graph may be an index of a specific operator or structure, an index of the specific operator or structure, through which an optimizer or optimization result corresponding to the specific operator or structure may be determined, for example, the output of the graph neural network may be directly an index of the optimizer corresponding to the specific operator or structure, through which the optimizer corresponding to the specific operator or structure may be found in the optimization library.

The optimization library may be composed of a plurality of optimizers, each optimizer being operable to optimize a particular operator or structure, and the optimizer being operable to optimize the first sub-graph since the first sub-graph is the particular operator or structure present in the first computational graph.

In some possible implementations, the number of the first sub-graphs may be more than one, and for each first sub-graph, in step S51, an optimizer corresponding to the first sub-graph may be determined from the optimization library according to the identifier of the first sub-graph; in step S52, an optimizer corresponding to the first sub-graph is called from the optimization library, and the first sub-graph is optimized, so that an optimization result is a second sub-graph corresponding to the first sub-graph; in step S53, the corresponding first sub-graph in the first computational graph is replaced with the corresponding second sub-graph of the first sub-graph.

And executing the steps on all the first subgraphs, and obtaining a final calculation graph which is the second calculation graph.

In the optimization process, an intersection may exist between different first subgraphs, for example, the first subgraph a and the first subgraph B share an operator C, after the optimizer corresponding to the first subgraph a optimizes the first subgraph a, the operator C is merged and optimized with other operators, and after the second subgraph corresponding to the first subgraph a is used to replace the first subgraph a to optimize the first computation graph, the optimized computation graph does not have the first subgraph B, that is, the second subgraph corresponding to the first subgraph B cannot be used to optimize the computation graph.

In some possible implementation manners, in the case of determining a plurality of first subgraphs, the first computation graph is optimized according to each first subgraph and the preset priority corresponding to the first subgraph, so as to obtain an optimized second computation graph.

Fig. 6 is a flowchart illustrating the step of optimizing the first computation graph according to each first sub-graph and the preset priority corresponding to the first sub-graph, and referring to fig. 6, the step of optimizing the first computation graph according to each first sub-graph and the preset priority corresponding to the first sub-graph may include step S61 and step S62.

In step S61, in a case where the plurality of first sub-graphs intersect, priorities of the plurality of first sub-graphs are respectively determined;

in step S62, the first computation graph is optimized according to the priority order of the plurality of first subgraphs.

For example, according to the sequence of the priority corresponding to the first subgraph from high to low, the optimizer corresponding to the first subgraph is used for optimizing the first subgraph to obtain a second subgraph, the second subgraph is used for replacing the first subgraph, when a certain first subgraph is run, the first subgraph is found to be disappeared from the optimized first computation graph in the previous optimization process, and the first computation graph is not used for optimizing the first computation graph.

The intersection existing among the first subgraphs means that the first subgraphs share an operator or a structure.

If the operator C is shared by the first sub-graph A and the first sub-graph B, after the optimizer corresponding to the first sub-graph A optimizes the first sub-graph A, the operator C is merged and optimized with other operators, and after the second sub-graph corresponding to the first sub-graph A is used for replacing the first sub-graph A to optimize the first calculation graph, the optimized calculation graph does not have the first sub-graph B. Since the priority of the first sub-graph a is higher than the priority of the first sub-graph B in the predetermined priorities, the optimization of the first computation graph using the first sub-graph B is abandoned.

Meanwhile, the first sub-graph B and the first sub-graph D also share the operator E, and after the optimizer corresponding to the first sub-graph B optimizes the first sub-graph B, the operator E is merged and optimized with other operators, but because the first sub-graph B and the first sub-graph a have intersection, the optimization of the first computation graph by using the first sub-graph B is abandoned, so that the "conflict" between the first sub-graph B and the first sub-graph D is "disappeared", and the first computation graph can be optimized by using the first sub-graph D.

Fig. 7 and 8 are schematic diagrams of an overall flow of a computational graph optimization method according to an embodiment of the present disclosure.

Referring to fig. 7 and 8, the first computation graph is input into the graph neural network, the graph neural network detects the first computation graph, identifies a specific structure or operator in the first computation graph, and invokes an optimizer corresponding to the structure or operator from the optimization library to optimize the specific operator or structure, so as to obtain an optimization result corresponding to the specific operator or structure.

The graph neural network for detecting the first computation graph may be a plurality of first graph neural networks in a network library as shown in fig. 7, or may be a common graph neural network, that is, a second graph neural network as shown in fig. 8.

The optimization library is composed of a plurality of optimizers as described above, and the corresponding optimizer can be obtained from the optimization library through identification of a specific structure or operator.

After obtaining the optimization result corresponding to the specific operator or structure, the optimization result may be used to replace the specific operator or structure in the first computation graph, so as to obtain the optimization result corresponding to the first computation graph, that is, the second computation graph.

In some possible implementations, after the second computation graph is obtained, the second computation graph is compiled, and the compilation result is loaded for use.

Fig. 9 is a flowchart illustrating steps of compiling the second computation graph and loading the compilation result for use. Referring to fig. 9, the compiling the second computation graph and loading the compilation result includes step S91 and step S92.

In step S91, the second computation graph is compiled to obtain a compilation result of the second computation graph.

In step S92, the compilation result is loaded to the plurality of processing cores of the many-core system so that the plurality of processing cores execute the processing task corresponding to the second computation graph.

That is, the first computation graph may be an intermediate representation corresponding to a task loaded on the many-core system, and for example, the first computation graph is an intermediate representation in a graph format constructed according to a neural network loaded on the many-core system, and a processing core of the many-core system cannot directly run the computation graph.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted. Those skilled in the art will appreciate that in the above methods of the specific embodiments, the specific order of execution of the steps should be determined by their function and possibly their inherent logic.

In addition, the present disclosure also provides a computation graph optimization apparatus, an electronic device, and a computer-readable storage medium, which can be used to implement any one of the computation graph optimization methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method sections are not repeated.

Fig. 10 is a block diagram of a computation graph optimization apparatus according to an embodiment of the present disclosure.

Referring to fig. 10, an embodiment of the present disclosure provides a computation graph optimization apparatus, including:

the obtaining module is used for obtaining a first calculation chart to be optimized;

the detection module is used for detecting the first calculation graph through a pre-trained graph neural network and determining a first sub-graph to be optimized in the first calculation graph;

and the optimization module is used for optimizing the first calculation graph according to the first subgraph to obtain an optimized second calculation graph.

Referring to fig. 11, an embodiment of the present disclosure provides an electronic device including: at least one processor 1101; at least one memory 1102, and one or more I/O interfaces 1103 coupled between the processor 1101 and the memory 1102; the memory 1102 stores one or more computer programs executable by the at least one processor 1101, and the one or more computer programs are executed by the at least one processor 1101 to enable the at least one processor 1101 to perform the computational graph optimization method described above.

The disclosed embodiments also provide a computer-readable storage medium having a computer program stored thereon, where the computer program, when executed by a processor/processing core, implements the computational graph optimization method described above. The computer readable storage medium may be a volatile or non-volatile computer readable storage medium.

The disclosed embodiments also provide a computer program product comprising computer readable code or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, the processor in the electronic device performs the above computational graph optimization method.

It will be understood by those of ordinary skill in the art that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable storage media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).

The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable program instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable Read Only Memory (EPROM), Static Random Access Memory (SRAM), flash memory or other memory technology, portable compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer. In addition, communication media typically embodies computer readable program instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

The computer program product described herein may be embodied in hardware, software, or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Example embodiments have been disclosed herein, and although specific terms are employed, they are used and should be interpreted in a generic and descriptive sense only and not for purposes of limitation. In some instances, features, characteristics and/or elements described in connection with a particular embodiment may be used alone or in combination with features, characteristics and/or elements described in connection with other embodiments, unless expressly stated otherwise, as would be apparent to one skilled in the art. Accordingly, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the disclosure as set forth in the appended claims.

Claims

1. A computational graph optimization method, comprising:

acquiring a first calculation chart to be optimized;

detecting the first computational graph through a pre-trained graph neural network, and determining a first sub-graph to be optimized in the first computational graph;

and optimizing the first calculation graph according to the first subgraph to obtain an optimized second calculation graph.

2. The method of claim 1, wherein the graph neural network comprises a plurality of first graph neural networks in a network library;

the detecting the first computational graph through a pre-trained graph neural network to determine a first sub-graph to be optimized in the first computational graph includes:

dividing a plurality of first graph neural networks in the network library into a plurality of detection groups, each detection group comprising at least one graph neural network;

and adopting each detection group to respectively detect the first calculation graph and determine at least one first subgraph in the first calculation graph.

3. The method of claim 1, wherein the graph neural network comprises a second graph neural network;

and detecting the first computational graph through the second graph neural network, and determining at least one first sub-graph in the first computational graph.

4. The method according to claim 2 or 3, wherein the optimizing the first computation graph according to the first sub-graph to obtain an optimized second computation graph comprises:

and under the condition of determining a plurality of first subgraphs, optimizing the first computation graph according to each first subgraph and the preset priority corresponding to the first subgraph to obtain an optimized second computation graph.

5. The method of claim 4, wherein optimizing the first computation graph according to each first sub-graph and the pre-determined priority corresponding to the first sub-graph comprises:

respectively determining the priority of the first sub-graphs under the condition that the first sub-graphs have intersection;

and optimizing the first calculation graph according to the priority order of the plurality of first subgraphs.

6. The method according to claim 1, wherein the optimizing the first computational graph according to the first sub-graph to obtain an optimized second computational graph comprises:

determining an optimizer corresponding to the first sub-graph according to the identification of the first sub-graph;

calling an optimizer corresponding to the first sub-graph, and optimizing the first sub-graph to obtain a second sub-graph;

and replacing the corresponding first subgraph in the first computational graph by using the second subgraph to obtain the second computational graph.

7. The method of claim 1, further comprising:

compiling the second calculation graph to obtain a compiling result of the second calculation graph;

and loading the compiling result to a plurality of processing cores of a many-core system so that the plurality of processing cores execute the processing tasks corresponding to the second computational graph.

8. The method according to any of claims 1-7, wherein the first computational graph is an intermediate representation in the form of a graph constructed from a neural network;

the neural network is used for executing processing tasks, and the processing tasks comprise any one of image processing tasks, voice processing tasks, text processing tasks and video processing tasks.

9. A computation graph optimization apparatus, comprising:

the detection module is used for detecting the first computational graph through a pre-trained graph neural network and determining a first sub-graph to be optimized in the first computational graph;

10. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores one or more computer programs executable by the at least one processor to enable the at least one processor to perform the optimization method of any one of claims 1-8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the optimization method according to any one of claims 1 to 8.