WO2021037196A1

WO2021037196A1 - Smart contract code vulnerability detection method and apparatus, computer device and storage medium

Info

Publication number: WO2021037196A1
Application number: PCT/CN2020/112050
Authority: WO
Inventors: 邱炜伟; 李伟; 李启雷; 张帅; 匡立中
Original assignee: 杭州趣链科技有限公司
Priority date: 2019-08-28
Filing date: 2020-08-28
Publication date: 2021-03-04
Also published as: CN110543419B; CN110543419A

Abstract

The present application relates to a smart contract code vulnerability detection method and apparatus, a computer device, and a storage medium. The smart contract code vulnerability detection method comprises the following steps: acquiring an ordered list of syntax nodes for training a smart contract source code to obtain a node vector, wherein the syntax nodes are nodes in an abstract syntax tree generated by training the smart contract source code; inputting the node vector into a cyclic neural network to obtain the output result of the global maximum pooling layer in the cyclic neural network, and using the output result as the intermediate representation of the smart contract source code; inputting the intermediate representation to a random forest classifier to perform training on the random forest classifier, and performing smart contract code vulnerability detection by means of the trained random forest classifier. The method can predict and locate code vulnerabilities more flexibly and accurately, and is more sensitive in the detection of new code vulnerabilities, without it being necessary for developers to formulate and add corresponding rules or formal specifications in time.

Description

智能合约代码漏洞检测方法、装置、计算机设备和存储介质Smart contract code vulnerability detection method, device, computer equipment and storage medium

相关申请Related application

本申请要求2019年8月28日申请的，申请号为201910802157.9，发明名称为“一种基于深度学习技术的智能合约代码漏洞检测方法”的中国专利申请的优先权，其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on August 28, 2019, the application number is 201910802157.9, and the invention title is "A method for detecting vulnerabilities in smart contract code based on deep learning technology", the entire content of which is incorporated by reference In this application.

技术领域Technical field

本申请涉及区块链技术领域，具体涉及智能合约代码漏洞检测方法、装置、计算机设备和存储介质。This application relates to the field of blockchain technology, and specifically relates to smart contract code vulnerability detection methods, devices, computer equipment, and storage media.

背景技术Background technique

区块链是一种基于数据加密、时间戳、分布式共识机制实现去中心化的分布式数据管理技术，具有可追溯、不可篡改、高可用的特点。智能合约是一种旨在以信息化方式传播、验证或执行合同的计算机协议，其允许在没有第三方的情况下进行可信交易，这些交易可追踪且不可逆转。区块链技术的出现为智能合约提供了一套能够支持可编程的数字***，智能合约一旦在区块链上发布成功之后就无法更改，如果存在安全漏洞将带来巨大的损失，故对智能合约进行代码漏洞检测是极为必要的。Blockchain is a distributed data management technology based on data encryption, time stamping, and distributed consensus mechanism to achieve decentralization. It has the characteristics of traceability, non-tampering, and high availability. A smart contract is a computer protocol designed to spread, verify, or execute a contract in an information-based way. It allows credible transactions without a third party, and these transactions are traceable and irreversible. The emergence of blockchain technology provides a set of programmable digital systems for smart contracts. Once a smart contract is successfully released on the blockchain, it cannot be changed. If there is a security breach, it will bring huge losses. It is extremely necessary for the contract to perform code vulnerability detection.

目前主流的智能合约漏洞检测的解决方案，是采用基于安全规则的技术或者形式化验证，这样就意味当新的代码漏洞被爆出时，开发人员需及时制定和添加相应的规则或形式规范，以更新漏洞检测***。这种规则集检测智能合约漏洞的方法，主要依赖于规则集的准确和完备程度，且检测流程较为耗时。The current mainstream smart contract vulnerability detection solution uses technology based on security rules or formal verification, which means that when new code vulnerabilities are exposed, developers need to formulate and add corresponding rules or formal specifications in a timely manner. Update the vulnerability detection system. This method of rule set detection of smart contract vulnerabilities mainly depends on the accuracy and completeness of the rule set, and the detection process is relatively time-consuming.

发明内容Summary of the invention

为了克服现有技术的不足，本申请的目的在于提供一种智能合约代码漏洞检测方法、装置、计算机设备和存储介质。In order to overcome the shortcomings of the prior art, the purpose of this application is to provide a smart contract code vulnerability detection method, device, computer equipment and storage medium.

根据本申请的第一个方面，提供了一种智能合约代码漏洞检测方法，所述方法包括：获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，所述语法节点是所述训练智能合约源代码生成的抽象语法树中的节点；将所述节点向量输入循环神经网络，获得所述循环神经网络中的全局最大池化层的输出结果，将所述输出结果作为所述训练智能合约源代码的中间表示；将所述中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。According to the first aspect of this application, there is provided a smart contract code vulnerability detection method. The method includes: obtaining an ordered list of grammatical nodes for training the smart contract source code to obtain a node vector, wherein the grammatical node is The node in the abstract syntax tree generated by the training smart contract source code; the node vector is input to the recurrent neural network, the output result of the global maximum pooling layer in the recurrent neural network is obtained, and the output result is used as the output result The intermediate representation of the source code of the training smart contract; the intermediate representation is input to the random forest classifier, the random forest classifier is trained, and the smart contract code vulnerability detection is performed through the trained random forest classifier.

在其中一些实施例中，所述获取训练智能合约源代码的语法节点的有序列表，得到节点向量包括：通过语法解析工具对所述训练智能合约源代码进行语法解析，生成抽象语法树；获取所述抽象语法树中的各个所述语法节点的有序列表，指示所述有序列表为一维向量，根据所述一维向量确定所述节点向量。In some of the embodiments, the obtaining an ordered list of grammatical nodes of the source code of the training smart contract to obtain a node vector includes: performing grammatical analysis on the source code of the training smart contract through a grammar analysis tool to generate an abstract syntax tree; The ordered list of each of the syntax nodes in the abstract syntax tree indicates that the ordered list is a one-dimensional vector, and the node vector is determined according to the one-dimensional vector.

在其中一些实施例中，所述获取所述抽象语法树中的各个所述语法节点的有序列表包括：通过深度优先遍历的方式对所述抽象语法树中的各个所述语法节点进行序列化，获取所述有序列表。In some of the embodiments, the obtaining the ordered list of each of the syntax nodes in the abstract syntax tree includes: serializing each of the syntax nodes in the abstract syntax tree in a depth-first traversal manner , To obtain the ordered list.

在其中一些实施例中，所述获取所述抽象语法树中的各个所述语法节点的有序列表，指示所述有序列表为一维向量之后，所述方法包括：对所述一维向量进行语法标记，所述语法标记将各个所述语法节点映射为整数并将所述一维向量中的各个所述语法节点所对应的整数组成数组，指示所述语法标记后的所述一维向量为所述节点向量。In some of the embodiments, after the obtaining the ordered list of each of the syntax nodes in the abstract syntax tree and indicating that the ordered list is a one-dimensional vector, the method includes: checking the one-dimensional vector Perform a grammar mark, which maps each of the syntax nodes to integers and forms an array of integers corresponding to each of the syntax nodes in the one-dimensional vector, indicating the one-dimensional vector after the syntax mark Is the node vector.

在其中一些实施例中，在对所述一维向量进行语法标记之后，所述方法包括：对所述数组进行填充截断，所述填充截断对所述数组进行末位截断或者末位补零，使得所述数组长度等于预设长度。In some of the embodiments, after the one-dimensional vector is grammatically marked, the method includes: padding and truncating the array, and the padding truncation performs the last bit truncation or the last bit zero padding on the array, Make the length of the array equal to the preset length.

在其中一些实施例中，所述循环神经网络为双向长短期记忆神经网络。In some of the embodiments, the cyclic neural network is a bidirectional long and short-term memory neural network.

根据本申请的第二个方面，提供了一种基于深度学习技术的智能合约代码漏洞检测方法，该方法具体包括如下步骤：S1：将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；S2：将S1得到的节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；将S2得到的智能合约源代码的中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。According to the second aspect of this application, a method for detecting smart contract code vulnerabilities based on deep learning technology is provided. The method specifically includes the following steps: S1: Serialize and preprocess the syntax nodes of the smart contract source code, Get the node vector; S2: Use the node vector obtained by S1 as the input of the two-way long and short-term memory network, and obtain the output of the global maximum pooling layer in the neural network as the intermediate representation of the smart contract source code; the smart contract source obtained by S2 The middle representation of the code is used as the input of the random forest classifier to train the random forest classifier, and the trained classifier is obtained for new smart contract code vulnerability detection.

在其中一些实施例中，所述的S1具体为：S1.1：利用语法解析工具，将智能合约源代码字符串进行语法解析，生成抽象语法树；S1.2：将抽象语法树进行深度优先遍历，并序列化为由语法节点构成的一维向量；S1.3：将所述的一维向量进行预处理，通过语法标记和填充截断生成预处理后的节点向量；所述的预处理过程中需保持代码结构和语义完整。In some of the embodiments, the S1 is specifically: S1.1: Use a syntax analysis tool to parse the smart contract source code string to generate an abstract syntax tree; S1.2: Depth-first the abstract syntax tree Traverse and serialize it into a one-dimensional vector composed of grammatical nodes; S1.3: preprocess the one-dimensional vector, and generate a preprocessed node vector through grammar mark and fill truncation; the preprocessing process The code structure and semantics must be kept intact in the middle.

在其中一些实施例中，所述的S1.2中的一维向量是由语法节点构成的有序列表，每个一维向量表示对应的智能合约函数。In some of the embodiments, the one-dimensional vector in S1.2 is an ordered list composed of syntax nodes, and each one-dimensional vector represents a corresponding smart contract function.

在其中一些实施例中，所述的S1.3中的填充截断位长c应介于最长一维向量列表长度和最短一维向量列表长度之间。In some of the embodiments, the padding and truncation bit length c in S1.3 should be between the length of the longest one-dimensional vector list and the shortest one-dimensional vector list length.

在其中一些实施例中，所述的S1.3中的填充截断规则为对于列表长度大于填充截断位长c进行列表末位截断，使其长度等于位长c；对于列表长度小于填充截断位长c进行列表末位补零，使其长度等于位长c。In some of the embodiments, the padding truncation rule in S1.3 is that the list length is greater than the padding truncation bit length c to perform the end-of-list truncation so that the length is equal to the bit length c; for the list length less than the padding truncation bit length c zero-padded the end of the list to make its length equal to the bit length c.

根据本申请的第三个方面，提供了一种智能合约代码漏洞检测装置，所述装置包括向量模块、神经网络模块和分类器模块，所述向量模块用于获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，所述语法节点是所述训练智能合约源代码生成的抽象语法树中的节点；所述神经网络模块用于将所述节点向量输入循环神经网络，获得所述循环神经网络中的全局最大池化层的输出结果，将所述输出结果作为所述智能合约源代码的中间表示；所述分类器模块用于将所述中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。According to the third aspect of the present application, there is provided a smart contract code vulnerability detection device. The device includes a vector module, a neural network module, and a classifier module. The vector module is used to obtain a grammar node for training the smart contract source code. To obtain a node vector, where the grammar node is a node in the abstract syntax tree generated by the source code of the training smart contract; the neural network module is used to input the node vector into the recurrent neural network to obtain The output result of the global maximum pooling layer in the recurrent neural network uses the output result as an intermediate representation of the smart contract source code; the classifier module is used to input the intermediate representation into a random forest classifier, Carry out the training of the random forest classifier, and detect the vulnerability of the smart contract code through the trained random forest classifier.

根据本申请的第四个方面，提供了一种智能合约代码漏洞检测装置，所述装置包括向量模块、神经网络模块和分类器模块，所述向量模块用于将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；所述神经网络模块用于将节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；所述分类器模块用于将中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。According to the fourth aspect of the present application, there is provided a smart contract code vulnerability detection device. The device includes a vector module, a neural network module, and a classifier module. The vector module is used to perform grammar nodes of the smart contract source code. Serialize and perform preprocessing to obtain node vectors; the neural network module is used to take the node vectors as the input of the two-way long and short-term memory network, and obtain the output of the global maximum pooling layer in the neural network, as the middle of the smart contract source code Representation; The classifier module is used to use the intermediate representation as the input of the random forest classifier, perform random forest classifier training, and obtain the trained classifier for new smart contract code vulnerability detection.

根据本申请的第五个方面，提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述计算机程序时实现上述基于配置区块的区块链多级签名方法的步骤。According to a fifth aspect of the present application, there is provided a computer device including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor implements the above-mentioned computer program when the computer program is executed. Configure the steps of the block chain multi-level signature method.

根据本申请的第六个方面，提供了一种计算机可读存储介质，其上存储有计算机程序，，所述计算机程序被处理器执行时实现上述基于配置区块的区块链多级签名方法的步骤。According to a sixth aspect of the present application, there is provided a computer-readable storage medium on which a computer program is stored, which, when executed by a processor, implements the above-mentioned configuration block-based blockchain multi-level signature method A step of.

上述智能合约代码漏洞检测方法、装置、计算机设备和存储介质，通过获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测，能够更加灵活和准确地预测并定位代码漏洞，对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。The above smart contract code vulnerability detection method, device, computer equipment and storage medium obtain an ordered list of grammar nodes for training smart contract source code to obtain a node vector, where the grammar node is an abstract syntax tree generated by training smart contract source code In the node; input the node vector into the cyclic neural network to obtain the output result of the global maximum pooling layer in the cyclic neural network, and use the output result as the intermediate representation of the smart contract source code; input the intermediate representation into the random forest classifier for random The training of forest classifier, through the training of random forest classifier to detect smart contract code vulnerabilities, can predict and locate code vulnerabilities more flexibly and accurately, and be more sensitive to new code vulnerabilities detection, without the need for developers to make and add in time Corresponding rules or formal specifications.

附图说明Description of the drawings

为了更好地描述和说明这里公开的那些发明的实施例和/或示例，可以参考一幅或多幅附图。用于描述附图的附加细节或示例不应当被认为是对所公开的发明、目前描述的实施例和/或示例以及目前理解的这些发明的最佳模式中的任何一者的范围的限制。In order to better describe and explain the embodiments and/or examples of those inventions disclosed herein, one or more drawings may be referred to. The additional details or examples used to describe the drawings should not be considered as limiting the scope of any of the disclosed inventions, the currently described embodiments and/or examples, and the best mode of these inventions currently understood.

图1是根据本申请一个实施例中的智能合约代码漏洞检测方法的流程图。Fig. 1 is a flowchart of a method for detecting a code vulnerability of a smart contract according to an embodiment of the present application.

图2是根据本申请一个实施例中智能合约代码漏洞检测方法中获取节点向量的流程图。Fig. 2 is a flowchart of obtaining a node vector in a method for detecting a vulnerability of a smart contract code in an embodiment of the present application.

图3是根据本申请另一个实施例中智能合约代码漏洞检测方法的流程图。Fig. 3 is a flowchart of a method for detecting smart contract code vulnerabilities according to another embodiment of the present application.

具体实施方式detailed description

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行描述和说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。基于本申请提供的实施例，本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer, the following describes and illustrates this application with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application. Based on the embodiments provided in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

显而易见地，下面描述中的附图仅仅是本申请的一些示例或实施例，对于本领域的普通技术人员而言，在不付出创造性劳动的前提下，还可以根据这些附图将本申请应用于其他类似情景。此外，还可以理解的是，虽然这种开发过程中所作出的努力可能是复杂并且冗长的，然而对于与本申请公开的内容相关的本领域的普通技术人员而言，在本申请揭露的技术内容的基础上进行的一些设计，制造或者生产等变更只是常规的技术手段，不应当理解为本申请公开的内容不充分。Obviously, the drawings in the following description are only some examples or embodiments of the application. For those of ordinary skill in the art, without creative work, the application can also be applied to the application according to these drawings. Other similar scenarios. In addition, it can also be understood that although the efforts made in this development process may be complicated and lengthy, for those of ordinary skill in the art related to the content disclosed in this application, the technology disclosed in this application Some design, manufacturing or production changes made on the basis of the content are just conventional technical means and should not be construed as insufficient content disclosed in this application.

在本申请中提及“实施例”意味着，结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例，也不是与其它实施例互斥的独立的或备选的实施例。本领域普通技术人员显式地和隐式地理解的是，本申请所描述的实施例在不冲突的情况下，可以与其它实施例相结合。The reference to "embodiments" in this application means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those of ordinary skill in the art clearly and implicitly understand that the embodiments described in this application can be combined with other embodiments without conflict.

除非另作定义，本申请所涉及的技术术语或者科学术语应当为本申请所属技术领域内具有一般技能的人士所理解的通常意义。本申请所涉及的“一”、“一个”、“一种”、“该”等类似词语并不表示数量限制，可表示单数或复数。本申请所涉及的术语“包括”、“包含”、“具有”以及它们任何变形，意图在于覆盖不排他的包含；例如包含了一系列步骤或模块(单元)的过程、方法、***、产品或设备没有限定于已列出的步骤或单元，而是可以还包括没有列出的步骤或单元，或可以还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。本申请所涉及的“连接”、“相连”、“耦接”等类似的词语并非限定于物理的或者机械的连接，而是可以包括电气的连接，不管是直接的还是间接的。本申请所涉及的“多个”是指两个或两个以上。“和/或”描述关联对象的关联关系，表示可以存在三种关系，例如，“A和/或B”可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。本申请所涉及的术语“第一”、“第二”、“第三”等仅仅是区别类似的对象，不代表针对对象的特定排序。Unless otherwise defined, the technical terms or scientific terms involved in this application shall have the usual meanings understood by those with general skills in the technical field to which this application belongs. The terms "a", "an", "one", "the" and other similar words referred to in this application do not indicate a quantitative limit, and may indicate a singular or plural number. The terms "include", "include", "have" and any of their variations mentioned in this application are intended to cover non-exclusive inclusion; for example, a process, method, system, product, or product that includes a series of steps or modules (units) The equipment is not limited to the listed steps or units, but may further include unlisted steps or units, or may further include other steps or units inherent to these processes, methods, products, or equipment. The terms "connected", "connected", "coupled" and the like referred to in this application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The "plurality" referred to in this application refers to two or more. "And/or" describes the association relationship of the associated objects, which means that there can be three kinds of relationships. For example, "A and/or B" can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship. The terms "first", "second", "third", etc. involved in this application merely distinguish similar objects, and do not represent a specific ordering of objects.

在一个实施例中，图1是根据本申请一个实施例中的智能合约代码漏洞检测方法的流程图，如图1所示，该方法包括以下步骤：步骤S110，获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点。抽象语法树(Abstract Syntax Tree，简称AST)是源代码语法结构的一种抽象表示，它以树状的形式表现编程语言的语法结构，树上的每个节点都表示源代码中的一种结构。将用于模型训练的训练智能合约源代码进行语法解析，生成抽象语法树。上述语法解析包括读取源代码，然后把源代码按照预定的规则合并成一个个的标识tokens；同时移除空白符、注释等；最后，整个源代码将被分割进一个tokens列表。当词法分析源代码的时候，会一个一个字母地读取源代码，该过程可以被称为扫描；当遇到空格、操作符，或者特殊符号的时候，则解析器会认为一个话已经完成了。随后，解析器会将词法分析出来的数组转换成树形的形式。针对抽象语法树上的各个语法节点，都能根据树形结构获取到从根节点到当前语法节点的有序列表，指示该有序列表为节点向量。在本步骤中，用于训练的智能合约源代码即包括有漏洞的源代码也包括无漏洞的源代码。In one embodiment, FIG. 1 is a flowchart of a smart contract code vulnerability detection method according to an embodiment of the present application. As shown in FIG. 1, the method includes the following steps: Step S110: Obtain the grammar of the source code of the training smart contract From an ordered list of nodes, a node vector is obtained, where the grammar node is a node in the abstract syntax tree generated by training the smart contract source code. Abstract Syntax Tree (AST) is an abstract representation of the grammatical structure of source code. It expresses the grammatical structure of a programming language in a tree-like form. Each node on the tree represents a structure in the source code. . The source code of the training smart contract used for model training is grammatically analyzed to generate an abstract syntax tree. The above-mentioned grammatical analysis includes reading the source code, and then combining the source code into individual identification tokens according to predetermined rules; at the same time removing whitespace, comments, etc.; finally, the entire source code will be divided into a tokens list. When lexical analysis of the source code, the source code will be read letter by letter. This process can be called scanning; when spaces, operators, or special symbols are encountered, the parser will think that a word has been completed . Subsequently, the parser will convert the lexically analyzed array into a tree form. For each syntax node on the abstract syntax tree, an ordered list from the root node to the current syntax node can be obtained according to the tree structure, indicating that the ordered list is a node vector. In this step, the smart contract source code used for training includes both vulnerable source code and non-vulnerable source code.

步骤S120，将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为训练智能合约源代码的中间表示。在对长序列进行学习时，循环神经网络会出现梯度消失和梯度***现象，无法掌握长跨度的非线性关系，为解决长期依赖问题，选择常用的循环神经网络例如双向循环神经网络(Bidirectional RNN,Bi-RNN)和长短期记忆网络(Long Short-Term Memory networks，LSTM)进行节点向量的处理。在一些实施例中，通过LSTM网络进行节点向量的处理，由于部分智能合约的漏洞问题，是蕴含在源码的语法结构里的，而通过构建抽象语法树并进行深度优先遍历，在一定程度上保留了源码的结构和语义信息。这样的表示形式，可以充分利用双向LSTM神经网络对于时间序列数据的处理优势。将步骤S110中获取到的节点向量输入循环神经网络中，并获取循环神经网络中的全局最大池化层的输出结果，作为训练智能合约源代码的中间表示。In step S120, the node vector is input to the cyclic neural network to obtain the output result of the global maximum pooling layer in the cyclic neural network, and the output result is used as an intermediate representation of the source code of the training smart contract. When learning a long sequence, the cyclic neural network will have the phenomenon of gradient disappearance and gradient explosion, and it is impossible to grasp the long-span nonlinear relationship. In order to solve the long-term dependence problem, the commonly used cyclic neural network such as the bidirectional cyclic neural network (Bidirectional RNN, Bi-RNN) and Long Short-Term Memory networks (Long Short-Term Memory networks, LSTM) process node vectors. In some embodiments, the processing of node vectors through the LSTM network, due to the vulnerability of some smart contracts, is contained in the grammatical structure of the source code, and by constructing an abstract syntax tree and performing depth-first traversal, it is retained to a certain extent The structure and semantic information of the source code are described. This form of representation can take full advantage of the processing advantages of the two-way LSTM neural network for time series data. The node vector obtained in step S110 is input into the recurrent neural network, and the output result of the global maximum pooling layer in the recurrent neural network is obtained as an intermediate representation of the source code of the training smart contract.

步骤S130，将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。将步骤S120中获取到的中间表示输入到随机森林分类器中，利用中间表示对森林分类器进行训练。训练通过后，利用训练后的随机分类器对待检测的智能合约代码进行漏洞检测。需要说明的是，由于随机森林分类器的输出结果通常是有漏洞和无漏洞这样的判断结果，因此在训练过程中，可以选择合适的粒度去衡量漏洞的等级，对漏洞库、代码库进行调研和漏洞标识。在一个实施例中，由于分类器只能给出合约的某个代码片段是有漏洞的，不能给出漏洞的具体和内容，因此可以事先对各种类型的漏洞提供描述信息，然后将漏洞描述信息和分类器检测出的漏洞之间建立一一对应关系，这个过程即为漏洞标识。可选地，漏洞描述信息可以包括漏洞等级、问题原因、解决方案等，这些信息需要对人工对已有的漏洞库和代码库进行调研，其中对漏洞等级的衡量则需要设置响应的粒度，如可能的经济损失、运行性能损失、影响范围等。In step S130, the middle representation is input to the random forest classifier, the random forest classifier is trained, and the smart contract code vulnerability detection is performed through the trained random forest classifier. Input the intermediate representation obtained in step S120 into the random forest classifier, and use the intermediate representation to train the forest classifier. After the training is passed, use the trained random classifier to perform vulnerability detection on the smart contract code to be detected. It should be noted that since the output result of the random forest classifier is usually the judgment result of loopholes and no loopholes, in the training process, you can choose the appropriate granularity to measure the level of loopholes, and conduct research on the loophole library and code library. And vulnerability identification. In one embodiment, because the classifier can only give a certain code fragment of the contract that is vulnerable, and cannot give the specific and content of the vulnerability, it can provide description information for various types of vulnerabilities in advance, and then describe the vulnerability A one-to-one correspondence is established between the information and the vulnerabilities detected by the classifier, and this process is the vulnerability identification. Optionally, the vulnerability description information can include the vulnerability level, the cause of the problem, the solution, etc. This information requires manual investigation of the existing vulnerability library and code library, and the measurement of the vulnerability level requires setting the granularity of the response, such as Possible economic loss, loss of operating performance, scope of influence, etc.

步骤S110至步骤S130中，通过获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。本发明的基于深度学习技术的智能合约代码漏洞检测方法，通过对神经网络模型进行智能合约代码漏洞的训练，能够更加灵活和准确地预测并定位代码漏洞。In step S110 to step S130, a node vector is obtained by obtaining an ordered list of grammar nodes of the source code of the training smart contract, where the grammar node is a node in the abstract syntax tree generated by the source code of the training smart contract; the node vector is input to the loop The neural network obtains the output result of the global maximum pooling layer in the recurrent neural network, and uses the output result as the intermediate representation of the smart contract source code; the intermediate representation is input to the random forest classifier, and the random forest classifier is trained. The random forest classifier is used to detect smart contract code vulnerabilities. The smart contract code vulnerability detection method based on the deep learning technology of the present invention can predict and locate code vulnerabilities more flexibly and accurately by training the neural network model for smart contract code vulnerabilities.

在其中一些实施例中，图2是根据本申请一个实施例中智能合约代码漏洞检测方法中获取节点向量的流程图，如图2所示，获取训练智能合约源代码的语法节点的有序列表，得到节点向量包括：步骤S210，通过语法解析工具对所述训练智能合约源代码进行语法解析，生成抽象语法树；步骤S220，获取抽象语法树中的各个语法节点的有序列表，指示有序列表为一维向量，根据一维向量确定节点向量。在步骤S210至步骤S220中，获取训练智能合约源代码后，利用语法解析工具，如Antrl4、Yacc等，通过编写符合智能合约语法规则的匹配列表，将智能合约源代码字符串进行语法解析，生成抽象语法树。对抽象语法树上的每一个语法节点进行序列化，每个语法节点都用一个一维向量表示。同时，序列化后的向量很好地保持了节点的结构和语义信息。每个一维向量与智能合约中的函数一一对应，即每个一维向量都可以表示对应的智能合约函数。In some of the embodiments, FIG. 2 is a flowchart of obtaining node vectors in the smart contract code vulnerability detection method in an embodiment of the present application. As shown in FIG. 2, an ordered list of grammar nodes for training smart contract source code is obtained , Obtaining the node vector includes: step S210, grammatically analyzing the source code of the training smart contract through a grammatical analysis tool to generate an abstract syntax tree; step S220, obtaining an ordered list of each grammatical node in the abstract syntax tree, indicating order The list is a one-dimensional vector, and the node vector is determined according to the one-dimensional vector. In step S210 to step S220, after obtaining the source code of the training smart contract, use grammatical analysis tools, such as Antrl4, Yacc, etc., by compiling a matching list that complies with the grammatical rules of the smart contract, and parse the smart contract source code string to generate Abstract syntax tree. Each grammar node on the abstract syntax tree is serialized, and each grammar node is represented by a one-dimensional vector. At the same time, the serialized vector maintains the structure and semantic information of the node well. Each one-dimensional vector has a one-to-one correspondence with the function in the smart contract, that is, each one-dimensional vector can represent the corresponding smart contract function.

在其中一些实施例中，获取抽象语法树中的各个所述语法节点的有序列表包括：通过深度优先遍历的方式对抽象语法树中的各个语法节点进行序列化，获取有序列表。图的搜索通常有两种方式，一种是深度优先遍历，另一种是广度优先遍历。广度优先遍历是按层来处理顶点，距离开始点最近的那些顶点首先被访问，而最远的那些顶点则最后被访问；深度优先遍历则是在每一个可能的分支路径深入到不能再深入为止，而且每个节点只能访问一次。在本实施例中，通过深度优先遍历的方式对各个语法节点进行序列化，能够在占用较少的内存的情况下完成遍历，节省了遍历过程中的内存资源耗能。In some of the embodiments, obtaining the ordered list of each syntax node in the abstract syntax tree includes: serializing each syntax node in the abstract syntax tree in a depth-first traversal manner to obtain the ordered list. There are usually two ways to search for graphs, one is depth-first traversal, and the other is breadth-first traversal. Breadth-first traversal is to process vertices in layers, the vertices closest to the starting point are visited first, and the vertices farthest are visited last; depth-first traversal is to deepen each possible branch path to the point where it can’t go further. , And each node can only be accessed once. In this embodiment, each syntax node is serialized in a depth-first traversal manner, which can complete the traversal while occupying less memory, and saves the energy consumption of memory resources in the traversal process.

在其中一些实施例中，获取抽象语法树中的各个语法节点的有序列表，指示有序列表为一维向量之后，还包括以下步骤：对一维向量进行语法标记，所述语法标记将各个所述语法节点映射为整数并将所述一维向量中的各个语法节点所对应的整数组成数组，指示语法标记后的一维向量为节点向量。语法标记指的是将一维向量的各组成节点映射为多个整数的组成，如“public”表示数字1，“private”表示数字2，再根据一维向量中的节点的有序列表，将一维向量通过数组的形式表达出来，上述数组可以是一串整数拼接得到的数字字符。通语法标记可以提高对向量的处理效率。In some of the embodiments, after obtaining the ordered list of each grammar node in the abstract syntax tree, and indicating that the ordered list is a one-dimensional vector, it further includes the following steps: grammatically labeling the one-dimensional vector, and the grammatical labeling The syntax nodes are mapped to integers and the integers corresponding to each syntax node in the one-dimensional vector form an array, indicating that the one-dimensional vector after the syntax mark is a node vector. Syntax notation refers to the mapping of each component node of a one-dimensional vector to a composition of multiple integers. For example, "public" represents the number 1, and "private" represents the number 2, and then according to the ordered list of nodes in the one-dimensional vector, The one-dimensional vector is expressed in the form of an array, and the above-mentioned array can be a string of numeric characters obtained by concatenating integers. General grammar marks can improve the efficiency of vector processing.

在其中一些实施例中，在对一维向量进行语法标记之后，还对数组进行填充截断，填充截断对数组进行末位截断或者末位补零，使得数组长度等于预设长度。填充截断是为了使得每一个一维向量有统一的长度。选取数值c为预设的数组长度，对位长小于c的一维向量进行末位补充0的填充操作，对于位长大于c的一维向量进行截断操作。填充截断位长c应介于最长一维向量列表长度和最短一维向量列表长度之间。一维向量的列表长度取决于具体输入的智能合约源代码，向量是由智能合约源代码而产生的，所以最长长度和最短长度取决于智能合约源代码的结构。本实施例中的方式能够进一步提高向量的处理效率。In some of these embodiments, after the one-dimensional vector is grammatically marked, the array is also filled and truncated, and the filling truncation performs the last truncation or the last zero padding on the array, so that the length of the array is equal to the preset length. Filling truncation is to make each one-dimensional vector have a uniform length. The value c is selected as the preset array length, the one-dimensional vector whose bit length is less than c is filled with 0 at the end, and the one-dimensional vector whose bit length is greater than c is truncated. The padding truncation bit length c should be between the length of the longest one-dimensional vector list and the length of the shortest one-dimensional vector list. The length of the list of one-dimensional vectors depends on the specific input smart contract source code. The vector is generated from the smart contract source code, so the longest and shortest lengths depend on the structure of the smart contract source code. The method in this embodiment can further improve the vector processing efficiency.

根据本申请的另一个方面，提供了一种智能合约代码漏洞检测方法，图3是根据本发明另一个实施例中智能合约代码漏洞检测方法的流程图，如图3所示，具体包括如下的步骤：步骤S1：将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量。步骤S1还包括：S1.1：利用语法解析工具，如Antrl4、Yacc等，通过编写符合智能合约语法规则的匹配列表，将智能合约源代码字符串进行语法解析，生成抽象语法树AST；S1.2：采用深度优先遍历(DFT)的方式将每一个语法节点进行序列化，每个语法节点都有一个构成的一维向量表示。同时，序列化后的向量很好地保持了节点的结构和语义信息。每个一维向量与智能合约中的函数一一对应，每个一维向量都可以表示对应的智能合约函数；S1.3：将所述的一维向量进行预处理，该操作分为语法标记和填充截断的方式。语法标记指的是将一维向量的各组成节点映射为多个整数的组成，如“public”表示数字1，“private”表示数字2等用来提高处理能力；上述数字的选取不一定要遵循特定的规则，只要同一次算法中保持统一的转化规则即可，可以将“public”表示为数字1，也可以表示为100，并不会影响结果。但是一旦某个向量按照一个取值规则进行处理，那么本次算法中所有向量都应该遵循这个取值规则进行处理。填充截断是为了规范每一个一位向量的长度，方便数据的处理，选取合适的数值c进行填充截断，对位长小于c的向量进行末位补充0的填充操作，对位长大于c的向量进行截断操作。填充截断位长c应介于最长一维向量列表长度和最短一维向量列表长度之间。其中的填充截断规则为对于列表长度大于填充截断位长c进行列表末位截断，使其长度等于位长c；对于列表长度小于填充截断位长c进行列表末位补零，使其长度等于位长c。According to another aspect of this application, there is provided a smart contract code vulnerability detection method. FIG. 3 is a flowchart of the smart contract code vulnerability detection method according to another embodiment of the present invention, as shown in FIG. 3, which specifically includes the following Steps: Step S1: The grammar nodes of the smart contract source code are serialized and preprocessed to obtain a node vector. Step S1 also includes: S1.1: Using syntax analysis tools, such as Antrl4, Yacc, etc., by compiling a matching list that complies with the smart contract syntax rules, the smart contract source code string is grammatically analyzed to generate an abstract syntax tree AST; S1. 2: Use depth-first traversal (DFT) to serialize each syntax node, and each syntax node has a one-dimensional vector representation. At the same time, the serialized vector maintains the structure and semantic information of the node well. Each one-dimensional vector has a one-to-one correspondence with the function in the smart contract, and each one-dimensional vector can represent the corresponding smart contract function; S1.3: preprocess the one-dimensional vector, and this operation is divided into grammar The way to mark and fill truncation. Syntax notation refers to the mapping of each component node of a one-dimensional vector to a composition of multiple integers, such as "public" for number 1, "private" for number 2, etc. to improve processing capabilities; the selection of the above numbers does not necessarily follow For a specific rule, as long as a unified conversion rule is maintained in the same algorithm, "public" can be expressed as the number 1 or 100, and it will not affect the result. But once a vector is processed according to a value rule, all vectors in this algorithm should be processed in accordance with this value rule. Filling truncation is to standardize the length of each one-bit vector to facilitate data processing. Choose a suitable value c for filling and truncation, perform the filling operation of supplementing the last bit with 0 for vectors with bit length less than c, and for vectors with bit length greater than c Perform truncation operation. The padding truncation bit length c should be between the length of the longest one-dimensional vector list and the length of the shortest one-dimensional vector list. The padding and truncation rule is that the list length is greater than the padding truncation bit length c, and the list end is truncated to make the length equal to the bit length c; for the list length less than the padding truncation bit length c, the list end bit is zero-filled to make the length equal to the bit length. Long c.

步骤S2：LSTM神经网络处理节点信息；将S1得到的节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示。LSTM是一种改进之后的循环神经网络，可以解决RNN无法处理长距离的依赖的问题。本发明之所以采用LSTM神经网络，在于部分智能合约的漏洞问题，是蕴含在源码的语法结构里的，而通过构建抽象语法树并进行深度优先遍历，在一定程度上保留了源码的结构和语义信息。这样的表示形式，可以充分利用双向LSTM神经网络对于时间序列数据的处理优势。Step S2: The LSTM neural network processes the node information; the node vector obtained by S1 is used as the input of the two-way long and short-term memory network, and the output of the global maximum pooling layer in the neural network is obtained as the intermediate representation of the smart contract source code. LSTM is an improved recurrent neural network, which can solve the problem that RNN cannot handle long-distance dependence. The reason why the LSTM neural network is used in the present invention is that the vulnerability of some smart contracts is contained in the grammatical structure of the source code. By constructing an abstract syntax tree and performing depth-first traversal, the structure and semantics of the source code are retained to a certain extent. information. This form of representation can take full advantage of the processing advantages of the two-way LSTM neural network for time series data.

步骤S3：将S2得到的智能合约源代码的中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。本步骤操作需要合适的粒度去衡量漏洞的等级，对漏洞库、代码库进行调研和漏洞标识。Step S3: The intermediate representation of the smart contract source code obtained by S2 is used as the input of the random forest classifier, and the random forest classifier is trained, and the trained classifier is obtained for new smart contract code vulnerability detection. The operation of this step requires appropriate granularity to measure the level of vulnerabilities, and conduct research and identification of vulnerabilities and code libraries.

上述智能合约代码漏洞检测方法，基于深度学习技术的智能合约代码漏洞检测方法，通过对神经网络模型进行智能合约代码漏洞的训练，能够更加灵活和准确地预测并定位代码漏洞。相比于目前采用基于安全规则的技术或者形式化验证来检测代码漏洞检测方式，该种方法对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。同时，随着区块链技术应用的普及，越来越多可用的智能合约代码数据集，对于模型的完善也有很大帮助。在有限的实验数据下，例如约3000条数据集进行10折交叉验证，理想情况下，模型预测准确率可达到85％乃至90％以上。The smart contract code vulnerability detection method mentioned above, the smart contract code vulnerability detection method based on deep learning technology, can predict and locate code vulnerabilities more flexibly and accurately by training the neural network model for smart contract code vulnerabilities. Compared with the current method of detecting code vulnerabilities based on security rules or formal verification, this method is more sensitive to new code vulnerabilities detection and does not require developers to formulate and add corresponding rules or formal specifications in time. At the same time, with the popularization of blockchain technology applications, more and more available smart contract code data sets are also very helpful to the improvement of the model. With limited experimental data, such as about 3000 data sets for 10-fold cross-validation, ideally, the model prediction accuracy can reach 85% or even more than 90%.

根据本申请的另一个方面，提供了一种智能合约代码漏洞检测装置，包括向量模块、神经网络模块和分类器模块，向量模块用于获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；神经网络模块用于将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；分类器模块用于将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。According to another aspect of the present application, a smart contract code vulnerability detection device is provided, including a vector module, a neural network module, and a classifier module. The vector module is used to obtain an ordered list of grammatical nodes for training smart contract source code to obtain The node vector, where the grammar node is the node in the abstract syntax tree generated by training the smart contract source code; the neural network module is used to input the node vector into the recurrent neural network to obtain the output result of the global maximum pooling layer in the recurrent neural network, The output result is used as the intermediate representation of the smart contract source code; the classifier module is used to input the intermediate representation into the random forest classifier, perform the training of the random forest classifier, and detect the smart contract code vulnerability detection through the trained random forest classifier.

上述智能合约代码漏洞检测装置，通过获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测，能够更加灵活和准确地预测并定位代码漏洞，对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。The smart contract code vulnerability detection device described above obtains a node vector by obtaining an ordered list of grammar nodes for training the smart contract source code, where the grammar node is a node in the abstract syntax tree generated by training the smart contract source code; input the node vector Recurrent neural network, obtain the output result of the global maximum pooling layer in the recurrent neural network, and use the output result as the intermediate representation of the smart contract source code; input the intermediate representation into the random forest classifier, and conduct the training of the random forest classifier. The latter random forest classifier performs detection of smart contract code vulnerabilities, which can predict and locate code vulnerabilities more flexibly and accurately, and is more sensitive to new code vulnerabilities detection, without the need for developers to formulate and add corresponding rules or formal specifications in time.

根据本申请的另一个方面，提供了一种智能合约代码漏洞检测装置，包括向量模块、神经网络模块和分类器模块，向量模块用于将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；神经网络模块用于将节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；分类器模块用于将中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。According to another aspect of the present application, a smart contract code vulnerability detection device is provided, which includes a vector module, a neural network module, and a classifier module. The vector module is used to serialize and preprocess the syntax nodes of the smart contract source code. , Get the node vector; the neural network module is used to take the node vector as the input of the two-way long and short-term memory network, and obtain the output of the global maximum pooling layer in the neural network, as the intermediate representation of the smart contract source code; the classifier module is used to The middle representation is used as the input of the random forest classifier to train the random forest classifier, and the trained classifier is obtained for new smart contract code vulnerability detection.

上述智能合约代码漏洞检测装置，通过将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；将得到的节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；将得到的智能合约源代码的中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。相比于目前采用基于安全规则的技术或者形式化验证来检测代码漏洞检测方式，该种方法对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。The smart contract code vulnerability detection device mentioned above obtains the node vector by serializing and preprocessing the syntax nodes of the smart contract source code; using the obtained node vector as the input of the two-way long and short-term memory network to obtain the global maximum in the neural network The output of the pooling layer is used as the intermediate representation of the smart contract source code; the obtained intermediate representation of the smart contract source code is used as the input of the random forest classifier, the random forest classifier is trained, and the trained classifier is obtained for new Smart contract code vulnerability detection. Compared with the current method of detecting code vulnerabilities based on security rules or formal verification, this method is more sensitive to new code vulnerabilities detection and does not require developers to formulate and add corresponding rules or formal specifications in time.

根据本申请的另一个方面，提供了一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，处理器执行计算机程序时实现上述智能合约代码漏洞检测方法。上述智能合约代码漏洞检测计算机设备，通过获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测，能够更加灵活和准确地预测并定位代码漏洞，对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。According to another aspect of the present application, a computer device is provided, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor executes the computer program to implement the smart contract code vulnerability detection method described above. . The above-mentioned smart contract code vulnerability detection computer device obtains a node vector by obtaining an ordered list of grammar nodes for training the smart contract source code, where the grammar node is a node in the abstract syntax tree generated by the training smart contract source code; the node vector Input the recurrent neural network to obtain the output result of the global maximum pooling layer in the recurrent neural network, and use the output result as the intermediate representation of the smart contract source code; input the intermediate representation into the random forest classifier to train the random forest classifier, pass After training, the random forest classifier detects smart contract code vulnerabilities, which can predict and locate code vulnerabilities more flexibly and accurately, and is more sensitive to new code vulnerabilities detection, without the need for developers to formulate and add corresponding rules or formal specifications in time.

根据本申请的另一个方面，提供了一种计算机可读存储介质，其上存储有计算机程序，计算机程序被处理器执行时实现权上述智能合约代码漏洞检测方法。上述智能合约代码漏洞检测计算机可读存储介质，通过获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，语法节点是训练智能合约源代码生成的抽象语法树中的节点；将节点向量输入循环神经网络，获得循环神经网络中的全局最大池化层的输出结果，将输出结果作为智能合约源代码的中间表示；将中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测，能够更加灵活和准确地预测并定位代码漏洞，对新的代码漏洞检测更加敏感，无需开发人员及时制定和添加相应的规则或形式规范。According to another aspect of the present application, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above smart contract code vulnerability detection method is realized. The above-mentioned smart contract code vulnerability detection computer-readable storage medium obtains a node vector by obtaining an ordered list of grammar nodes for training the smart contract source code, where the grammar node is a node in the abstract syntax tree generated by training the smart contract source code; The node vector is input to the recurrent neural network to obtain the output result of the global maximum pooling layer in the recurrent neural network, and the output result is used as the intermediate representation of the smart contract source code; the intermediate representation is input to the random forest classifier to perform the random forest classifier Training, through the trained random forest classifier to detect smart contract code vulnerabilities, can predict and locate code vulnerabilities more flexibly and accurately, and be more sensitive to new code vulnerabilities detection, without the need for developers to formulate and add corresponding rules or Formal specification.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。其中，本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用，均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限，RAM以多种形式可得，诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The computer program can be stored in a non-volatile computer readable storage. In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory. Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. As an illustration and not a limitation, RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the technical features in the above embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered as the range described in this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation manners of the present application, and their description is relatively specific and detailed, but they should not be understood as a limitation on the scope of the invention patent. It should be pointed out that for those of ordinary skill in the art, without departing from the concept of this application, several modifications and improvements can be made, and these all fall within the protection scope of this application. Therefore, the scope of protection of the patent of this application shall be subject to the appended claims.

Claims

一种智能合约代码漏洞检测方法，其特征在于，所述方法包括：A method for detecting smart contract code vulnerabilities, characterized in that the method includes:

获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，所述语法节点是所述训练智能合约源代码生成的抽象语法树中的节点；Obtain an ordered list of grammatical nodes of the source code of the training smart contract to obtain a node vector, where the grammatical node is a node in the abstract syntax tree generated by the source code of the training smart contract;

将所述节点向量输入循环神经网络，获得所述循环神经网络中的全局最大池化层的输出结果，将所述输出结果作为所述训练智能合约源代码的中间表示；Inputting the node vector into a recurrent neural network to obtain an output result of the global maximum pooling layer in the recurrent neural network, and using the output result as an intermediate representation of the source code of the training smart contract;

将所述中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。The intermediate representation is input to the random forest classifier, the random forest classifier is trained, and the smart contract code vulnerability detection is performed through the trained random forest classifier.
根据权利要求1所述的方法，其特征在于，所述获取训练智能合约源代码的语法节点的有序列表，得到节点向量包括：The method according to claim 1, wherein the obtaining an ordered list of grammar nodes of the source code of the training smart contract to obtain a node vector comprises:

通过语法解析工具对所述训练智能合约源代码进行语法解析，生成抽象语法树；Perform grammatical analysis on the source code of the training smart contract through a grammatical analysis tool to generate an abstract syntax tree;

获取所述抽象语法树中的各个所述语法节点的有序列表，指示所述有序列表为一维向量，根据所述一维向量确定所述节点向量。Obtain an ordered list of each of the syntax nodes in the abstract syntax tree, indicate that the ordered list is a one-dimensional vector, and determine the node vector according to the one-dimensional vector.
根据权利要求2所述的方法，其特征在于，所述获取所述抽象语法树中的各个所述语法节点的有序列表包括：The method according to claim 2, wherein said obtaining an ordered list of each of said syntax nodes in said abstract syntax tree comprises:

通过深度优先遍历的方式对所述抽象语法树中的各个所述语法节点进行序列化，获取所述有序列表。Serialize each of the syntax nodes in the abstract syntax tree in a depth-first traversal manner to obtain the ordered list.
根据权利要求2所述的方法，其特征在于，所述获取所述抽象语法树中的各个所述语法节点的有序列表，指示所述有序列表为一维向量之后，所述方法包括：The method according to claim 2, wherein after the obtaining an ordered list of each of the syntax nodes in the abstract syntax tree and indicating that the ordered list is a one-dimensional vector, the method comprises:

对所述一维向量进行语法标记，所述语法标记将各个所述语法节点映射为整数并将所述一维向量中的各个所述语法节点所对应的整数组成数组，指示所述语法标记后的所述一维向量为所述节点向量。Perform a syntax mark on the one-dimensional vector, the syntax mark maps each syntax node to an integer, and the integer corresponding to each syntax node in the one-dimensional vector forms an array, indicating that the syntax mark is The one-dimensional vector of is the node vector.
根据权利要求4所述的方法，其特征在于，在对所述一维向量进行语法标记之后，所述方法包括：The method according to claim 4, wherein after grammatically marking the one-dimensional vector, the method comprises:

对所述数组进行填充截断，所述填充截断对所述数组进行末位截断或者末位补零，使得所述数组长度等于预设长度。Filling truncation is performed on the array, and the filling truncation performs last truncation or zero padding on the end of the array, so that the length of the array is equal to a preset length.
根据权利要求1所述的方法，其特征在于，所述循环神经网络为双向长短期记忆神经网络。The method according to claim 1, wherein the cyclic neural network is a bidirectional long- and short-term memory neural network.
一种基于深度学习技术的智能合约代码漏洞检测方法，其特征在于，该方法具体包括如下步骤：A smart contract code vulnerability detection method based on deep learning technology is characterized in that the method specifically includes the following steps:

S1：将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；S1: Serialize and preprocess the syntax nodes of the smart contract source code to obtain a node vector;

S2：将S1得到的节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；S2: The node vector obtained by S1 is used as the input of the bidirectional long-term memory network, and the output of the global maximum pooling layer in the neural network is obtained as the intermediate representation of the smart contract source code;

S3：将S2得到的智能合约源代码的中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。S3: The intermediate representation of the smart contract source code obtained by S2 is used as the input of the random forest classifier, and the random forest classifier is trained, and the trained classifier is obtained for new smart contract code vulnerability detection.
根据权利要求7所述的基于深度学习技术的智能合约代码漏洞检测方法，其特征在于，所述的S1具体为：The smart contract code vulnerability detection method based on deep learning technology according to claim 7, wherein the S1 is specifically:

S1.1：利用语法解析工具，将智能合约源代码字符串进行语法解析，生成抽象语法树；S1.1: Use the syntax analysis tool to parse the smart contract source code string to generate an abstract syntax tree;

S1.2：将抽象语法树进行深度优先遍历，并序列化为由语法节点构成的一维向量；S1.2: Perform depth-first traversal of the abstract syntax tree and serialize it into a one-dimensional vector composed of syntax nodes;

S1.3：将所述的一维向量进行预处理，通过语法标记和填充截断生成预处理后的节点向量；所述的预处理过程中需保持代码结构和语义完整。S1.3: Preprocess the one-dimensional vector, and generate a preprocessed node vector through grammar marking and filling truncation; the code structure and semantics must be kept intact during the preprocessing.
根据权利要求8所述的基于深度学习技术的智能合约代码漏洞检测方法，其特征在于，所述的S1.2中的一维向量是由语法节点构成的有序列表，每个一维向量表示对应的智能合约函数。The smart contract code vulnerability detection method based on deep learning technology according to claim 8, wherein the one-dimensional vector in S1.2 is an ordered list composed of grammatical nodes, and each one-dimensional vector represents The corresponding smart contract function.
根据权利要求8所述的基于深度学习技术的智能合约代码漏洞检测方法，其特征在于，所述的S1.3中的填充截断位长c应介于最长一维向量列表长度和最短一维向量列表长度之间。The method for detecting smart contract code vulnerabilities based on deep learning technology according to claim 8, wherein the padded truncation bit length c in S1.3 should be between the length of the longest one-dimensional vector list and the shortest one-dimensional Between the length of the vector list.
根据权利要求8所述的基于深度学习技术的智能合约代码漏洞检测方法，其特征在于，所述的S1.3中的填充截断规则为对于列表长度大于填充截断位长c进行列表末位截断，使其长度等于位长c；对于列表长度小于填充截断位长c进行列表末位补零，使其长度等于位长c。The smart contract code vulnerability detection method based on deep learning technology according to claim 8, characterized in that, the padding truncation rule in S1.3 is that the list length is greater than the padding truncation bit length c to perform list end truncation, Make the length equal to the bit length c; for the list length less than the stuffing truncation bit length c, add zero to the end of the list to make the length equal to the bit length c.
一种智能合约代码漏洞检测装置，其特征在于，所述装置包括向量模块、神经网络模块和分类器模块，A smart contract code vulnerability detection device, characterized in that the device includes a vector module, a neural network module and a classifier module,

所述向量模块用于获取训练智能合约源代码的语法节点的有序列表，得到节点向量，其中，所述语法节点是所述训练智能合约源代码生成的抽象语法树中的节点；The vector module is used to obtain an ordered list of grammar nodes of the source code of the training smart contract to obtain a node vector, wherein the grammar node is a node in the abstract syntax tree generated by the source code of the training smart contract;

所述神经网络模块用于将所述节点向量输入循环神经网络，获得所述循环神经网络中的全局最大池化层的输出结果，将所述输出结果作为所述智能合约源代码的中间表示；The neural network module is configured to input the node vector into the recurrent neural network, obtain the output result of the global maximum pooling layer in the recurrent neural network, and use the output result as an intermediate representation of the smart contract source code;

所述分类器模块用于将所述中间表示输入随机森林分类器，进行随机森林分类器的训练，通过训练后的随机森林分类器进行检测智能合约代码漏洞检测。The classifier module is used to input the intermediate representation into a random forest classifier, perform random forest classifier training, and detect smart contract code vulnerability detection through the trained random forest classifier.
一种智能合约代码漏洞检测装置，其特征在于，所述装置包括向量模块、神经网络模块和分类器模块，A smart contract code vulnerability detection device, characterized in that the device includes a vector module, a neural network module and a classifier module,

所述向量模块用于将智能合约源代码的语法节点进行序列化并进行预处理，得到节点向量；The vector module is used to serialize and preprocess the syntax nodes of the smart contract source code to obtain a node vector;

所述神经网络模块用于将节点向量作为双向长短期记忆网络的输入，获得神经网络中的全局最大池化层的输出，作为智能合约源代码的中间表示；The neural network module is used to take the node vector as the input of the bidirectional long-term and short-term memory network to obtain the output of the global maximum pooling layer in the neural network as an intermediate representation of the source code of the smart contract;

所述分类器模块用于将中间表示作为随机森林分类器的输入，进行随机森林分类器的训练，得到训练后的分类器进行新的智能合约代码漏洞检测。The classifier module is used to use the intermediate representation as the input of the random forest classifier, perform random forest classifier training, and obtain the trained classifier for new smart contract code vulnerability detection.
一种计算机设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，其特征在于，所述处理器执行所述计算机程序时实现权利要求1至11中任一项所述方法的步骤。A computer device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements any one of claims 1 to 11 when the computer program is executed The steps of the method.
一种计算机可读存储介质，其上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现权利要求1至11中任一项所述的方法的步骤。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the steps of the method according to any one of claims 1 to 11 when the computer program is executed by a processor.