CN114297075A - Code detection method and device, electronic equipment and computer readable medium - Google Patents

Code detection method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN114297075A
CN114297075A CN202111651812.9A CN202111651812A CN114297075A CN 114297075 A CN114297075 A CN 114297075A CN 202111651812 A CN202111651812 A CN 202111651812A CN 114297075 A CN114297075 A CN 114297075A
Authority
CN
China
Prior art keywords
defect
code
codes
false
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111651812.9A
Other languages
Chinese (zh)
Inventor
东红林
闫保奇
纪妙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN202111651812.9A priority Critical patent/CN114297075A/en
Publication of CN114297075A publication Critical patent/CN114297075A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Testing And Monitoring For Control Systems (AREA)

Abstract

The disclosure relates to a code detection method, a code detection device, electronic equipment and a computer readable medium, and belongs to the technical field of computers. The method comprises the following steps: inputting a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, wherein the initial defect detection result comprises a defect related code in the code; obtaining defect related characteristics corresponding to the defect related codes according to the initial defect detection result; inputting the defect related characteristics corresponding to the defect related codes into a defect misinformation judging model trained in advance to obtain a defect misinformation judging result of the defect related codes; and filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code. According to the method and the device, the initial defect detection result is filtered through the defect false alarm judging model, the false alarm rate of engine detection can be reduced, and the defect detection efficiency of codes is improved.

Description

Code detection method and device, electronic equipment and computer readable medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a code detection method, a code detection apparatus, an electronic device, and a computer-readable medium.
Background
With the rapid development and continuous maturation of the internet, the problem of network security becomes more important. By detecting and analyzing the codes, the safety of the codes can be effectively improved, and potential risks can be found out.
In the process of code safety detection, efficient code static analysis inevitably causes large number of defects in an analysis result and high false alarm rate, and a large amount of time and energy are consumed to manually check a detection result, so that time and labor are wasted.
In view of the above, there is a need in the art for a method that can reduce the false alarm rate of defect detection.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure provides a code detection method, a code detection apparatus, an electronic device, and a computer-readable medium, which can reduce the false alarm rate of engine detection at least to a certain extent and improve the defect detection efficiency of codes.
According to a first aspect of the present disclosure, there is provided a method of detecting a code, including:
inputting a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, wherein the initial defect detection result comprises a defect related code in the code;
obtaining defect related characteristics corresponding to the defect related codes according to the initial defect detection result, wherein the defect related characteristics comprise control flow characteristics, data characteristics and defect mode characteristics;
inputting the defect related characteristics corresponding to the defect related codes into a defect misinformation judging model trained in advance to obtain a defect misinformation judging result of the defect related codes;
and filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code.
In an exemplary embodiment of the disclosure, the obtaining the defect-related feature corresponding to the defect-related code according to the initial defect detection result includes:
extracting an abstract syntax tree according to the defect related codes, and generating a corresponding control flow graph according to the abstract syntax tree;
and obtaining the defect related characteristics corresponding to the defect related codes according to the abstract syntax tree and the control flow diagram.
In an exemplary embodiment of the disclosure, the obtaining, according to the abstract syntax tree and the control flow graph, a defect-related feature corresponding to the defect-related code includes:
obtaining a function call relation in the defect related code according to the abstract syntax tree and the control flow diagram;
performing global path analysis on the defect-related codes based on the control flow graph and the function call relation, and obtaining data characteristics corresponding to the defect-related codes according to analysis results;
and obtaining control flow characteristics corresponding to the defect related codes according to the control flow graph, and obtaining defect mode characteristics corresponding to the defect related codes according to the function call relation.
In an exemplary embodiment of the present disclosure, the method further comprises:
and updating the defect misinformation discrimination model according to the defect misinformation discrimination result of the defect related code.
In an exemplary embodiment of the disclosure, the updating the defect false-positive discrimination model according to the defect false-positive discrimination result of the defect-related code includes:
obtaining a defect judging label corresponding to the defect related code according to the defect misinformation judging result of the defect related code, and auditing the defect judging label;
and updating the training data of the defect false-alarm judging model according to the audited defect related codes and the corresponding defect judging labels.
In an exemplary embodiment of the present disclosure, the training method of the defect false alarm discriminant model includes:
acquiring a defect related code in an open source code, and obtaining a training code of the defect false alarm judging model according to a historical detection code and the defect related code in the open source code;
dividing the training codes into defect training codes and false alarm training codes according to the defect judging labels corresponding to the training codes;
obtaining a corresponding defect feature vector set according to the defect training codes, and obtaining a corresponding false alarm feature vector set according to the false alarm training codes;
and training an initial defect false-alarm discrimination model by using the defect feature vector set and the false-alarm feature vector set as training data to obtain the defect false-alarm discrimination model.
In an exemplary embodiment of the present disclosure, the obtaining a corresponding defect feature vector set according to the defect training code and obtaining a corresponding false positive feature vector set according to the false positive training code includes:
obtaining corresponding defect sample characteristics according to the defect training codes, and obtaining corresponding false alarm sample characteristics according to the false alarm training codes;
standardizing the defect sample characteristics to obtain a defect characteristic vector set corresponding to the defect training code;
and carrying out standardization processing on the false alarm sample characteristics to obtain a false alarm characteristic vector set corresponding to the false alarm training code.
According to a second aspect of the present disclosure, there is provided a code detection apparatus including:
the system comprises an initial detection result determining module, a defect detecting module and a defect detecting module, wherein the initial detection result determining module is used for inputting a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, and the initial defect detection result comprises a defect related code in the code;
the defect related feature extraction module is used for obtaining defect related features corresponding to the defect related codes according to the initial defect detection result, wherein the defect related features comprise control flow features, data features and defect mode features;
a false alarm judgment result determining module, configured to input a defect-related feature corresponding to the defect-related code into a defect-false alarm judgment model trained in advance, to obtain a defect-false alarm judgment result of the defect-related code;
and the defect detection result determining module is used for filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to perform the detection method of the code of any one of the above via execution of the executable instructions.
According to a fourth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements a method of detecting code as described in any one of the above.
The exemplary embodiments of the present disclosure may have the following advantageous effects:
in the code detection method of the disclosed example embodiment, a code to be detected is input into a defect detection engine to obtain an initial defect detection result of the code, then defect-related features are extracted from the initial defect detection result of the code, the extracted features are input into a defect misinformation discriminant model trained in advance, and the initial defect detection result is filtered through the defect misinformation discriminant model, so that a final code defect detection result is obtained. On one hand, the method for detecting the code in the disclosed example embodiment can effectively improve the false alarm filtering degree and reduce the false alarm rate of engine detection by comprehensively extracting and comparing a plurality of defect characteristics of the code, thereby improving the accuracy of code detection; on the other hand, a discrimination model is established according to the defect code and the false alarm code respectively, and the initial detection result is subjected to bidirectional discrimination, so that the false alarm discrimination accuracy can be improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.
FIG. 1 shows a flow diagram of a method of detecting code of an example embodiment of the present disclosure;
FIG. 2 illustrates a flowchart of obtaining defect-related features from initial defect detection results according to an example embodiment of the present disclosure;
FIG. 3 illustrates a flow diagram for deriving defect-related features from an abstract syntax tree and a control flow graph according to an exemplary embodiment of the present disclosure;
FIG. 4 schematically illustrates a diagram of deriving data characteristics and control flow characteristics from defect-related code, according to an embodiment of the present disclosure;
FIG. 5 shows a flowchart of a training method of a defect false positive discriminant model according to an example embodiment of the present disclosure;
FIG. 6 schematically shows a flowchart for processing sample features into a feature vector set according to an embodiment of the present disclosure;
FIG. 7 shows a flowchart of a method for updating a defect false positive discriminant model according to an example embodiment of the present disclosure;
FIG. 8 is a flow diagram illustrating a method for detecting code in accordance with one embodiment of the present disclosure;
FIG. 9 shows a block diagram of a detection apparatus of code of an example embodiment of the present disclosure;
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the process of code security detection, because the static security detection requirement is high, a detection engine generally needs to "kill one thousand by mistake and not leave one" in order to ensure the security of codes. In addition, the same detection rule set aims at multiple code scenes, and some rule sets are effective to some code scenes and some code scenes are ineffective, which is also one of the reasons for the existence of false alarms.
Considering the detection efficiency of the engine, the engine does not deeply analyze the relevant characteristics of the code during detection, and the efficient static analysis of the code inevitably causes the number of defects in the analysis result to be large and the false alarm rate to be high, so that the detection engine generates a large number of defect false alarms. After the code detection is finished, the fault of the engine false alarm needs to be audited and checked item by item manually in the later stage, a large amount of time and energy are consumed, the practicability of the code detection tool is greatly reduced, and the use confidence of developers on the detection tool is also consumed.
In some related embodiments, when a false alarm is evaluated or processed, a feature code segment of the feature information of a sensitive point of a defect source code is generally used as training data, only the relevant features of the sensitive information are considered, control flow information is not compared, and data features and defect mode features are extracted to be comprehensively considered, so that the false alarm filtering accuracy is low, and further the detection efficiency and accuracy are low.
In view of the above problem, the present exemplary embodiment first provides a code detection method. Referring to fig. 1, the method for detecting the code may include the following steps:
and S110, inputting the code to be detected into a defect detection engine to obtain an initial defect detection result of the code, wherein the initial defect detection result comprises a defect related code in the code.
And S120, obtaining defect related characteristics corresponding to the defect related codes according to the initial defect detection result, wherein the defect related characteristics comprise control flow characteristics, data characteristics and defect mode characteristics.
And S130, inputting the defect related characteristics corresponding to the defect related codes into a defect misinformation judging model trained in advance to obtain a defect misinformation judging result of the defect related codes.
And S140, filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code.
In the code detection method of the disclosed example embodiment, a code to be detected is input into a defect detection engine to obtain an initial defect detection result of the code, then defect-related features are extracted from the initial defect detection result of the code, the extracted features are input into a defect misinformation discriminant model trained in advance, and the initial defect detection result is filtered through the defect misinformation discriminant model, so that a final code defect detection result is obtained. On one hand, the method for detecting the code in the disclosed example embodiment can effectively improve the false alarm filtering degree and reduce the false alarm rate of engine detection by comprehensively extracting and comparing a plurality of defect characteristics of the code, thereby improving the accuracy of code detection; on the other hand, a discrimination model is established according to the defect code and the false alarm code respectively, and the initial detection result is subjected to bidirectional discrimination, so that the false alarm discrimination accuracy can be improved.
The above steps of the present exemplary embodiment will be described in more detail with reference to fig. 2 to 7.
In step S110, a code to be detected is input into a defect detection engine, and an initial defect detection result of the code is obtained, where the initial defect detection result includes a defect-related code in the code.
In this exemplary embodiment, first, a defect detection engine is used to perform preliminary defect detection on an input code file, so as to obtain an initial defect detection result of a code. The initial defect detection result includes a defect-related code in the code, where the defect-related code refers to a partial code of specific contents of a defect and a defect context-related code. In addition, the initial defect detection result may further include information such as a defect type and a call relationship corresponding to the defect-related code.
In step S120, defect-related features corresponding to the defect-related codes are obtained according to the initial defect detection result, where the defect-related features include control flow features, data features, and defect pattern features.
In this exemplary embodiment, as shown in fig. 2, obtaining the defect-related feature corresponding to the defect-related code according to the initial defect detection result may specifically include the following steps:
and S210, extracting an abstract syntax tree according to the defect related codes, and generating a corresponding control flow diagram according to the abstract syntax tree.
An Abstract Syntax Tree (AST) is a Tree-like representation of the Abstract Syntax structure of the source code, with each node on the Tree representing a structure in the source code. A control flow graph is an abstract representation of a process or program, and is an abstract data structure used in a compiler that represents all the paths that a program will traverse during its execution. The method can graphically represent the possible flow direction of all basic block execution in a process and can also reflect the real-time execution process of the process.
And S220, acquiring defect related characteristics corresponding to the defect related codes according to the abstract syntax tree and the control flow diagram.
In this exemplary embodiment, as shown in fig. 3, obtaining the defect-related feature corresponding to the defect-related code according to the abstract syntax tree and the control flow graph may specifically include the following steps:
and S310, obtaining a function call relation in the defect related code according to the abstract syntax tree and the control flow diagram.
And obtaining the calling relation between the functions contained in the defect related codes and the functions of other modules in the codes according to the abstract syntax tree and the control flow diagram.
And S320, performing global path analysis on the defect related codes based on the control flow graph and the function call relation, and obtaining data characteristics corresponding to the defect related codes according to the analysis result.
And performing global path analysis on the defect-related codes based on the control flow graph and the function call relation, performing global data flow analysis on the basis of the global path analysis, and obtaining data characteristics corresponding to the defect-related codes according to an analysis result.
The data characteristics may include information such as function-related types in the defect-related code, and parameter types of parameters in the functions. The following table is a partial data characterization.
Figure BDA0003446755540000081
And S330, obtaining control flow characteristics corresponding to the defect related codes according to the control flow graph, and obtaining defect mode characteristics corresponding to the defect related codes according to the function call relation.
And performing control flow analysis on the defect related codes according to the control flow diagram to obtain control flow characteristics corresponding to the defect related codes. And performing defect mode analysis on the defect related codes according to the function call relation to obtain defect mode characteristics corresponding to the defect related codes, wherein the defect mode characteristics refer to related defect types. The following table lists some of the control flow characteristics.
Figure BDA0003446755540000091
FIG. 4 schematically illustrates deriving data characteristics and control flow characteristics from defect-related code according to one embodiment of the present disclosure. And extracting a related global abstract syntax tree according to the defect related codes to generate a related control flow diagram. And obtaining a global function call relation based on the abstract syntax tree and the control flow graph, and further obtaining a path set from a function entrance to a defect target point by combining analysis of function call information, a precursor node relation, a back-drive node relation and the like on the basis of the function call relation and the control flow graph. And finishing control flow characteristic extraction and data characteristic extraction according to the results of the global path analysis and the global data flow analysis.
In step S130, the defect-related features corresponding to the defect-related codes are input into a defect-misinformation discriminant model trained in advance, so as to obtain a defect-misinformation discriminant result of the defect-related codes.
In this exemplary embodiment, the defect-related features corresponding to the defect-related codes may be input into a defect-misinformation discriminant model trained in advance, and matched with the defect-misinformation discriminant model, and the output result of the defect-misinformation discriminant model is the defect-misinformation discriminant result. For example, if the defect false-positive discrimination result output by the defect false-positive discrimination model is "confirm defect", it indicates that the initial defect detection result of the code is correct; if the defect false-alarm judging result output by the defect false-alarm judging model is false-alarm defect, the initial defect detection result is incorrect, and the related defect code is false-alarm defect, namely the code does not have defects.
In addition, if the defect false-alarm judging model can not output the corresponding defect false-alarm judging result, the code defect does not belong to the existing defect type or false alarm, and the code defect can be supplemented with false-alarm judgment through a sensitive path analysis and symbol execution mode.
In step S140, the initial defect detection result of the code is filtered according to the false-positive defect judgment result of the defect-related code, so as to obtain the defect detection result of the code.
And finally, filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code, and removing the part of the defect misinformation judgment result which is the false alarm defect from the initial defect detection result to obtain the final code defect detection result.
In this exemplary embodiment, after the initial defect detection result of the code is filtered according to the defect misinformation judgment result of the defect-related code, defect audit can be performed on the filtered defect detection result, and inspection and adjustment can be performed according to the audit result to obtain the final defect detection result of the code.
In the present exemplary embodiment, as shown in fig. 5, the training method of the defect false alarm discriminant model may specifically include the following steps:
and S510, acquiring a defect related code in the open source code, and acquiring a training code of a defect false-alarm judging model according to the historical detection code and the defect related code in the open source code.
In this exemplary embodiment, the source file of the open source code may be subjected to global data stream tracking, and the related code of the specific content of the defect and the related code of the context of the defect are obtained through global path analysis, so as to obtain the related code of the defect in the open source code.
In this exemplary embodiment, the defect-related code in the open source code and the defect-distinguishing label corresponding thereto, as well as the existing historical detection code and the defect-distinguishing label corresponding thereto, may be used to train the defect-misinformation-distinguishing model, and the training code of the defect-misinformation-distinguishing model is formed by the historical detection code and the defect-related code in the open source code.
And S520, dividing the training codes into defect training codes and false alarm training codes according to the defect judging labels corresponding to the training codes.
The defect distinguishing label corresponding to the training code comprises a defect confirmation label and a defect false-alarm label, and the training code can be divided into a defect training code and a false-alarm training code according to the defect distinguishing label corresponding to the training code.
And S530, obtaining a corresponding defect feature vector set according to the defect training codes, and obtaining a corresponding false alarm feature vector set according to the false alarm training codes.
Specifically, the corresponding defect sample feature may be obtained according to the defect training code, the corresponding false-positive sample feature may be obtained according to the false-positive training code, then the defect sample feature may be subjected to standardization processing to obtain a defect feature vector set P corresponding to the defect training code, and the false-positive sample feature may be subjected to standardization processing to obtain a false-positive feature vector set Q corresponding to the false-positive training code. The defect feature vector set P and the false positive feature vector set Q respectively include control flow features, data features, and defect pattern features of the relevant training codes, and the method for obtaining the features is shown in fig. 6, and the specific method is similar to the method for extracting the features in fig. 2, and is not described here again.
And S540, training the initial defect false-alarm discrimination model by using the defect feature vector set and the false-alarm feature vector set as training data to obtain the defect false-alarm discrimination model.
And inputting the defect feature vector set P and the false-alarm feature vector set Q as training data into an initial defect false-alarm discrimination model for training, wherein the trained model is the defect false-alarm discrimination model. The initial defect false alarm discriminant model may be, for example, an MLP (multi layer Perceptron) model.
In addition, in the code detection method according to the exemplary embodiment, the defect false-alarm discrimination model may be updated according to the defect false-alarm discrimination result of the defect-related code, so as to improve the defect false-alarm filtering accuracy and coverage rate.
In the present exemplary embodiment, as shown in fig. 7, updating the defect false-positive discrimination model according to the defect false-positive discrimination result of the defect-related code may specifically include the following steps:
and S710, obtaining a defect judgment label corresponding to the defect related code according to the defect misinformation judgment result of the defect related code, and auditing the defect judgment label.
And S720, updating training data of the defect misinformation discrimination model according to the audited defect related codes and the corresponding defect discrimination labels.
In the embodiment of the present invention, the defect-related code obtained after the defect audit and the corresponding defect-judging tag can be added to the history detection code as new training data to update the defect-misinformation-judging model, so that the judging result of the defect-misinformation-judging model is more accurate in the subsequent judging process.
Fig. 8 is a complete flowchart of a method for detecting a code in an embodiment of the present disclosure, which is an illustration of the above steps in the present exemplary embodiment, and the specific steps in the flowchart are as follows:
and S810, acquiring a historical code detection result.
The historical code detection result may include a defect-related code in the open source code and a defect discrimination label corresponding thereto, and a historical detection code existing in the discrimination model and a defect discrimination label corresponding thereto.
And S820, acquiring defect characteristics and false alarm characteristics.
The defect signature and the false positive signature include a control flow signature, a data signature, and a defect mode signature, respectively, that are related.
And S830, establishing/updating a discrimination model.
And step S840, code file input.
And S850, detecting defects.
And S860, acquiring an initial defect detection result.
And S870, filtering the false alarm of the defect.
And carrying out false alarm defect judgment and filtration on the initial defect detection result by using a judgment model.
And step S880, auditing a filtering result.
And performing defect audit on the filtered defect detection result.
And S890, obtaining an audit result.
And finally, updating the audit result to the training data of the discrimination model to complete the updating of the discrimination model.
Steps S810 to S830 are processes of establishing and updating the discriminant model, steps S840 to S860 are processes of initially detecting the code, and steps S880 and S890 are processes of auditing the defect.
It should be noted that although the various steps of the methods of the present disclosure are depicted in the drawings in a particular order, this does not require or imply that these steps must be performed in this particular order, or that all of the depicted steps must be performed, to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions, etc.
Further, the present disclosure also provides a device for detecting a code. Referring to fig. 9, the apparatus for detecting a code may include an initial detection result determining module 910, a defect-related feature extracting module 920, a false positive discrimination result determining module 930, and a defect detection result determining module 940.
Wherein:
the initial detection result determining module 910 may be configured to input a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, where the initial defect detection result includes a defect-related code in the code;
the defect-related feature extraction module 920 may be configured to obtain defect-related features corresponding to the defect-related codes according to the initial defect detection result, where the defect-related features include control flow features, data features, and defect pattern features;
the false alarm determination module 930 may be configured to input the defect-related feature corresponding to the defect-related code into a defect-false alarm determination model trained in advance, so as to obtain a defect-false alarm determination result of the defect-related code;
the defect detection result determining module 940 may be configured to filter the initial defect detection result of the code according to the defect misinformation determination result of the defect-related code to obtain the defect detection result of the code.
In some exemplary embodiments of the present disclosure, the defect-related feature extraction module 920 may include a control flow graph generation unit and a defect feature extraction unit. Wherein:
the control flow graph generating unit may be configured to extract an abstract syntax tree according to the defect-related code, and generate a corresponding control flow graph according to the abstract syntax tree;
the defect feature extraction unit may be configured to obtain a defect-related feature corresponding to the defect-related code according to the abstract syntax tree and the control flow graph.
In some exemplary embodiments of the present disclosure, the defect feature extraction unit may include a function call relation determination unit, a data feature acquisition unit, and a control flow feature acquisition unit. Wherein:
the function call relation determining unit may be configured to obtain a function call relation in the defect-related code according to the abstract syntax tree and the control flow graph;
the data characteristic acquisition unit can be used for carrying out global path analysis on the defect related codes based on the control flow graph and the function call relation and obtaining data characteristics corresponding to the defect related codes according to the analysis result;
the control flow characteristic obtaining unit may be configured to obtain a control flow characteristic corresponding to the defect-related code according to the control flow graph, and obtain a defect mode characteristic corresponding to the defect-related code according to the function call relation.
In some exemplary embodiments of the present disclosure, the detection apparatus for a code provided by the present disclosure may further include a discriminant model updating module, which may be configured to update the defect false-positive discriminant model according to a defect false-positive discriminant result of the defect-related code.
In some exemplary embodiments of the present disclosure, the discriminant model update module may include a defect code acquisition unit and a training data update unit. Wherein:
the defect code obtaining unit can be used for obtaining a defect judging label corresponding to the defect related code according to the defect misinformation judging result of the defect related code;
the training data updating unit can be used for updating the training data of the defect false-alarm judging model according to the defect related codes and the corresponding defect judging labels.
In some exemplary embodiments of the present disclosure, a code detection apparatus provided by the present disclosure may further include a discriminant model training module, which may include a training code acquisition unit, a training code division unit, a feature vector set acquisition unit, and a discriminant model training unit. Wherein:
the training code acquisition unit can be used for acquiring defect related codes in the open source codes and acquiring training codes of a defect false alarm discrimination model according to the historical detection codes and the defect related codes in the open source codes;
the training code dividing unit can be used for dividing the training codes into defect training codes and false-alarm training codes according to the defect judging labels corresponding to the training codes;
the feature vector set obtaining unit may be configured to obtain a corresponding defect feature vector set according to the defect training code, and obtain a corresponding false-positive feature vector set according to the false-positive training code;
the discriminant model training unit can be used for training the initial defect false-positive discriminant model by taking the defect feature vector set and the false-positive feature vector set as training data to obtain the defect false-positive discriminant model.
In some exemplary embodiments of the present disclosure, the feature vector set obtaining unit may include a sample feature obtaining unit, a defect feature vector set generating unit, and a false positive feature vector set generating unit. Wherein:
the sample characteristic acquisition unit can be used for acquiring corresponding defect sample characteristics according to the defect training codes and acquiring corresponding false alarm sample characteristics according to the false alarm training codes;
the defect feature vector set generating unit may be configured to perform normalization processing on the features of the defect sample to obtain a defect feature vector set corresponding to the defect training code;
the false-positive feature vector set generating unit can be used for carrying out standardization processing on the false-positive sample features to obtain a false-positive feature vector set corresponding to the false-positive training code.
The details of each module/unit in the above-mentioned code detection apparatus have already been described in detail in the corresponding method embodiment section, and are not described herein again.
FIG. 10 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device to implement an embodiment of the invention.
It should be noted that the computer system 1000 of the electronic device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of the application of the embodiment of the present invention.
As shown in fig. 10, the computer system 1000 includes a Central Processing Unit (CPU)1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for system operation are also stored. The CPU 1001, ROM 1002, and RAM 1003 are connected to each other via a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
The following components are connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output section 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication section 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.
In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the invention include a computer program product comprising a computer program embodied on a computer-readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. When the computer program is executed by a Central Processing Unit (CPU)1001, various functions defined in the system of the present application are executed.
It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method as described in the embodiments above.
It should be noted that although in the above detailed description several modules of the device for action execution are mentioned, this division is not mandatory. Indeed, the features and functionality of two or more of the modules described above may be embodied in one module, in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module described above may be further divided into embodiments by a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A method for detecting a code, comprising:
inputting a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, wherein the initial defect detection result comprises a defect related code in the code;
obtaining defect related characteristics corresponding to the defect related codes according to the initial defect detection result, wherein the defect related characteristics comprise control flow characteristics, data characteristics and defect mode characteristics;
inputting the defect related characteristics corresponding to the defect related codes into a defect misinformation judging model trained in advance to obtain a defect misinformation judging result of the defect related codes;
and filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code.
2. The method for detecting a code according to claim 1, wherein the obtaining the defect-related feature corresponding to the defect-related code according to the initial defect detection result comprises:
extracting an abstract syntax tree according to the defect related codes, and generating a corresponding control flow graph according to the abstract syntax tree;
and obtaining the defect related characteristics corresponding to the defect related codes according to the abstract syntax tree and the control flow diagram.
3. The method for detecting the code according to claim 2, wherein the obtaining the defect-related feature corresponding to the defect-related code according to the abstract syntax tree and the control flow graph includes:
obtaining a function call relation in the defect related code according to the abstract syntax tree and the control flow diagram;
performing global path analysis on the defect-related codes based on the control flow graph and the function call relation, and obtaining data characteristics corresponding to the defect-related codes according to analysis results;
and obtaining control flow characteristics corresponding to the defect related codes according to the control flow graph, and obtaining defect mode characteristics corresponding to the defect related codes according to the function call relation.
4. The method of detecting a code according to claim 1, the method further comprising:
and updating the defect misinformation discrimination model according to the defect misinformation discrimination result of the defect related code.
5. The method of claim 4, wherein the updating the defect false-positive discrimination model according to the defect false-positive discrimination result of the defect-related code comprises:
obtaining a defect judging label corresponding to the defect related code according to the defect misinformation judging result of the defect related code, and auditing the defect judging label;
and updating the training data of the defect false-alarm judging model according to the audited defect related codes and the corresponding defect judging labels.
6. The method for detecting the code according to claim 1, wherein the method for training the defect false-positive discriminant model comprises:
acquiring a defect related code in an open source code, and obtaining a training code of the defect false alarm judging model according to a historical detection code and the defect related code in the open source code;
dividing the training codes into defect training codes and false alarm training codes according to the defect judging labels corresponding to the training codes;
obtaining a corresponding defect feature vector set according to the defect training codes, and obtaining a corresponding false alarm feature vector set according to the false alarm training codes;
and training an initial defect false-alarm discrimination model by using the defect feature vector set and the false-alarm feature vector set as training data to obtain the defect false-alarm discrimination model.
7. The method of claim 6, wherein the obtaining a corresponding set of defect feature vectors according to the defect training codes and a corresponding set of false positive feature vectors according to the false positive training codes comprises:
obtaining corresponding defect sample characteristics according to the defect training codes, and obtaining corresponding false alarm sample characteristics according to the false alarm training codes;
standardizing the defect sample characteristics to obtain a defect characteristic vector set corresponding to the defect training code;
and carrying out standardization processing on the false alarm sample characteristics to obtain a false alarm characteristic vector set corresponding to the false alarm training code.
8. An apparatus for detecting a code, comprising:
the system comprises an initial detection result determining module, a defect detecting module and a defect detecting module, wherein the initial detection result determining module is used for inputting a code to be detected into a defect detection engine to obtain an initial defect detection result of the code, and the initial defect detection result comprises a defect related code in the code;
the defect related feature extraction module is used for obtaining defect related features corresponding to the defect related codes according to the initial defect detection result, wherein the defect related features comprise control flow features, data features and defect mode features;
a false alarm judgment result determining module, configured to input a defect-related feature corresponding to the defect-related code into a defect-false alarm judgment model trained in advance, to obtain a defect-false alarm judgment result of the defect-related code;
and the defect detection result determining module is used for filtering the initial defect detection result of the code according to the defect misinformation judgment result of the defect related code to obtain the defect detection result of the code.
9. An electronic device, comprising:
a processor; and
a memory for storing one or more programs which, when executed by the processor, cause the processor to implement a detection method of code according to any one of claims 1 to 7.
10. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the detection method of the code according to any one of claims 1 to 7.
CN202111651812.9A 2021-12-30 2021-12-30 Code detection method and device, electronic equipment and computer readable medium Pending CN114297075A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111651812.9A CN114297075A (en) 2021-12-30 2021-12-30 Code detection method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111651812.9A CN114297075A (en) 2021-12-30 2021-12-30 Code detection method and device, electronic equipment and computer readable medium

Publications (1)

Publication Number Publication Date
CN114297075A true CN114297075A (en) 2022-04-08

Family

ID=80973511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111651812.9A Pending CN114297075A (en) 2021-12-30 2021-12-30 Code detection method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN114297075A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115454855A (en) * 2022-09-16 2022-12-09 中国电信股份有限公司 Code defect report auditing method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115454855A (en) * 2022-09-16 2022-12-09 中国电信股份有限公司 Code defect report auditing method and device, electronic equipment and storage medium
CN115454855B (en) * 2022-09-16 2024-02-09 中国电信股份有限公司 Code defect report auditing method, device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110363220B (en) Behavior class detection method and device, electronic equipment and computer readable medium
CN109087667B (en) Voice fluency recognition method and device, computer equipment and readable storage medium
CN111723728A (en) Pedestrian searching method, system and device based on bidirectional interactive network
CN109992969B (en) Malicious file detection method and device and detection platform
CN109063433B (en) False user identification method and device and readable storage medium
US11954599B2 (en) Bi-directional interaction network (BINet)-based person search method, system, and apparatus
CN111861463A (en) Intelligent information identification method based on block chain and artificial intelligence and big data platform
CN116304909A (en) Abnormality detection model training method, fault scene positioning method and device
CN114297075A (en) Code detection method and device, electronic equipment and computer readable medium
CN109064464B (en) Method and device for detecting burrs of battery pole piece
CN111126112B (en) Candidate region determination method and device
CN117911159A (en) Real-time data processing method, device, equipment, storage medium and program product
CN113746780A (en) Abnormal host detection method, device, medium and equipment based on host image
CN107886113B (en) Electromagnetic spectrum noise extraction and filtering method based on chi-square test
CN116502147A (en) Training method of anomaly detection model and related equipment
CN112733015B (en) User behavior analysis method, device, equipment and medium
CN110083807B (en) Contract modification influence automatic prediction method, device, medium and electronic equipment
CN112465149A (en) Same-city part identification method and device, electronic equipment and storage medium
CN112417007A (en) Data analysis method and device, electronic equipment and storage medium
CN113033431A (en) Optical character recognition model training and recognition method, device, equipment and medium
CN112685610A (en) False registration account identification method and related device
CN114189585A (en) Crank call abnormity detection method and device and computing equipment
CN111798237A (en) Abnormal transaction diagnosis method and system based on application log
CN113468519A (en) Plug-in operation identification method, device and equipment
CN113095342B (en) Audit model optimization method and device based on misjudgment sample picture and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination