CN110162335A - Code refactoring method, apparatus, computer equipment and medium - Google Patents

Code refactoring method, apparatus, computer equipment and medium Download PDF

Info

Publication number
CN110162335A
CN110162335A CN201910345048.9A CN201910345048A CN110162335A CN 110162335 A CN110162335 A CN 110162335A CN 201910345048 A CN201910345048 A CN 201910345048A CN 110162335 A CN110162335 A CN 110162335A
Authority
CN
China
Prior art keywords
code
sample
code data
sorted
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910345048.9A
Other languages
Chinese (zh)
Inventor
成明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Priority to CN201910345048.9A priority Critical patent/CN110162335A/en
Publication of CN110162335A publication Critical patent/CN110162335A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/72Code refactoring

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a code reconstructing method, device, computer equipment and storage mediums, first by obtaining code data to be sorted, treat classification code data and carry out feature extraction, obtain the sample characteristics to be sorted of code data to be sorted;Then the sample characteristics to be sorted of code data to be sorted are input in preset code classification model, obtain the object code reconstruct type of code data to be sorted;Type is reconstructed further according to the object code of code data to be sorted, obtains corresponding object reconstruction script;Classification code data are finally treated using object reconstruction script and carry out code refactoring;The efficiency for not only increasing code refactoring also achieves the automatic classification and reconstruct of the code data to be sorted to different code reconstruct type.

Description

Code refactoring method, apparatus, computer equipment and medium
Technical field
The present invention relates to intelligent decision field more particularly to a kind of code refactoring method, apparatus, computer equipment and storage Medium.
Background technique
With the fast development of computer technology, the ease for use and response speed of software product have been increasingly becoming influence user One of an important factor for usage experience.And the ease for use and response speed of software product be in its product code quality it is close Cut it is relevant, if the product bad for some code architectures wants the current technology development of response and the update of software product is changed Generation, just need to code structure to its inside repeatedly modified, as modification number increases, the code modified can undoubtedly be produced Raw some mistakes, gradually decrease, therefore just need that code is reconstructed so as to cause the operational efficiency of product.Weight is treated at present The method that structure code is reconstructed is mainly by some senior engineers or architect, first by reading source code manually Mode go discovery to need the code that reconstructs, multiple manual reconstruct is then carried out to the code that need to be reconstructed again.However works as and face It is a variety of different types of to reconfiguration code, engineer or frame are perhaps generally required when the quantity of reconfiguration code is huger Structure teacher takes a substantial amount of time and energy, goes to treat reconfiguration code according to different code refactoring methods and is reconstructed, this is not only The difficulty for increasing code refactoring is also greatly reduced the efficiency of code refactoring.
Summary of the invention
The embodiment of the present invention provides a kind of code refactoring method, apparatus, computer equipment and storage medium, to solve code The inefficient problem of reconstruct.
A kind of code refactoring method, comprising:
Code data to be sorted is obtained, feature extraction is carried out to the code data to be sorted, obtains the generation to be sorted The sample characteristics to be sorted of code data;
The sample characteristics to be sorted of the code data to be sorted are input in preset code classification model, institute is obtained State the object code reconstruct type of code data to be sorted;
Type is reconstructed according to the object code of the code data to be sorted, obtains corresponding object reconstruction script;
Code refactoring is carried out to the code data to be sorted using the object reconstruction script.
A kind of code refactoring device, comprising:
Sample characteristics extraction module to be sorted, for obtaining code data to be sorted, to the code data to be sorted into Row feature extraction obtains the sample characteristics to be sorted of the code data to be sorted;
Input module, for the sample characteristics to be sorted of the code data to be sorted to be input to preset code classification In model, the object code reconstruct type of the code data to be sorted is obtained;
Object reconstruction script obtains module, for reconstructing class according to the object code of the code data to be sorted Type obtains corresponding object reconstruction script;
Code refactoring module, for carrying out code weight to the code data to be sorted using the object reconstruction script Structure.
A kind of computer equipment, including memory, processor and storage are in the memory and can be in the processing The computer program run on device, the processor realize above-mentioned code refactoring method when executing the computer program.
A kind of computer readable storage medium, the computer-readable recording medium storage have computer program, the meter Calculation machine program realizes above-mentioned code refactoring method when being executed by processor.
In above-mentioned code refactoring method, apparatus, computer equipment and storage medium, by obtaining code data to be sorted, It treats classification code data and carries out feature extraction, obtain the sample characteristics to be sorted of code data to be sorted;It then will be to be sorted The sample characteristics to be sorted of code data are input in preset code classification model, obtain the target generation of code data to be sorted Code reconstruct type;Type is reconstructed further according to the object code of code data to be sorted, obtains corresponding object reconstruction script;Finally Classification code data are treated using object reconstruction script and carry out code refactoring;The efficiency for not only increasing code refactoring, is also realized It automatic classification to the code data to be sorted of different code reconstruct type and reconstructs.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below by institute in the description to the embodiment of the present invention Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention Example, for those of ordinary skill in the art, without any creative labor, can also be according to these attached drawings Obtain other attached drawings.
Fig. 1 is an application environment schematic diagram of code refactoring method in one embodiment of the invention;
Fig. 2 is an exemplary diagram of code refactoring method in one embodiment of the invention;
Fig. 3 is another exemplary diagram of code refactoring method in one embodiment of the invention;
Fig. 4 is another exemplary diagram of code refactoring method in one embodiment of the invention;
Fig. 5 is another exemplary diagram of code refactoring method in one embodiment of the invention;
Fig. 6 is another exemplary diagram of code refactoring method in one embodiment of the invention;
Fig. 7 is a functional block diagram of code refactoring device in one embodiment of the invention;
Fig. 8 is another functional block diagram of code refactoring device in one embodiment of the invention;
Fig. 9 is another functional block diagram of code refactoring device in one embodiment of the invention;
Figure 10 is a schematic diagram of computer equipment in one embodiment of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, every other implementation obtained by those of ordinary skill in the art without making creative efforts Example, shall fall within the protection scope of the present invention.
Code refactoring method provided in an embodiment of the present invention, the code refactoring method can apply ring using as shown in Figure 1 In border.Specifically, which applies in code refactoring system, which includes as shown in Figure 1 Client and server-side, client are communicated with server-side by network, for solving the problems, such as the inefficient of code refactoring. Wherein, client is also known as user terminal, refers to corresponding with server-side, provides the program of local service for client.Client can It is mounted on but is not limited to various personal computers, laptop, smart phone, tablet computer and portable wearable device On.Server-side can be realized with the server cluster of the either multiple server compositions of independent server.
In one embodiment, as shown in Fig. 2, providing a kind of code refactoring method, the service in Fig. 1 is applied in this way It is illustrated, includes the following steps: for end
S10: obtaining code data to be sorted, treats classification code data and carries out feature extraction, obtains code number to be sorted According to sample characteristics to be sorted.
Wherein, code data to be sorted refers to the data of the reconstruct classification of type of pending code.Optionally, the generation to be sorted Code data can be the code data directly acquired from the code database of server-side;It is also possible in the code text of client The code data pre-saved, or the local code data for directly uploading or being sent to client.Client waits for this point Category code data are sent to server-side, and server-side gets code data to be sorted.
Wherein, sample characteristics to be sorted refer to one group of characteristic for reflecting code data to be sorted in structure.It is to be sorted The sample characteristics to be sorted of code data may include: lines of code, code redundancies, member variable, function numbers and function At least one of in parameter.Specifically, treat the progress feature extraction of classification code data can be by obtaining from the database of server-side Feature extraction script compiled in advance is taken, classification code data are then treated using corresponding feature extraction script and carry out feature It extracts.
Preferably, classification code data progress feature can also be treated by using code instrumentation SourceMonitor to mention It takes.Wherein, SourceMonitor tool be it is a can be to the generation that multilingual (C++, C#, VB, net, Java and HTML) writes The tool that code is measured, and for different language, the code metric value exported is different.It specifically, as long as will be wait divide Category code data are input in code instrumentation SourceMonitor, and code instrumentation SourceMonitor can be exported directly wait divide Sample characteristics to be sorted corresponding to category code data obtain the sample characteristics to be sorted of code data to be sorted.Specifically, generation Code line number can pass through<Lines>tag extraction in code instrumentation SourceMonitor;Code redundancies can pass through code instrumentation <Repetive rate>tag extraction in SourceMonitor;Member variable can pass through code instrumentation SourceMonitor In<Variable>tag extraction;Similarly, other sample characteristics to be sorted of code data to be sorted can pass through generation respectively Other labels in code tool SourceMonitor extract.
S20: the sample characteristics to be sorted of code data to be sorted are input in preset code classification model, obtain to The object code of classification code data reconstructs type.
Wherein, code classification model is also referred to as Decision-Tree Classifier Model, refers to the characteristic of division based on decision tree, can treat point The model that category code data are classified automatically.Specifically, decision tree refer to it is known it is various happen probability on the basis of, lead to It crosses and constitutes decision tree to seek the probability that the desired value of net present value (NPV) is more than or equal to zero, it belongs to a kind of tree, wherein each Internal node indicates the test on an attribute, and each branch represents a test output, and each leaf node represents a kind of classification. Wherein, decision tree belongs to a kind of supervision study.Supervision study refer to by give a pile sample, each sample have one group of attribute and One classification, and these classifications be it is pre-determined, then through overfitting emerging object can be provided correctly point A kind of method of class.
Classified automatically since code classification model can treat classification code data.Therefore, as long as by code to be sorted The sample characteristics to be sorted of data are input in the code classification model, can directly export the target generation of code data to be sorted Code reconstruct type obtains the object code reconstruct type of the code data to be sorted.Wherein, code refactoring type fingering line code Reconstruct classification of type after resulting typonym, object code reconstruct type refers to that code data to be sorted carries out the reconstruct of code Resulting typonym after classification of type.In the present embodiment, the sample characteristics to be sorted of code data to be sorted are input to After being trained in code classification model, the object code of obtained code data to be sorted reconstruct type can be with are as follows: too long letter The excessive type of several classes of types, class, function parameter cross polymorphic type, duplicated code type, the direct access type of class members's variable or condition Any one in expression formula repeat type.Specifically, too long type function refers to the generation that the line number of function in code is more than threshold value Code;The excessive type of class refers to the code that function member variable and function numbers in code are more than threshold value;Function parameter is crossed polymorphic type and is referred to Function parameter is more than the code of threshold value in code: duplicated code type refers to the code that duplicated code block in code is more than threshold value;Item Part expression formula repeat type refers to that in condition discriminant function, the code block there are repetition judgement is more than the code of threshold value;Class members Variables access type refers to through the function of class the code for accessing member variable.
S30: type is reconstructed according to the object code of code data to be sorted, obtains corresponding object reconstruction script.
Wherein, object reconstruction script refer in advance it is compiled, can be to the code number to be sorted of different target code refactoring type According to the script for carrying out code refactoring.It is to be appreciated that if the object code reconstruct type for obtaining code data to be sorted is different, The object reconstruction script of corresponding acquisition is also different.In this step, the database of server-side is previously stored with several different generations Reconstruct script corresponding to code reconstruct type.
Specifically, type is reconstructed according to the object code of code data to be sorted, obtains corresponding object reconstruction script, it can To include: the object code reconstruct type for obtaining code data to be sorted, then by canonical matching method, by the object code weight Structure type is matched one by one with the script of each reconstruct script in database mark;Finally class will be reconstructed with the object code The reconstruct script of type successful match, is determined as object reconstruction script, wherein script mark refers to the one kind for distinguishing different reconstruct scripts Mark.Such as: the script of object reconstruction script corresponding to too long type function is identified as too long function;Duplicated code type institute The script of corresponding object reconstruction script is identified as duplicated code;The script mark of object reconstruction script corresponding to the excessive type of class Know is that class is excessive.
S40: classification code data are treated using object reconstruction script and carry out code refactoring.
Since object reconstruction script has the function of that treating classification code data carries out code refactoring, in step S30 After getting corresponding object reconstruction script, the object reconstruction script can be directlyed adopt and treat classification code data progress code weight Structure.Optionally, treat classification code data carry out code refactoring can be with are as follows: first treat classification code data split to obtain it is more Then a subfunction is packaged the subfunction after fractionation to obtain encapsulation subfunction;Finally by the multiple sub- letters of encapsulation of calling Several modes realizes code refactoring.
The present embodiment treats classification code data and carries out feature extraction, obtain wait divide by obtaining code data to be sorted The sample characteristics to be sorted of category code data;Then the sample characteristics to be sorted of code data to be sorted are input to preset generation In code disaggregated model, the object code reconstruct type of code data to be sorted is obtained;Finally according to the mesh of code data to be sorted Code refactoring type is marked, obtains corresponding object reconstruction script, and the progress of classification code data is treated using object reconstruction script Code refactoring;To realize automatic classification and reconstruct to the code data to be sorted of different target code refactoring type.
In one embodiment, as shown in figure 3, it is preset being input to the sample characteristics to be sorted of code data to be sorted In code classification model, before obtaining the object code reconstruct type of code data to be sorted, code refactoring method further include:
S201: obtaining N number of sample code data, and each sample code data include corresponding reconstruction of the original code type, N For positive integer.
Wherein, sample code data refer to the sample data that code architecture is not good enough, need to be reconstructed, and N is positive integer.Per the same This code data includes corresponding reconstruction of the original code type, and reconstruction of the original code type refers to that sample code data carry out code Resulting typonym after reconstruct classification of type.I.e. each sample code data are all to have been subjected to processing classification and difference in advance The code data of corresponding reconstruction of the original code type on mark.Specifically, the reconstruction of the original code type of sample code data It may include: too long type function, the excessive type of class, that function parameter crosses polymorphic type, duplicated code type, class members's variable is direct At least one of in access type and conditional expression repeat type.
Optionally, sample code data can be and get source code from open source system in advance, then to the source of acquisition Code is resulting after being detected, identified and being classified.Wherein, open source system can for Tomcat system, ArgoUML system or Apache Ant system etc..Be also possible to pre-save in the code text of client has been subjected to the code number of processing classification According to;The either local code data for having been subjected to processing classification for directly uploading or being sent to client.Client is by the sample Code data is sent to server-side, and server-side can get sample code data.
S202: carrying out feature extraction to each sample code data, and the training sample for obtaining each sample code data is special Sign.
Wherein, training sample feature refers to one group characteristic of the reflected sample code data in structure.Sample code number According to training sample feature may include: in lines of code, code redundancies, member variable, function numbers and function parameter At least one of.Specifically, feature extraction is carried out to each sample code data, obtains the training sample of each sample code data The method and detailed process of feature treat classification code data with above-mentioned steps S10 and carry out feature extraction, obtain code to be sorted The method of the sample characteristics to be sorted of data is identical with detailed process, does not do redundancy herein and repeats.
S203: according to the training sample feature of each sample code data and corresponding reconstruction of the original code type to decision Tree-model is trained, and obtains code classification model.
Wherein, code classification model refers to the training sample feature by a large amount of sample code data and corresponding original generation Code reconstruct type is trained rear generated, can treat the model that classification code is classified.Specifically, by each sample generation Code data training sample feature and corresponding reconstruction of the original code type be input in decision-tree model, then by using C4.5 algorithm is trained decision-tree model, that is, produces the code classification model after training.Wherein, C4.5 algorithm is a system Column are used in the algorithm in the classification problem of machine learning and data mining.The target of C4.5 algorithm is supervised learning, gives one Data set, wherein each tuple can be described with one group of attribute value, each tuple belongs in the classification of a mutual exclusion Certain is a kind of.C4.5 algorithm can find a dependence value to the mapping relations of classification, and this mapping can be used for by study The entity unknown to new classification is classified.
For the present embodiment by obtaining N number of sample code data, each sample code data include corresponding source code weight Structure type, N are positive integer;Then feature extraction is carried out to each sample code data, obtains the instruction of each sample code data Practice sample characteristics;Finally fought to the finish according to the training sample feature of each sample code data and corresponding reconstruction of the original code type Plan tree-model is trained, and obtains code classification model;It ensure that the accuracy by the arrived code classification model of training.
In one embodiment, as shown in figure 4, according to the training sample feature of each sample code data and corresponding original Code refactoring type is trained decision-tree model, obtains code classification model, comprising:
S2031: the training sample feature of each sample code data and corresponding reconstruction of the original code type are formed into sample Eigen collection, and sample characteristics collection is divided into sample training collection and sample verifying collection.
Wherein, sample characteristics collection refers to the sample data for training decision-tree model, and sample characteristics collection includes each sample The training sample feature of code data and corresponding reconstruction of the original code type.Specifically, sample characteristics collection is by several samples Eigen composition data set, each sample characteristics all include the sample code data training sample feature and with the sample generation The reconstruction of the original code type of code data.It is to be appreciated that the training sample feature and corresponding original of each sample code data Beginning code refactoring type is associated.
Sample training collection refers to the data set for establishing code classification model.Sample verifying collection refers to after verifying foundation The data set of the effect of code classification model.Specifically, sample characteristics collection is divided into sample training collection and sample verifying collection can It is divided using random division or the method for crosscheck;The ratio value of sample training collection and sample verifying collection can be with after division Are as follows: sample training collection: sample verifies collection=6:4, sample training collection: sample verifies collection=7:3 or sample training collection: sample is tested Demonstrate,prove collection=7.5:2.5 etc..Preferably, in order to improve the precision of code classification model, in this step, the sample characteristics that will acquire The 75% of collection is used as sample training collection, for being trained to model;It is tested using the 25% of the sample characteristics collection of acquisition as sample Card collection, for carrying out recruitment evaluation to the model for completing training.
S2032: sample training collection being input in decision-tree model and is trained, and obtains preliminary classification model.
Specifically, it is made of due to sample training collection training sample feature and corresponding reconstruction of the original code type, Therefore the training sample feature of sample training concentration and corresponding reconstruction of the original code type need to be input in decision-tree model, And the decision-tree model is trained by using C4.5 algorithm, preliminary classification model can be obtained.Optionally, training sample Feature may include: at least one in lines of code, code redundancies, member variable, function numbers and function parameter.It is original Code refactoring type includes: that too long type function, the excessive type of class, function parameter cross polymorphic type, duplicated code type, class members At least one of in the direct access type of variable and conditional expression repeat type result.
S2033: preliminary classification model is verified using sample verifying collection, is verified accuracy rate.
Wherein, verifying accuracy rate refers to using sample verifying collection to institute after the preliminary classification model progress recruitment evaluation after training The result obtained.Specifically, carrying out verifying to preliminary classification model using sample verifying collection includes: the instruction for first concentrating sample verifying Practice sample characteristics to be input in the preliminary classification model, then the reconstruction of the original code type after being trained is tested the sample Reconstruction of the original code type after card collection training is matched one by one with corresponding preset reconstruction of the original code type, last basis Matching result is verified accuracy rate.
Illustratively, if it includes 10 sample code data that the sample verifying got, which is concentrated, by 10 sample codes After the sample training feature of data is input to preliminary classification model, have 7 in the reconstruction of the original code type after obtaining 10 training Reconstruction of the original code type after a training matches with corresponding preset reconstruction of the original code type, original after 3 training Code refactoring type does not match that it is 0.7 that the verifying accuracy rate, which is then calculated, with corresponding preset reconstruction of the original code type.
S2034: if verifying accuracy rate is greater than or equal to default accuracy rate, preliminary classification model is determined as code classification Model.
Wherein, default accuracy rate refers in advance to resulting result after preliminary classification model progress initial assessment.Specifically, will The verifying accuracy rate according to obtained in step S233 is compared with default accuracy rate, if the verifying accuracy rate is greater than or equal in advance If accuracy rate, then corresponding preliminary classification model is determined as code classification model;If the verifying accuracy rate is less than default accurate Rate then needs to optimize processing to the preliminary classification model.Specifically, optimizing processing to preliminary classification model can be by making It is realized with minimum optimization loss function.A loss function is determined first, is then iterated to the loss function, it is minimum Change the loss function, until convergence realizes the optimization processing to preliminary classification model to obtain minimum optimization loss function.
The present embodiment is by by the training sample feature of each sample code data and corresponding reconstruction of the original code type Sample characteristics collection is formed, and sample characteristics collection is divided into sample training collection and sample verifying collection;Then sample training collection is defeated Enter and be trained into decision-tree model, obtains preliminary classification model, and using sample verifying collection to affiliated preliminary classification model It is verified, is verified accuracy rate;If affiliated verifying accuracy rate is greater than or equal to default accuracy rate, by preliminary classification model It is determined as code classification model;Further ensure the accuracy of the code classification model of foundation.
In one embodiment, as shown in figure 5, treating classification code data carries out feature extraction, code number to be sorted is obtained According to sample characteristics to be sorted, specifically include:
S101: characteristic parameter collection is obtained, characteristic parameter collection includes M parameter identification, and M is positive integer.
Wherein, characteristic parameter collection refers to the characteristic set for presetting the code data to be sorted that need to be extracted.Feature ginseng Manifold may include: at least one in lines of code, code redundancies, member variable, function numbers and function parameter.Specifically Ground, characteristic parameter collection include M parameter identification, and M is positive integer.Parameter identification refers to the one kind assigned for each characteristic parameter Identification number.Such as: the parameter identification of lines of code can be Lines;The parameter identification of code redundancies can be Repetive rate;The parameter identification of member variable can be Variable;The parameter identification of function numbers can be Number;Function parameter Parameter identification can be Parameter.
S102: corresponding feature extraction script is obtained according to each parameter identification.
Wherein, feature extraction script, which refers to, can directly treat the text that classification code data carry out feature extraction.In this implementation In example, feature extraction script is database that is compiled in advance and being stored in server-side, therefore according to the parameter identification of acquisition Can corresponding feature extraction script directly be obtained from the database of server-side.Such as: according to the parameter identification Lines of lines of code Corresponding feature extraction script<Lines>can be got from the database of server-side;According to the parameter identification of code redundancies Repetive rate can get corresponding feature extraction script<Repetive rate>from the database of server-side;According at The parameter identification Variable of member's variable can get corresponding feature extraction script<Variable>from the database of server-side; Corresponding feature extraction script < Number can be got from the database of server-side according to the parameter identification Number of function numbers >;Corresponding feature extraction script can be got from the database of server-side according to the parameter identification Parameter of function parameter <Parameter>。
S103: classification code data are treated using feature extraction script and carry out feature extraction, obtain code data to be sorted Sample characteristics to be sorted.
Wherein, sample characteristics to be sorted refer to one group of characteristic for reflecting code data to be sorted in structure.Specifically, The sample characteristics to be sorted of code data to be sorted may include: lines of code, code redundancies, member variable, function numbers And function parameter.Specifically, the direct feature extraction of classification code data progress is treated since each feature extraction script all has Function, therefore can directly by the feature extraction script obtained according to step S102 treat classification code data carry out feature mention It takes, obtains the sample characteristics to be sorted of code data to be sorted.
For the present embodiment by obtaining characteristic parameter collection, characteristic parameter collection includes M parameter identification;Then according to each parameter Mark obtains corresponding feature extraction script, finally treats classification code data using feature extraction script and carries out feature extraction, Obtain the sample characteristics to be sorted of code data to be sorted;Further ensure the sample to be sorted of the code data to be sorted of acquisition The accuracy and validity of eigen.
In one embodiment, as shown in fig. 6, before obtaining N number of sample code data, code refactoring method further include:
S2011: L source code data are obtained, L is positive integer.
Wherein, source code data refer to the code data without any processing.Optionally, L source code data can be with It is the source code obtained from a certain code system, or from open source system such as Tomcat, ArgoUML or Apache Ant The source code of acquisition, L are positive integer.In this step, L source code data of acquisition may be according to different programming languages What speech was compiled;Such as: source code data can be by java language compilation's or be by c++ language compilation, It can also be through python language compilation etc..
S2012: code detection is carried out to each source code data, obtains the programming language class of each source code data Type.
Wherein, programming language type refers to a kind of abstract concept for describing the compiler language of different source code data.It is optional Ground, the programming language type of source code data may include: java language, c++ language, python language and SSH language etc.. Specifically, the automatic inspection of code can be realized by the detection instrument of program language by carrying out code detection to each source code data It surveys, as long as each source code data are input in the detection instrument of program language, each source code number can be obtained According to corresponding programming language type.
Preferably, code detection is being carried out to each source code data, is obtaining the programming language of each source code data Say type after, can also the programming language type according to corresponding to each source code data, assign each source code data Corresponding type identification.Wherein, type identification refers to for distinguishing programming language type corresponding to different source code data not A kind of same mark.Type identification can be indicated with any one in number, capitalization or big minuscule.Such as: it assigns The corresponding type identification of source code data compiled with java language is a;Assign the source code compiled with c++ language The corresponding type identification of data is b;Assigning the corresponding type identification of source code data compiled with python language is c; Assigning the corresponding type identification of source code data compiled with SSH language is d.
S2013: according to preset screening sample information, each source code data are sieved according to programming language type Choosing, obtains sample code data.
Wherein, screening sample information refers to a kind of regular or requirement for being screened to source code data.Specifically, The programming language type as corresponding to different source code data may be different, according to preset screening sample information, Screening according to programming language type to each source code data can be by canonical matching method, by each source code data Corresponding programming language type is matched with preset screening sample information, finally by the source code data of successful match As sample code data.
Illustratively, if preset screening sample information is to filter out the code number that programming language type is java language According to then by canonical matching method, by programming language type corresponding to each source code data and preset screening sample letter After breath is matched, resulting sample code data are all the code datas due to java language compilation.
For the present embodiment by obtaining L source code data, L is positive integer;Then each source code data are carried out Code detection obtains the programming language type of each source code data;Finally according to preset screening sample information, according to volume Journey language form screens each source code data, obtains sample code data;It not only ensure that sample code data Accuracy, further improve the accuracy of code classification model.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
In one embodiment, a kind of device is provided, code refactoring method one in the code refactoring device and above-described embodiment One is corresponding.As shown in fig. 7, the code refactoring device includes sample characteristics extraction module 10 to be sorted, input module 20, target weight Structure script obtains module 30 and code refactoring module 40.Detailed description are as follows for each functional module:
Sample characteristics extraction module 10 to be sorted treats the progress of classification code data for obtaining code data to be sorted Feature extraction obtains the sample characteristics to be sorted of code data to be sorted;
Input module 20, for the sample characteristics to be sorted of code data to be sorted to be input to preset code classification mould In type, the object code reconstruct type of code data to be sorted is obtained;
Object reconstruction script obtains module 30, for reconstructing type according to the object code of code data to be sorted, obtains Corresponding object reconstruction script;
Code refactoring module 40 carries out code refactoring for treating classification code data using object reconstruction script.
Preferably, as shown in figure 8, code refactoring device further include:
Sample code data acquisition module 201, for obtaining N number of sample code data, each sample code data include Corresponding reconstruction of the original code type, N are positive integer;
Training sample characteristic extracting module 202 is obtained for carrying out feature extraction to each sample code data per the same The training sample feature of this code data;
Model training module 203, for according to each sample code data training sample feature and corresponding original generation Code reconstruct type is trained decision-tree model, obtains code classification model.
Preferably, as shown in figure 9, model training module 203 includes:
Component units 2031, for by the training sample feature of each sample code data and corresponding reconstruction of the original code Type forms sample characteristics collection, and sample characteristics collection is divided into sample training collection and sample verifying collection;
Input unit 2032, is trained for sample training collection to be input in decision-tree model, obtains preliminary classification Model;
Authentication unit 2033 is verified accuracy rate for verifying using sample verifying collection to preliminary classification model;
Determination unit 2034 is used for when verifying accuracy rate more than or equal to default accuracy rate, and preliminary classification model is true It is set to code classification model.
Preferably, characteristic extracting module 10 includes:
Characteristic parameter collection acquiring unit, for obtaining characteristic parameter collection, characteristic parameter collection includes M parameter identification, and M is positive Integer;
Feature extraction script acquiring unit, for obtaining corresponding feature extraction script according to each parameter identification;
Sample characteristics extraction unit to be sorted is mentioned for treating classification code data progress feature using feature extraction script It takes, obtains the sample characteristics to be sorted of code data to be sorted.
Preferably, code refactoring device further include:
Source code data acquisition module, for obtaining L source code data, L is positive integer;
Code detection module obtains each source code data for carrying out code detection to each source code data Programming language type;
Screening module is used for according to preset screening sample information, according to programming language type to each source code number According to being screened, sample code data are obtained.
Specific about code refactoring device limits the restriction that may refer to above for code refactoring method, herein not It repeats again.Modules in above-mentioned code refactoring device can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.
In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in Figure 10.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is used to store the data used in the code refactoring method in above-described embodiment.The computer equipment Network interface is used to communicate with external terminal by network connection.To realize one kind when the computer program is executed by processor Code refactoring method.
In one embodiment, a kind of computer equipment is provided, including memory, processor and storage are on a memory And the computer program that can be run on a processor, processor realize code refactoring in above-described embodiment when executing computer program Method.
In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program realizes code refactoring method in above-described embodiment when being executed by processor.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of code refactoring method characterized by comprising
Code data to be sorted is obtained, feature extraction is carried out to the code data to be sorted, obtains the code number to be sorted According to sample characteristics to be sorted;
The sample characteristics to be sorted of the code data to be sorted are input in preset code classification model, obtain it is described to The object code of classification code data reconstructs type;
Type is reconstructed according to the object code of the code data to be sorted, obtains corresponding object reconstruction script;
Code refactoring is carried out to the code data to be sorted using the object reconstruction script.
2. code refactoring method as described in claim 1, which is characterized in that it is described by the code data to be sorted to Classification samples feature is input in preset code classification model, obtains the object code reconstruct class of the code data to be sorted Before type, the code refactoring method further include:
N number of sample code data are obtained, each sample code data include corresponding reconstruction of the original code type, and N is positive Integer;
Feature extraction is carried out to each sample code data, the training sample for obtaining each sample code data is special Sign;
According to the training sample feature of each sample code data and the corresponding reconstruction of the original code type to decision Tree-model is trained, and obtains code classification model.
3. code refactoring method as claimed in claim 2, which is characterized in that described according to each sample code data Training sample feature and the corresponding reconstruction of the original code type are trained decision-tree model, obtain code classification mould Type, comprising:
The training sample feature of each sample code data and the corresponding reconstruction of the original code type are formed into sample Feature set, and the sample characteristics collection is divided into sample training collection and sample verifying collection;
The sample training collection is input in decision-tree model and is trained, preliminary classification model is obtained;
The preliminary classification model is verified using sample verifying collection, is verified accuracy rate;
If the verifying accuracy rate is greater than or equal to default accuracy rate, the preliminary classification model is determined as code classification mould Type.
4. code refactoring method as described in claim 1, which is characterized in that described to carry out spy to the code data to be sorted Sign is extracted, and the sample characteristics to be sorted of the code data to be sorted are obtained, comprising:
Characteristic parameter collection is obtained, the characteristic parameter collection includes M parameter identification, and M is positive integer;
Corresponding feature extraction script is obtained according to each parameter identification;
Feature extraction is carried out to the code data to be sorted using the feature extraction script, obtains the code number to be sorted According to sample characteristics to be sorted.
5. code refactoring method as claimed in claim 2, which is characterized in that before the N number of sample code data of acquisition, The code refactoring method further include:
L source code data are obtained, L is positive integer;
Code detection is carried out to each source code data, obtains the programming language class of each source code data Type;
According to preset screening sample information, each source code data are sieved according to the programming language type Choosing, obtains sample code data.
6. a kind of code refactoring device characterized by comprising
Sample characteristics extraction module to be sorted carries out the code data to be sorted special for obtaining code data to be sorted Sign is extracted, and the sample characteristics to be sorted of the code data to be sorted are obtained;
Input module, for the sample characteristics to be sorted of the code data to be sorted to be input to preset code classification model In, obtain the object code reconstruct type of the code data to be sorted;
Object reconstruction script obtains module, for reconstructing type according to the object code of the code data to be sorted, obtains Take corresponding object reconstruction script;
Code refactoring module, for carrying out code refactoring to the code data to be sorted using the object reconstruction script.
7. code refactoring device as claimed in claim 6, which is characterized in that the code refactoring device further include:
Sample code data acquisition module, for obtaining N number of sample code data, each sample code data include corresponding to Reconstruction of the original code type, N is positive integer;
Training sample characteristic extracting module obtains each described for carrying out feature extraction to each sample code data The training sample feature of sample code data;
Model training module, for according to each sample code data training sample feature and the corresponding original generation Code reconstruct type is trained decision-tree model, obtains code classification model.
8. code refactoring device as claimed in claim 7, which is characterized in that the model training module includes:
Component units, for by the training sample feature of each sample code data and the corresponding reconstruction of the original code Type forms sample characteristics collection, and the sample characteristics collection is divided into sample training collection and sample verifying collection;
Input unit is trained for the sample training collection to be input in decision-tree model, obtains preliminary classification model;
Authentication unit is verified accuracy rate for verifying using sample verifying collection to the preliminary classification model;
Determination unit is used for when the verifying accuracy rate is greater than or equal to default accuracy rate, and the preliminary classification model is true It is set to code classification model.
9. a kind of computer equipment, including memory, processor and storage are in the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Any one of 5 code refactoring methods.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In realization code refactoring method as described in any one of claim 1 to 5 when the computer program is executed by processor.
CN201910345048.9A 2019-04-26 2019-04-26 Code refactoring method, apparatus, computer equipment and medium Pending CN110162335A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910345048.9A CN110162335A (en) 2019-04-26 2019-04-26 Code refactoring method, apparatus, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910345048.9A CN110162335A (en) 2019-04-26 2019-04-26 Code refactoring method, apparatus, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN110162335A true CN110162335A (en) 2019-08-23

Family

ID=67640146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910345048.9A Pending CN110162335A (en) 2019-04-26 2019-04-26 Code refactoring method, apparatus, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN110162335A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625275A (en) * 2020-04-29 2020-09-04 贝壳技术有限公司 Code reconstruction planning method and device, storage medium and electronic equipment
CN112947993A (en) * 2019-12-31 2021-06-11 深圳市明源云链互联网科技有限公司 Method and device for reconstructing system framework, electronic equipment and storage medium
CN113238796A (en) * 2021-05-17 2021-08-10 北京京东振世信息技术有限公司 Code reconstruction method, device, equipment and storage medium
CN114327415A (en) * 2022-03-17 2022-04-12 武汉天喻信息产业股份有限公司 Compiling method and device for compiling java file
WO2022121724A1 (en) * 2020-12-07 2022-06-16 华为技术有限公司 Data processing apparatus and method
WO2023155487A1 (en) * 2022-02-18 2023-08-24 华为云计算技术有限公司 Code refactoring method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220180A (en) * 2017-06-08 2017-09-29 电子科技大学 A kind of code classification method based on neutral net language model
CN108446115A (en) * 2018-03-12 2018-08-24 中国银行股份有限公司 A kind of method and device of code reuse
CN108897572A (en) * 2018-07-19 2018-11-27 北京理工大学 A kind of complicated type reconstructing method based on variable association tree

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107220180A (en) * 2017-06-08 2017-09-29 电子科技大学 A kind of code classification method based on neutral net language model
CN108446115A (en) * 2018-03-12 2018-08-24 中国银行股份有限公司 A kind of method and device of code reuse
CN108897572A (en) * 2018-07-19 2018-11-27 北京理工大学 A kind of complicated type reconstructing method based on variable association tree

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947993A (en) * 2019-12-31 2021-06-11 深圳市明源云链互联网科技有限公司 Method and device for reconstructing system framework, electronic equipment and storage medium
CN112947993B (en) * 2019-12-31 2021-12-07 深圳市明源云链互联网科技有限公司 Method and device for reconstructing system framework, electronic equipment and storage medium
CN111625275A (en) * 2020-04-29 2020-09-04 贝壳技术有限公司 Code reconstruction planning method and device, storage medium and electronic equipment
CN111625275B (en) * 2020-04-29 2023-10-20 贝壳技术有限公司 Code reconstruction planning method and device, storage medium and electronic equipment
WO2022121724A1 (en) * 2020-12-07 2022-06-16 华为技术有限公司 Data processing apparatus and method
CN113238796A (en) * 2021-05-17 2021-08-10 北京京东振世信息技术有限公司 Code reconstruction method, device, equipment and storage medium
WO2023155487A1 (en) * 2022-02-18 2023-08-24 华为云计算技术有限公司 Code refactoring method and device
CN114327415A (en) * 2022-03-17 2022-04-12 武汉天喻信息产业股份有限公司 Compiling method and device for compiling java file

Similar Documents

Publication Publication Date Title
CN110162335A (en) Code refactoring method, apparatus, computer equipment and medium
Davison et al. PyNN: a common interface for neuronal network simulators
Li et al. Software defect prediction via convolutional neural network
US11681925B2 (en) Techniques for creating, analyzing, and modifying neural networks
Rajpal et al. Not all bytes are equal: Neural byte sieve for fuzzing
CN106599922B (en) Transfer learning method and system for large-scale data calibration
CN105184160B (en) A kind of method of the Android phone platform application program malicious act detection based on API object reference relational graphs
CN109829155A (en) Determination method, automatic scoring method, apparatus, equipment and the medium of keyword
US20180373986A1 (en) Machine learning using dynamic multilayer perceptrons
US9984336B2 (en) Classification rule sets creation and application to decision making
CN110021439A (en) Medical data classification method, device and computer equipment based on machine learning
Gupta et al. Neural attribution for semantic bug-localization in student programs
CN106537333A (en) Systems and methods for a database of software artifacts
EP3332320A1 (en) Model integration tool
US20210141801A1 (en) String Parsed Categoric Encodings for Machine Learning
CN108681746A (en) A kind of image-recognizing method, device, electronic equipment and computer-readable medium
CN109753653A (en) Entity name recognition methods, device, computer equipment and storage medium
US11640539B2 (en) Techniques for visualizing the operation of neural networks using samples of training data
US20180357201A1 (en) Ability-providing-data generation apparatus
CN108255702A (en) A kind of test case creation method, apparatus, equipment and storage medium
CN109828750A (en) Auto-configuration data buries method, apparatus, electronic equipment and storage medium a little
CN110502224A (en) Interface analogy method, device and computer equipment based on HTTP request
CN108897572A (en) A kind of complicated type reconstructing method based on variable association tree
CN110162972A (en) A kind of UAF leak detection method based on sentence combined coding deep neural network
CN111210356B (en) Medical insurance data analysis method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination