CN113448861B - Method and device for detecting repeated form - Google Patents

Method and device for detecting repeated form Download PDF

Info

Publication number
CN113448861B
CN113448861B CN202110779913.8A CN202110779913A CN113448861B CN 113448861 B CN113448861 B CN 113448861B CN 202110779913 A CN202110779913 A CN 202110779913A CN 113448861 B CN113448861 B CN 113448861B
Authority
CN
China
Prior art keywords
detected
history
repeated
similar
forms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110779913.8A
Other languages
Chinese (zh)
Other versions
CN113448861A (en
Inventor
党娜
刘洋
李�昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202110779913.8A priority Critical patent/CN113448861B/en
Publication of CN113448861A publication Critical patent/CN113448861A/en
Application granted granted Critical
Publication of CN113448861B publication Critical patent/CN113448861B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a method and a device for detecting a repeated form, wherein the method comprises the following steps: acquiring first characteristic data of a form to be detected; performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; acquiring second characteristic data corresponding to each similar history form, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; and deleting the form to be detected when the form to be detected is repeated with any similar historical form. The invention relates to the technical field of big data, which can screen out repeated forms, avoid repeated processing of the same problem by developers and improve the problem processing efficiency.

Description

Method and device for detecting repeated form
Technical Field
The invention relates to the technical field of big data, in particular to a method and a device for detecting a repeated form.
Background
This section is intended to provide a background or context to the embodiments of the invention that are recited in the claims. The description herein is not admitted to be prior art by inclusion in this section.
Before an application or function is put into use, the application or function needs to be tested, a tester can sort each tested problem into a form, then the form is submitted to a developer, and the developer can check and correct the problems in the form. Different testers may arrange the forms aiming at the same problem, and repeated forms appear, so that when a developer processes the forms, the repeated forms are difficult to screen, and the repeated processing of the same problem may be performed, so that the problem processing efficiency is affected.
Disclosure of Invention
The embodiment of the invention provides a method for detecting repeated forms, which is used for solving the problems that in the prior art, when a developer processes a form, the repeated form is difficult to screen and the repeated processing of the same problem is possibly carried out, and the problem processing efficiency is affected, and comprises the following steps:
Acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
Performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
Acquiring second characteristic data corresponding to each similar history form, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and deleting the form to be detected when the form to be detected is repeated with any similar historical form.
The embodiment of the invention also provides a device for detecting the repeated forms, which is used for solving the problems that in the prior art, when a developer processes the forms, the repeated forms are difficult to screen and the same problem can be repeatedly processed, and the problem processing efficiency is affected, and the device comprises:
The acquisition module is used for acquiring first characteristic data of the form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
The first processing module is used for word segmentation processing of the descriptive contents of the forms to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to word segmentation results of the forms to be detected and word segmentation results of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
The second processing module is used for acquiring second characteristic data corresponding to each similar history form respectively, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
and the deleting module is used for deleting the form to be detected when the form to be detected is repeated with any similar historical form.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method for detecting the repeated forms when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program for executing the method for detecting the repeated forms.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form; performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history; thus, a part of history forms with low similarity can be filtered, and the pressure of a follow-up form detection model for determining whether the form to be detected and the history form are repeated or not is reduced; then second characteristic data corresponding to each similar history form is obtained, and whether the form to be detected and the similar history form are repeated or not is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar history form aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic of the forms and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high form degree to be detected is repeated with the form to be detected can be accurately judged, and further, repeated forms are screened out, so that repeated processing of the same problem by a developer is avoided, and the problem processing efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a flowchart of a method for detecting a repeated form according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for performing word segmentation on descriptive contents of a form to be detected, and selecting a similar history form with a form degree larger than a preset threshold from the history form according to word segmentation results of the form to be detected and word segmentation results of the history form;
FIG. 3 is a flowchart of a method for performing word segmentation on the description content of a form to be detected, and selecting a similar history form with a form degree larger than a preset threshold from the history forms according to the word segmentation result of the form to be detected and the word segmentation result of the history forms, which are provided in the embodiment of the present invention;
FIG. 4 is a flowchart of a method for training a form detection model according to an embodiment of the present invention;
fig. 5 is an exemplary diagram of a repeated form detection device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
The term "and/or" is used herein to describe only one relationship, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, may mean including any one or more elements selected from the group consisting of A, B and C.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are open-ended terms, meaning including, but not limited to. The description of the reference terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The order of steps involved in the embodiments is illustrative of the practice of the application, and is not limited and may be suitably modified as desired.
According to research, before an application or a function is put into use, the application or the function needs to be tested, a tester can sort each tested problem into a form, then the form is submitted to a developer, and the developer can inspect and correct the problems in the form. Different testers may sort the forms against the same problem, and repeated forms appear; when a developer processes forms, the problems in each form can be checked one by one, when the number of forms is huge and repeated forms exist, the developer cannot accurately judge which forms are repeated, different developers process different forms, and the developers cannot judge whether the repeated forms are processed or not, so that the same problem can be checked and corrected repeatedly, and the problem processing efficiency is affected.
For the above study, an embodiment of the present invention provides a method for detecting a repeated form, as shown in fig. 1, including:
s101: acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
S102: performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
S103: acquiring second characteristic data corresponding to each similar history form, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
S104: and deleting the form to be detected when the form to be detected is repeated with any similar historical form.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form; performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history; thus, a part of history forms with low similarity can be filtered, and the pressure of a follow-up form detection model for determining whether the form to be detected and the history form are repeated or not is reduced; then second characteristic data corresponding to each similar history form is obtained, and whether the form to be detected and the similar history form are repeated or not is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar history form aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic of the forms and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high form degree to be detected is repeated with the form to be detected can be accurately judged, and further, repeated forms are screened out, so that repeated processing of the same problem by a developer is avoided, and the problem processing efficiency is improved.
Some terms in the embodiments of the present invention are described below:
The first characteristic data and the reference characteristics corresponding to the second characteristic data are the same; the first feature data comprise data corresponding to each reference feature in the form to be detected; the second feature data includes, for example, data corresponding to each reference feature in a similar history form.
The history forms in the embodiment of the invention comprise forms which are not repeated in the forms to be detected obtained in a history manner; similar history forms include, for example, a history form that is highly similar to the form to be detected, which is determined from among the history forms.
The following describes the above-mentioned S101 to S104 in detail.
For S101, the form to be detected includes description content that describes main content of the form to be detected, where the form to be detected includes: a business form of a bank, a form containing test problems submitted by a tester in a test scene, and the like.
Taking a form to be detected as an example of a form submitted by a tester in a test scene and containing a test problem, the first characteristic data comprises at least one of the following: priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases affected by the test problem in the form to be detected, and a service identifier corresponding to the form to be detected; the corresponding second characteristic data for example comprises at least one of the following: priority data of similar history forms, severity data of test problems in the similar history forms, number of test cases affected by the test problems in the similar history forms, and service identifications corresponding to the similar history forms.
For the S102, when performing word segmentation on the description content of the form to be detected, and selecting a similar history form with similarity to the form to be detected greater than a preset threshold from the history forms according to the word segmentation result of the form to be detected and the word segmentation result of the history forms, for example, a method shown in fig. 2 may be adopted, and fig. 2 is a flowchart of a method for performing word segmentation on the description content of the form to be detected, and selecting a similar history form with similarity to the form to be detected greater than the preset threshold from the history forms according to the word segmentation result of the form to be detected and the word segmentation result of the history forms, where the method includes:
s201: and performing word segmentation processing on the description content of the form to be detected by using a word segmentation tool to obtain a first word segment corresponding to the form to be detected.
Specifically, for example, but not limited to, at least one word segmentation process may be performed on the description content of the form to be detected by using a uni-gram language model, a bi-gram language model, and the like, so as to obtain at least one first word corresponding to the form to be detected.
S202: and obtaining second word segmentation corresponding to each history form.
Here, in the history period in which the history forms are acquired and the repeated form detection is performed on the history forms, the description contents of the history forms have been subjected to word segmentation processing by a method similar to the word segmentation processing of the description contents of the forms to be detected, and at least one second word corresponding to each history form is obtained, so that at least one second word corresponding to each history form obtained in the history period can be directly acquired.
S203: for each history form: comparing the first word of the form to be detected with the second word of the history form respectively, and determining the number of the second word consistent with the first word in the history form; and when the proportion of the number of the second segmented words which are consistent with the first segmented words in the history form to the total number of the second segmented words in the history form is larger than a preset threshold value, determining that the history form is a similar history form.
For example, if the preset threshold is seventy-five percent, if the number of second words in the history form that are consistent with the first words is more than seventy-five percent of the total number of second words in the history form, the history form is a similar history form.
In another embodiment of the present invention, when performing word segmentation on the description content of the form to be detected, and selecting a similar history form with a similarity to the form to be detected greater than a preset threshold from the history forms according to the word segmentation result of the form to be detected and the word segmentation result of the history forms, for example, a method shown in fig. 3 may also be adopted, and fig. 3 is a flowchart of a method for performing word segmentation on the description content of another form to be detected provided by the embodiment of the present invention, and selecting a similar history form with a similarity to the form to be detected greater than a preset threshold from the history forms according to the word segmentation result of the form to be detected and the word segmentation result of the history forms, where the method includes:
S301: performing word segmentation on the description contents of the to-be-detected forms and the description contents of the history forms, and obtaining the similarity between the to-be-detected forms and the history forms according to the word segmentation results of the to-be-detected forms and the word segmentation results of the history forms.
Specifically, for example, the following method (1) to (3) may be used, but not limited to, word segmentation processing is performed on the description content of the form to be detected and the description content of the history form:
(1): and performing word segmentation processing on the descriptive contents of the forms to be detected and the descriptive contents of the historical forms by using a pre-trained word segmentation model to obtain the similarity of the forms to be detected and the corresponding historical forms.
The word segmentation model includes, for example: markov models (Hidden Markov Model, HMM), structured perceptrons (Structured Perceptron, SP), conditional random field models (conditional random field, CRF), etc.
(2) And performing word segmentation processing on the descriptive contents of the forms to be detected and the descriptive contents of the historical forms by using a word segmentation method based on character string matching to obtain the similarity of the forms to be detected and the corresponding historical forms.
The word segmentation method based on character string matching comprises the following steps: forward maximum match, reverse maximum match, and bi-directional maximum match.
Taking forward maximum matching as an example: aiming at a history form, taking out a preset number of first characters from left to right from the description content of the form to be detected; then, starting from the first character of the descriptive content of the history form, taking out a preset number of second characters from left to right; comparing the first character with the second character; if the comparison result is inconsistent, the second character is selected again to be compared with the first character from the second character of the description content of the history form until the second character is consistent with the first character or the second character cannot be selected again in the history form, a new first character is selected from the next character of the last character of the first character selected last time in the description content of the form to be detected, and the new comparison is started with the second character in the history form until the new first character cannot be selected again in the form to be detected, and the forward maximum matching is stopped; and calculating the number of matching of the first character and the second character and the comparison times to obtain the similarity between the form to be detected and the history form.
(3) And performing word segmentation on the descriptive contents of the forms to be detected and the descriptive contents of the historical forms by using an understanding-based word segmentation method to obtain the similarity of the forms to be detected and the historical forms respectively.
The word segmentation method based on understanding comprises the following steps: and carrying out grammar and semantic analysis while word segmentation, and processing singular phenomena by utilizing syntax information and semantic information.
S302: and determining the history forms with the similarity between the to-be-detected form and the history forms being larger than a preset threshold value as similar history forms.
Therefore, a part of history forms with low similarity can be filtered, the pressure for determining whether the form to be detected and the history form are repeated or not by the follow-up form detection model is reduced, and the efficiency of repeated form detection is improved.
For the above S103, the form detection model is a model for judging whether two forms are repeated or not according to each reference feature obtained through machine learning, taking a form to be detected as a form submitted by a tester in a test scene and containing a test problem as an example, as shown in fig. 4, a method flowchart for training the form detection model provided by the embodiment of the present invention includes:
s401: and acquiring a form submitted by the tester in a history manner, and extracting a plurality of reference features from the form submitted by the tester in the history manner.
Wherein the reference features include, for example, at least one of: priority of the form, severity of the test problem in the form, test case affected by the test problem in the form, and service identifier corresponding to the form to be detected.
S402: and extracting feature data corresponding to each reference feature in the form submitted by each tester in a history way.
S403: and obtaining a training sample according to the characteristic data corresponding to the form submitted by each tester in history.
Specifically, for example, every two repeated forms in the forms submitted by each tester are used as a positive sample, and the characteristic data of the two repeated forms in the positive sample are used as positive sample data of the positive sample; taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as negative sample data of the negative sample; training samples are derived from the positive samples, and positive sample data for the positive samples, the negative samples, and negative sample data for the negative samples.
S404: and constructing a plurality of decision trees according to the plurality of reference features.
Specifically, for example, for each decision tree, at least one reference feature is selected from a plurality of reference features, and a corresponding node is generated according to each reference feature to form the decision tree.
S405: training the multiple decision trees by using training samples, and selecting a target decision tree with a detection result meeting expectations from the multiple decision trees.
Specifically, for example, a training sample is used for performing supervised training, for example, positive sample data of a positive sample is input into a decision tree, the output result of the decision tree is repeated, negative sample data of a negative sample is input into the decision tree, the output result of the decision tree is non-repeated, and the decision tree is a decision tree with a detection result conforming to expectations; and performing supervised training on each decision tree by using a training sample, and selecting a target decision tree with a detection result meeting expectations from all decision trees.
S406: pruning operation is carried out on the target decision tree according to the preset recursion depth, and a form detection model is obtained.
In step S103, when determining whether the form to be detected and the similar history form are repeated according to the first feature data and the second feature data corresponding to the similar history form by using the form detection model trained in advance, for example: determining whether the form to be detected and the similar history form are repeated or not by utilizing each decision tree in the form detection model according to the first characteristic data and the second characteristic data corresponding to the similar history form; and determining whether the to-be-detected form and the similar historical form are repeated or not according to the number of the first decision trees for determining the to-be-detected form and the similar historical form to be repeated in the form detection model and the number of the second decision trees for determining the to-be-detected form and the similar historical form to be not repeated in the form detection model.
Specifically, according to the number of the first decision trees for determining that the to-be-detected form and the similar history form are repeated in the form detection model and the number of the second decision trees for determining that the to-be-detected form and the similar history form are not repeated in the form detection model, when determining whether the to-be-detected form and the similar history form are repeated, for example, but not limited to, any one of the following methods a to B may be adopted:
a: when the number of the first decision trees is greater than that of the second decision trees, determining that the form to be detected is repeated with the similar historical form; and when the number of the first decision trees is smaller than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
B: setting weight parameters for each decision tree according to the recursion depth of each decision tree in the form detection model; multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of the recursion depths to obtain a predicted value of the repetition of the form to be detected and the similar historical form; multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of the recursion depths to obtain a predicted value of the to-be-detected form, which is not repeated with the similar historical form, and when the predicted value of the to-be-detected form, which is repeated with the similar historical form, is larger than the predicted value of the to-be-detected form, which is not repeated with the similar historical form, the to-be-detected form is repeated with the similar historical form; when the predicted value of the to-be-detected form and the similar historical form are not repeated, the to-be-detected form and the similar historical form are not repeated.
For S104, when the form to be detected is repeated with any similar history form, it represents that the form to be detected is a repeated form, and the form to be detected is deleted in order to avoid the repeated form. Taking the form to be detected as an example of the form containing the test problem submitted by the tester in the test scene, the repeated form can be prevented from being obtained, further, the developer is prevented from repeatedly processing the same test problem, and the problem processing efficiency is improved.
In addition, in another embodiment of the present invention, when the form to be detected and all similar history forms are not repeated, the test problem contained in the form to be detected is examined and corrected, and the form to be detected is stored in the database storing the history forms.
The embodiment of the invention also provides a device for detecting the repeated form, which is described in the following embodiment. Because the principle of the device for solving the problem is similar to that of the repeated form detection method, the implementation of the device can refer to the implementation of the repeated form detection method, and repeated parts are not repeated.
As shown in fig. 5, an exemplary diagram of a repeated form detection apparatus according to an embodiment of the present invention includes: an acquisition module 501, a first processing module 502, a second processing module 503, and a deletion module 504; wherein,
An obtaining module 501, configured to obtain first feature data of a form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
The first processing module 502 performs word segmentation processing on the description content of the form to be detected, and screens out similar historical forms with the form degree larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
A second processing module 503, configured to obtain second feature data corresponding to each similar history form, and determine, for each similar history form, whether the form to be detected and the similar history form are repeated according to the first feature data and the second feature data corresponding to the similar history form by using a pre-trained form detection model; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
And the deleting module 504 is configured to delete the form to be detected when the form to be detected is repeated with any similar historical form.
In one possible implementation, the first characteristic data includes at least one of: priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases affected by the test problem in the form to be detected, and a service identifier corresponding to the form to be detected; the second characteristic data includes at least one of: priority data of similar history forms, severity data of test problems in the similar history forms, number of test cases affected by the test problems in the similar history forms, and service identifications corresponding to the similar history forms.
In a possible implementation manner, the first processing module is specifically configured to perform word segmentation processing on the description content of the form to be detected by using a word segmentation tool to obtain a first word segment corresponding to the form to be detected; acquiring second word segmentation corresponding to each history form; for each history form: comparing the first word of the form to be detected with the second word of the history form respectively, and determining the number of the second word consistent with the first word in the history form; and when the proportion of the number of the second segmented words which are consistent with the first segmented words in the history form to the total number of the second segmented words in the history form is larger than a preset threshold value, determining that the history form is a similar history form.
In one possible embodiment, the method further comprises: the third processing module is used for acquiring a form submitted by the tester in a history manner and extracting a plurality of reference features from the form submitted by the tester in the history manner; extracting feature data corresponding to each reference feature in a form submitted by each tester in a history manner; obtaining training samples according to characteristic data corresponding to the forms submitted by the historic testers; constructing a plurality of decision trees according to a plurality of reference features; training a plurality of decision trees by using training samples, and selecting a target decision tree with a detection result meeting expectations from the plurality of decision trees; pruning operation is carried out on the target decision tree according to the preset recursion depth, and a form detection model is obtained.
In one possible implementation manner, the third processing module is specifically configured to take each two repeated forms in the forms submitted by each tester as a positive sample, and take feature data of two repeated forms in the positive sample as positive sample data of the positive sample;
Taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as negative sample data of the negative sample;
Training samples are derived from the positive samples, and positive sample data for the positive samples, the negative samples, and negative sample data for the negative samples.
In a possible implementation manner, the second processing module is specifically configured to determine, for each similar history form, whether the form to be detected and the similar history form are repeated according to the first feature data and the second feature data corresponding to the similar history form, by using each decision tree in the form detection model; and determining whether the to-be-detected form and the similar historical form are repeated or not according to the number of the first decision trees for determining the to-be-detected form and the similar historical form to be repeated in the form detection model and the number of the second decision trees for determining the to-be-detected form and the similar historical form to be not repeated in the form detection model.
In a possible implementation manner, the second processing module is specifically configured to determine that the form to be detected is repeated with the similar history form when the number of the first decision trees is greater than the number of the second decision trees; and when the number of the first decision trees is smaller than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
In a possible implementation manner, the third processing module is further configured to set a weight parameter for each decision tree according to a recursion depth of each decision tree in the form detection model; the second processing module is specifically configured to multiply the number of the first decision trees of each recursion depth with a weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of the recursion depths to obtain a predicted value of the repetition of the form to be detected and the similar historical form; multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of the recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form; when the predicted value of the repetition of the to-be-detected form and the similar historical form is larger than the predicted value of the non-repetition of the to-be-detected form and the similar historical form, the to-be-detected form and the similar historical form are repeated; when the predicted value of the to-be-detected form and the similar historical form are not repeated, the to-be-detected form and the similar historical form are not repeated.
In one possible embodiment, the method further comprises: and the fourth processing module is used for checking and correcting the test problems contained in the to-be-detected form when the to-be-detected form is not repeated with all similar historical forms, and storing the to-be-detected form into a database for storing the historical forms.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method for detecting the repeated forms when executing the computer program.
The embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program for executing the method for detecting the repeated forms.
In the embodiment of the invention, first characteristic data of a form to be detected is obtained; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form; performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history; thus, a part of history forms with low similarity can be filtered, and the pressure of a follow-up form detection model for determining whether the form to be detected and the history form are repeated or not is reduced; then second characteristic data corresponding to each similar history form is obtained, and whether the form to be detected and the similar history form are repeated or not is determined by utilizing a pre-trained form detection model according to the first characteristic data and the second characteristic data corresponding to the similar history form aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic of the forms and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same; deleting the form to be detected when the form to be detected is repeated with any similar historical form; therefore, through the form detection model, whether the historical form with high form degree to be detected is repeated with the form to be detected can be accurately judged, and further, repeated forms are screened out, so that repeated processing of the same problem by a developer is avoided, and the problem processing efficiency is improved.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (18)

1.A method for detecting a duplicate form, comprising:
Acquiring first characteristic data of a form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
Performing word segmentation processing on the descriptive content of the form to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to the word segmentation result of the form to be detected and the word segmentation result of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
Acquiring second characteristic data corresponding to each similar history form, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
Deleting the form to be detected when the form to be detected is repeated with any similar historical form;
Performing word segmentation processing on the description content of the form to be detected, and screening a similar historical form with the form similarity larger than a preset threshold value from the historical form according to the word segmentation result of the form to be detected and the word segmentation result of the historical form, wherein the method comprises the following steps:
Performing word segmentation processing on the description content of the form to be detected by using a word segmentation tool to obtain a first word segment corresponding to the form to be detected;
acquiring second word segmentation corresponding to each history form;
for each history form:
Comparing the first word of the form to be detected with the second word of the history form respectively, and determining the number of the second word consistent with the first word in the history form; and when the proportion of the number of the second segmented words which are consistent with the first segmented words in the history form to the total number of the second segmented words in the history form is larger than a preset threshold value, determining that the history form is a similar history form.
2. The method of claim 1, wherein the first characteristic data comprises at least one of:
Priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases affected by the test problem in the form to be detected, and a service identifier corresponding to the form to be detected;
the second characteristic data includes at least one of:
Priority data of similar history forms, severity data of test problems in the similar history forms, number of test cases affected by the test problems in the similar history forms, and service identifications corresponding to the similar history forms.
3. The method of detecting according to claim 1, further comprising:
acquiring a form submitted by a tester in a history manner, and extracting a plurality of reference features from the form submitted by the tester in the history manner;
extracting feature data corresponding to each reference feature in a form submitted by each tester in a history manner;
obtaining training samples according to characteristic data corresponding to the forms submitted by the historic testers;
Constructing a plurality of decision trees according to a plurality of reference features;
Training a plurality of decision trees by using training samples, and selecting a target decision tree with a detection result meeting expectations from the plurality of decision trees;
pruning operation is carried out on the target decision tree according to the preset recursion depth, and a form detection model is obtained.
4. The method of claim 3, wherein obtaining training samples based on characteristic data corresponding to the forms submitted by each tester history comprises:
taking each two repeated forms in the forms submitted by each tester as a positive sample, and taking the characteristic data of the two repeated forms in the positive sample as positive sample data of the positive sample;
Taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as negative sample data of the negative sample;
Training samples are derived from the positive samples, and positive sample data for the positive samples, the negative samples, and negative sample data for the negative samples.
5. The method according to claim 3 or 4, wherein for each similar history form, determining whether the form to be detected and the similar history form are repeated by using a pre-trained form detection model according to the first feature data and the second feature data corresponding to the similar history form comprises:
Determining whether the form to be detected and the similar history form are repeated or not by utilizing each decision tree in the form detection model according to the first characteristic data and the second characteristic data corresponding to the similar history form;
And determining whether the to-be-detected form and the similar historical form are repeated or not according to the number of the first decision trees for determining the to-be-detected form and the similar historical form to be repeated in the form detection model and the number of the second decision trees for determining the to-be-detected form and the similar historical form to be not repeated in the form detection model.
6. The method of claim 5, wherein determining whether the form to be detected and the similar history form are repeated based on the number of first decision trees in the form detection model that determine that the form to be detected and the similar history form are repeated and the number of second decision trees in the form detection model that determine that the form to be detected and the similar history form are not repeated comprises:
when the number of the first decision trees is greater than that of the second decision trees, determining that the form to be detected is repeated with the similar historical form;
and when the number of the first decision trees is smaller than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
7. The method of detecting according to claim 5, further comprising:
Setting weight parameters for each decision tree according to the recursion depth of each decision tree in the form detection model;
Determining whether the to-be-detected form and the similar history form are repeated according to the number of first decision trees for determining that the to-be-detected form and the similar history form are repeated in the form detection model and the number of second decision trees for determining that the to-be-detected form and the similar history form are not repeated in the form detection model, including:
Multiplying the number of the first decision trees of each recursion depth by the weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of the recursion depths to obtain a predicted value of the repetition of the form to be detected and the similar historical form;
Multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of the recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form;
when the predicted value of the repetition of the to-be-detected form and the similar historical form is larger than the predicted value of the non-repetition of the to-be-detected form and the similar historical form, the to-be-detected form and the similar historical form are repeated;
when the predicted value of the to-be-detected form and the similar historical form are not repeated, the to-be-detected form and the similar historical form are not repeated.
8. The method of detecting according to claim 1, further comprising:
And when the to-be-detected form is not repeated with all similar historical forms, checking and correcting the test problems contained in the to-be-detected form, and storing the to-be-detected form into a database storing the historical forms.
9. A device for detecting a duplicate form, comprising:
The acquisition module is used for acquiring first characteristic data of the form to be detected; the first characteristic data are data corresponding to each reference characteristic of the form to be detected; the form to be detected contains descriptive contents for describing the form;
The first processing module is used for word segmentation processing of the descriptive contents of the forms to be detected, and screening out similar historical forms with the form similarity larger than a preset threshold value from the historical forms according to word segmentation results of the forms to be detected and word segmentation results of the historical forms; the history form is a form which is not repeated in the form to be detected obtained in the history;
The second processing module is used for acquiring second characteristic data corresponding to each similar history form respectively, and determining whether the form to be detected and the similar history form are repeated or not according to the first characteristic data and the second characteristic data corresponding to the similar history form by utilizing a pre-trained form detection model aiming at each similar history form; the form detection model is a model which is obtained through machine learning according to each reference characteristic and judges whether the two forms are repeated or not; the first characteristic data and the reference characteristic corresponding to the second characteristic data are the same;
the deleting module is used for deleting the form to be detected when the form to be detected is repeated with any similar historical form;
the first processing module is specifically used for performing word segmentation on the description content of the form to be detected by using a word segmentation tool to obtain a first word segment corresponding to the form to be detected;
acquiring second word segmentation corresponding to each history form;
for each history form:
Comparing the first word of the form to be detected with the second word of the history form respectively, and determining the number of the second word consistent with the first word in the history form; and when the proportion of the number of the second segmented words which are consistent with the first segmented words in the history form to the total number of the second segmented words in the history form is larger than a preset threshold value, determining that the history form is a similar history form.
10. The detection apparatus according to claim 9, wherein the first characteristic data includes at least one of:
Priority data of a form to be detected, severity data of a test problem in the form to be detected, the number of test cases affected by the test problem in the form to be detected, and a service identifier corresponding to the form to be detected;
the second characteristic data includes at least one of:
Priority data of similar history forms, severity data of test problems in the similar history forms, number of test cases affected by the test problems in the similar history forms, and service identifications corresponding to the similar history forms.
11. The detection apparatus according to claim 9, characterized by further comprising:
The third processing module is used for acquiring a form submitted by the tester in a history manner and extracting a plurality of reference features from the form submitted by the tester in the history manner;
extracting feature data corresponding to each reference feature in a form submitted by each tester in a history manner;
obtaining training samples according to characteristic data corresponding to the forms submitted by the historic testers;
Constructing a plurality of decision trees according to a plurality of reference features;
Training a plurality of decision trees by using training samples, and selecting a target decision tree with a detection result meeting expectations from the plurality of decision trees;
pruning operation is carried out on the target decision tree according to the preset recursion depth, and a form detection model is obtained.
12. The apparatus according to claim 11, wherein the third processing module is specifically configured to take each two of the repeated forms submitted by each tester as a positive sample, and take the characteristic data of the two repeated forms in the positive sample as the positive sample data of the positive sample;
Taking every two non-repeated forms in the forms submitted by each tester as a negative sample, and taking the characteristic data of the two non-repeated forms in the negative sample as negative sample data of the negative sample;
Training samples are derived from the positive samples, and positive sample data for the positive samples, the negative samples, and negative sample data for the negative samples.
13. The detection apparatus according to claim 11 or 12, wherein the second processing module is specifically configured to determine, for each similar history form, whether the form to be detected and the similar history form are repeated according to the first feature data and the second feature data corresponding to the similar history form, using each decision tree in the form detection model;
And determining whether the to-be-detected form and the similar historical form are repeated or not according to the number of the first decision trees for determining the to-be-detected form and the similar historical form to be repeated in the form detection model and the number of the second decision trees for determining the to-be-detected form and the similar historical form to be not repeated in the form detection model.
14. The apparatus according to claim 13, wherein the second processing module is configured to determine that the form to be detected is repeated with the similar history form when the number of the first decision trees is greater than the number of the second decision trees;
and when the number of the first decision trees is smaller than or equal to that of the second decision trees, determining that the form to be detected is not repeated with the similar historical form.
15. The apparatus of claim 13, wherein the third processing module is further configured to set a weight parameter for each decision tree according to a recursion depth of each decision tree in the form detection model;
The second processing module is specifically configured to multiply the number of the first decision trees of each recursion depth with a weight parameter corresponding to the first decision tree of the recursion depth to obtain a first calculation result of the recursion depth; adding the first calculation results of the recursion depths to obtain a predicted value of the repetition of the form to be detected and the similar historical form;
Multiplying the number of the second decision trees of each recursion depth by the weight parameter corresponding to the second decision tree of the recursion depth to obtain a second calculation result of the recursion depth; adding the second calculation results of the recursion depths to obtain a predicted value of the form to be detected, which is not repeated with the similar historical form;
when the predicted value of the repetition of the to-be-detected form and the similar historical form is larger than the predicted value of the non-repetition of the to-be-detected form and the similar historical form, the to-be-detected form and the similar historical form are repeated;
when the predicted value of the to-be-detected form and the similar historical form are not repeated, the to-be-detected form and the similar historical form are not repeated.
16. The detection apparatus according to claim 9, characterized by further comprising:
And the fourth processing module is used for checking and correcting the test problems contained in the to-be-detected form when the to-be-detected form is not repeated with all similar historical forms, and storing the to-be-detected form into a database for storing the historical forms.
17. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method for detecting repeated forms according to any of claims 1 to 8 when the computer program is executed by the processor.
18. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for executing the method of detecting a duplicate form according to any one of claims 1 to 8.
CN202110779913.8A 2021-07-09 Method and device for detecting repeated form Active CN113448861B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110779913.8A CN113448861B (en) 2021-07-09 Method and device for detecting repeated form

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110779913.8A CN113448861B (en) 2021-07-09 Method and device for detecting repeated form

Publications (2)

Publication Number Publication Date
CN113448861A CN113448861A (en) 2021-09-28
CN113448861B true CN113448861B (en) 2024-06-21

Family

ID=

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163409A (en) * 2020-09-23 2021-01-01 平安直通咨询有限公司上海分公司 Similar document detection method, system, terminal device and computer readable storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163409A (en) * 2020-09-23 2021-01-01 平安直通咨询有限公司上海分公司 Similar document detection method, system, terminal device and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于核心词相似度的重复数据检测框架构建;吴善鹏;李萍;;信息***工程;20200520(05) *

Similar Documents

Publication Publication Date Title
CN111124840B (en) Method and device for predicting alarm in business operation and maintenance and electronic equipment
CN111338692B (en) Vulnerability classification method and device based on vulnerability codes and electronic equipment
CN114936158B (en) Software defect positioning method based on graph convolution neural network
CN111177655B (en) Data processing method and device and electronic equipment
CN110188196B (en) Random forest based text increment dimension reduction method
US11385988B2 (en) System and method to improve results of a static code analysis based on the probability of a true error
CN112328499A (en) Test data generation method, device, equipment and medium
CN114547318A (en) Fault information acquisition method, device, equipment and computer storage medium
CN112866292A (en) Attack behavior prediction method and device for multi-sample combination attack
CN114443331A (en) Time series data abnormity detection method and device
CN116467171A (en) Automatic test case construction device, method, electronic equipment and storage medium
CN116756041A (en) Code defect prediction and positioning method and device, storage medium and computer equipment
CN108875810B (en) Method and device for sampling negative examples from word frequency table aiming at training corpus
CN112783508B (en) File compiling method, device, equipment and storage medium
CN112257332B (en) Simulation model evaluation method and device
CN113448861B (en) Method and device for detecting repeated form
CN110808947B (en) Automatic vulnerability quantitative evaluation method and system
CN116383048A (en) Software quality information processing method and device
CN113448861A (en) Method and device for detecting repeated forms
CN114792007A (en) Code detection method, device, equipment, storage medium and computer program product
CN114610576A (en) Log generation monitoring method and device
CN113778902A (en) Method and device for detecting coverage of test case
CN109739950B (en) Method and device for screening applicable legal provision
CN109284354B (en) Script searching method and device, computer equipment and storage medium
US20240160696A1 (en) Method for Automatic Detection of Pair-Wise Interaction Effects Among Large Number of Variables

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant