CN109635254A - Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model - Google Patents
Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model Download PDFInfo
- Publication number
- CN109635254A CN109635254A CN201811467956.7A CN201811467956A CN109635254A CN 109635254 A CN109635254 A CN 109635254A CN 201811467956 A CN201811467956 A CN 201811467956A CN 109635254 A CN109635254 A CN 109635254A
- Authority
- CN
- China
- Prior art keywords
- decision tree
- classification
- class
- keyword
- svm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 31
- 230000004927 fusion Effects 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 36
- 230000008569 process Effects 0.000 claims description 6
- 230000003252 repetitive effect Effects 0.000 claims description 5
- 238000012360 testing method Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 3
- 238000001514 detection method Methods 0.000 claims description 3
- 238000000513 principal component analysis Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 238000004513 sizing Methods 0.000 claims description 3
- 238000003646 Spearman's rank correlation coefficient Methods 0.000 claims description 2
- 230000009467 reduction Effects 0.000 claims description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention proposes a kind of paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model.Firstly, the frequency of occurrences with searching keyword establishes keyword database.Secondly, classifying to keyword.Furthermore the plagiarism type that first coarse screens determining article is carried out using decision tree and naive Bayesian fusion.Finally, learning in the case where classification standard can not be specified when using decision tree classification with SVM, riffle is formed.This patent is intended to improve current paper duplicate checking system, improves system for the accuracy of paper duplicate checking.
Description
Technical field:
The present invention relates to a kind of text checking methods, and in particular to is based on naive Bayesian, decision tree and SVM mixed model
Paper duplicate checking method.
Background technique:
Current internet is very flourishing, the research achievement for having many different researchers to upload on network.Present many positions, example
Academic title's paper will be completed as teacher, doctor carry out academic title's competition, graduates' graduation is also required to finish one's graduation thesis, however
There are many people to violate lowest permissible level of virtue, wherein in order to reach the research achievement that the personal purpose of oneself plagiarizes others.It is learned to hit
Art is faked and academic improper behavior, paper duplicate checking software come into being.But this technology is complete not enough, the possibility of erroneous judgement
Property is very high.There is also following Railway Projects for current paper duplicate checking system: (1) very tight for the duplicate checking technology of text in article
Lattice, but the plagiarism of the central idea in article is but difficult to recognize.(2) inevitably occur in article some formula or
Some knowledge class descriptions, these should not be calculated to plagiarize, but many duplicate checking systems are but judged to plagiarize now.(3) for plagiarizing
The differentiation of type is unobvious, leads to not the plagiarism severity for judging author.For problem as above, art technology is needed
Personnel solve.
Summary of the invention:
In view of the above-mentioned problems, the present invention proposes a kind of paper duplicate checking method.It is specific as follows:
1. the paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model, which is characterized in that including with
Lower four steps:
S1 establishes keyword database with the frequency of occurrences of searching keyword;
S2 classifies to keyword;
S3 carries out preliminary coarse sizing using decision tree and naive Bayesian fusion;
S4 learns with SVM when can not specify classification standard when with decision tree classification, forms riffle.
2. the paper duplicate checking side according to claim 1 based on naive Bayesian, decision tree and SVM mixed model
Method, which is characterized in that step S2 includes following sub-step:
S21 classifies keyword, is divided into innovation class and knowledge class;
S22, it is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but innovative keyword is tolerated
Rate wants lower, is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
3. the paper duplicate checking side according to claim 1 based on naive Bayesian, decision tree and SVM mixed model
Method, which is characterized in that step S3 includes following sub-step:
S31: key index is extracted by detection chart, data, keyword, central idea;
S32: spearman rank correlation coefficient is selected to determine the correlation of index between any two, and to the correlation filtered out
Property strong index dimensionality reduction carried out using Principal Component Analysis, reconfigure as one group of new generalized variable being independent of each other;
S33: beginning, six four sections of the interlude, concluding paragraph parts of article are chosen, power is analyzed using analytic hierarchy process (AHP)
It is heavy, the integrated value of six parts is obtained after weighted comprehensive;The extracting method of interlude are as follows: if intermediate body part core views number
Greater than four, then the section most by number of words in each core views, after its number of words is arranged from big to small, chooses highest four
A section;If core views number equal to four, directly chooses the most section of the number of words in this four viewpoints;It is selected if less than four
Preceding four paragraphs for taking the number of words in text after all paragraph number of words arrangements most;
S34: set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with paragraph Criterion Attribute
Six position integrated values and its corresponding type of plagiarizing are training sample, and CART is established by way of recursive subdivision to training sample
Decision tree;
S35: counting CART decision tree and Bayesian model respectively and classify in the training process correct training sample number,
It is the classification accuracy A of two algorithms divided by training sample sumCARTAnd ANB;And then it calculates decision-tree model and is copied respectively to all kinds of
The training accuracy b (k), k=1,2 ... attacked, m, m are whole plagiarism type sums;Decision-tree model is defined to plagiarize in output
Type is YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk| X) NB weighted comprehensive, it obtains,
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
4. a kind of paper duplicate checking based on naive Bayesian, decision tree and SVM mixed model according to claim 1
Method, which is characterized in that step S4 includes following sub-step:
S41: training sample set is generated, training sample is actively selected;I.e. on the various articles of training, C classification is drawn a circle to approve
Training article collection I1, I2 ..., IC, are respectively sampled I1, I2 ..., IC using the method for uniform sampling, generate training
Sample set I ' 1, I ' 2 ..., I ' C, the equal of article quantity of each sample set is using the plagiarism probability of article as sample vector;
S42: the class splitting scheme of node classifier is as follows:
Assuming that it is respectively in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2,
Classification number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, Xj's
Number of samples is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=| | e1 i-e2 i||
4) center to the center of S2 all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated
Average distance:
5) d is calculated according to the following formulai, the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class for designing each node classifier top-downly is divided
Scheme finally establishes complete decision tree;
S43: training sample set I1 ', I2 ' ..., IC ' are utilized, each node classifier is trained, has been ultimately formed
Whole SVM decision tree classifier;
S44: using whole pixels of image to be classified as test sample collection, test point is carried out with SVM decision tree classifier
Classification results are mapped back image and realize image classification by class.
The beneficial effects of the present invention are: solving the subproblem in current paper duplicate checking system, the tool plagiarized has been refined
Body situation.Using keyword classification and keyword repetitive rate inquiry reduce paper duplicate checking in may cause for knowledge type weight
Multiple erroneous judgement merges the plagiarism type for establishing CART decision tree to judge paper using naive Bayesian and decision Tree algorithms,
For the plagiarism type that cannot clearly classify, establishes SVM decision tree classifier using SVM and decision Tree algorithms fusion and divided
Class further analyzes plagiarism degree.
Detailed description of the invention
Additional aspect of the invention and advantage will be apparent and hold from the description of the embodiment in conjunction with the following figures
It is readily understood, in which:
Fig. 1 is overview flow chart of the present invention.
Specific embodiment
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not considered as limiting the invention.
In the description of the present invention, it is to be understood that, term " longitudinal direction ", " transverse direction ", "upper", "lower", "front", "rear",
The orientation or positional relationship of the instructions such as "left", "right", "vertical", "horizontal", "top", "bottom" "inner", "outside" is based on attached drawing institute
The orientation or positional relationship shown, is merely for convenience of description of the present invention and simplification of the description, rather than the dress of indication or suggestion meaning
It sets or element must have a particular orientation, be constructed and operated in a specific orientation, therefore should not be understood as to limit of the invention
System.
The present invention proposes a kind of paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model.By right
Situation is plagiarized to paper in the fusion of naive Bayesian and decision Tree algorithms and carries out the determining plagiarism type of coarse sizing, then by certainly
Plan tree and SVM algorithm fusion are classified further to the plagiarism type that can not classify.
In conjunction with attached drawing 1, the present invention is described in detail, mainly comprises the steps that
Step 1: starting.
Step 2: extracting keyword, detect keyword repetitive rate.
Keyword database is established with the frequency of occurrences of searching keyword, keyword is classified, is divided into innovation class
With knowledge class.It is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but for innovative keyword tolerance rate
It wants lower, is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
Step 3: establishing CART decision tree.
Key index is extracted by detection chart, data, keyword, central idea.Select spearman rank correlation system
Number is dropped the strong index of the correlation filtered out using Principal Component Analysis to determine the correlation of index between any two
Dimension reconfigures as one group of new generalized variable being independent of each other.Choose beginning, four sections of the interlude, concluding paragraph six of article
Part analyzes weight using analytic hierarchy process (AHP), the integrated value of six parts is obtained after weighted comprehensive;The extracting method of interlude
Are as follows: if intermediate body part core views number is greater than four, by the most section of number of words in each core views, by its number of words
After arranging from big to small, highest four sections are chosen;If core views number is equal to four, directly choose in this four viewpoints
The most section of number of words;Preceding four paragraphs of number of words at most in text after all paragraph number of words arrangements are chosen if less than four.
Set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with six position integrated values of paragraph Criterion Attribute
It is training sample with its corresponding type of plagiarizing, establishes CART decision tree by way of recursive subdivision to training sample.
Step 4: whether can judge to plagiarize type.
CART decision tree and Bayesian model is counted respectively to classify in the training process correct training sample number, divided by
Training sample sum is the classification accuracy A of two algorithmsCARTAnd ANB;And then decision-tree model is calculated respectively to all kinds of plagiarisms
Training accuracy b (k), k=1,2 ..., m, m are whole plagiarism type sums;It defines decision-tree model and plagiarizes type in output
For YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk|X)NBWeighted comprehensive obtains
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
Step 5: forming SVM decision tree classifier.
Training sample set is generated, training sample is actively selected.I.e. on the various articles of training, the training of C classification is drawn a circle to approve
Article collection I1, I2 ..., IC are sampled using method the difference I1, I2 ..., IC of uniform sampling, generate training sample set
I ' 1, I ' 2 ..., I ' C, the article quantity of each sample set is equal, using the plagiarism probability of article as sample vector.Node classification
The class splitting scheme of device is as follows:
Assuming that it is respectively in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2,
Classification number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, Xj's
Number of samples is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=||e1 i-e2 i||
4) center to the center of S2 all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated
Average distance:
5) di is calculated according to the following formula, and the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class for designing each node classifier top-downly is divided
Scheme finally establishes complete decision tree;
Using training sample set I1 ', I2 ' ..., IC ', each node classifier is trained, is ultimately formed complete
SVM decision tree classifier.Using whole pixels of image to be classified as test sample collection, surveyed with SVM decision tree classifier
Classification results are mapped back image and realize image classification by examination classification.
Step 6: terminating.
Although an embodiment of the present invention has been shown and described, it will be understood by those skilled in the art that: not
A variety of change, modification, replacement and modification can be carried out to these embodiments in the case where being detached from the principle of the present invention and objective, this
The range of invention is defined by the claims and their equivalents.
Claims (4)
1. the paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model, which is characterized in that including following four
A step:
S1 establishes keyword database with the frequency of occurrences of searching keyword;
S2 classifies to keyword;
S3 carries out preliminary coarse sizing using decision tree and naive Bayesian fusion;
S4 learns with SVM when can not specify classification standard when with decision tree classification, forms riffle.
2. the paper duplicate checking method according to claim 1 based on naive Bayesian, decision tree and SVM mixed model,
It is characterized in that, step S2 includes following sub-step:
S21 classifies keyword, is divided into innovation class and knowledge class;
S22, it is 40% that the repetitive rate of the keyword of knowledge class, which can be extended the deadline, but innovative keyword tolerance rate is wanted
It is lower, it is 5%;Can prevent in this way in duplicate checking article for some universal knowledeges utilization and caused by erroneous judgement.
3. the paper duplicate checking method according to claim 1 based on naive Bayesian, decision tree and SVM mixed model,
It is characterized in that, step S3 includes following sub-step:
S31: key index is extracted by detection chart, data, keyword, central idea;
S32: spearman rank correlation coefficient is selected to determine the correlation of index between any two, and strong to the correlation filtered out
Index dimensionality reduction is carried out using Principal Component Analysis, reconfigure as one group of new generalized variable being independent of each other;
S33: beginning, six four sections of the interlude, concluding paragraph parts of article are chosen, weight is analyzed using analytic hierarchy process (AHP), is added
The integrated value of six parts is obtained after power is comprehensive;The extracting method of interlude are as follows: if intermediate body part core views number is greater than
Four, then the section most by number of words in each core views, after its number of words is arranged from big to small, chooses highest four
Section;If core views number equal to four, directly chooses the most section of the number of words in this four viewpoints;It is chosen if less than four
Preceding four paragraphs of number of words at most in text after all paragraph number of words arrangements;
S34: set of types will be plagiarized and be expressed as dependent variable, Criterion Attribute set representations are independent variable, with six of paragraph Criterion Attribute
Position integrated value and its corresponding type of plagiarizing are training sample, establish CART decision by way of recursive subdivision to training sample
Tree;
S35: counting CART decision tree and Bayesian model respectively and classify in the training process correct training sample number, divided by
Training sample sum is the classification accuracy A of two algorithmsCARTAnd ANB;And then decision-tree model is calculated respectively to all kinds of plagiarisms
Training accuracy b (k), k=1,2 ..., m, m are whole plagiarism type sums;It defines decision-tree model and plagiarizes type in output
For YtWhen be to the posterior probability of all kinds of plagiarisms
By the posterior probability P (Y of itself and Bayesian model outputk|X)NBWeighted comprehensive obtains,
At this point, plagiarism type corresponding to obtained maximum probability is final classification output result.
4. a kind of paper duplicate checking side based on naive Bayesian, decision tree and SVM mixed model according to claim 1
Method, which is characterized in that step S4 includes following sub-step:
S41: training sample set is generated, training sample is actively selected;I.e. on the various articles of training, the training of C classification is drawn a circle to approve
Article collection I1, I2 ..., IC are respectively sampled I1, I2 ..., IC using the method for uniform sampling, generate training sample
Collect I ' 1, I ' 2 ..., I ' C, the article quantity of each sample set is equal, using the plagiarism probability of article as sample vector;
S42: the class splitting scheme of node classifier is as follows:
Assuming that it is respectively the classification in S1 and S2 that the positive counter-example class set that node classifier class divides, which closes respectively S1 and S2, N1 and N2,
Number, C=N1+N2 are total classification number that the node need to divide, XjIndicate jth class sample set, j=1,2 ..., C, XjSample
Number is nj, sample vector x;
1) all kinds of centers is calculated
2) it sets i as class splitting scheme number, for all splitting schemes, according to step 3), 4) calculates
3) center of positive example and counter-example class set S1 and S2 is calculated:
Calculate the Euclidean distance between the center of S1 and S2:
di S1S2=| | e1 i-e2 i||
4) center all kinds of in the average distance and S2 at center of the center all kinds of in S1 to S1 is calculated to the flat of the center of S2
Equal distance:
5) d is calculated according to the following formulai, the scheme being maximized is required scheme
di=dS1S2 i+dS1 i+dS2 i
According to node classifier class division methods presented hereinbefore, the class division side of each node classifier is designed top-downly
Case finally establishes complete decision tree;
S43: utilizing training sample set I1 ', I2 ' ..., IC ', be trained to each node classifier, ultimately forms complete
SVM decision tree classifier;
S44: using whole pixels of image to be classified as test sample collection, carrying out testing classification with SVM decision tree classifier,
Classification results map back image and realize image classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811467956.7A CN109635254A (en) | 2018-12-03 | 2018-12-03 | Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811467956.7A CN109635254A (en) | 2018-12-03 | 2018-12-03 | Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109635254A true CN109635254A (en) | 2019-04-16 |
Family
ID=66070663
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811467956.7A Pending CN109635254A (en) | 2018-12-03 | 2018-12-03 | Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109635254A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367874A (en) * | 2020-02-28 | 2020-07-03 | 北京神州绿盟信息安全科技股份有限公司 | Log processing method, device, medium and equipment |
CN111723208A (en) * | 2020-06-28 | 2020-09-29 | 西南财经大学 | Conditional classification tree-based legal decision document multi-classification method and device and terminal |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1804829A (en) * | 2006-01-10 | 2006-07-19 | 西安交通大学 | Semantic classification method for Chinese question |
US20080195577A1 (en) * | 2007-02-09 | 2008-08-14 | Wei Fan | Automatically and adaptively determining execution plans for queries with parameter markers |
CN101441620A (en) * | 2008-11-27 | 2009-05-27 | 温州大学 | Electronic text document plagiarism recognition method based on similar string matching distance |
CN101819601A (en) * | 2010-05-11 | 2010-09-01 | 同方知网(北京)技术有限公司 | Method for automatically classifying academic documents |
CN101826263A (en) * | 2009-03-04 | 2010-09-08 | 中国科学院自动化研究所 | Objective standard based automatic oral evaluation system |
CN103514170A (en) * | 2012-06-20 | 2014-01-15 | ***通信集团安徽有限公司 | Speech-recognition text classification method and device |
CN103544326A (en) * | 2013-11-14 | 2014-01-29 | 上海交通大学 | Chinese and English cross-language plagiarism recognition method based on characteristics and content of translations |
US20140223284A1 (en) * | 2013-02-01 | 2014-08-07 | Brokersavant, Inc. | Machine learning data annotation apparatuses, methods and systems |
CN105045825A (en) * | 2015-06-29 | 2015-11-11 | 中国地质大学(武汉) | Structure extended polynomial naive Bayes text classification method |
CN105447505A (en) * | 2015-11-09 | 2016-03-30 | 成都数之联科技有限公司 | Multilevel important email detection method |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN105956382A (en) * | 2016-04-26 | 2016-09-21 | 北京工商大学 | Traditional Chinese medicine constitution optimized classification method based on improved CART decision-making tree and fuzzy naive Bayes combined model |
CN107145514A (en) * | 2017-04-01 | 2017-09-08 | 华南理工大学 | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models |
CN107391772A (en) * | 2017-09-15 | 2017-11-24 | 国网四川省电力公司眉山供电公司 | A kind of file classification method based on naive Bayesian |
CN107908715A (en) * | 2017-11-10 | 2018-04-13 | 中国民航大学 | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion |
CN107977670A (en) * | 2017-10-09 | 2018-05-01 | 中国电子科技集团公司第二十八研究所 | Accident classification stage division, the apparatus and system of decision tree and bayesian algorithm |
US20180173847A1 (en) * | 2016-12-16 | 2018-06-21 | Jang-Jih Lu | Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation |
CN108763486A (en) * | 2018-05-30 | 2018-11-06 | 湖南写邦科技有限公司 | Paper duplicate checking method, terminal and storage medium based on terminal |
-
2018
- 2018-12-03 CN CN201811467956.7A patent/CN109635254A/en active Pending
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1804829A (en) * | 2006-01-10 | 2006-07-19 | 西安交通大学 | Semantic classification method for Chinese question |
US20080195577A1 (en) * | 2007-02-09 | 2008-08-14 | Wei Fan | Automatically and adaptively determining execution plans for queries with parameter markers |
CN101441620A (en) * | 2008-11-27 | 2009-05-27 | 温州大学 | Electronic text document plagiarism recognition method based on similar string matching distance |
CN101826263A (en) * | 2009-03-04 | 2010-09-08 | 中国科学院自动化研究所 | Objective standard based automatic oral evaluation system |
CN101819601A (en) * | 2010-05-11 | 2010-09-01 | 同方知网(北京)技术有限公司 | Method for automatically classifying academic documents |
CN103514170A (en) * | 2012-06-20 | 2014-01-15 | ***通信集团安徽有限公司 | Speech-recognition text classification method and device |
US20140223284A1 (en) * | 2013-02-01 | 2014-08-07 | Brokersavant, Inc. | Machine learning data annotation apparatuses, methods and systems |
CN103544326A (en) * | 2013-11-14 | 2014-01-29 | 上海交通大学 | Chinese and English cross-language plagiarism recognition method based on characteristics and content of translations |
CN105045825A (en) * | 2015-06-29 | 2015-11-11 | 中国地质大学(武汉) | Structure extended polynomial naive Bayes text classification method |
CN105447505A (en) * | 2015-11-09 | 2016-03-30 | 成都数之联科技有限公司 | Multilevel important email detection method |
CN105468713A (en) * | 2015-11-19 | 2016-04-06 | 西安交通大学 | Multi-model fused short text classification method |
CN105956382A (en) * | 2016-04-26 | 2016-09-21 | 北京工商大学 | Traditional Chinese medicine constitution optimized classification method based on improved CART decision-making tree and fuzzy naive Bayes combined model |
US20180173847A1 (en) * | 2016-12-16 | 2018-06-21 | Jang-Jih Lu | Establishing a machine learning model for cancer anticipation and a method of detecting cancer by using multiple tumor markers in the machine learning model for cancer anticipation |
CN107145514A (en) * | 2017-04-01 | 2017-09-08 | 华南理工大学 | Chinese sentence pattern sorting technique based on decision tree and SVM mixed models |
CN107391772A (en) * | 2017-09-15 | 2017-11-24 | 国网四川省电力公司眉山供电公司 | A kind of file classification method based on naive Bayesian |
CN107977670A (en) * | 2017-10-09 | 2018-05-01 | 中国电子科技集团公司第二十八研究所 | Accident classification stage division, the apparatus and system of decision tree and bayesian algorithm |
CN107908715A (en) * | 2017-11-10 | 2018-04-13 | 中国民航大学 | Microblog emotional polarity discriminating method based on Adaboost and grader Weighted Fusion |
CN108763486A (en) * | 2018-05-30 | 2018-11-06 | 湖南写邦科技有限公司 | Paper duplicate checking method, terminal and storage medium based on terminal |
Non-Patent Citations (4)
Title |
---|
CHANCHANA SORNSOONTORN ET AL: ""Using Document Classification to Improve the Performance of a Plagiarism Checker:a Case for Thai language documents"", 《2017 21ST INTERNATIONAL COMPUTER SCIENCE AND ENGINEERING CONFERENCE(ICSEC)》 * |
HADJ AHMED BOUARARA: ""Multi-Agents Machine Learning(MML) System for Plagiarism Detection"", 《MULTI-AGENTS MACHINE LEARNING(MML) SYSTEM FOR PLAGIARISM DETECTION》 * |
PATIL SANGITAB ET AL: ""Use of Support Vector Machine, Decision Tree and Naive Bayesian Techniques for Wind Speed Classification"", 《2011 INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS》 * |
王素红: ""基于SVM的抄袭检测研究"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111367874A (en) * | 2020-02-28 | 2020-07-03 | 北京神州绿盟信息安全科技股份有限公司 | Log processing method, device, medium and equipment |
CN111367874B (en) * | 2020-02-28 | 2023-11-14 | 绿盟科技集团股份有限公司 | Log processing method, device, medium and equipment |
CN111723208A (en) * | 2020-06-28 | 2020-09-29 | 西南财经大学 | Conditional classification tree-based legal decision document multi-classification method and device and terminal |
CN111723208B (en) * | 2020-06-28 | 2023-04-18 | 西南财经大学 | Conditional classification tree-based legal decision document multi-classification method and device and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107577785B (en) | Hierarchical multi-label classification method suitable for legal identification | |
Styawati et al. | Sentiment analysis on online transportation reviews using Word2Vec text embedding model feature extraction and support vector machine (SVM) algorithm | |
CN107798033B (en) | Case text classification method in public security field | |
Kuhkan | A method to improve the accuracy of k-nearest neighbor algorithm | |
CN110222744A (en) | A kind of Naive Bayes Classification Model improved method based on attribute weight | |
CN107861951A (en) | Session subject identifying method in intelligent customer service | |
CN107220365A (en) | Accurate commending system and method based on collaborative filtering and correlation rule parallel processing | |
CN107766418A (en) | A kind of credit estimation method based on Fusion Model, electronic equipment and storage medium | |
CN107391772A (en) | A kind of file classification method based on naive Bayesian | |
Manziuk et al. | Definition of information core for documents classification | |
CN112100512A (en) | Collaborative filtering recommendation method based on user clustering and project association analysis | |
CN106570076A (en) | Computer text classification system | |
CN109344227A (en) | Worksheet method, system and electronic equipment | |
CN109635254A (en) | Paper duplicate checking method based on naive Bayesian, decision tree and SVM mixed model | |
CN109783633A (en) | Data analysis service procedural model recommended method | |
Arbel et al. | Classifier evaluation under limited resources | |
KR20240023494A (en) | Monitoring method for laws and regulations based on artificial intelligence learning and program for this | |
WO2024131524A1 (en) | Depression diet management method based on food image segmentation | |
Zhang et al. | A classification performance measure considering the degree of classification difficulty | |
CN112417082B (en) | Scientific research achievement data disambiguation filing storage method | |
Řehůřek et al. | Automated classification and categorization of mathematical knowledge | |
CN105160358B (en) | A kind of image classification method and system | |
CN111708865A (en) | Technology forecasting and patent early warning analysis method based on improved XGboost algorithm | |
CN106775694A (en) | A kind of hierarchy classification method of software merit rating code product | |
CN108268458A (en) | A kind of semi-structured data sorting technique and device based on KNN algorithms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190416 |