CN108920492A - A kind of Web page classification method, system, terminal and storage medium - Google Patents

A kind of Web page classification method, system, terminal and storage medium Download PDF

Info

Publication number
CN108920492A
CN108920492A CN201810465784.3A CN201810465784A CN108920492A CN 108920492 A CN108920492 A CN 108920492A CN 201810465784 A CN201810465784 A CN 201810465784A CN 108920492 A CN108920492 A CN 108920492A
Authority
CN
China
Prior art keywords
classification model
training
text classification
sample
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810465784.3A
Other languages
Chinese (zh)
Other versions
CN108920492B (en
Inventor
张君晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shun Fei Mdt Infotech Ltd
Original Assignee
Guangzhou Shun Fei Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shun Fei Mdt Infotech Ltd filed Critical Guangzhou Shun Fei Mdt Infotech Ltd
Priority to CN201810465784.3A priority Critical patent/CN108920492B/en
Publication of CN108920492A publication Critical patent/CN108920492A/en
Application granted granted Critical
Publication of CN108920492B publication Critical patent/CN108920492B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of Web page classification method, system, terminal and storage medium, this method includes obtaining webpage link information;After the webpage link information that will acquire is input to textual classification model progress classification processing, the corresponding website classification results of the webpage link information are exported;Wherein, the textual classification model is to train obtained textual classification model based on Boosting integrated approach.The system includes obtaining module and processing module.The terminal includes memory for storing program and the processor that the method step is executed for loading procedure.By using the present invention, can quickly and accurately classify to webpage.The present invention can be widely applied in text classification field as a kind of Web page classification method, system, terminal and storage medium.

Description

Webpage classification method, system, terminal and storage medium
Technical Field
The present invention relates to data classification processing technologies, and in particular, to a method, a system, a terminal, and a storage medium for classifying web pages.
Background
Explanation of technical words:
f1 Score: f1 score is an index used for measuring the accuracy of the two classification models in statistics, and gives consideration to the accuracy and the recall rate of the classification models; in particular, the F1 score may be viewed as a weighted average of model accuracy and recall.
Boosting: boosting, which is a method used to improve the accuracy of weak classification algorithms, is performed by constructing a series of prediction functions and then combining them in a certain way into a prediction function.
The advertisement bidding system needs to process requests of billions level every day, wherein each bidding request contains page information, equipment information, user information and the like, and the information is landed on a server in a log form, and then through the analysis of an algorithm, required data is extracted and persisted in a database. However, sometimes the bid request lacks keywords and related descriptions of the requested page, and in order to solve this problem, a method commonly used in the industry at present is to extract links of the requested page and give the links of the requested page to crawlers to crawl information of these pages. However, with the advertisement bidding system, there are also the following problems: 1. under the condition of massive bidding requests, the data volume of the extracted request page is very huge, and the data volume of the result crawled from the request page is also very huge, so that the problem of high difficulty in data storage and management exists; 2. some categories of advertisement pages have poor delivery effect, so when crawlers of the advertisement pages need to be paused, a relatively complex flow is needed to remove the urls; 3. different advertisement pages have different requirements on the crawler capacity, so that the processing capacity of the crawler is difficult to balance and the bandwidth is difficult to reasonably utilize on the premise that the advertisement pages are not classified.
Disclosure of Invention
In order to solve the above technical problems, an object of the present invention is to provide a method, a system, a terminal and a storage medium for classifying web pages, which have high classification processing rate and accuracy.
The first technical scheme adopted by the invention is as follows: a webpage classification method comprises the following steps:
acquiring webpage link information;
inputting the obtained webpage link information into a text classification model for classification processing, and outputting a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
The second technical scheme adopted by the invention is as follows: a web page classification system comprising:
the acquisition module is used for acquiring webpage link information;
the processing module is used for inputting the acquired webpage link information into a text classification model for classification processing and then outputting a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
The third technical scheme adopted by the invention is as follows: a terminal, comprising:
at least one processor;
at least one memory for storing at least one program;
when the at least one program is executed by the at least one processor, the at least one processor may implement the method for classifying web pages according to the first aspect.
The fourth technical scheme adopted by the invention is as follows: a storage medium having stored therein processor-executable instructions for performing a method of classifying web pages as described in the first aspect when the processor-executable instructions are executed by a processor.
The invention has the advantages that: according to the method, the obtained webpage link information is input into a text classification model trained based on a Boosting integration method for classification, and then a site classification result corresponding to the webpage link information is output, so that classification storage of webpage data is facilitated, management and subsequent query processing are facilitated, information crawling processing is not directly performed on webpages with poor advertising page delivery effects, waste of resources is avoided, processing efficiency is improved, processing capacity of crawlers can be balanced, bandwidth is reasonably utilized, and downstream algorithm tasks are supported. In addition, the method adopts the text classification model trained based on the Boosting integration method to realize the webpage classification, so that the processing rate and the accuracy of the classification are very high.
Drawings
FIG. 1 is a flowchart illustrating the steps of a method for classifying web pages according to the present invention;
FIG. 2 is a schematic diagram of a text classification model used in a web page classification method according to the present invention;
FIG. 3 is a schematic diagram of a web page classification system according to the present invention;
fig. 4 is a schematic structural diagram of a terminal according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and preferred embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.
As shown in fig. 1, an embodiment of the present invention provides a method for classifying web pages, which includes the following steps:
s101, acquiring webpage link information;
specifically, in this embodiment, the web page is an advertisement page;
s102, inputting the acquired webpage link information into a text classification model for classification, and outputting a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
According to the method, the constructed text classification model trained based on the Boosting integration method is on line, advertisement page link information contained in bidding requests in real-time logs is classified by using the text classification model, site classification is carried out in real time, and therefore advertisement page links of different site categories can be handed to crawlers of corresponding categories for information crawling processing, classified storage of webpage data is facilitated, management, follow-up query processing and the like are facilitated, information crawling processing can be directly omitted for webpages of the categories with poor advertisement page launching effects, waste of resources is avoided, processing efficiency is improved, load capacity of balanced crawlers can be greatly improved, bandwidth is reasonably utilized, and downstream algorithm tasks are supported; in addition, the method adopts the text classification model trained based on the Boosting integration method to realize the webpage classification, so that the processing rate and the accuracy of the classification are very high.
In a preferred embodiment, the web page link information includes a website title keyword list corresponding to the web page link, where the keyword list includes at least one keyword.
In a preferred embodiment, the text classification model is a text classification model constructed by the following construction step S100:
and S1001, acquiring a training data set.
Specifically, as for the step S1001, it preferably includes:
s10011, data acquisition step: selecting data stored in the last r days from a persistence layer database as an original data set;
the data feature components (site, title, manual _ site _ cat, and predict _ site _ cat) of the original data set are respectively a site link site, a site title, a manually classified site category manual _ site _ cat, and a model predicted site category predict _ site _ cat, that is, an original data in the original data set includes four kinds of information, i.e., a site link, a site title, a manually classified site category, and a model predicted site category;
s10012, data preprocessing: removing original data of which the site is null and the title is null or both the preset _ site _ cat and the manual _ site _ cat are null;
s10013, Chinese word segmentation step: sequentially performing Chinese word segmentation and stop word removal processing on title to obtain a processed data set A, wherein a data characteristic component (cat: title _ keywords) of the processed data set A is a data (sample) in the data set A, and the data (sample) comprises a category cat of a site and a keyword list (site title keyword list) obtained after the stop word removal of the site title;
s10014, data set preparation: dividing the processed data set A obtained in the step S10013 into a training data set X and a testing data set TE according to a preset proportion, such as 3: 1;
specifically, the data of 3/4 in the processed data set a is divided as a training data set X, the data of 1/4 is divided as a test data set T; the text classification method comprises the steps that title _ keywords are used as input data, a cat is used as output data, namely, the title _ keywords in a training data set X are used as training input data, the cat in the training data set X is used as training output data, and a text classification model is trained by utilizing the training input data and the training output data; and the title _ keywords in the test data set T are used as test input data, the cate in the test data set T is used as test output data, and the test input data and the test output data are used for testing the text classification model obtained after training.
S1002, inputting the training data set X into a text classification model by using a Boosting integration method for training to obtain a trained text classification model H (X). As shown in fig. 2, it is a schematic diagram of a text classification model used in this embodiment, and the text classification model is implemented by classifying a plurality of basic classifiers H1、H2、……、HnAfter iterative training, carrying out integrated processing on the basic classifier obtained after each iterative training to obtain a classification model; and n is the total number of the basic classifiers, namely the number of the basic classifiers adopted in the text classification model.
Specifically, as for the step S1002, it preferably includes:
s10021, training a current basic classifier by using a current sub-sample set, and calculating the error rate of the basic classifier obtained after training;
the sub-sample set is obtained by distributing corresponding weights to the samples contained in the training data set X; preferably specificallyAfter corresponding weights are distributed to samples contained in the training data set X, N is randomly extracted from the training data set X1Taking the samples as a sub-sample set, wherein each sample has a corresponding weight; the initial weight of the sample is 1/m, i.e. in the first iteration, the weight of the sample distributed to X is 1/m, at this time, N randomly extracted from X1Taking the samples as a first subsample set; wherein m is the total number of samples contained in the training data set X;
specifically, in the training data set X, each sample has a corresponding weight, where the set of weight vectors is referred to as D, specifically, D1,D2,……,DTThe T (T ═ 1,2,3, …, T) th round weight vector sets are respectively, and for example, D1 is the weight of the sample in 1 st round iteration;
in the 1 st iteration, m samples exist in a training data set X, after the weight of each sample is initialized to be 1/m, a first sub-sample set S is obtained through random extraction1I.e. at this time, in iteration 1, S1The ith sample x iniThe corresponding weight is D1(xi)=1/m;
Then, the first subsample set S1Input to a first basic classifier H1Training is carried out, i.e. using the first set of subsamples S1For the first basic classifier H1Training is carried out, after one round of training, the first basic classifier H obtained after the training is finished is obtained1Calculating the error rate; if the current is the t-th iteration, the current t-th sub-sample set S is usedtInput to the current j (j ═ 1,2,3, …, n) th basic classifier HjTraining is carried out, namely, the current t-th sub-sample set S is utilizedtFor the current j basic classifier HjTraining is carried out, after one round of training, the jth basic classifier H obtained after the training is finished is subjected tojCalculating the error rate;
since the number of basic classifiers included in the text classification model is n, and the number of iteration rounds T is usually greater than n, the number of the iteration rounds T is larger than nWhen the iteration training of the (n + 1) th round is performed, the training of the 1 st basic classifier is returned, and in this case, the training of the basic classifier of the (n + 1) th round is equivalent to the training of the basic classifier H of the iteration training in the t-th round, that is, in the t-th round of iteration, the basic classifier H of the iteration training is performedjA basic classifier, referred to as the t-th round;
in a preferred embodiment, the error rate is calculated as follows:
in the formula, epsilon (t) is the error rate corresponding to the basic classifier of the tth round after training; if t is n +1, namely epsilon (t) is the error rate corresponding to the basic classifier of the n +1 th round after training;
Dt(xp) Is the sample x with the p-th classification result as error in the sub-sample setpI.e. in the training of the t iteration, the p-th classification result in the current sub-sample set is the wrong sample xpThe weight of (c); wherein p is ∈ [1,2, …, N]N is the number of samples with wrong classification results in the sub-sample set;
t belongs to [1,2, …, T ], and T is iteration round number;
htis the basic classifier of the t-th round;
ht(xp) Is a classification result predictor, specifically, it is using sample xpIn the training, htOutputting a classification result prediction value;
ypis the true value of the classification result;
[ht(xp)≠yp]=-1;
s10022, when the calculated error rate is converged to the threshold value range, ending the training, and executing the step S10026;
s10023, when the calculated error rate is not converged in the threshold range, executing the step S10024;
s10024, updating the weights of the samples contained in the current sub-sample set according to the calculated error rate, so as to increase the weight of the sample with the classification result as error;
specifically, updating the weight of each sample contained in the current sub-sample set according to the calculated error rate, and increasing the weight of the sample with the classification result of error;
in a preferred embodiment, the weights of the samples included in the current sub-sample set are updated, wherein the weight update formula is as follows:
wherein,αtthe weight corresponding to the basic classifier expressed as the t-th round after training;
k is the number of categories (i.e., the number of classification categories) output by the text classification model, and if the text classification model can realize classification of 3 categories, k is 3;
Ztis to makeA normalization factor of (c);
Dt(xi) Is the ith sample x in the subsetiI.e. the ith sample x in the current subset during the training of the t iterationiThe weight of (c); wherein i belongs to [1,2, … ], N1],N1The total number of samples of the sub-sample set;
Dt+1(xi) Is a sub-sampleCollect the ith sample xiThe weight of the (t + 1) th iteration, i.e. the ith sample x in the sub-sample setiAn updated weight;
ht(xi) Is a classification result predictor, specifically, it is using sample xiIn the training, htOutputting a classification result prediction value;
yiis the true value of the classification result;
from the above formula, for the sample with correct classification result, in the next iteration, i.e. the t +1 th iteration, the corresponding weight isAnd then [ h ]t(xi)=yi]1 is ═ 1; for the sample with the wrong classification result, in the next iteration, namely the t +1 th iteration, the corresponding weight isAnd then [ h ]t(xi)≠yi]=-1;
S10025, after the updated weight is distributed to the samples in the training data set, obtaining the next sub-sample set, inputting the next sub-sample set into the next basic classifier, and returning to execute the step S10021;
specifically, the updated sample weight is distributed to the corresponding sample in the training data set to replace the current weight of the corresponding sample in X, and N is randomly extracted from X again after updating1Obtaining a next sub-sample set by the samples, inputting the next sub-sample set into a next basic classifier, and training the next basic classifier by using the next sub-sample set to realize the next round of iterative training;
s10026, integrating a plurality of basic classifiers to obtain a trained text classification model;
specifically, after T rounds of iteration, when the error rate converges to within the threshold range, the model training process is ended, at this time, multiple rounds of iterative training on n basic classifiers are completed, and then the basic classifiers obtained by each round of iterative training are integrated, that is, the basic classifiers of the 1 st, 2 nd, 3 rd, … … th and T rounds are integrated to obtain the final required text classification model; the basic classifiers obtained by each iteration training are integrated to obtain the final required text classification model H (x), and the adopted integration formula is as follows:
wherein Y ∈ Y indicates that the prediction result belongs to the label set Y.
For the n basic classifiers used in the text classification model, they may be the same or different basic classifiers, and in order to further improve the classification accuracy, it is preferable to select a basic classifier with an error rate range of [0,1/k ] as the basic classifier needed to be used in the text classification model of this embodiment, where k is the number of classes (i.e., the number of classification classes) output by the text classification model; for example, the classification effect (i.e. the error rate range of the two basic classifiers, namely the SVM and the TextGrocery) meets the condition of being within [0,1/k ], so that the SVM and/or the TextGrocery can be selected as the basic classifiers; the experimental result shows that the integrated classifier obtained by integrating the SVM and/or the TextGrocery weak classifiers has better classification effect and higher accuracy than the SVM/TextGrocery alone, which shows that the classification effect of the integrated classifier is better than that of the single basic classifier. Therefore, the classification scheme of the invention can improve the accuracy of text classification by a Boosting integration method on the basis of single textGrocery.
In a preferred embodiment, the constructing step S100 further includes:
s1003, performing ten-fold cross validation on the text classification model H (X) after training in the step S1002 by using a training data set X;
specifically, after the training data set X is scattered into ten parts, ten-fold cross validation is performed on the text classification model H (X) by using the ten parts of data so as to validate the accuracy and stability of the model;
s1004, when the model H (x) passes the verification, testing the text classification model passing the verification by using a test data set TE;
specifically, the test data set TE is input into the text classification model after passing the verification, and the accuracy, the recall rate, and F1-score of the text classification model are calculated and compared with the accuracy, the recall rate, and F1-score of the plurality of basic classifiers described in the above step S10026, so as to test the accuracy of the model and ensure the accuracy of the model classification.
The webpage classification scheme can be applied to an advertisement bidding system for site classification of advertisement pages, can also be applied to other systems for site classification of webpages in other fields (such as games, shopping and the like), and has wide application range and high compatibility.
As shown in fig. 3, an embodiment of the present invention further provides a web page classification system, which includes:
an obtaining module 201, configured to obtain webpage link information;
the processing module 202 is configured to input the obtained webpage link information into a text classification model for classification processing, and output a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
In a preferred embodiment, the system further comprises a construction module for constructing the text classification model.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
As shown in fig. 4, an embodiment of the present invention further provides a terminal, including:
at least one processor 301;
at least one memory 302 for storing at least one program;
when executed by the at least one processor 301, the at least one program causes the at least one processor 301 to implement a method step of web page classification as described in the above method embodiments.
The contents in the foregoing method embodiments are all applicable to this terminal embodiment, the functions specifically implemented by this terminal embodiment are the same as those in the foregoing method embodiments, and the beneficial effects achieved by this terminal embodiment are also the same as those achieved by the foregoing method embodiments.
Embodiments of the present invention further provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform the steps of a method for classifying web pages as described in the above method embodiments.
The contents in the above method embodiments are all applicable to the present storage medium embodiment, the functions specifically implemented by the present storage medium embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present storage medium embodiment are also the same as those achieved by the above method embodiments.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A webpage classification method is characterized by comprising the following steps:
acquiring webpage link information;
inputting the obtained webpage link information into a text classification model for classification processing, and outputting a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
2. The method as claimed in claim 1, wherein the web page link information comprises a list of keywords corresponding to the title of the web page.
3. The method for classifying web pages according to claim 1 or 2, wherein the text classification model is a text classification model constructed by the following construction steps:
acquiring a training data set;
and inputting the training data set into the text classification model by using a Boosting integration method for training so as to obtain a trained text classification model.
4. The method of claim 3, wherein the constructing step further comprises:
performing ten-fold cross validation on the trained text classification model by using a training data set;
and after the model passes the verification, testing the verified text classification model by using the test data set.
5. The method for classifying web pages according to claim 3, wherein the step of inputting the training data set to the text classification model by using the Boosting integration method for training to obtain the trained text classification model specifically comprises:
s1, training the current basic classifier by using the current sub-sample set, and calculating the error rate of the basic classifier obtained after training;
s2, when the calculated error rate is converged to the threshold value range, ending the training and executing the step S6;
s3, when the calculated error rate is not converged in the threshold value range, executing the step S4;
s4, updating the weights of the samples contained in the current sub-sample set according to the calculated error rate, so as to increase the weight of the sample with the classification result as error;
s5, distributing the updated weight to the sample in the training data set, obtaining the next sub-sample set, inputting the next sub-sample set to the next basic classifier, and returning to execute the step S1;
s6, integrating a plurality of basic classifiers to obtain a trained text classification model;
the basic classifier adopted in the text classification model is a basic classifier with the self error rate range of [0,1/k ], and k is the number of classes output by the text classification model.
6. The method for classifying web pages according to claim 5, wherein the error rate is calculated as follows:
wherein epsilon (t) is the error rate corresponding to the basic classifier of the tth round after training; dt(xp) Is the sample x with the p-th classification result as error in the sub-sample setpWeight of the tth iteration; p is equal to [1,2, …, N ∈]N is the number of samples with wrong classification results in the sub-sample set; t is e [1,2, …, T]T is the number of iteration rounds; h istIs the basic classifier of the t-th round; h ist(xp) Is the classification result prediction value; y ispIs the true value of the classification result; [ h ] oft(xp)≠yp]=-1。
7. The method according to claim 5, wherein the weights of the samples included in the current subset are updated, and the weight update formula is as follows:
wherein,Ztis to makeA normalization factor of (c); dt(xi) Is the ith sample x in the subsetiWeight of the tth iteration; i ∈ [1,2, … ], N1],N1The total number of samples of the sub-sample set; h ist(xi) Is the classification result prediction value; y isiIs the true value of the classification result.
8. A system for classifying web pages, comprising:
the acquisition module is used for acquiring webpage link information;
the processing module is used for inputting the acquired webpage link information into a text classification model for classification processing and then outputting a site classification result corresponding to the webpage link information;
the text classification model is trained based on a Boosting integration method.
9. A terminal, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, cause the at least one processor to implement a method of classifying web pages as claimed in any one of claims 1 to 7.
10. A storage medium having stored therein processor-executable instructions, which when executed by a processor, are configured to perform a method of classifying web pages as claimed in any one of claims 1 to 7.
CN201810465784.3A 2018-05-16 2018-05-16 Webpage classification method, system, terminal and storage medium Active CN108920492B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810465784.3A CN108920492B (en) 2018-05-16 2018-05-16 Webpage classification method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810465784.3A CN108920492B (en) 2018-05-16 2018-05-16 Webpage classification method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN108920492A true CN108920492A (en) 2018-11-30
CN108920492B CN108920492B (en) 2021-04-09

Family

ID=64402649

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810465784.3A Active CN108920492B (en) 2018-05-16 2018-05-16 Webpage classification method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN108920492B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110825998A (en) * 2019-08-09 2020-02-21 国家计算机网络与信息安全管理中心 Website identification method and readable storage medium
CN111353803A (en) * 2018-12-24 2020-06-30 北京奇虎科技有限公司 Advertiser classification method and device and computing equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447161A (en) * 2015-11-26 2016-03-30 广东工业大学 Data feature based intelligent information classification method
CN107360200A (en) * 2017-09-20 2017-11-17 广东工业大学 A kind of fishing detection method based on classification confidence and web site features
CN107560850A (en) * 2017-08-26 2018-01-09 中南大学 Shafting fault recognition method based on Threshold Denoising and AdaBoost
CN107909396A (en) * 2017-11-11 2018-04-13 霍尔果斯普力网络科技有限公司 The anti-cheat monitoring method that a kind of Internet advertising is launched

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447161A (en) * 2015-11-26 2016-03-30 广东工业大学 Data feature based intelligent information classification method
CN107560850A (en) * 2017-08-26 2018-01-09 中南大学 Shafting fault recognition method based on Threshold Denoising and AdaBoost
CN107360200A (en) * 2017-09-20 2017-11-17 广东工业大学 A kind of fishing detection method based on classification confidence and web site features
CN107909396A (en) * 2017-11-11 2018-04-13 霍尔果斯普力网络科技有限公司 The anti-cheat monitoring method that a kind of Internet advertising is launched

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353803A (en) * 2018-12-24 2020-06-30 北京奇虎科技有限公司 Advertiser classification method and device and computing equipment
CN111353803B (en) * 2018-12-24 2024-04-05 三六零科技集团有限公司 Advertiser classification method and device and computing equipment
CN110825998A (en) * 2019-08-09 2020-02-21 国家计算机网络与信息安全管理中心 Website identification method and readable storage medium

Also Published As

Publication number Publication date
CN108920492B (en) 2021-04-09

Similar Documents

Publication Publication Date Title
US11669744B2 (en) Regularized neural network architecture search
CN108073568B (en) Keyword extraction method and device
US9449271B2 (en) Classifying resources using a deep network
CN110909164A (en) Text enhancement semantic classification method and system based on convolutional neural network
CN107346433A (en) A kind of text data sorting technique and server
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN113177700B (en) Risk assessment method, system, electronic equipment and storage medium
CN112785005A (en) Multi-target task assistant decision-making method and device, computer equipment and medium
CN108920492B (en) Webpage classification method, system, terminal and storage medium
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN111709225A (en) Event cause and effect relationship judging method and device and computer readable storage medium
WO2023050143A1 (en) Recommendation model training method and apparatus
CN117593077A (en) Prop combination recommendation method and device based on hypergraph neural network and computing equipment
CN110262906B (en) Interface label recommendation method and device, storage medium and electronic equipment
CN112989182A (en) Information processing method, information processing apparatus, information processing device, and storage medium
Saha et al. A large scale study of SVM based methods for abstract screening in systematic reviews
CN116089886A (en) Information processing method, device, equipment and storage medium
CN117009621A (en) Information searching method, device, electronic equipment, storage medium and program product
CN114707068A (en) Method, device, equipment and medium for recommending intelligence base knowledge
US20230063686A1 (en) Fine-grained stochastic neural architecture search
CN115700550A (en) Label classification model training and object screening method, device and storage medium
CN112507189A (en) Financial user portrait information extraction method and system based on BilSTM-CRF model
CN114637921B (en) Item recommendation method, device and equipment based on modeling accidental uncertainty
CN116628236B (en) Method and device for delivering multimedia information, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510665 Room 401, No.3, East Tangdong Road, Tianhe District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU SUNTENG INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 510665 b-420, ocean Creative Park, No.5, Tangdong East Road, Tianhe District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU SUNTENG INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510665 Room 401, No.3, East Tangdong Road, Tianhe District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU SUNTENG INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 510665 Room 401, 3 Tangdong East Road, Tianhe District, Guangzhou City, Guangdong Province

Applicant before: GUANGZHOU SUNTENG INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant