CN110147472B - Detection method and device for cheating sites and detection device for cheating sites - Google Patents

Detection method and device for cheating sites and detection device for cheating sites Download PDF

Info

Publication number
CN110147472B
CN110147472B CN201710576240.XA CN201710576240A CN110147472B CN 110147472 B CN110147472 B CN 110147472B CN 201710576240 A CN201710576240 A CN 201710576240A CN 110147472 B CN110147472 B CN 110147472B
Authority
CN
China
Prior art keywords
page
cheating
site
detected
sites
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710576240.XA
Other languages
Chinese (zh)
Other versions
CN110147472A (en
Inventor
李健
李毅
许静芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sogou Technology Development Co Ltd
Original Assignee
Beijing Sogou Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sogou Technology Development Co Ltd filed Critical Beijing Sogou Technology Development Co Ltd
Priority to CN201710576240.XA priority Critical patent/CN110147472B/en
Publication of CN110147472A publication Critical patent/CN110147472A/en
Application granted granted Critical
Publication of CN110147472B publication Critical patent/CN110147472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application provides a detection method and a detection device for a cheating site, and the detection method and the detection device for the cheating site are provided, wherein the detection method for the cheating site comprises the following steps: extracting page features of pages under the known cheating sites from retrieval logs and/or access logs of the known cheating sites; constructing a cheating detection model according to the cheating rule represented by the page characteristics, wherein the cheating detection model is used for detecting whether the website is cheated; and detecting whether the site to be detected is cheated according to the cheating detection model. By adopting the method and the device, the accuracy of the cheating detection result of the website can be improved.

Description

Detection method and device for cheating sites and detection device for cheating sites
Technical Field
The present application relates to the field of site detection technologies, and in particular, to a method and an apparatus for detecting a cheating site, and a computer-readable medium.
Background
At present, as the number of times of using the internet by users is more and more frequent, the situation of site cheating is more and more. The website cheating is that partial websites can also appear in the query result of the user in order to make the webpages which do not belong to the query result of the user, and the condition can be called the website cheating. Generally, a user queries a website to which a webpage belongs for cheating, which is mainly classified into content-based cheating, link cheating, cheating by a crawler and the like.
In the prior art, each webpage under a site is generally analyzed, and whether a site cheating condition exists is judged according to an analysis result.
Disclosure of Invention
The inventor finds in the research process that the prior art needs to rely on a cheating means adopted by the identified cheating site when analyzing the webpage, and if the webpage under one site adopts an unanalyzed cheating means, the prior art is not accurate enough when judging whether the site is cheated or not; in addition, a random sampling method is generally adopted for analyzing the web pages, and a large number of web pages which are not representative may also be used as analysis objects, so that the accuracy and recall of the cheating web page model trained in the prior art are insufficient.
The inventor also finds that, in the research process, for known cheating web pages, if the search logs for searching the web pages under the site and the access logs for accessing the web pages under the site, which are included in the historical search records of the search engine, can be utilized, so that the information such as the searched result web pages under the known cheating sites, the access frequency of the accessed web pages, the corresponding search words and the like is utilized to construct a cheating detection model, the cheating detection model can reflect the cheating rules of the cheating sites, and further accurate cheating detection is carried out on other sites; moreover, because the cheating detection model is established based on the retrieval log and the access log of the user in the search engine, the establishment of the model based on the user perspective is more uniform and representative.
Based on this, the present application provides a method for detecting a cheating site, which may include:
extracting page features of pages under the known cheating sites from retrieval logs and/or access logs of the known cheating sites;
constructing a cheating detection model according to the cheating rule represented by the page characteristics, wherein the cheating detection model is used for detecting whether the website is cheated;
and detecting whether the site to be detected is cheated according to the cheating detection model.
The extracting of the page features of the page under the known cheating site from the retrieval log and/or the access log of the known cheating site may include:
obtaining a retrieval log and/or an access log of the known cheating site, wherein the retrieval log comprises: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page;
and extracting text features and/or structural features of the retrieval result page and/or the access page as the page features.
Wherein, the extracting text features and/or structural features of the retrieval result page and/or the access page may include, as the page features:
extracting text information and/or title text information of each page from the retrieval result page and/or the access page as the text features; and
and extracting the text structural features and the title structural features of each page from the retrieval result page and/or the access page as the structural features.
Wherein, the constructing a cheating detection model according to the cheating rules represented by the page features may include:
respectively converting the page characteristics of the retrieval result page and/or the access page into retrieval characteristic vectors and/or access characteristic vectors;
and constructing a cheating detection model according to the retrieval feature vector and/or the access feature vector.
Wherein, the basis the cheating detection model detects whether the website of waiting to detect cheating, can include:
acquiring a page to be detected of a station to be detected;
extracting the page features to be detected of the page to be detected, and converting the page features to be detected into the feature vectors to be detected of the station to be detected;
and detecting whether the station to be detected is a cheating station or not according to whether the feature vector to be detected accords with the page cheating rule or not.
Wherein the known cheating site may be determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites; and
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
Wherein the method may further comprise:
and performing right reduction or deletion processing on the station to be detected with the detection result of cheating. The application also provides a device for ensuring the realization and the application of the method in practice.
The detection device of cheating website that this application embodiment provided includes:
the extraction unit is used for extracting the page characteristics of the page under the known cheating site from the retrieval log and/or the access log of the known cheating site;
the model construction unit is used for constructing a cheating detection model according to the cheating rule represented by the page characteristics, and the cheating detection model is used for detecting whether the website is cheated or not;
and the detection unit is used for detecting whether the site to be detected is cheated according to the cheating detection model.
Wherein the extracting unit may include:
an obtaining subunit, configured to obtain a retrieval log and/or an access log of the known cheating site, where the retrieval log includes: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page; and the number of the first and second groups,
and the extraction subunit is used for extracting the text features and/or the structural features of the retrieval result page and/or the access page as the page features.
Wherein the extraction subunit may include:
the information extraction subunit is used for extracting the text information and/or the title text information of each page from the retrieval result page and/or the access page as the text characteristics; and
and the structure extraction subunit is used for extracting the text structure characteristics and the title structure characteristics of each page from the retrieval result page and/or the access page as the structure characteristics.
Wherein the model construction unit may include:
the conversion module is used for converting the page characteristics of the search result page and/or the access page into a search characteristic vector and/or an access characteristic vector respectively; and the number of the first and second groups,
and the construction subunit is used for constructing the cheating detection model according to the retrieval feature vector and/or the access feature vector.
Wherein the detection unit may include:
the acquisition subunit is used for acquiring a to-be-detected page of the to-be-detected station;
the extraction subunit is used for extracting the page features to be detected of the page to be detected and converting the page features to be detected into the feature vectors to be detected of the station to be detected;
and the detection subunit is used for detecting whether the station to be detected is a cheating station or not according to whether the characteristic vector to be detected accords with the cheating law of the page or not.
Wherein the known cheating site may be determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites;
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
Wherein the apparatus may further comprise:
and the cheating processing unit is used for performing right reduction or deletion processing on the station to be detected with the cheating detection result.
The present application also provides a detection apparatus for a cheating site, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs including instructions for:
extracting page features of pages under the known cheating sites from retrieval logs and/or access logs of the known cheating sites;
constructing a cheating detection model according to the cheating rule represented by the page characteristics, wherein the cheating detection model is used for detecting whether the website is cheated;
and detecting whether the site to be detected is cheated according to the cheating detection model.
The present application also provides a computer-readable medium having stored thereon instructions, which, when executed by one or more processors, cause an apparatus to perform a method of detecting a cheating site according to one or more of the preceding claims.
The extracting of the page features of the page under the known cheating site from the retrieval log and/or the access log of the known cheating site may specifically include:
obtaining a retrieval log and/or an access log of the known cheating site, wherein the retrieval log comprises: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page; and
and extracting text features and/or structural features of the retrieval result page and/or the access page as the page features.
The extracting text features and/or structural features of the retrieval result page and/or the access page may specifically include, as the page features:
extracting text information and/or title text information of each page from the retrieval result page and/or the access page as the text features; and
and extracting the text structural features and the title structural features of each page from the retrieval result page and/or the access page as the structural features.
The method includes the steps of establishing a cheating detection model according to a cheating rule represented by the page features, and specifically includes the following steps:
respectively converting the page characteristics of the retrieval result page and/or the access page into retrieval characteristic vectors and/or access characteristic vectors;
and constructing a cheating detection model according to the retrieval feature vector and/or the access feature vector.
Wherein, the basis the cheating detection model detects whether the website of waiting to detect cheating specifically can include:
acquiring a page to be detected of a station to be detected;
extracting the page features to be detected of the page to be detected, and converting the page features to be detected into the feature vectors to be detected of the station to be detected; and the number of the first and second groups,
and detecting whether the station to be detected is a cheating station or not according to whether the feature vector to be detected accords with the page cheating rule or not.
Wherein the known cheating site may be determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites; and
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
Wherein the apparatus may also be configured to execute the one or more programs by the one or more processors including instructions for:
and performing right reduction or deletion processing on the station to be detected with the detection result of cheating.
In the embodiment of the application, for the known cheating site, the page characteristics of each page or part of the pages under the known cheating site are extracted from the retrieval log and/or the access log stored by the search engine, so that a cheating detection model is constructed according to the cheating characteristics represented by the extracted page characteristics, and therefore, when whether the detected site is cheated or not is detected based on the cheating detection model, the cheating detection model can reflect the cheating characteristics of the cheating site on the pages, so that more accurate cheating detection can be performed on other sites; moreover, because the cheating detection model is established based on the retrieval log and the access log of the user in the search engine, the cheating site is more uniform and representative when the cheating detection model is established from the perspective of the user.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is an exemplary flow chart of an embodiment of a method of detection of a cheating site of the present application;
FIG. 2 is a block diagram of an exemplary configuration of an embodiment of a detection apparatus for a cheating site according to the present application;
FIG. 3 is a block diagram of a detection apparatus 800 for a cheating site shown herein according to an exemplary embodiment;
fig. 4 is a schematic structural diagram of a server in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application is operational with numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multi-processor apparatus, distributed computing environments that include any of the above devices or equipment, and the like.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
Referring to fig. 1, a flowchart of an embodiment of a detection method for a cheating site in the present application is shown, in the embodiment of the present application, a detection process for the cheating site may include a construction process for a cheating model and a site detection process, the construction process for the cheating model includes steps 101 to 102, the site detection process includes step 103, and the overall process of the present embodiment includes steps 101 to 104 as follows:
step 101: and extracting page features of pages under the known cheating sites from retrieval logs and/or access logs of the known cheating sites.
In this embodiment, when a cheating site is known, the search log, the access log, and the like of the cheating site stored in the database can be used to extract the page features of each page or a part of pages of the cheating site, so that the page features of the cheating site can be subsequently used to construct a cheating detection model.
Specifically, assume that a search engine's database stores all retrieval logs and access logs for site "www.ABCD.com".
The retrieval log is used for representing retrieval information of each page under the site, which is retrieved by a user; retrieving the log may include: the search words input by the user during each search and the corresponding search result page.
The access log is used for representing information of clicking an access page after each retrieval by the user, or the information of the access page by the user in a page recommendation mode and the like; the access log may include: the access page information of the user and the corresponding access times of each access page, for example, the access times of the user to the page "www.ABCD.com/890/hty" at the site "www.ABCD.com" is 10, and the access times to the page "www.ABCD.com/855555/ef" at the site "www.ABCD.com" is 100, and so on.
After the retrieval logs and the access logs of the known cheating sites are obtained, the retrieval result pages and the access pages are obtained from the retrieval logs and the access logs, and the page features of the retrieval result pages and the page features of the access pages are extracted. The page features may include text features and structural features in the page, the text features are used for characterizing text information in the page, and the structural features are used for characterizing distribution characteristics of various parts of the page on the structure. For example, the text features may include: whether the text information in the page includes illegal words, whether a large number of repeated words exist, whether the sentence is not smooth enough, whether the context is irrelevant, and the like. The structural features may indicate the distribution of each part in the page, for example, where the title in the page is located in the page, whether each part of the page is spliced, whether the length of the page exceeds a preset length threshold, whether the distribution between a main frame for displaying the text and other functional frames is reasonable, or whether the advertisement distribution is reasonable or covers the text or the title of the page.
Of course, in practical applications, only the text feature may be used as the page feature, only the structure feature may be used as the page feature, and both the text feature and the structure feature may be used as the page feature. In addition, in this embodiment, the page features may be extracted only for the search result page in the search log, or only for the access page in the access log, or both the search result page and the access page, and those skilled in the art may autonomously set the page features according to requirements in an actual scene. For the retrieval log or the access log, since there may be many retrieval result pages or access pages, in practical applications, a part of the retrieval result pages or a part of the access pages may also be randomly selected to perform the extraction of the page features, which is not limited in the embodiment of the present application.
In practical applications, the known cheating site can be determined through an exemplary manner from step a1 to step A3 as follows:
step A1: and acquiring a site set to be determined whether to cheat.
In this embodiment, when detecting whether a site is cheated, the sites in each site set to be determined whether to be cheated may be clustered first, and then manual labeling is performed on the clustered sites, that is, each site is labeled as a cheated site or a normal site, so as to obtain a detection result whether each site is cheated.
Step A2: and clustering all the sites in the site set to obtain various clustered sites.
For each site acquired in step a1, extracting the features of each web page of each site according to the web page included in each site and the search term corresponding to the web page, and converting the features into corresponding feature vectors. Specifically, the process of extracting the web page features of the web page and transforming the feature vectors may refer to the following detailed description of step 102.
After the feature vectors of the web pages at each site are obtained, the feature vectors may be clustered, which is a process of dividing a set of physical or abstract objects into a plurality of classes composed of similar objects. For example, in this step, a k-means clustering algorithm for clustering samples may be used for clustering, and in practical applications, the feature vectors of the web pages at the cheating sites generally converge into one type, while the feature vectors of the web pages at the normal sites converge into another type.
Step A3: and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
After the two clustered sites are obtained, the two sites can be manually marked, that is, which one is a cheating site and which one is a normal site is manually marked, and then whether each site is a cheating site or a normal site is determined according to a manual marking result, and a site with a manual marking result of a cheating site is used as a known cheating site in step 101.
In addition to the above manner of determining known cheating sites by clustering unknown sites, in the embodiment of the present application, the cheating sites detected in the subsequent step 103 may also be used as known cheating sites, so that the cheating detection model established in the step 102 is further updated by performing page feature extraction on new known cheating sites; of course, the cheating sites detected in step 103 may also be subjected to manual annotation verification, and if the cheating sites detected in step 103 are also cheating sites as a result of the manual annotation, the cheating sites are further updated as known cheating sites to the established cheating detection model.
After the page features of the known cheating sites are extracted in step 101, step 102 is then entered:
step 102: and constructing a cheating detection model according to the cheating rule represented by the page characteristics, wherein the cheating detection model is used for detecting whether the website is cheated.
According to the cheating rules of the known cheating sites represented by the page features obtained in the step 101, a supervised machine learning method can be adopted to train a classifier as a cheating detection model, and the cheating detection model is used for detecting whether other sites are cheating sites or not.
Specifically, the step 102 may include the following steps B1 to B2 in the implementation process:
step B1: and respectively converting the page characteristics of the retrieval result page and/or the access page into a retrieval characteristic vector and/or an access characteristic vector.
In this step, the page features of the retrieval result page and the page features of the access page may be converted into a retrieval feature vector and an access feature vector of the known cheating site, respectively. In practical application, the model is built only by using the retrieval feature vector, or the model is built only by using the access feature vector, or the model is built by using both the retrieval feature vector and the access feature vector.
Specifically, in the process of converting the feature vector, the feature value corresponding to each page feature in each web page of the website may be obtained first, and then the feature value ranges to which the feature values corresponding to all pages of each page feature in the known cheating website belong are obtained through statistics. For example, for the structural feature of the page length, the numerical value of the web page a is obtained as the feature value corresponding to the page length in the web page a, the numerical value of the web page B is obtained as the feature value corresponding to the page length in the web page B, and so on until the feature values corresponding to the page lengths in all the web pages at the site are obtained.
Then, the characteristic values corresponding to the page lengths in the web pages are counted to determine the characteristic value ranges of the corresponding characteristic values of all the web pages with the page lengths under the known cheating sites. Assuming that after the feature values corresponding to the page length in each web page are obtained, the maximum value of the corresponding feature values is 1024 pixels through statistics, and the minimum value of the corresponding feature values is 268 pixels, it can be determined that the range of the corresponding feature values of the page length in each web page of the known cheating site is 268 pixels to 1024 pixels, and the range of the feature values is converted into binary values which can be identified by a computer, so that the corresponding vector values are obtained, for example, the corresponding range of the vector values is 000100-111111.
Assuming that 100 page features are found in the total statistics of all the web pages in the known cheating site, for a numerical page feature, for example, the value of the page feature of the page length is 268 or 1024, the sum of the numerical values of the page features of the web pages in the known cheating site can be used as the feature value of the page feature of the known cheating site, and then the feature value is converted into a binary value, so that the nth vector value of the feature vector of the 1N-dimension can be obtained, wherein N is an integer greater than zero. For example, if the first vector value of the feature vector of the cheating site is known to correspond to the page length, which is the feature value of the page feature, the sum value 8534 is obtained after the page lengths of all the web pages are accumulated, and the first vector value is the binary value converted by "8534".
For a non-numerical page feature, such as a page feature with image definition, whose feature values are "clear", "normal", and "unclear", respectively, one skilled in the art can use binary values "2", "1", and "0" to respectively represent the three feature values "clear", "normal", and "unclear". For example, the second vector value of the feature vector of the known cheating site is used to represent the page feature of image definition, and if 5 webpages a to E exist under the known cheating site, the second vector value to the sixth vector value of the feature vector of the known cheating site represent the image definitions of the 5 webpages respectively. For example, it is known that 2 nd to 6 th vector values of feature vectors of a cheating site are {0, 2, 1, 2}, respectively, and a preset webpage sequence is an alphabetical sequence from a to E, which indicates that the definition of a webpage a is unclear, the definition of a webpage B is clear, the definition of a webpage C is normal, the definition of a webpage D is normal, and the definition of a webpage E is clear.
By analogy, according to whether the feature value of the page feature of each webpage under the known cheating site is numerical or not and the number of the webpages under the known cheating site, a 1-dimensional N-dimensional feature vector corresponding to the known cheating site is finally obtained. Of course, the above-mentioned manner is merely illustrative and should not be construed as a limitation of the embodiments of the present application.
In practical application, the number of page features in the feature vector corresponding to each web page directly affects the accuracy and speed of model training, and the feature vector generated by the method can only contain important page features, so that the feature vector can effectively improve the subsequent training and retrieval efficiency through lower dimensionality.
Step B2: and constructing a cheating detection model according to the retrieval feature vector and/or the access feature vector.
After the page features of each webpage under the known cheating site are converted into the feature vectors, two groups of training data can be obtained, wherein one group is a retrieval feature vector set formed by the retrieval feature vectors of the known cheating site, and the other group is an access feature vector set formed by the access feature vectors of the known cheating site. When the cheating detection models are constructed, two cheating detection models can be constructed according to a retrieval feature vector set and an access feature vector set respectively; a cheating detection model may also be constructed.
Taking the construction of a cheating detection model as an example, because an access page is a page that a user clicks and views, the weight of each access feature Vector in an access feature Vector set can be set to be larger, each access feature Vector and retrieval feature Vector are respectively used as input objects, a detection result is used as an expected output value (also called a supervision signal), and a supervised Machine learning method such as a k-nearest neighbor (k-nearest neighbor) or Support Vector Machine (SVM) method is adopted to train and obtain the cheating web page detection model. The desired output value may be "detection result is cheat" or "detection result is a probability of cheat is 100%".
The above steps 101 to 102 are processes for constructing a cheating detection model in the embodiment of the present application, and the following step 103 is executed when it is necessary to detect whether another site is cheated after the cheating detection model is constructed.
Step 103: and detecting whether the site to be detected is cheated according to the cheating detection model.
After the cheating webpage detection model is trained, whether other sites to be detected are cheated or not can be detected by using the cheating webpage detection model.
Specifically, the process of detecting whether the website to be detected is cheated may include steps C1 to C3:
step C1: and acquiring a page to be detected of the station to be detected.
Firstly, each page to be detected under a station to be detected is obtained, or a part of pages under the station to be detected is randomly obtained to be used as pages to be detected. In practical application, all pages under the site to be detected can be obtained for detection, and if the number of the pages to be detected is too large, part of the pages can be extracted from the pages, for example, 60% of the pages are taken as the pages to be detected, and when part of the pages are extracted, the percentage of extracting all the pages can be set by a person skilled in the art.
Step C2: and extracting the page features to be detected of the page to be detected, and converting the page features to be detected into the feature vectors to be detected of the station to be detected.
And then extracting page features of the page to be detected, which can comprise text features, structural features and the like, and converting the extracted page features to be detected into a feature vector to be detected of the site to be detected. The extraction of the page features may refer to the description in step 101, and the process of converting the feature vectors may refer to the description in step B1, which is not described herein again.
Step C3: and detecting whether the station to be detected is a cheating station or not according to whether the feature vector to be detected accords with the page cheating rule or not.
And taking the feature vector to be detected obtained in the step C2 as the input of the cheating detection model constructed in the step 102, so as to obtain an output result, namely that the station to be detected is a cheating station or not. In practical applications, according to different adopted supervision methods, the output of the cheating detection model may be directly a result of whether the station to be detected is a cheating station or not, or may be a predicted probability of the station to be detected being a cheating station, for example, the probability of the station to be detected being a cheating station is 80%, in this case, a skilled person in the art may preset a probability judgment threshold, for example, 70%, and if the probability value output by the cheating detection model is greater than the probability judgment threshold, the station to be detected is determined to be a cheating station.
After detecting whether other sites are cheating in step 103, the following step 104 may be selected.
Step 104: and performing right reduction or deletion processing on the station to be detected with the detection result of cheating.
In this embodiment, if the site to be detected is a cheating site, in order to reduce the possibility that each page under the site to be detected is retrieved by a user, the authority of the site to be detected may be reduced, or each page under the site to be detected may be directly deleted.
In the embodiment of the application, for the known cheating site, the page characteristics of each page or part of the pages under the known cheating site are extracted from the retrieval log and/or the access log stored by the search engine, so that a cheating detection model is constructed according to the cheating characteristics represented by the extracted page characteristics, and therefore, when whether the detected site is cheated or not is detected based on the cheating detection model, the cheating detection model can reflect the cheating characteristics of the cheating site on the pages, so that more accurate cheating detection can be performed on other sites; moreover, because the cheating detection model is established based on the retrieval log and the access log of the user in the search engine, the cheating site is more uniform and representative when the cheating detection model is established from the perspective of the user.
For simplicity of explanation, the foregoing method embodiments are described as a series of acts or combinations, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may occur in other orders or concurrently with other steps based on the disclosure herein. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Corresponding to the method provided by the embodiment of the method for detecting a cheating site in the present application, referring to fig. 2, the present application further provides an embodiment of a device for detecting a cheating site, and in this embodiment, the device may include:
the extracting unit 201 is configured to extract a page feature of a page under a known cheating site from a retrieval log and/or an access log of the known cheating site.
Wherein the extracting unit 201 may include:
an obtaining subunit, configured to obtain a retrieval log and/or an access log of the known cheating site, where the retrieval log includes: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page; and the extraction subunit is used for extracting the text features and/or the structural features of the retrieval result page and/or the access page as the page features.
Wherein the extraction subunit may include:
the information extraction subunit is used for extracting the text information and/or the title text information of each page from the retrieval result page and/or the access page as the text characteristics; and the structure extraction subunit is used for extracting the text structure characteristics and the title structure characteristics of each page from the retrieval result page and/or the access page as the structure characteristics.
Wherein the known cheating site may be determined by:
acquiring a site set to be determined whether to cheat; clustering all the sites in the site set to obtain various clustered sites; and determining the sites with cheating results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
The model building unit 202 is configured to build a cheating detection model according to the cheating rule represented by the page feature, where the cheating detection model is used to detect whether a site is cheated.
Wherein the model building unit 202 may include:
the conversion module is used for converting the page characteristics of the search result page and/or the access page into a search characteristic vector and/or an access characteristic vector respectively; and the construction subunit is used for constructing the cheating detection model according to the retrieval feature vector and/or the access feature vector.
And the detection unit 203 is used for detecting whether the site to be detected is cheated according to the cheating detection model.
Wherein the detecting unit 203 may include:
the acquisition subunit is used for acquiring a to-be-detected page of the to-be-detected station; the extraction subunit is used for extracting the page features to be detected of the page to be detected and converting the page features to be detected into the feature vectors to be detected of the station to be detected; and the detection subunit is used for detecting whether the station to be detected is a cheating station or not according to whether the eigenvector to be detected accords with the cheating law of the page or not.
Wherein the apparatus may further comprise:
and the cheating processing unit 204 is configured to perform right reduction or deletion processing on the station to be detected whose detection result is cheating.
It can be seen that, in the embodiment of the application, for a known cheating site, the page features of each page or part of the pages under the known cheating site are extracted from the retrieval log and/or the access log stored by the search engine, so that a cheating detection model is constructed according to the cheating rules represented by the extracted page features, and therefore, when whether the site to be detected is cheated is detected based on the cheating detection model, the cheating detection model can reflect the cheating rules of the cheating site, and thus, more accurate cheating detection is performed on other sites; in addition, because the cheating detection model is established based on the retrieval log and the access log of the user in the search engine, the establishment of the model from the perspective of the user is more uniform and representative, and the unknown cheating types can be accurately detected.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 3 is a block diagram illustrating a detection apparatus 800 for a cheating site according to an exemplary embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.
Referring to fig. 3, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.
The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing elements 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.
The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.
The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.
The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed state of the device 800, the relative positioning of the components, such as a display and keypad of the apparatus 800, the sensor assembly 814 may also detect a change in position of the apparatus 800 or a component of the apparatus 800, the presence or absence of user contact with the apparatus 800, orientation or acceleration/deceleration of the apparatus 800, and a change in temperature of the apparatus 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the device 800 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer-readable storage medium having instructions therein, which when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of detecting a cheating site, the method comprising: extracting page features of pages under the known cheating sites from retrieval logs and/or access logs of the known cheating sites; constructing a cheating detection model according to the cheating rule represented by the page characteristics, wherein the cheating detection model is used for detecting whether the website is cheated; and detecting whether the site to be detected is cheated according to the cheating detection model.
The extracting of the page features of the page under the known cheating site from the retrieval log and/or the access log of the known cheating site may specifically include:
obtaining a retrieval log and/or an access log of the known cheating site, wherein the retrieval log comprises: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page; and
and extracting text features and/or structural features of the retrieval result page and/or the access page as the page features.
The extracting text features and/or structural features of the retrieval result page and/or the access page may specifically include, as the page features:
extracting text information and/or title text information of each page from the retrieval result page and/or the access page as the text features; and
and extracting the text structural features and the title structural features of each page from the retrieval result page and/or the access page as the structural features.
The method includes the steps of establishing a cheating detection model according to a cheating rule represented by the page features, and specifically includes the following steps:
respectively converting the page characteristics of the retrieval result page and/or the access page into retrieval characteristic vectors and/or access characteristic vectors;
and constructing a cheating detection model according to the retrieval feature vector and/or the access feature vector.
Wherein, the basis the cheating detection model detects whether the website of waiting to detect cheating specifically can include:
acquiring a page to be detected of a station to be detected;
extracting the page features to be detected of the page to be detected, and converting the page features to be detected into the feature vectors to be detected of the station to be detected; and the number of the first and second groups,
and detecting whether the station to be detected is a cheating station or not according to whether the feature vector to be detected accords with the page cheating rule or not.
Wherein the known cheating site may be determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites; and
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
Wherein the device 800 may also be configured to execute the one or more programs by the one or more processors including instructions for:
and performing right reduction or deletion processing on the station to be detected with the detection result of cheating.
Fig. 4 is a schematic structural diagram of a server in an embodiment of the present invention. The server 1900 may vary widely by configuration or performance and may include one or more Central Processing Units (CPUs) 1922 (e.g., one or more processors) and memory 1932, one or more storage media 1930 (e.g., one or more mass storage devices) storing applications 1942 or data 1944. Memory 1932 and storage medium 1930 can be, among other things, transient or persistent storage. The program stored in the storage medium 1930 may include one or more modules (not shown), each of which may include a series of instructions operating on a server. Still further, a central processor 1922 may be provided in communication with the storage medium 1930 to execute a series of instruction operations in the storage medium 1930 on the server 1900.
The server 1900 may also include one or more power supplies 1926, one or more wired or wireless network interfaces 1950, one or more input-output interfaces 1958, one or more keyboards 1956, and/or one or more operating systems 1941, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (16)

1. A method for detecting a cheating site, comprising:
extracting the page characteristics of each page in at least one page under the known cheating site from a retrieval log and/or an access log of the known cheating site;
converting the page characteristics of each page in the at least one page into the vector characteristics of the known cheating sites;
constructing a cheating detection model according to the vector characteristics of the cheating sites, wherein the cheating detection model is used for detecting whether the sites cheat or not;
and detecting whether the site to be detected is cheated according to the cheating detection model.
2. The method of claim 1, wherein the extracting the page features of the page below the known cheating site from the retrieval log and/or the access log of the known cheating site comprises:
obtaining a retrieval log and/or an access log of the known cheating site, wherein the retrieval log comprises: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page;
and extracting text features and/or structural features of the retrieval result page and/or the access page as the page features.
3. The method according to claim 2, wherein the extracting text features and/or structural features of the search result page and/or the access page comprises, as the page features:
extracting text information and/or title text information of each page from the retrieval result page and/or the access page as the text features; and
and extracting the text structural features and the title structural features of each page from the retrieval result page and/or the access page as the structural features.
4. The method of claim 3, wherein the constructing the cheat detection model based on the vector features of the cheating sites comprises:
respectively converting the page characteristics of the retrieval result page and/or the access page into retrieval characteristic vectors and/or access characteristic vectors;
and constructing a cheating detection model according to the retrieval feature vector and/or the access feature vector.
5. The method as claimed in claim 4, wherein the detecting whether the site to be detected is cheated according to the cheating detection model comprises:
acquiring a page to be detected of a station to be detected;
extracting the page features to be detected of the page to be detected, and converting the page features to be detected into the feature vectors to be detected of the station to be detected;
and detecting whether the station to be detected is a cheating station or not according to whether the feature vector to be detected accords with the page cheating rule or not.
6. The method of claim 1, wherein the known cheating site is determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites;
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
7. The method of claim 1, further comprising:
and performing right reduction or deletion processing on the station to be detected with the detection result of cheating.
8. A detection apparatus for a cheating site, comprising:
the extraction unit is used for extracting the page characteristics of each page in at least one page under the known cheating site from the retrieval log and/or the access log of the known cheating site;
the model building unit is used for converting the page characteristics of each page in the at least one page into the vector characteristics of the known cheating site; constructing a cheating detection model according to the vector characteristics of the cheating sites, wherein the cheating detection model is used for detecting whether the sites cheat or not;
and the detection unit is used for detecting whether the site to be detected is cheated according to the cheating detection model.
9. The apparatus of claim 8, wherein the extracting unit comprises:
an obtaining subunit, configured to obtain a retrieval log and/or an access log of the known cheating site, where the retrieval log includes: the search term and the search result page corresponding to the search term, the access log comprises: the access page of the user and the access times of each access page; and the number of the first and second groups,
and the extraction subunit is used for extracting the text features and/or the structural features of the retrieval result page and/or the access page as the page features.
10. The apparatus of claim 9, wherein the extracting subunit comprises:
the information extraction subunit is used for extracting the text information and/or the title text information of each page from the retrieval result page and/or the access page as the text characteristics; and
and the structure extraction subunit is used for extracting the text structure characteristics and the title structure characteristics of each page from the retrieval result page and/or the access page as the structure characteristics.
11. The apparatus of claim 10, wherein the model building unit comprises:
the conversion subunit is used for respectively converting the page characteristics of the retrieval result page and/or the access page into retrieval characteristic vectors and/or access characteristic vectors; and the number of the first and second groups,
and the construction subunit is used for constructing the cheating detection model according to the retrieval feature vector and/or the access feature vector.
12. The apparatus of claim 11, wherein the detection unit comprises:
the acquisition subunit is used for acquiring a to-be-detected page of the to-be-detected station;
the extraction subunit is used for extracting the page features to be detected of the page to be detected and converting the page features to be detected into the feature vectors to be detected of the station to be detected;
and the detection subunit is used for detecting whether the station to be detected is a cheating station or not according to whether the characteristic vector to be detected accords with the cheating law of the page or not.
13. The apparatus of claim 8, wherein the known cheating site can be determined by:
acquiring a site set to be determined whether to cheat;
clustering all the sites in the site set to obtain various clustered sites;
and determining the sites with cheating manual marking results in the various sites as the known cheating sites, wherein the manual marking results are used for indicating whether the various sites are cheating sites.
14. The apparatus of claim 8, wherein the apparatus further comprises:
and the cheating processing unit is used for performing right reduction or deletion processing on the station to be detected with the cheating detection result.
15. An apparatus for detection of a cheating site, comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured for execution by one or more processors the one or more programs include instructions for:
extracting the page characteristics of each page in at least one page under the known cheating site from a retrieval log and/or an access log of the known cheating site;
converting the page characteristics of each page in the at least one page into the vector characteristics of the known cheating sites; constructing a cheating detection model according to the vector characteristics of the cheating sites, wherein the cheating detection model is used for detecting whether the sites cheat or not;
and detecting whether the site to be detected is cheated according to the cheating detection model.
16. A computer-readable medium having stored thereon instructions which, when executed by one or more processors, cause an apparatus to perform a method of detecting a cheating site as recited in one or more of claims 1-7.
CN201710576240.XA 2017-07-14 2017-07-14 Detection method and device for cheating sites and detection device for cheating sites Active CN110147472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710576240.XA CN110147472B (en) 2017-07-14 2017-07-14 Detection method and device for cheating sites and detection device for cheating sites

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710576240.XA CN110147472B (en) 2017-07-14 2017-07-14 Detection method and device for cheating sites and detection device for cheating sites

Publications (2)

Publication Number Publication Date
CN110147472A CN110147472A (en) 2019-08-20
CN110147472B true CN110147472B (en) 2021-10-15

Family

ID=67588038

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710576240.XA Active CN110147472B (en) 2017-07-14 2017-07-14 Detection method and device for cheating sites and detection device for cheating sites

Country Status (1)

Country Link
CN (1) CN110147472B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093510A (en) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 Anti cheating method and system for aiming at cheat on web page
CN101350011A (en) * 2007-07-18 2009-01-21 中国科学院自动化研究所 Method for detecting search engine cheat based on small sample set
CN101777053A (en) * 2009-01-08 2010-07-14 北京搜狗科技发展有限公司 Method and system for identifying cheating webpages
CN102243659A (en) * 2011-07-18 2011-11-16 南京邮电大学 Webpage junk detection method based on dynamic Bayesian model
CN103064984A (en) * 2013-01-25 2013-04-24 清华大学 Spam webpage identifying method and spam webpage identifying system
CN103150369A (en) * 2013-03-07 2013-06-12 人民搜索网络股份公司 Method and device for identifying cheat web-pages
WO2016101737A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Search query method and apparatus
CN106326498A (en) * 2016-10-13 2017-01-11 合网络技术(北京)有限公司 Cheat video identification method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350011A (en) * 2007-07-18 2009-01-21 中国科学院自动化研究所 Method for detecting search engine cheat based on small sample set
CN101093510A (en) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 Anti cheating method and system for aiming at cheat on web page
CN101777053A (en) * 2009-01-08 2010-07-14 北京搜狗科技发展有限公司 Method and system for identifying cheating webpages
CN102243659A (en) * 2011-07-18 2011-11-16 南京邮电大学 Webpage junk detection method based on dynamic Bayesian model
CN103064984A (en) * 2013-01-25 2013-04-24 清华大学 Spam webpage identifying method and spam webpage identifying system
CN103150369A (en) * 2013-03-07 2013-06-12 人民搜索网络股份公司 Method and device for identifying cheat web-pages
WO2016101737A1 (en) * 2014-12-22 2016-06-30 北京奇虎科技有限公司 Search query method and apparatus
CN106326498A (en) * 2016-10-13 2017-01-11 合网络技术(北京)有限公司 Cheat video identification method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于蚁群优化的网页作弊检测技术研究;唐寿洪;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140915(第 09 期);I139-148 *

Also Published As

Publication number Publication date
CN110147472A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN109800325B (en) Video recommendation method and device and computer-readable storage medium
CN107766426B (en) Text classification method and device and electronic equipment
US11394675B2 (en) Method and device for commenting on multimedia resource
EP3173948A1 (en) Method and apparatus for recommendation of reference documents
CN110008401B (en) Keyword extraction method, keyword extraction device, and computer-readable storage medium
CN111291069B (en) Data processing method and device and electronic equipment
CN108227950B (en) Input method and device
CN113792207B (en) Cross-modal retrieval method based on multi-level feature representation alignment
CN113590881B (en) Video clip retrieval method, training method and device for video clip retrieval model
CN107784034B (en) Page type identification method and device for page type identification
CN109815396B (en) Search term weight determination method and device
CN109918565B (en) Processing method and device for search data and electronic equipment
CN109471919B (en) Zero pronoun resolution method and device
CN112926310B (en) Keyword extraction method and device
CN112307281A (en) Entity recommendation method and device
CN112784142A (en) Information recommendation method and device
CN110929176A (en) Information recommendation method and device and electronic equipment
CN111553372A (en) Training image recognition network, image recognition searching method and related device
CN111368161B (en) Search intention recognition method, intention recognition model training method and device
CN107491453B (en) Method and device for identifying cheating web pages
CN112541110A (en) Information recommendation method and device and electronic equipment
CN111813932B (en) Text data processing method, text data classifying device and readable storage medium
CN110110046B (en) Method and device for recommending entities with same name
CN113033163A (en) Data processing method and device and electronic equipment
CN110147426B (en) Method for determining classification label of query text and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant