CN104050198B - A kind of recognition methods of webpage information and device - Google Patents

A kind of recognition methods of webpage information and device Download PDF

Info

Publication number
CN104050198B
CN104050198B CN201310084318.8A CN201310084318A CN104050198B CN 104050198 B CN104050198 B CN 104050198B CN 201310084318 A CN201310084318 A CN 201310084318A CN 104050198 B CN104050198 B CN 104050198B
Authority
CN
China
Prior art keywords
information
webpage
classification
characteristic information
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310084318.8A
Other languages
Chinese (zh)
Other versions
CN104050198A (en
Inventor
冯景华
陈超
杨宝春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310084318.8A priority Critical patent/CN104050198B/en
Publication of CN104050198A publication Critical patent/CN104050198A/en
Priority to HK15101896.6A priority patent/HK1201360A1/en
Application granted granted Critical
Publication of CN104050198B publication Critical patent/CN104050198B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The embodiment of the invention discloses a kind of web information recognition and device, recognition methods therein includes:Webpage log information is obtained from database;The Webpage log information obtained is divided according to the classification belonging to description object, and counts the Webpage log information in each classification;All kinds of purpose statistical models are established using the Webpage log information in each classification of statistics, determine that the characteristic information of each classification description object is distributed according to the statistical model;Judge the characteristic information of object described in identified webpage information whether in the normal range (NR) of the characteristic information distribution of affiliated classification;If so, determining that the identified webpage information is real information, otherwise, it determines the identified webpage information is deceptive information.According to the embodiment of the present application, the deceptive information of product can be automatically identified, improves recognition efficiency.

Description

A kind of recognition methods of webpage information and device
Technical field
The present invention relates to computer application technology, the recognition methods more particularly to a kind of webpage information and device.
Background technology
On third party's shopping platform, seller user is passed through by platform release product webpage, buyer user on platform Search engine searches the webpage for meeting specific search condition in the webpage that seller issues, and search engine meets these specific The webpage of search condition show buyer in the form of search result, buyer user, which further passes through, browses product search result And then decides whether to click and check some search result product in detail.In addition, being produced when buyer user is searched by search engine When product meet the webpage of specific search condition, search engine also can be based on webpage information to the webpage progress as search result Sequence.Therefore, some seller users are in order to make the webpage that it is issued become the search result of search engine, alternatively, in order to make it The webpage of publication come when as search result front to obtain more chances for exposure, usually all can be in third party's shopping platform Upper publication includes the webpage of false webpage information.For example, product price information is buyer user in all webpage informations The key factor paid close attention to the most, and search engine also is provided with the ranking function based on price, therefore, some seller users False pricing information can be deliberately issued in publishing web page.
It is influenced by these false webpage informations, in information search, on the one hand, search engine is likely to wrap Webpage containing false webpage information feeds back to buyer user as search result;On the other hand, search engine it is also possible to Webpage comprising false webpage information is come to the previous section of entire search result when sequence.Above-mentioned two situations all will be tight Ghost image rings the search quality of search engine, reduces user experience.
In addition to this, for other website platforms also can the inconsistent phenomenon of existence information, such as video website, generally Video website includes:The videos such as film, music, TV play and animation, and video has its heading message in webpage information And attribute information, such as:Just there are heading message and film recommended information, wherein film recommended information i.e. film for film Attribute information.The existing user's (i.e. " upload user ") for uploading film video, also has search, browsing and downloads in video website The user (i.e. " download user ") of film video, upload user more chances for exposure in order to obtain, the heading message filled in and Attribute information can have inconsistent phenomenon, and this inconsistent webpage information can equally influence searching for video website search engine Suo Zhiliang, and then just influence to download the search experience of user.
In order to improve the search quality of search engine, the prior art be by way of manually spot-check from the webpage of publication Find out the doubtful webpage for including false webpage information.And seller user publication webpage be quantitatively it is very huge, by In the limitation of human resources, it is also extremely limited to lead to the webpage quantity of this selective examination processing, so, it is this manually to spot-check Mode is difficult to be widely used, and working efficiency is also very low.Based on the above-mentioned technical problems in the prior art, compel to be essential at present A kind of method of the webpage information of the automatic identification falseness in third party's shopping platform is provided, the work to improve identification is imitated Rate.
Invention content
In order to solve the above-mentioned technical problem, an embodiment of the present invention provides a kind of recognition methods of webpage information and device, Automatically to identify false webpage information, the working efficiency of identification is improved, meanwhile, also improve the search matter of search engine Amount.
The embodiment of the present application discloses following technical solution:
A kind of recognition methods of webpage information, including:
Webpage log information is obtained from database, the Webpage log information includes description object in publishing log Characteristic information and characteristic information in exposure daily record, in the characteristic information in characteristic information and transaction log in click logs It is any one or any number of;
The Webpage log information obtained is divided according to the classification belonging to description object, and counts the webpage in each classification Log information;
All kinds of purpose statistical models are established using the Webpage log information in each classification of statistics, according to the statistics Model determines the characteristic information distribution of each classification description object;
Judge whether the characteristic information of object described in identified webpage information is distributed in the characteristic information of affiliated classification Normal range (NR) in;
If so, determining that the identified webpage information is real information, otherwise, it determines the identified webpage letter Breath is deceptive information.
A kind of web information recognition, including:
Webpage log information is obtained from database, the Webpage log information includes description object in publishing log Characteristic information and the characteristic information in the characteristic information in exposing daily record, characteristic information and transaction log in click logs In it is any one or any number of;
The Webpage log information obtained is divided according to the classification belonging to description object, and counts the webpage in each classification Log information;
All kinds of purpose Webpage log information are divided according to the subcategory belonging to description object, and count each in each classification The Webpage log information of subcategory;
The system of each subcategory in each classification is established using the Webpage log information of each subcategory in each classification of statistics Model is counted, determines that the characteristic information of each subcategory description object in each classification is distributed according to the statistical model;
Judge object described in identified webpage information characteristic information whether affiliated class now belonging to subcategory In the normal range (NR) of characteristic information distribution;
If so, determining that the identified webpage information is real information, otherwise, it determines the identified webpage letter Breath is deceptive information.
A kind of identification device of webpage information, including:
Acquisition module, for obtaining Webpage log information from database, the Webpage log information includes description object Characteristic information in publishing log and the characteristic information in exposure daily record, characteristic information and transaction log in click logs In characteristic information in it is any one or any number of;
Statistical module for dividing the Webpage log information obtained according to the classification belonging to description object, and counts Webpage log information in each classification;
First establishes model module, and all kinds of purposes are established for the Webpage log information in each classification using statistics Statistical model determines that the characteristic information of each classification description object is distributed according to the statistical model;
First judgment module, for judging the characteristic information of object described in identified webpage information whether in affiliated class In the normal range (NR) of purpose characteristic information distribution;
First determining module, for when the result of the first judgment module is to be, determining the identified webpage information For real information, otherwise, it determines the identified webpage information is deceptive information.
A kind of webpage information identification device, including:
Acquisition module, for obtaining Webpage log information from database, the Webpage log information includes description object Characteristic information in publishing log and the characteristic information in exposure daily record, characteristic information and transaction log in click logs In characteristic information in it is any one or any number of;
Industry statistic module, for dividing the Webpage log information obtained according to the classification belonging to description object, and Count the Webpage log information in each classification;
Type statistics module is believed for dividing all kinds of purpose Webpage logs according to the subcategory belonging to description object Breath, and count the Webpage log information of each subcategory in each classification;
Second establishes model module, and the Webpage log information for each subcategory in the classification using statistics is established each The statistical model of each subcategory in classification determines that the feature of each subcategory description object in each classification is believed according to the statistical model Breath distribution;
Second judgment module, for judging the characteristic information of object described in identified webpage information whether in affiliated class Now in the normal range (NR) of the characteristic information distribution of affiliated subcategory;
Second determining module determines the identified webpage information for being yes when the second judgment module judging result For real information, otherwise, it determines the identified webpage information is deceptive information.
As can be seen from the above-described embodiment, the characteristic information distribution of each classification description object is established, or establishes each classification Under each subcategory description object characteristic information distribution, according to the characteristic information of each classification description object distribution or it is all kinds of now The characteristic information distribution of each subcategory description object automatically identifies whether a webpage information is deceptive information.This automatic knowledge The mode of other webpage information improves recognition efficiency.
In addition, search engine after finding search result, is filtered out in search result comprising false webpage information Webpage, alternatively, spy of the characteristic information of object described in webpage information according to each webpage of search result in affiliated classification Probability in reference breath distribution is ranked up search result, can improve the search quality of search engine.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention without having to pay creative labor, may be used also for those of ordinary skill in the art With obtain other attached drawings according to these attached drawings.
Fig. 1 is a kind of method flow diagram for web information recognition that the embodiment of the present application one discloses;
Fig. 2 is a kind of method flow diagram for web information recognition that the embodiment of the present application two discloses;
Fig. 3 is a kind of method flow diagram for web information recognition that the embodiment of the present application three discloses;
Fig. 4 is a kind of structure drawing of device for webpage information identification device that the embodiment of the present application four discloses;
Fig. 5 is a kind of structure drawing of device for webpage information identification device that the embodiment of the present application five discloses;
Fig. 6 is the result figure using semantic analysis tool analysis product title that the application discloses.
Specific implementation mode
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, below in conjunction with the accompanying drawings to the present invention Embodiment is described in detail.
Embodiment one
Referring to Fig. 1, a kind of method flow diagram of its web information recognition disclosed for the embodiment of the present application one, it should Method includes the following steps:
Step 101:Webpage log information is obtained from database, the Webpage log information includes that description object is being issued Characteristic information in daily record and the characteristic information in exposing daily record, the characteristic information in click logs and in transaction log In characteristic information in it is any one or any number of;
The webpage of website is as a kind of novel information carrier, the information for carrying a certain special object, so as to net The user that stands is browsed, which is the description object of webpage.The description object of the webpage of different web sites is also not With, such as:For Taobao, Jingdone district, Amazon, when the shopping websites such as working as, the description object of webpage can be product (that is, Clothes, food, furniture, household electrical appliances, books etc.);For video websites such as youku.com, iqiyi.com, potatoes, the description object of webpage can To be video (that is, video in the form of film, TV, animation, music etc.).In addition, other websites such as novel website, recruitment website Webpage also have the description object that it is directed to, also just say, the webpage of any type website has the description object of oneself.
In the various information carried on webpage, one information of most critical is the characteristic information of description object.So-called " description The characteristic information of object " exactly refers to the information of the feature of characterization description object in one aspect.For example, for product and Speech, price is exactly one feature, and pricing information is exactly the characteristic information of product.Below only with the description of the webpage of shopping website It is illustrated for object-product:In the database of third party's shopping platform, it can record and issue net in each seller user Some generated historical informations when page, and saved as publishing log, include the spy of product in publishing log Reference ceases.In addition, can also record exposure daily record, click logs and/or the day of trade in the database of third party's shopping platform Will, wherein also all including the characteristic information of product, that is, characteristic information of the product in exposing daily record, in click logs Characteristic information and the characteristic information in transaction log.
So-called " publication of product " refers to the product net when buyer user's release product webpage on third party's shopping platform The product described on page is considered as publication.The publication pricing information of the product will be correspondingly recorded in the database.
So-called " exposure of product " refers to meeting when buyer user is searched for by the search engine on third party's shopping platform The webpage of specific search condition, and the webpage for meeting specific search condition is showed as search result and is bought by search engine When family user, product is regarded as exposure described in search result.Product is often exposed once, in the database will Correspondingly record the number and exposure pricing information that the product is exposed.For example, buyer user searches on third party's shopping platform Rope and " mobile phone " relevant webpage, search engine will with " mobile phone " relevant web page display to buyer user, at this point, these with " mobile phone " product involved in " mobile phone " relevant webpage is exposed." mobile phone " product will be also recorded in the database Exposure frequency and exposure pricing information.
So-called " click of product " refers to when buyer user carries out clicking browsing to each webpage in search result, by point Product described in the webpage hit is considered as clicking.Product is often clicked once, will correspondingly be recorded in the database The number and click pricing information that the product is clicked.For example, all and " mobile phone " phase that buyer user shows in search engine With the webpage of " iPhone " click in the webpage of pass pair and check, at this point, involved in " iPhone " webpage being clicked " iPhone " product be clicked.The number of clicks of " iPhone " product will be recorded in the database and clicks valence Lattice information.
So-called " transaction of product " refers to the quilt when product described in buyer user successfully has purchased the webpage being clicked The product of purchase is considered as being merchandised.Product is often primary by transaction, can correspondingly record time of transaction in the database Number, the product quantity merchandised every time and transaction value information.
Product is likely to have its characteristic information in publishing log, exposure daily record, click logs and transaction log, and And characteristic information of the product in publishing log, exposure daily record, click logs and transaction log is possible to variant.
Such as:The characteristic information of product be pricing information, some product issuing process pricing information for 100 and The pricing information of exposure process is 100, is possible to for 150, in the pricing information of process of exchange in the pricing information of the process of click It is possible to be 180 again.That is, the pricing information in issuing process of product with the product in exposure process, click process It is likely to be different with the pricing information of process of exchange.
By taking the Iphone4 products on third party's shopping platform as an example, the product for having recorded the product in the database is being sent out Publication price when cloth is 3100, the affiliated industry of product be " mobile phone ", product it is entitled " apple 4 generation Iphone4 mobile phones official without It is wholesale to lock the original-pack intelligent iPhone of 16G certified products ".The exposure time of the product is also recorded in the exposure log information of database Number is 100 times, wherein the exposure price that 30 exposure prices are 3500,70 times is 3000 (the product exposure valences exposed every time Lattice possibility is identical may also be different).The number of clicks of the product is 40 times, wherein 10 click prices are 3500,30 times It is 3000 (the click price possibility for the product clicked every time is identical may also be different) to click price, and the transaction count of the product is 20 times, wherein the product number merchandised every time in 15 transaction is 50, transaction value 3000, wherein in 5 transaction every time The product number of transaction is 40, and transaction value is that 3500 (the product number merchandised every time and transaction value all may Bu Tong It may be identical).
Step 102:The Webpage log information obtained is divided according to the classification belonging to description object, and counts each classification In Webpage log information;
Classification belonging to description object divides all webpage informations of above-mentioned acquisition, for example, product can be divided into Few following classification:Mobile phone industry, computer industry, apparel industry and household electric appliances etc..Certainly, above to enumerate only example Property, can also include other classifications.Here, the classification belonging to description object can with coarseness be divided according to actual demand, It can also divide to fine granularity the classification belonging to product.Also, it is directed to different types of description object, the mode classification of classification It is also different with classification results.The present invention is not defined the classification mode classification and classification results of each description object. For technical scheme of the present invention, after the classification mode classification of description object determines, classification results also determine that.
For example, after the classification for dividing description object is mobile phone industry, computer industry, apparel industry and household electric appliances, it will All Webpage log information obtained are respectively divided into:The Webpage log information of mobile phone industry, the Webpage log letter of computer industry The Webpage log information of breath, the Webpage log information of apparel industry and household electric appliances, then counts the webpage day in each classification again Will information.
Step 103:All kinds of purpose statistical models are established using the Webpage log information in each classification of statistics, according to The statistical model determines the characteristic information distribution of each classification description object;
When description object is product, and characteristic information is pricing information, step 103 is specially:Utilize the described each of statistics The Webpage log information of classification product establishes the statistical model of each classification product, and each classification product is determined using the statistical model Pricing information distribution.
Assuming that the pricing information of each classification product all obeys Gaussian Mixture distribution, in the case, described in statistics All kinds of purpose Webpage log information establish being achieved in that for statistical model:The institute of the statistics is parsed using EM algorithm All kinds of purpose Webpage log information are stated, the gauss hybrid models of each classification description object are established using analysis result;According to described The gauss hybrid models of each classification description object determine the characteristic information distribution of each classification description object.
Establishing the process of gauss hybrid models is:Using the data in all kinds of purpose Webpage log information of acquisition as training Training data is trained to a gauss hybrid models to be fitted the feature letter of description object by data using the method for machine learning The probability distribution of breath includes N number of single Gaussian function altogether if the sample number of training data is N, in mixed model, they have not With mean value, different covariance matrixes and different weights, being combined summation according to different parameter values, to obtain Gauss mixed Molding type.So-called EM algorithm is to make likelihood function value reach maximum by the increase iteration of training data, Jin Erqiu Obtain model parameter corresponding when functional value maximum, you can fit gauss hybrid models, retouched according to gauss hybrid models State the characteristic information distribution of object.Certainly, the characteristic information of description object also may be used other than it can obey Gaussian Mixture distribution To obey other distributions, e.g., logarithm normal distribution, X2Distribution, T distributions, F distributions or Poisson distribution, based on other distribution sides Formula can also establish corresponding other statistical models.EM algorithm is a kind of clustering algorithm based on gauss hybrid models, In addition to using EM algorithm parsing statistical model other than, can also use K-means algorithms, least-squares algorithm, greatly The method for parameter estimation such as likelihood algorithm parse gauss hybrid models, obtain the characteristic information distribution of description object.When using other Statistical model when, other algorithms can also be used to be parsed.
Which kind of algorithm parsing statistical model is gone using it should be noted that not limited in the embodiment of the present application, that is, Say, above-mentioned any one arithmetic analysis statistical model enumerated may be used, it is of course also possible to use the prior art it is disclosed its His arithmetic analysis statistical model.In addition, the embodiment of the present application does not also limit which kind of statistical model used, above-mentioned row may be used Any one statistical model lifted, it is of course also possible to use other statistical models disclosed in the prior art.
It should be noted that:In order to preferably train statistical model so that the degree of fitting of the statistical model trained is more Height, training pattern is more accurate, more demanding to the authenticity of training data, since the transaction log information in database is most true It is real, can most reflect the data of user behavior, followed by click logs information, exposure log information, product characteristic information.Institute With, when obtaining Webpage log information, the number for the training data that can be needed according to statistical model, to determine using in database Which information as Webpage log information.Such as:When needs training data number be 100, obtained from transaction log Description object the characteristic information totally 30 of process of exchange, the description object that is obtained from click logs the process of click spy Reference breath totally 40, the characteristic information totally 50 of the description object that obtains in exposure process from exposure daily record.Training statistics mould When type, need in all information (40) and exposure daily record in all information (30) and click logs using transaction log Partial information (30).That is, when extracting Webpage log information, according to the sequence that log information authenticity is descending, press The number selection data required according to training data carry out training pattern.In addition, when the information content in transaction log and/or click logs It is smaller, and when cannot be satisfied the requirement of training data number, it can be to the characteristic information in transaction log and/or click logs (that is, characteristic information of the description object in process of exchange and/or characteristic information in the process of click) is weighted processing, then carries out Training.That is, in order to meet the requirement of training data number and required precision, it is higher to validity in Webpage log information After information is weighted processing, then it is trained.
Step 104:Judge object described in identified webpage information characteristic information whether affiliated classification feature In the normal range (NR) of information distribution, if so, entering step 105, otherwise, 106 are entered step;
When description object is product, and characteristic information is pricing information, step 104 is specially:Judge identified webpage Whether the pricing information of the product in information is in the normal range (NR) of the pricing information distribution of affiliated product classification.
When using gauss hybrid models, judge the characteristic information of object described in identified webpage information whether in institute Belong to classification characteristic information distribution normal range (NR) in realization method be:According to the description object of identified webpage information institute The gauss hybrid models for belonging to classification calculate two standard deviation ranges of Gaussian Mixture distribution;Judge identified webpage information Described in object characteristic information whether in the numberical range between described two standard deviations, if so, identified webpage The characteristic information of object described in information is in the normal range (NR) that the characteristic information of affiliated classification is distributed, otherwise, identified net The characteristic information of object described in page information is not in the normal range (NR) of the characteristic information of affiliated classification distribution.
When assuming that the characteristic information of description object obeys Gaussian Mixture distribution, due to most of data in Gaussian Profile All concentrate between two standard deviations, therefore, the present invention using the numberical range between two standard deviations as Gaussian Profile just Normal range, it is real information that the characteristic information within the scope of this, which is determined, the characteristic information being located at except this range It is deceptive information to be determined.It, can be with when assuming that characteristic information obeys other distributions in addition to being judged using the above method According to the distribution characteristics of other distributions, the regime values range of characteristic information distribution is determined.
In practical applications, the type of statistical model can be selected according to actual needs, and further determine that characteristic information The normal range (NR) of distribution, does not limit in this application.
Step 105:Determine that the identified webpage information is real information;
Step 106:Determine that the identified webpage information is deceptive information.
In addition, in order to keep the fitting effect of statistical model more preferable, can also further comprise after step 103:Removal Numerical value in the Webpage log information of statistics is relatively low and the higher partial data of numerical value;
Then at step 104, all kinds of purposes are established using the Webpage log information in removal treated each classification to count Model determines that the characteristic information of each classification description object is distributed according to statistical model.
Search engine can utilize the recognition result of above-mentioned webpage information, can be filtered to search result, screen out packet Search result containing false webpage information.Alternatively, search engine is also based on the webpage information of each webpage in search result Described in object characteristic information affiliated classification characteristic information distribution in probability, in search result each webpage carry out Sequence.Here it is possible to which webpage information is identified by search engine, and directly search result was carried out using recognition result Filter or sequence.It is of course also possible to execute the identification of webpage information by other function modules on third party's shopping platform, search is drawn It holds up and calls recognition result from the function module.The present invention does not limit this.
Preferably, after identifying that webpage information is deceptive information, further include:It is filtered out from search result comprising falseness Webpage information webpage, filtered search result is fed back into client.
Or, it is preferred that after obtaining all kinds of purpose characteristic information distributions, further include:To each net in search result When page is ranked up, the characteristic information for calculating object described in the webpage information of each webpage is distributed in the characteristic information of affiliated classification In probability;Each webpage in search result is ranked up according to the sequence of the probability from big to small.It is of course also possible to press It is ranked up processing according to other sequential systems.
As can be seen from the above-described embodiment, Webpage log information is obtained from database, establishes the statistical model of each classification, And determine that the characteristic information of each classification description object is distributed according to statistical model, pass through the characteristic information point of each classification description object Cloth identifies whether identified webpage information is deceptive information, can also provide and be ordered as consumer's offer better choice.
Particularly, search engine can utilize the authenticity of the webpage information identified, and false webpage information is filtered Fall, filtered search result is fed back into client, to improve the search quality of search engine.Search engine can also lead to It crosses and the true webpage information in search result is ranked up according to the probability in distribution according to descending mode, from And improve user experience.
Embodiment two
Due to the description object huge number of each class now, characteristic information it is widely different, so judging result is accurate Degree is not high.Therefore, second embodiment of the present invention provides a kind of information identifying method, further to identify each subclass in each classification Whether purpose description object is deceptive information.Referring to Fig. 2, it knows for another webpage information that the embodiment of the present application two discloses The method flow diagram of other method, includes the following steps:
Step 201:Webpage log information is obtained from database, the Webpage log information includes that description object is being issued Characteristic information in daily record and in the characteristic information in exposing daily record, characteristic information and transaction log in click logs It is any one or any number of in characteristic information;
Wherein, the characteristic information of description object includes at least heading message.
Step 202:The Webpage log information obtained is divided according to the classification belonging to description object, and counts each classification In Webpage log information;
Step 203:All kinds of purpose Webpage log information are divided according to the subcategory belonging to description object, and count each The Webpage log information of each subcategory in classification;
When the characteristic information of description object includes at least heading message, institute is divided according to the subcategory belonging to description object All kinds of purpose Webpage log information are stated, and the Webpage log information for counting each subcategory in each classification is specially::Using semanteme point Analysis tool (such as Termweight) carries out semantic analysis to the heading message, obtains belonging to the description object in each classification Subcategory;Count the Webpage log information of the description object with identical subcategory in each classification.
Such as:The classification of product be mobile phone industry, product it is entitled " apple 4 generation Iphone4 mobile phones official without lock 16G The original-pack intelligent iPhone of certified products is wholesale ", by semantic analysis tool analysis product title, analysis result is obtained specifically such as Fig. 6 Shown, the subcategory that can be further known to the product is apple 4 generation mobile phone, then counts 4 generation of all apples in mobile phone industry Webpage log information.Again for example:The classification of product is apparel industry, entitled " the male jacket man of Nike/NIKE movements of product Fill colorant match jacket ", the subcategory that the product is further known to by semantic analysis is Nike man style jacket, then counts clothes row All subcategories are the Webpage log information of Nike man style jacket in industry.
It should be noted that here, the subcategory belonging to description object can with coarseness be divided according to actual demand, It can with fine granularity divide the subcategory belonging to product.Also, it is directed to different types of description object, the classification side of subcategory Formula and classification results are also different.The present invention does not limit the subcategory mode classification and classification results of each description object It is fixed.For technical scheme of the present invention, after the subcategory mode classification of description object determines, classification results also determine that .
Step 204:Each son in each classification is established using the Webpage log information of each subcategory in each classification of statistics The statistical model of classification determines that the characteristic information of each subcategory description object in each classification is distributed according to the statistical model;
When the characteristic information of product is pricing information, above-mentioned steps are specially:Using in each product industry of statistics The Webpage log information of all types of products establishes the statistical model of each product type in each product industry, according to the statistical model Determine the pricing information distribution of each product type in each product industry.
Step 205:Judge whether the characteristic information of object described in identified webpage information is affiliated now in affiliated class In the normal range (NR) of the characteristic information distribution of subcategory, if so, entering step 206, otherwise, 207 are entered step;
In practical applications, the type of different statistical models can be selected according to actual needs, and according to different systems Meter model further determines that the normal range (NR) of characteristic information distribution, does not limit in this application.
Step 206:Determine that the identified webpage information is real information;
Step 207:Determine that the identified webpage information is deceptive information.
The implementation procedure of above-mentioned steps 204-207 may refer to implement the step 103-106 in one, due to the contents of the section As soon as being described in detail in embodiment, therefore repeat no more herein.
In addition to so that the statistical model established is more accurate, each subcategory describes in step 203 counts each classification After the Webpage log information of object, divider value can also be gone relatively low from the Webpage log information of statistics and the higher portion of numerical value Divided data;Partial data can be 5%, 10% or the data of other percentages, and removal how many data determined according to actual conditions. Then step 204 is specially:It is established using the Webpage log information of each subcategory in removal treated each classification each in each classification The statistical model of subcategory description object determines that the feature of each subcategory description object in each classification is believed according to the statistical model Breath distribution.
Search engine can utilize the recognition result of above-mentioned webpage information, can be filtered to search result, screen out packet Search result containing false webpage information.Alternatively, search engine is also based on the webpage information of each webpage in search result Described in object characteristic information affiliated class now belonging to subcategory characteristic information distribution in probability, in search result Each webpage be ranked up.Here it is possible to webpage information is identified by search engine, and directly using recognition result to searching Hitch fruit is filtered or sorts.It is of course also possible to execute webpage information by other function modules on third party's shopping platform Identification, search engine calls recognition result from the function module.The present invention does not limit this.
Preferably, after identifying that webpage information is deceptive information, further include:It is filtered out from search result comprising falseness Webpage information webpage, filtered search result is fed back into client.
Or, it is preferred that in obtaining each classification after the characteristic information distribution of each subcategory, further include:Obtaining each production In conduct industry after the webpage information distribution of each product type, further include:When being ranked up to each webpage in search result, meter The characteristic information of object described in the webpage information of each webpage is calculated in affiliated class now the characteristic information distribution of affiliated subcategory Probability;Each webpage is ranked up according to the sequence of the probability from big to small.It is of course also possible to carry out in other orders Sequence.
As can be seen from the above-described embodiment, Webpage log information is obtained from database, establishes all kinds of subcategories each now Statistical model, and determine that the characteristic information of all kinds of subcategory description objects each now is distributed according to statistical model, pass through each classification Under each subcategory description object characteristic information distribution identify whether identified webpage information is deceptive information so that identification Effect higher, the precision of identification more increases.
In particular, when search engine is using the authenticity of the product web page information of identification, true webpage information is fed back To client, search result can also be ranked up according to the probability that product web page information is distributed, search can be not only provided Quality can more provide better search experience to the user.
Embodiment three
Below using statistical model as gauss hybrid models, description object is product, and characteristic information includes pricing information and mark Information is inscribed, it is right for the Type division product subcategory belonging to product according to the trade division product classification belonging to product A kind of web information recognition provided by the present application is described in greater detail.Referring to Fig. 3, it is the embodiment of the present application A kind of method flow diagram of three information identifying methods disclosed, includes the following steps:
Step 301:Webpage log information is extracted from database, the Webpage log information includes product in publishing log In pricing information and the price in the pricing information in exposing daily record, pricing information and transaction log in click logs It is any one or any number of in information;
Step 302:According to the Webpage log information that the trade division belonging to product obtains, and count each product industry Webpage log information;
Step 303:According to the Webpage log information of each product industry described in the Type division belonging to product, and count each production The Webpage log information of each product type in conduct industry;
The specific implementation that the webpage information of each product industry is divided according to the Type division belonging to product is:It adopts Semantic analysis is carried out to the heading message with semantic analysis tool (such as Termweight), is obtained belonging to each product industry Product type;Then the Webpage log information of the product with like products type in each product industry is counted.
Step 304:Each product is established using the Webpage log information of all types of products in each product industry of statistics The statistical model of each product type in industry determines that the price of each product type in each product industry is believed according to the statistical model Breath distribution;
Step 305:Judge whether the pricing information of product in identified webpage information is affiliated under affiliated product industry In the normal range (NR) of the pricing information distribution of product type, if so, entering step 306, otherwise, 307 are entered step;
When statistical model is gauss hybrid models, a kind of realization method of step 305 is:According to identified product web page The gauss hybrid models of affiliated product type calculate two standard deviations of Gaussian Mixture distribution under the affiliated product industry of information Range;
The pricing information of product in identified webpage information is judged whether within the scope of described two standard deviations, such as Fruit is the product price information of the pricing information of product affiliated type under affiliated product industry point in identified webpage information In the normal range (NR) of cloth, otherwise, the pricing information of the product not affiliated class under affiliated product industry in identified webpage information In the normal range (NR) of the product price information distribution of type.
Step 306:Determine that the identified webpage information is real information;
Step 307:Determine that the identified webpage information is deceptive information;
Step 308:When being ranked up to each webpage in search result, product in the webpage information of each webpage is calculated Probability in the product price information distribution of pricing information affiliated product type under affiliated product industry;
Step 309:Each webpage in search result is ranked up according to the sequence of the probability from big to small.
In addition, in order to enable the gauss hybrid models established are more accurate, in step 303 and count each in each product industry After the Webpage log information of product type, the numerical value that can also be removed in the Webpage log information of statistics is relatively low higher with numerical value Partial data;For example, partial data can be 5%, 10% or the data of other percentages, determine to remove according to actual conditions How many data.
Then step 304 is specially:Utilize the Webpage log of each product type in removal treated each product industry Information establishes the gauss hybrid models of each product type in each product industry, and each product line is determined according to the gauss hybrid models The pricing information distribution of each product type in industry.
As can be seen from the above-described embodiment, the product web page day of all types of products in each product industry of statistics is utilized Will information establishes gauss hybrid models, obtains the product web page characteristic information distribution of all types of products in each product industry, and right Product is ranked up, and not only can accurately identify whether the product web page information of all types of products of every profession and trade is false letter Breath so that the effect higher of identification, the precision of identification are more increased, and can be provided more reliable search information to consumer and More easily search experience.
Example IV
Corresponding with a kind of web information recognition in above-described embodiment one, the embodiment of the present application provides a kind of net Page information identification device.Referring to Fig. 4, a kind of device knot of its webpage information identification device disclosed for the embodiment of the present application four Composition, the device include:Acquisition module 401, statistical module 402, first establish model module 403,404 and of the first judgment module First determining module 405.It is further described its internal structure and its connection relation with reference to the operation principle of the device.
Acquisition module 401, for obtaining Webpage log information from database, the webpage information includes that description object exists Characteristic information in publishing log and the information in exposure daily record, the spy in characteristic information and transaction log in click logs It is any one or any number of in reference breath;
Statistical module 402 for dividing the Webpage log information obtained according to the classification belonging to description object, and is united Count the Webpage log information in each classification;
First establishes model module 403, is established for the Webpage log information in each classification using statistics all kinds of Purpose statistical model determines that the characteristic information of each classification description object is distributed according to the statistical model;
First judgment module 404, for judging the characteristic information of object described in identified webpage information whether in institute Belong in the normal range (NR) of characteristic information distribution of classification;
First determining module 405, for when the result of the first judgment module is to be, determining the identified webpage letter Breath is real information, otherwise, it determines the identified webpage information is deceptive information.
Preferably, when statistical model is gauss hybrid models, described first, which establishes model module 403, includes:Parsing Module one and determination sub-module one, wherein analyzing sub-module one, for parsing the described of the statistics using EM algorithm All kinds of purpose Webpage log information establish all kinds of purpose gauss hybrid models using analysis result;Determination sub-module one is used for root All kinds of purpose characteristic information distributions are determined according to all kinds of purpose gauss hybrid models.
Preferably, described first establish model module 403 include analyzing sub-module one and determination sub-module for the moment, described the One judgment module 404 includes:Computational submodule one is used for the height of the affiliated classification of description object according to identified webpage information This mixed model calculates two standard deviation ranges of Gaussian Mixture distribution;Judging submodule one, it is identified for judging Whether the characteristic information of object described in webpage information is within the scope of described two standard deviations, if so, identified net The characteristic information of object described in page information is otherwise, identified in the normal range (NR) that the characteristic information of affiliated classification is distributed The characteristic information of object described in webpage information is not in the normal range (NR) of the characteristic information of affiliated classification distribution.
Preferably, which further includes:First feedback module, for being filtered out from search result comprising false webpage Filtered search result is fed back to client by the webpage of information.
Preferably, which further includes:First calculates probabilistic module, for being arranged to each webpage in search result When sequence, the characteristic information for calculating object described in the webpage information of each webpage is general in the characteristic information distribution of affiliated classification Rate;
First sorting module, for being arranged each webpage in search result according to the sequence of the probability from big to small Sequence.
As can be seen from the above-described embodiment, Webpage log information is obtained from database, establishes the statistical model of each classification, And determine that the characteristic information of each classification description object is distributed according to statistical model, pass through the characteristic information point of each classification description object Cloth identifies whether identified webpage information is deceptive information, can also provide and be ordered as consumer's offer better choice.
Particularly, search engine can utilize the authenticity of the webpage information identified, and false webpage information is filtered Fall, filtered search result is fed back into client, to improve the search quality of search engine.Search engine can also lead to It crosses and the true webpage information in search result is ranked up according to the probability in distribution according to descending mode, from And improve user experience.
Embodiment five
Corresponding with a kind of web information recognition in above-described embodiment two, the embodiment of the present application provides a kind of net Page information identification device.Please Parameter Map 5, the device of a kind of webpage information identification device disclosed for the embodiment of the present application five shows It is intended to, which includes:Acquisition module 501, industry statistic module 502, type statistics model 503, second establish model module 504, the second judgment module 505, the second determining module 506.It is further described inside it with reference to the operation principle of the device Structure and connection relation.
Acquisition module 501, for obtaining Webpage log information from database, the Webpage log information includes description pair As the characteristic information in publishing log and characteristic information, the characteristic information in click logs and the day of trade in exposure daily record It is any one or any number of in characteristic information in will;
Industry statistic module 502, for dividing the Webpage log information obtained according to the classification belonging to description object, And count the Webpage log information in each classification;
Type statistics model 503, for dividing all kinds of purpose Webpage logs according to the subcategory belonging to description object Information, and count the Webpage log information of each subcategory in each classification;
Second establishes model module 504, the Webpage log information for each subcategory in each classification using statistics The statistical model for establishing each subcategory in each classification determines each subcategory description object in each classification according to the statistical model Characteristic information is distributed;
Second judgment module 505, for judging the characteristic information of object described in identified webpage information whether in institute Category class is now in the normal range (NR) of the characteristic information distribution of affiliated subcategory;
Second determining module 506 determines the identified webpage letter for being yes when the second judgment module judging result Breath is real information, otherwise, it determines the identified webpage information is deceptive information.
Preferably, the characteristic information of the description object includes at least heading message, then the type statistics module, specifically Including:Analyze submodule and statistic submodule;Wherein, submodule is analyzed, for believing the title using semantic analysis tool Breath carries out semantic analysis, obtains the subcategory belonging to the description object in each classification;
Statistic submodule, the Webpage log information for counting the description object with identical subcategory in each classification.
Preferably, further include:Second feedback module includes false webpage information for being filtered out from search result Filtered search result is fed back to client by webpage.
Preferably, further include:Second computing module and the second sorting module;
Second calculates probabilistic module, for when being ranked up to each webpage in search result, calculating the net of each webpage Probability of the characteristic information of object described in page information in affiliated class now the characteristic information distribution of affiliated subcategory;
Second sorting module, for being ranked up to each webpage according to the sequence of the probability from big to small.
As can be seen from the above-described embodiment, Webpage log information is obtained from database, establishes all kinds of subcategories each now Statistical model, and determine that the characteristic information of all kinds of subcategory description objects each now is distributed according to statistical model, pass through each classification Under each subcategory description object characteristic information distribution identify whether identified webpage information is deceptive information so that identification Effect higher, the precision of identification more increases.
In particular, when search engine is using the authenticity of the product web page information of identification, true webpage information is fed back To client, search result can also be ranked up according to the probability that product web page information is distributed, search can be not only provided Quality can more provide better search experience to the user.
It should be noted that one of ordinary skill in the art will appreciate that realizing the whole in above-described embodiment method or portion Split flow is relevant hardware can be instructed to complete by computer program, and the program can be stored in computer can It reads in storage medium, the program is when being executed, it may include such as the flow of the embodiment of above-mentioned each method.Wherein, described to deposit Storage media can be magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
A kind of web information recognition provided by the present invention and device are described in detail above, answered herein With specific embodiment, principle and implementation of the present invention are described, and the explanation of above example is only intended to help Understand the method and its core concept of the present invention;Meanwhile for those of ordinary skill in the art, according to the thought of the present invention, There will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as to this The limitation of invention.

Claims (18)

1. a kind of recognition methods of webpage information, which is characterized in that including:
Webpage log information is obtained from database, the Webpage log information includes feature of the description object in publishing log Information and the characteristic information in exposure daily record, appointing in the characteristic information in characteristic information and transaction log in click logs Meaning one or any number of;
The Webpage log information obtained is divided according to the classification belonging to description object, and counts the Webpage log in each classification Information;
All kinds of purpose statistical models are established using the Webpage log information in each classification of statistics, according to all kinds of purposes Statistical model determines the characteristic information distribution of each classification description object;
Judge whether the characteristic information of object described in identified webpage information is distributed just in the characteristic information of affiliated classification In normal range;
If so, determining that the identified webpage information is real information, otherwise, it determines the identified webpage information is Deceptive information.
2. according to the method described in claim 1, it is characterized in that, when statistical model be gauss hybrid models when, the utilization All kinds of purpose Webpage log information of statistics establish all kinds of purpose statistical models, and each classification is determined according to the statistical model The characteristic information of description object is distributed, including:
All kinds of purpose Webpage log information that the statistics is parsed using EM algorithm are established each using analysis result The gauss hybrid models of classification description object;
The characteristic information distribution of each classification description object is determined according to the gauss hybrid models of each classification description object.
3. according to the method described in claim 2, it is characterized in that, described judge object described in identified webpage information Characteristic information whether affiliated classification characteristic information be distributed normal range (NR) in, including:
The two of Gaussian Mixture distribution are calculated according to the gauss hybrid models of the affiliated classification of the description object of identified webpage information A standard deviation range;
The characteristic information of object described in identified webpage information is judged whether within the scope of described two standard deviations, such as Fruit is the normal range (NR) that the characteristic information of object described in identified webpage information is distributed in the characteristic information of affiliated classification Interior, otherwise, the characteristic information of object described in identified webpage information is not in the normal of the characteristic information of affiliated classification distribution In range.
4. method according to any one of claims 1 to 3, which is characterized in that identifying that webpage information is falseness After information, further include:
The webpage for including false webpage information is filtered out from search result, and filtered search result is fed back into client End.
5. method according to any one of claims 1 to 3, which is characterized in that obtaining all kinds of purpose characteristic informations After distribution, further include:
When being ranked up to each webpage in search result, the feature letter of object described in the webpage information of each webpage is calculated Cease the probability in the characteristic information distribution of affiliated classification;
Each webpage in search result is ranked up according to the sequence of the probability from big to small.
6. a kind of web information recognition, which is characterized in that including:
Webpage log information is obtained from database, the Webpage log information includes feature of the description object in publishing log In information and characteristic information in the characteristic information in exposing daily record, characteristic information and transaction log in click logs It is any one or any number of;
The Webpage log information obtained is divided according to the classification belonging to description object, and counts the Webpage log in each classification Information;
All kinds of purpose Webpage log information are divided according to the subcategory belonging to description object, and count each subclass in each classification Purpose Webpage log information;
The statistics mould of each subcategory in each classification is established using the Webpage log information of each subcategory in each classification of statistics Type determines that the characteristic information of each subcategory description object in each classification is distributed according to the statistical model;
Judge object described in identified webpage information characteristic information whether affiliated class now belonging to subcategory feature In the normal range (NR) of information distribution;
If so, determining that the identified webpage information is real information, otherwise, it determines the identified webpage information is Deceptive information.
7. according to the method described in claim 6, it is characterized in that, the characteristic information of the description object is believed including at least title Breath;Then the subcategory according to belonging to description object divides all kinds of purpose Webpage log information, and counts in each classification The Webpage log information of each subcategory is specially:
Semantic analysis is carried out to the heading message using semantic analysis tool, obtains the son belonging to the description object in each classification Classification;
Count the Webpage log information of the description object with identical subcategory in each classification.
8. the method described according to claim 6 or 7, which is characterized in that after identifying that webpage information is deceptive information, also wrap It includes:
The webpage for including false webpage information is filtered out from search result, and filtered search result is fed back into client End.
9. the method described according to claim 6 or 7, which is characterized in that the characteristic information of each subcategory in obtaining each classification After distribution, further include:
When being ranked up to each webpage in search result, the characteristic information of object described in the webpage information of each webpage is calculated Probability in affiliated class now the characteristic information distribution of affiliated subcategory;
Each webpage is ranked up according to the sequence of the probability from big to small.
10. a kind of identification device of webpage information, which is characterized in that including:
Acquisition module, for obtaining Webpage log information from database, the Webpage log information includes that description object is being sent out Characteristic information in cloth daily record and the characteristic information in exposure daily record, in characteristic information and transaction log in click logs It is any one or any number of in characteristic information;
Statistical module for dividing the Webpage log information obtained according to the classification belonging to description object, and counts all kinds of Webpage log information in mesh;
First establishes model module, and all kinds of purpose statistics are established for the Webpage log information in each classification using statistics Model determines that the characteristic information of each classification description object is distributed according to all kinds of purpose statistical models;
First judgment module, for judging the characteristic information of object described in identified webpage information whether in affiliated classification In the normal range (NR) of characteristic information distribution;
First determining module, for when the result of the first judgment module is to be, determining that the identified webpage information is true Real information, otherwise, it determines the identified webpage information is deceptive information.
11. device according to claim 10, which is characterized in that when statistical model is gauss hybrid models, described the One establishes model module, including:
Analyzing sub-module one, all kinds of purpose Webpage log information for parsing the statistics using EM algorithm, All kinds of purpose gauss hybrid models are established using analysis result;
Determination sub-module one, for determining that all kinds of purpose characteristic informations are distributed according to all kinds of purpose gauss hybrid models.
12. according to the devices described in claim 11, which is characterized in that first judgment module, including:
Computational submodule one, the gauss hybrid models for the affiliated classification of description object according to identified webpage information calculate Two standard deviation ranges of Gaussian Mixture distribution;
Judging submodule one, for judging the characteristic information of object described in identified webpage information whether in described two marks In quasi- difference numberical range, if so, the characteristic information of object described in identified webpage information is believed in the feature of affiliated classification In the normal range (NR) for ceasing distribution, otherwise, the characteristic information of object described in identified webpage information is not in the spy of affiliated classification In the normal range (NR) of reference breath distribution.
13. the device according to any one of claim 10~12, which is characterized in that further include:
First feedback module, the webpage for filtering out the webpage information for including falseness from search result, is searched filtered Hitch fruit feeds back to client.
14. the device according to any one of claim 10~12, which is characterized in that further include:
First calculates probabilistic module, and the webpage for when being ranked up to each webpage in search result, calculating each webpage is believed Probability of the characteristic information of object described in breath in the characteristic information distribution of affiliated classification;
First sorting module, for being ranked up to each webpage in search result according to the sequence of the probability from big to small.
15. a kind of webpage information identification device, which is characterized in that including:
Acquisition module, for obtaining Webpage log information from database, the Webpage log information includes that description object is being sent out Characteristic information in cloth daily record and the characteristic information in exposure daily record, in characteristic information and transaction log in click logs It is any one or any number of in characteristic information;
Industry statistic module for dividing the Webpage log information obtained according to the classification belonging to description object, and counts Webpage log information in each classification;
Type statistics module, for dividing all kinds of purpose Webpage log information according to the subcategory belonging to description object, and Count the Webpage log information of each subcategory in each classification;
Second establishes model module, and the Webpage log information for each subcategory in the classification using statistics establishes each classification In each subcategory statistical model, the characteristic information point of each subcategory description object in each classification is determined according to the statistical model Cloth;
Second judgment module, for judge object described in identified webpage information characteristic information whether affiliated class now In the normal range (NR) of the characteristic information distribution of affiliated subcategory;
Second determining module determines that the identified webpage information is true for being yes when the second judgment module judging result Real information, otherwise, it determines the identified webpage information is deceptive information.
16. device according to claim 15, which is characterized in that the characteristic information of the description object includes at least title Information;The then type statistics module, specifically includes:
Submodule is analyzed, for carrying out semantic analysis to the heading message using semantic analysis tool, is obtained in each classification Subcategory belonging to description object;
Statistic submodule, the Webpage log information for counting the description object with identical subcategory in each classification.
17. device according to claim 15 or 16, which is characterized in that further include:
Second feedback module, the webpage for filtering out the webpage information for including falseness from search result, is searched filtered Hitch fruit feeds back to client.
18. device according to claim 15 or 16, which is characterized in that further include:
Second calculates probabilistic module, and the webpage for when being ranked up to each webpage in search result, calculating each webpage is believed Probability of the characteristic information of object described in breath in affiliated class now the characteristic information distribution of affiliated subcategory;
Second sorting module, for being ranked up to each webpage according to the sequence of the probability from big to small.
CN201310084318.8A 2013-03-15 2013-03-15 A kind of recognition methods of webpage information and device Active CN104050198B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201310084318.8A CN104050198B (en) 2013-03-15 2013-03-15 A kind of recognition methods of webpage information and device
HK15101896.6A HK1201360A1 (en) 2013-03-15 2015-02-25 Method for recognizing webpage information and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310084318.8A CN104050198B (en) 2013-03-15 2013-03-15 A kind of recognition methods of webpage information and device

Publications (2)

Publication Number Publication Date
CN104050198A CN104050198A (en) 2014-09-17
CN104050198B true CN104050198B (en) 2018-08-24

Family

ID=51503049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310084318.8A Active CN104050198B (en) 2013-03-15 2013-03-15 A kind of recognition methods of webpage information and device

Country Status (2)

Country Link
CN (1) CN104050198B (en)
HK (1) HK1201360A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201848A (en) * 2016-06-30 2016-12-07 北京奇虎科技有限公司 The log processing method of a kind of real-time calculating platform and device
CN108600113B (en) * 2018-04-12 2022-05-31 北京五八信息技术有限公司 Preliminary auditing method and device for data to be issued and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093510A (en) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 Anti cheating method and system for aiming at cheat on web page
CN102316081A (en) * 2010-06-30 2012-01-11 北京启明星辰信息技术股份有限公司 Method and device for identifying similar webpage
CN102890681A (en) * 2011-07-20 2013-01-23 阿里巴巴集团控股有限公司 Method and system for generating webpage structure template

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7596749B2 (en) * 2005-09-26 2009-09-29 Ricoh Company Limited Method and system for script processing in script implementation of HTTP to obtain information from devices

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093510A (en) * 2007-07-25 2007-12-26 北京搜狗科技发展有限公司 Anti cheating method and system for aiming at cheat on web page
CN102316081A (en) * 2010-06-30 2012-01-11 北京启明星辰信息技术股份有限公司 Method and device for identifying similar webpage
CN102890681A (en) * 2011-07-20 2013-01-23 阿里巴巴集团控股有限公司 Method and system for generating webpage structure template

Also Published As

Publication number Publication date
HK1201360A1 (en) 2015-08-28
CN104050198A (en) 2014-09-17

Similar Documents

Publication Publication Date Title
CN102841946B (en) Commodity data retrieval ordering and Method of Commodity Recommendation and system
CN104965889B (en) Content recommendation method and device
CN103136683B (en) Calculate method, device and product search method, the system of product reference price
TWI546751B (en) Cross - site information display method and system
CN109559208A (en) A kind of information recommendation method, server and computer-readable medium
US10579659B2 (en) Method, apparatus, electronic equipment and storage medium for performing screening and statistical operation on data
CN107632984A (en) A kind of cluster data table shows methods, devices and systems
CN103942712A (en) Product similarity based e-commerce recommendation system and method thereof
CN109409928A (en) A kind of material recommended method, device, storage medium, terminal
CN105138690B (en) The method and apparatus for determining keyword
CN106503025A (en) Method and system is recommended in a kind of application
CN103778553A (en) Commodity attribute recommendation method and commodity attribute recommendation system
CN110111167A (en) A kind of method and apparatus of determining recommended
CN113254810B (en) Search result output method and device, computer equipment and readable storage medium
CN108053282A (en) A kind of method for pushing of combined information, device and terminal
CN111899047A (en) Resource recommendation method and device, computer equipment and computer-readable storage medium
WO2019072098A1 (en) Method and system for identifying core product terms
CN109460519A (en) Browse object recommendation method and device, storage medium, server
CN106354855A (en) Recommendation method and system
CN106919995A (en) A kind of method and device for judging user group's loss orientation
CN104050198B (en) A kind of recognition methods of webpage information and device
CN107133811A (en) The recognition methods of targeted customer a kind of and device
CN107016583A (en) Data processing method and device
CN108959289B (en) Website category acquisition method and device
CN110189188A (en) Merchandise control method, apparatus, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 1201360

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211110

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co., Ltd

Address before: P.O. Box 847, 4th floor, capital building, Grand Cayman, British Cayman Islands

Patentee before: Alibaba Group Holdings Limited

TR01 Transfer of patent right