CN107609106B - Similar article searching method, device, equipment and storage medium - Google Patents

Similar article searching method, device, equipment and storage medium Download PDF

Info

Publication number
CN107609106B
CN107609106B CN201710817664.0A CN201710817664A CN107609106B CN 107609106 B CN107609106 B CN 107609106B CN 201710817664 A CN201710817664 A CN 201710817664A CN 107609106 B CN107609106 B CN 107609106B
Authority
CN
China
Prior art keywords
article
similar
target article
target
searching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710817664.0A
Other languages
Chinese (zh)
Other versions
CN107609106A (en
Inventor
罗欢
权圣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Xiaofei Finance Co Ltd
Original Assignee
Mashang Xiaofei Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Xiaofei Finance Co Ltd filed Critical Mashang Xiaofei Finance Co Ltd
Priority to CN201710817664.0A priority Critical patent/CN107609106B/en
Publication of CN107609106A publication Critical patent/CN107609106A/en
Application granted granted Critical
Publication of CN107609106B publication Critical patent/CN107609106B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for searching similar articles, which comprises the following steps: receiving a searching request aiming at similar articles of a target article, searching the similar articles of the target article by using the title of the target article in a title searching mode to obtain a first similar article set of the target article, searching the similar articles of the target article by using the content of the target article in a content searching mode to obtain a second similar article set of the target article, and combining the first similar article set and the second similar article set to obtain a similar article result of the target article. By applying the technical scheme provided by the embodiment of the invention, the similar articles of the target article are searched by respectively adopting proper searching modes for the title and the content of the target article, so that the searching accuracy rate of the similar articles aiming at the target article can be improved, and the searching speed is increased. The invention also discloses a device, equipment and a storage medium for searching similar articles, and the device, the equipment and the storage medium have corresponding technical effects.

Description

Similar article searching method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of computer application, in particular to a method, a device, equipment and a storage medium for searching similar articles.
Background
With the development of computer application technology, the diffusion speed of written articles put on the internet is increasing. For example, a large number of news articles are generated every day, people can forward read news articles through the internet, and the more forwarding, the greater the influence of the news articles. However, now people are not only simply forwarding news articles, but may change a title to better attract the attention of readers, or simply add some own insights for the readers to understand. Therefore, for various reasons, the similarity of articles is not only a simple complete match of titles or contents, but also needs to increase some similar calculation methods to improve the coverage rate of similar calculation so as to better reflect the influence degree of the articles.
In summary, how to effectively solve the problem of searching similar articles is a technical problem that needs to be solved urgently by those skilled in the art.
Disclosure of Invention
In order to solve the technical problems, the invention provides the following technical scheme:
a similar article searching method comprises the following steps:
receiving a search request aiming at similar articles of a target article;
searching similar articles of the target article by using the title of the target article in a title searching mode to obtain a first similar article set of the target article;
searching similar articles of the target article by using the content of the target article in a content searching mode to obtain a second similar article set of the target article;
and merging the first similar article set and the second similar article set to obtain a similar article result of the target article.
In a specific embodiment of the present invention, after the receiving a search request for a similar article of a target article, and before searching for the similar article of the target article by using a title of the target article in a title search manner and obtaining a first set of similar articles of the target article, the method further includes:
and performing first preprocessing operation on the title of the target article to obtain the title of the target article after the first preprocessing operation.
In a specific embodiment of the present invention, the searching for the similar articles of the target article by using the title of the target article in a title searching manner to obtain the first similar article set of the target article includes:
searching the title of the target article through a search engine to search for a similar article of the target article, and obtaining a third similar article set of the target article;
and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
In a specific embodiment of the present invention, after the receiving a search request for a similar article of a target article, and before searching for the similar article of the target article by using the content of the target article in a content search manner and obtaining a second set of similar articles of the target article, the method further includes:
and performing second preprocessing operation on the content of the target article to obtain the content of the target article after the second preprocessing operation.
In a specific embodiment of the present invention, the searching for the similar article of the target article by using the content of the target article in a content searching manner to obtain the second similar article set of the target article includes:
calculating a hash value corresponding to the content of the target article through a hash algorithm;
determining available sub-portions of the hash value;
searching the available sub-parts by utilizing a search engine to search for similar articles of the target article, and obtaining a fourth similar article set of the target article;
calculating a hamming distance between the available sub-portion of the target article and an available sub-portion corresponding to each article in the fourth set of similar articles, respectively;
and extracting the article with the Hamming distance smaller than a preset second threshold value to obtain a second similar article set of the target article.
A similar article searching apparatus, comprising:
the request receiving module is used for receiving a searching request of similar articles aiming at the target article;
a first set obtaining module, configured to search, in a title search manner, similar articles of the target article by using a title of the target article, and obtain a first similar article set of the target article;
a second set obtaining module, configured to search, in a content search manner, similar articles of the target article by using the content of the target article, and obtain a second similar article set of the target article;
and the result obtaining module is used for merging the first similar article set and the second similar article set to obtain a similar article result of the target article.
In an embodiment of the present invention, the apparatus further includes a first preprocessing module, configured to:
after the search request for the similar articles of the target article is received, the similar articles of the target article are searched by using the title of the target article in a title search mode, and before a first similar article set of the target article is obtained, a first preprocessing operation is performed on the title of the target article, so that the title of the target article after the first preprocessing operation is obtained.
In an embodiment of the present invention, the first set obtaining module is specifically configured to:
searching the title of the target article through a search engine to search for a similar article of the target article, and obtaining a third similar article set of the target article;
and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
A similar article finding device comprising:
a memory for storing a computer program;
a processor for implementing the steps of the similar article search method as described above when executing the computer program.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the similar article searching method as described above.
By applying the technical scheme provided by the embodiment of the invention, the searching request for the similar article of the target article is received, the similar article of the target article is searched by using the title of the target article in a title searching mode to obtain the first similar article set of the target article, the similar article of the target article is searched by using the content of the target article in a content searching mode to obtain the second similar article set of the target article, and the first similar article set and the second similar article set are combined to obtain the similar article result of the target article. The similar articles of the target article are searched by respectively adopting proper searching modes for the title and the content of the target article, so that the searching accuracy rate of the similar articles of the target article can be improved, and the searching speed is increased.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating an implementation of a method for searching similar articles according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a similar article searching apparatus according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a similar article searching apparatus in an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, which is a flowchart illustrating an implementation of a method for searching for a similar article according to an embodiment of the present invention, the method may include the following steps:
s101: a search request for similar articles to the target article is received.
In the embodiment of the invention, the whole searching process of the similar articles aiming at the target article can be controlled by one searching server. In practical applications, a user may need to search for similar articles of a target article, for example, when the user needs to know the forwarding amount of the target article, the user needs to search for similar articles of the target article. In this case, the user may send a search request for a similar article of the target article to the search server, and when the search server receives the search request, the step S102 may be continuously performed.
The target article may be any article for which a similar article search is to be performed.
S102: and searching similar articles of the target article by using the title of the target article in a title searching mode to obtain a first similar article set of the target article.
After the search server receives a search request for a target article sent by a user, because the titles of the target article are relatively short texts, a title search mode suitable for searching for the short texts is provided, and similar articles of the target article are searched for by using the titles of the target article through the title search mode, so that a first similar article set of the target article is obtained.
In one embodiment of the present invention, step S102 may include the following steps:
the method comprises the following steps: searching the title of the target article through a search engine to find a similar article of the target article, and obtaining a third similar article set of the target article;
step two: and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
After receiving a search request for a similar article of a target article sent by a user, the search server may search for the similar article of the target article through a search engine, for example, may search for the similar article of the target article through an open source search engine ES. Thereby obtaining a third similar article set of the target article. The search engine has the problem of overlarge word vectors for the similarity calculation of long texts, but the search engine is very suitable for the similarity calculation of short texts.
After the third similar article set of the target article is obtained, the dissimilar articles can be filtered according to the difference of the title lengths, for example, an article with the matching degree with the title length of the target article being greater than a preset first threshold value can be extracted from the third similar article set, and the article with the matching degree with the title length of the target article being greater than the preset first threshold value is taken as a similar article, so that the first similar article set of the target article is obtained.
It should be noted that the first threshold may be set and adjusted according to actual situations, for example, the first threshold may be set to 80%, which is not limited in this embodiment of the present invention.
It should be further noted that, for the title searching manner, in addition to a manner of searching for a similar article of the target article by searching for the title of the target article through a search engine and then comparing the obtained matching degree between the similar article and the title length of the target article, other manners may also be used, such as a similar algorithm for direct pairwise comparison of cosine included angles, and the like, which is not limited in this embodiment of the present invention. Similar algorithms of cosine included angles and the like which are directly compared pairwise have a large number of articles, and the efficiency is very slow when the articles are long. However, it works well for similar calculations for short text.
S103: and searching similar articles of the target article by using the content of the target article in a content searching mode to obtain a second similar article set of the target article.
After the search server receives a search request for similar articles of the target article sent by the user, because the contents of the target article are relatively long texts, a content search mode suitable for the long texts is provided, the contents of the target article are used for searching the similar articles of the target article through the content search mode, and a second similar article set of the target article is obtained.
In one embodiment of the present invention, step S103 may include the following steps:
the method comprises the following steps: and calculating a hash value corresponding to the content of the target article through a hash algorithm.
Step two: determining available sub-portions of the hash value;
step three: searching for similar articles of the target article by utilizing the search engine, wherein the similar articles of the target article can be searched by utilizing the available sub-parts, and obtaining a fourth similar article set of the target article;
step four: calculating the Hamming distance between the available sub-part of the target article and the available sub-part corresponding to each article in the fourth similar article set respectively;
step five: and extracting the article with the Hamming distance smaller than a preset second threshold value to obtain a second similar article set of the target article.
For convenience of description, the above five steps may be combined for illustration.
After receiving a search request for a similar article of a target article sent by a user, a search server may calculate a hash value corresponding to the content of the target article through a hash algorithm, for example, may calculate a hash value corresponding to the content of the target article through a method of fast similarity calculation for a long text, such as SimHash. The hash value may be divided into several sub-portions and available sub-portions of the hash value may be determined, the available sub-portions may be searched for similar articles of the target article using a search engine, for example, the hash value may be divided into 4 sub-portions and any one of the sub-portions may be determined as an available sub-portion, and the available sub-portions may be searched for similar articles of the target article using ES, thereby obtaining a fourth set of similar articles of the target article.
After the fourth similar article set of the target article is obtained, the dissimilar articles can be filtered through the Hamming distance, that is, the Hamming distance between the available subsection of the target article and the available subsection corresponding to each article in the fourth similar article set is respectively calculated, the article with the Hamming distance smaller than the preset second threshold value is extracted, and the article with the Hamming distance smaller than the preset second threshold value is used as the similar article, so that the second similar article set of the target article is obtained.
It should be noted that the second threshold may be set and adjusted according to actual situations, for example, the second threshold may be set to 4, which is not limited in this embodiment of the present invention.
It should be further noted that, because the usable sub-portion of the hash value is relatively short in text, besides searching for a similar article of the target article by using the search engine, other manners may also be used, such as a similar algorithm for directly comparing two by two, such as cosine included angle, etc., which is not limited in this embodiment of the present invention.
S104: and merging the first similar article set and the second similar article set to obtain the similar article result of the target article.
After a first similar article set of the target article is obtained in a title searching mode and a second similar article set of the target article is obtained in a content searching mode, the first similar article set and the second similar article set can be combined, and therefore a similar article result of the target article is obtained.
It should be noted that, in the embodiment of the present invention, the execution sequence of step S102 and step S103 is not limited, and may be executed sequentially or simultaneously.
By applying the method provided by the embodiment of the invention, the search request for the similar article of the target article is received, the similar article of the target article is searched by using the title of the target article in a title search mode to obtain the first similar article set of the target article, the similar article of the target article is searched by using the content of the target article in a content search mode to obtain the second similar article set of the target article, and the first similar article set and the second similar article set are combined to obtain the similar article result of the target article. The similar articles of the target article are searched by respectively adopting proper searching modes for the title and the content of the target article, so that the searching accuracy rate of the similar articles of the target article can be improved, and the searching speed is increased.
In an embodiment of the present invention, after step S101 and before step S102, the method may further include the steps of:
and performing first preprocessing operation on the title of the target article to obtain the title of the target article after the first preprocessing operation.
After receiving a search request for a similar article of a target article sent by a user, a search server may perform a first preprocessing operation on a title of the target article, for example, may remove tag information such as websites and categories from the title of the target article, so as to obtain the title of the target article after the first preprocessing operation, so as to reduce interference of the tag information such as the websites and the categories on the search for the similar article of the target article, and improve the accuracy of the search.
In an embodiment of the present invention, after step S101 and before step S103, the method may further include the steps of:
and performing second preprocessing operation on the content of the target article to obtain the content of the target article after the second preprocessing operation.
After receiving a search request for a similar article of a target article sent by a user, the search server may perform a second preprocessing operation on the content of the target article, for example, may remove beginning ending signatures uniformly added by some websites from the content of the target article, so as to obtain the content of the target article after the second preprocessing operation, so as to reduce interference of the beginning ending signatures on searching for the similar article of the target article, and improve the accuracy of the search.
Corresponding to the above method embodiments, the embodiments of the present invention further provide a similar article searching apparatus, and the similar article searching apparatus described below and the similar article searching method described above may be referred to correspondingly.
Referring to fig. 2, the apparatus may include the following modules:
a request receiving module 201, configured to receive a search request for a similar article of a target article;
a first set obtaining module 202, configured to search, in a title search manner, similar articles of a target article by using a title of the target article, and obtain a first similar article set of the target article;
the second set obtaining module 203 is configured to search for similar articles of the target article by using the content of the target article in a content search manner, and obtain a second similar article set of the target article;
and the result obtaining module 204 is configured to merge the first similar article set and the second similar article set to obtain a similar article result of the target article.
The device provided by the embodiment of the invention receives a searching request for the similar article of the target article, searches the similar article of the target article by using the title of the target article in a title searching mode to obtain a first similar article set of the target article, searches the similar article of the target article by using the content of the target article in a content searching mode to obtain a second similar article set of the target article, and combines the first similar article set and the second similar article set to obtain the similar article result of the target article. The similar articles of the target article are searched by respectively adopting proper searching modes for the title and the content of the target article, so that the searching accuracy rate of the similar articles of the target article can be improved, and the searching speed is increased.
In an embodiment of the present invention, the apparatus further includes a first preprocessing module, configured to:
after receiving a search request for similar articles of a target article, searching the similar articles of the target article by using the title of the target article in a title search mode, and before obtaining a first similar article set of the target article, performing a first preprocessing operation on the title of the target article to obtain the title of the target article after the first preprocessing operation.
In an embodiment of the present invention, the first set obtaining module 202 is specifically configured to:
searching the title of the target article through a search engine to find a similar article of the target article, and obtaining a third similar article set of the target article;
and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
In an embodiment of the present invention, the apparatus further includes a second preprocessing module, configured to:
after receiving a searching request for the similar articles of the target article, searching the similar articles of the target article by using the content of the target article in a content searching mode, and before obtaining a second similar article set of the target article, performing a second preprocessing operation on the content of the target article to obtain the content of the target article after the second preprocessing operation.
In an embodiment of the present invention, the second set obtaining module 203 is specifically configured to:
calculating a hash value corresponding to the content of the target article through a hash algorithm;
determining available sub-portions of the hash value;
searching for similar articles of the target article by utilizing the search engine, wherein the similar articles of the target article can be searched by utilizing the available sub-parts, and obtaining a fourth similar article set of the target article;
calculating the Hamming distance between the available sub-part of the target article and the available sub-part corresponding to each article in the fourth similar article set respectively;
and extracting the article with the Hamming distance smaller than a preset second threshold value to obtain a second similar article set of the target article.
Corresponding to the above method embodiment, the embodiment of the present invention further provides a similar article searching device, and the similar article searching device described below and the similar article searching method described above may be referred to in correspondence.
Referring to fig. 3, the apparatus may include:
a memory 301 for storing a computer program;
a processor 302 for implementing the steps of the method for finding similar articles in the method embodiments when executing the computer program.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a computer-readable storage medium, and a computer-readable storage medium described below and a similar article searching method described above may be referred to correspondingly.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method for finding similar articles in the method embodiments.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The principle and the implementation of the present invention are explained in the present application by using specific examples, and the above description of the embodiments is only used to help understanding the technical solution and the core idea of the present invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (9)

1. A method for searching similar articles is characterized by comprising the following steps:
receiving a search request aiming at similar articles of a target article;
searching similar articles of the target article by using the title of the target article in a title searching mode to obtain a first similar article set of the target article;
calculating a hash value corresponding to the content of the target article through a hash algorithm;
determining available sub-portions of the hash value;
searching each available sub-part by utilizing a search engine to search for similar articles of the target article, and obtaining a fourth similar article set of the target article;
calculating a hamming distance between the available sub-portion of the target article and an available sub-portion corresponding to each article in the fourth set of similar articles, respectively;
extracting articles with Hamming distances smaller than a preset second threshold value to obtain a second similar article set of the target articles;
and merging the first similar article set and the second similar article set to obtain a similar article result of the target article.
2. The method of claim 1, wherein after receiving a search request for a similar article of a target article, and before searching for the similar article of the target article by using a title of the target article in a title search manner and obtaining a first set of similar articles of the target article, the method further comprises:
and performing first preprocessing operation on the title of the target article to obtain the title of the target article after the first preprocessing operation.
3. The method according to claim 1 or 2, wherein the searching for the similar article of the target article by using the title of the target article in a title searching manner to obtain the first similar article set of the target article comprises:
searching the title of the target article through a search engine to search for a similar article of the target article, and obtaining a third similar article set of the target article;
and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
4. The method of claim 1, wherein after the receiving a search request for a similar article of a target article and before the calculating a hash value corresponding to the content of the target article by a hashing algorithm, further comprising:
and performing second preprocessing operation on the content of the target article to obtain the content of the target article after the second preprocessing operation.
5. A similar article searching apparatus, comprising:
the request receiving module is used for receiving a searching request of similar articles aiming at the target article;
a first set obtaining module, configured to search, in a title search manner, similar articles of the target article by using a title of the target article, and obtain a first similar article set of the target article;
the second set obtaining module is used for calculating a hash value corresponding to the content of the target article through a hash algorithm; determining available sub-portions of the hash value; searching each available sub-part by utilizing a search engine to search for similar articles of the target article, and obtaining a fourth similar article set of the target article; calculating a hamming distance between the available sub-portion of the target article and an available sub-portion corresponding to each article in the fourth set of similar articles, respectively; extracting articles with Hamming distances smaller than a preset second threshold value to obtain a second similar article set of the target articles;
and the result obtaining module is used for merging the first similar article set and the second similar article set to obtain a similar article result of the target article.
6. The apparatus of claim 5, further comprising a first pre-processing module to:
after the search request for the similar articles of the target article is received, the similar articles of the target article are searched by using the title of the target article in a title search mode, and before a first similar article set of the target article is obtained, a first preprocessing operation is performed on the title of the target article, so that the title of the target article after the first preprocessing operation is obtained.
7. The apparatus according to claim 5 or 6, wherein the first set obtaining module is specifically configured to:
searching the title of the target article through a search engine to search for a similar article of the target article, and obtaining a third similar article set of the target article;
and extracting the articles with the matching degree with the title length of the target article being greater than a preset first threshold value from the third similar article set to obtain a first similar article set of the target article.
8. A similar article search device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the similar article search method according to any one of claims 1 to 4 when executing the computer program.
9. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the similar article searching method according to any one of claims 1 to 4.
CN201710817664.0A 2017-09-12 2017-09-12 Similar article searching method, device, equipment and storage medium Active CN107609106B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710817664.0A CN107609106B (en) 2017-09-12 2017-09-12 Similar article searching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710817664.0A CN107609106B (en) 2017-09-12 2017-09-12 Similar article searching method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN107609106A CN107609106A (en) 2018-01-19
CN107609106B true CN107609106B (en) 2020-10-30

Family

ID=61063567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710817664.0A Active CN107609106B (en) 2017-09-12 2017-09-12 Similar article searching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN107609106B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846117A (en) * 2018-06-26 2018-11-20 北京金堤科技有限公司 The duplicate removal screening technique and device of business news flash
CN109446301A (en) * 2018-09-18 2019-03-08 沈文策 A kind of lookup method and device of similar article
CN109615001B (en) * 2018-12-05 2020-03-10 上海恺英网络科技有限公司 Method and device for identifying similar articles
CN111046129A (en) * 2019-05-13 2020-04-21 国家计算机网络与信息安全管理中心 Public number information storage method and retrieval system based on text content characteristics
CN110363401B (en) * 2019-06-26 2022-05-03 北京百度网讯科技有限公司 Integrated viscosity evaluation method and device, computer equipment and storage medium
CN112084776B (en) * 2020-09-15 2023-11-10 腾讯科技(深圳)有限公司 Method, device, server and computer storage medium for detecting similar articles

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350032A (en) * 2008-09-23 2009-01-21 胡辉 Method for judging whether web page content is identical or not

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577462B (en) * 2012-08-02 2018-10-16 北京百度网讯科技有限公司 A kind of Document Classification Method and device
CN106202224B (en) * 2016-06-29 2022-01-07 北京百度网讯科技有限公司 Search processing method and device
CN106202057B (en) * 2016-08-30 2019-07-12 东软集团股份有限公司 The recognition methods of similar news information and device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101350032A (en) * 2008-09-23 2009-01-21 胡辉 Method for judging whether web page content is identical or not

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
典型位置敏感哈希的相似性搜索的研究;JasonDing1354;《CSDN博客URL:https://blog.csdn.net/JasonDing1354/article/details/34451003》;20140625;第3.3节 *

Also Published As

Publication number Publication date
CN107609106A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107609106B (en) Similar article searching method, device, equipment and storage medium
CN106570141B (en) Approximate repeated image detection method
US8452106B2 (en) Partition min-hash for partial-duplicate image determination
US20150220833A1 (en) Generating vector representations of documents
US8788925B1 (en) Authorized syndicated descriptions of linked web content displayed with links in user-generated content
US10216848B2 (en) Method and system for recommending cloud websites based on terminal access statistics
CN104866985B (en) The recognition methods of express delivery odd numbers, apparatus and system
EP3314461A1 (en) Learning entity and word embeddings for entity disambiguation
US10803380B2 (en) Generating vector representations of documents
CN108170650B (en) Text comparison method and text comparison device
US11100073B2 (en) Method and system for data assignment in a distributed system
CN107330079B (en) Method and device for presenting rumor splitting information based on artificial intelligence
JP6105599B2 (en) Search for information
CN104091164A (en) Face picture name recognition method and system
US20100036781A1 (en) Apparatus and method providing retrieval of illegal motion picture data
US10394838B2 (en) App store searching
CN108153728B (en) Keyword determination method and device
US20180276286A1 (en) Metadata Extraction and Management
CN115329048A (en) Statement retrieval method and device, electronic equipment and storage medium
CN108712486B (en) Workload proving method and device
CN109241360B (en) Matching method and device of combined character strings and electronic equipment
US8370390B1 (en) Method and apparatus for identifying near-duplicate documents
CN107633020B (en) Article similarity detection method and device
WO2016210203A1 (en) Learning entity and word embeddings for entity disambiguation
CN105069175A (en) Information retrieval method and server based on version control system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant