CN114943036A - push similar article judgment method and device, storage medium and electronic equipment - Google Patents

push similar article judgment method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN114943036A
CN114943036A CN202210658173.7A CN202210658173A CN114943036A CN 114943036 A CN114943036 A CN 114943036A CN 202210658173 A CN202210658173 A CN 202210658173A CN 114943036 A CN114943036 A CN 114943036A
Authority
CN
China
Prior art keywords
article
articles
candidate
historical
push
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210658173.7A
Other languages
Chinese (zh)
Inventor
李国库
张鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Tianyanchawei Technology Co ltd
Original Assignee
Yancheng Jindi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Jindi Technology Co Ltd filed Critical Yancheng Jindi Technology Co Ltd
Priority to CN202210658173.7A priority Critical patent/CN114943036A/en
Publication of CN114943036A publication Critical patent/CN114943036A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for judging similar articles pushed by a push, a storage medium and electronic equipment, wherein the method comprises the following steps: acquiring the first N articles with the highest matching degree with the user as candidate articles; acquiring a historical pushed article of a user; calculating article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles; responding to a comparison result that the similarity of the articles is smaller than or equal to a preset threshold value, and removing candidate articles which are repeated with the historical pushed articles; according to the push pushing similar article judging method and device, the storage medium and the electronic device, whether the candidate article is similar to the historical pushing article or not is determined by calculating and judging the article similarity of the candidate article and the historical pushing article, the situation that the similar article is repeatedly pushed by the same user during pushing is avoided, and the resource occupancy rate can be reduced.

Description

push pushing similar article judging method and device, storage medium and electronic equipment
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for judging push-to-push similar articles, a storage medium and electronic equipment.
Background
At present, there are many news media, and each media reports about the hot information that people are interested in. In order to increase the activity of the users, personalized article information needs to be pushed to different users at irregular intervals. However, the core of each media report content does not change, but only the media report content is different in terms of expression, decoration, and the like, that is, similar articles may exist in the candidate article pool (articles provided by different media may describe the same event), which may result in that, when an article is pushed for a user, similar articles are likely to be selected for the same user multiple times for pushing, and therefore, it is necessary to determine whether the currently selected article is pushed in the process of history pushing, so as to avoid that the same user repeatedly selects similar articles for pushing during pushing, thereby affecting the user experience.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for determining a push-to-push similar article, a storage medium, and an electronic device, so as to at least partially solve the above problem.
In order to solve the above problems, the present invention provides a method for determining a push-to-push similar article, including:
acquiring the first N articles with the highest matching degree with the user as candidate articles;
acquiring a historical pushed article of a user;
calculating article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles;
and removing candidate articles which are repeated with the historical pushed articles in response to a comparison result that the article similarity is smaller than or equal to a preset threshold value.
Optionally, in the foregoing method embodiments of the present invention, calculating the article similarity between each candidate article and the history pushed article based on the candidate articles and the history pushed articles includes:
respectively segmenting the candidate articles and the historical pushed articles;
removing stop words in the candidate article after word segmentation and the historical pushed article after word segmentation, and respectively and correspondingly obtaining candidate article key words and historical pushed article key words;
constructing a directed cyclic graph according to the key words of the historical pushed articles;
obtaining a candidate article path based on a directed cyclic graph and candidate article key words of a historical pushed article to obtain a result vector;
and calculating the article similarity of the candidate articles and the historical pushed articles according to the result vector.
Optionally, in the foregoing method embodiments of the present invention, the removing the candidate article after word segmentation and the stop word in the history pushed article after word segmentation to obtain the candidate article key word and the history pushed article key word respectively and correspondingly includes:
presetting a stop word list;
and according to the stop word list, removing the stop words in the segmented candidate article and the segmented historical pushed article, and respectively and correspondingly obtaining the candidate article key words and the historical pushed article key words.
Optionally, in the above method embodiments of the present invention, constructing a directed cyclic graph according to the key terms of the history pushed article includes:
summarizing historical pushed article key words;
and constructing a directed cyclic graph by taking the key words of the historical push articles as nodes and taking the sequence of the key words of the historical push articles as the direction of each node.
Optionally, in the foregoing method embodiments of the present invention, obtaining a candidate article path based on a directed cyclic graph and candidate article keywords of a historical pushed article to obtain a result vector includes:
traversing each word in the candidate article key words in a directed cyclic graph respectively, and judging whether a node same with the word can be found;
if the same node is found, marking the current word as a first preset identifier and adding the first preset identifier to the result vector so as to continuously find the next word in the candidate article key words backwards from the current node in the directed cyclic graph;
if the same node is not found, marking the current word as a second preset identifier and adding the second preset identifier to the result vector, continuing to search the current word backwards by using the current node in the directed cyclic graph, and traversing the next word in the key words of the candidate article again in the directed cyclic graph when the step length of traversal of each out degree of the current node is larger than the preset step length and the node which is the same as the current word is not found;
and obtaining a final result vector after traversing.
Optionally, in the foregoing method embodiments of the present invention, calculating article similarity between the candidate article and the history pushed article according to the result vector includes:
and calculating the proportion of the number of the first preset identifiers in the result vector to the length of the whole result vector based on the final result vector.
Optionally, in each of the above method embodiments of the present invention, the method further includes:
and responding to a comparison result that the similarity of the article is greater than a preset threshold value, if the candidate article is not similar to the historical pushed article, and complementarily constructing a directed cyclic graph by using the key words of the candidate article.
In order to solve the above problem, the present invention further provides a device for determining a push-to-push similar article, including:
the selecting module is used for acquiring the first N articles with the highest matching degree with the user as candidate articles;
the acquisition module is used for acquiring historical pushed articles;
the similarity calculation module is used for calculating the article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles;
and the duplication removing module is used for responding to a comparison result that the similarity of the articles is less than or equal to a preset threshold value and removing candidate articles which are duplicated with the historical pushed articles.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements an embodiment of any method of the method for determining push-to-push similar articles.
In order to solve the above problem, the present invention further provides an electronic device, where the electronic device includes a memory and a processor, the memory is used for storing a computer executable program, and the processor is used for running the computer executable program to implement an embodiment of any method of the push-to-push similar article determination method.
According to the judgment method and device for pushing similar articles by push, the storage medium and the electronic equipment, the problem that similar articles are repeatedly pushed to the same user is effectively solved by checking the similar articles, the user experience is improved, and the resource occupancy rate can be reduced.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in more detail embodiments of the present invention with reference to the attached drawings. The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings, like reference numbers generally represent like parts or steps.
Fig. 1 is a schematic flowchart of a method for determining a push-to-push similar article according to an exemplary embodiment of the present invention;
FIG. 2 is a flowchart illustrating an article similarity calculation according to an exemplary embodiment of the present invention;
FIG. 3 is a flow diagram of stop word culling provided by an exemplary embodiment of the invention;
FIG. 4 is a flow diagram of a build sheet provided by an exemplary embodiment of the present invention;
fig. 5 is a schematic flowchart of a method for determining a push-to-push similar article according to another exemplary embodiment of the present invention;
FIG. 6 is a schematic diagram of a key term B construction diagram provided by an exemplary embodiment of the present invention;
FIG. 7 is a schematic diagram of an apparatus provided in an exemplary embodiment of the invention;
fig. 8 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present invention.
Detailed Description
Hereinafter, example embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of embodiments of the invention and not all embodiments of the invention, with the understanding that the invention is not limited to the example embodiments described herein.
It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.
It will be understood by those of skill in the art that the terms "first," "second," and the like in the embodiments of the present invention are used merely to distinguish one element, step, device, module, or the like from another element, and do not denote any particular technical or logical order therebetween.
It should also be understood that in embodiments of the present invention, "a plurality" may refer to two or more and "at least one" may refer to one, two or more.
It is also to be understood that any reference to any component, data, or structure in the embodiments of the invention may be generally understood as one or more, unless explicitly defined otherwise or stated to the contrary hereinafter.
In addition, the term "and/or" in the present invention is only one kind of association relationship describing the associated object, and means that there may be three kinds of relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In the present invention, the character "/" generally indicates that the preceding and following related objects are in an "or" relationship.
It should also be understood that the description of the embodiments of the present invention emphasizes the differences between the embodiments, and the same or similar parts may be referred to each other, so that the descriptions thereof are omitted for brevity.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Embodiments of the invention are operational with numerous other general purpose or special purpose computing system environments or configurations, and with numerous other electronic devices, such as terminal devices, computer systems, servers, etc. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with electronic devices, such as terminal devices, computer systems, servers, and the like, include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, minicomputer systems, mainframe computer systems, distributed cloud computing environments that include any of the above, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc. that perform particular tasks or implement particular abstract data types. The computer system/server may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.
Exemplary method
Fig. 1 is a flowchart illustrating a method for determining a push-to-push similar article according to an exemplary embodiment of the present invention. The embodiment can be applied to an electronic device, as shown in fig. 1, the method includes the following steps:
step S101, acquiring the first N articles with the highest matching degree with the user as candidate articles;
wherein N is a positive integer; for the selection manner of the candidate article, a person skilled in the art can select the selection manner in the prior art according to actual needs, which is not limited herein.
Step S102, acquiring a historical pushing article of a user;
for the acquisition mode of the history pushed article, a person skilled in the art can select the acquisition mode in the prior art according to actual needs, and the acquisition mode is not limited here.
Step S103, calculating article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles;
optionally, as shown in fig. 2, step S103 specifically includes:
step S1031, performing word segmentation on the candidate articles and the historical pushed articles respectively;
step S1032, eliminating stop words in the candidate articles after word segmentation and the historical pushed articles after word segmentation, and respectively and correspondingly obtaining key words of the candidate articles and key words of the historical pushed articles;
optionally, as shown in fig. 3, step S1032 specifically includes:
step S10321, presetting a stop word list;
step S10322, according to the stop word list, removing the stop words in the segmented candidate article and the segmented historical pushed article, and obtaining a candidate article key word and a historical pushed article key word respectively and correspondingly.
Step S1033, a directed cyclic graph is constructed according to the key words of the historical pushed articles;
one or more directed cyclic graphs constructed according to the key words of the history pushed articles can be provided. Specifically, if one historical pushed article is provided, a directed cyclic graph can be constructed according to the key words of the historical pushed article; if the number of the historical pushed articles is multiple, a directed cyclic graph can be constructed according to the key words of the historical pushed articles, and also a directed cyclic graph can be correspondingly constructed for the key words of the historical pushed articles, namely multiple directed cyclic graphs.
It should be noted that, if the same key terms of the history push articles exist in one or more history push articles, the same key terms of the history push articles as nodes can only appear once when a directed cyclic graph is constructed. That is, if the same historical pushed article key words exist in one or more historical pushed articles, the same historical pushed article key words are merged when a directed cyclic graph is constructed.
Optionally, as shown in fig. 4, step S1033 specifically includes:
step S10331, summarizing historical pushed article key words;
step S10332, a directed cyclic graph is constructed by taking the key words of the historical pushed articles as nodes and taking the sequence of the key words of the historical pushed articles as the direction of each node.
Step S1034, based on the directed cyclic graph and the candidate article key words of the historical pushed article, obtaining a candidate article path and obtaining a result vector;
optionally, step S1034 specifically includes:
traversing each word in the candidate article key words in a directed cyclic graph respectively, and judging whether a node same with the word can be found;
if the same node is found, marking the current word as a first preset identifier and adding the first preset identifier to the result vector so as to continuously find the next word in the candidate article key words backwards from the current node in the directed cyclic graph;
if the same node is not found, marking the current word as a second preset identifier and adding the second preset identifier to the result vector, continuing to search the current word backwards by using the current node in the directed cyclic graph, and traversing the next word in the key words of the candidate article again in the directed cyclic graph when the step length of traversal of each out degree of the current node is larger than the preset step length and the node which is the same as the current word is not found;
and obtaining a final result vector after traversing.
Furthermore, when a plurality of directed cyclic graphs are constructed according to the key words of the historical pushed articles, each candidate article traverses the plurality of directed cyclic graphs, and the similarity between the candidate article and the historical pushed articles can be accurately calculated in such a way, so that the duplicate removal is more accurate; when a directed cyclic graph is constructed according to the key words of the historical pushed articles, each candidate article is only traversed with the directed cyclic graph, and the method can effectively compress the calculated data amount and improve the calculation efficiency.
Optionally, when a first word in the candidate article key words is traversed in the directed cyclic graph, if the first word in the candidate article key words is not found in the directed cyclic graph, directly marking the first word in the candidate article key words as a second preset identifier with the same number as the preset step length, adding the second preset identifier to the result vector, and continuing to traverse a next word in the candidate article key words again in the directed cyclic graph.
In addition, it should be noted that when the previous word of the current word in the candidate article key words is not found in the directed cyclic graph, and is marked as the second preset identifier with the same number as the preset step length and added to the result vector, the current word in the candidate article key words is no longer continuously found according to the candidate article path of the previous word, but the whole directed cyclic graph is traversed again.
In step S1035, the article similarity between the candidate article and the history pushed article is calculated according to the result vector.
Optionally, step S1035 specifically includes: and calculating the proportion of the number of the first preset identifiers in the result vector to the length of the whole result vector based on the final result vector.
And step S104, responding to a comparison result that the similarity of the article is less than or equal to a preset threshold value, and removing candidate articles which are repeated with the historical pushed articles.
In an alternative embodiment of the invention, the method further comprises: and responding to a comparison result that the similarity of the article is greater than a preset threshold value, if the candidate article is not similar to the historical pushed article, and complementarily constructing a directed cyclic graph by using the key words of the candidate article.
The judgment method for pushing similar articles by push provided by the invention effectively avoids the problem of repeatedly pushing similar articles for the same user, improves the user experience, and can reduce the resource occupancy rate.
Fig. 5 is a flowchart illustrating a method for determining a push-to-push similar article according to another exemplary embodiment of the present invention. As shown in fig. 5, the method includes the steps of:
step S201, using the trained recommendation model to obtain the top N articles with the highest matching degree with the user from the candidate article content pool as candidate articles;
in an implementation manner of the embodiment of the present invention, the recommendation model may be a Deep cross model, a NeuralCF model, a PNN model, or a Wide & Deep model in the prior art, and the recommendation model used in the embodiment of the present invention is not specifically limited. In addition, the number of candidate articles is determined according to articles included in a preset matching degree range, and the number of candidate articles is not specifically limited in the embodiment of the present invention. Wherein N is a positive integer.
In the present embodiment, the candidate article includes a candidate article a as "i like to see a novel". Note that the candidate article a is only an example for explaining the present invention in detail, and the present embodiment is explained by taking only one candidate article as an example.
Step S202, a history pushing article is obtained, and the history pushing article comprises a history pushing article B, which is 'I do not like watching TV nor movies'. Note that the candidate article a is an example given for explaining the present invention in detail, and the present embodiment is described by taking only one history push article as an example.
Step S203, a jieba word segmentation tool is used for segmenting words of the candidate article A and the historical pushed article B respectively;
the candidate article a after word segmentation is: i/like/see/novel;
the history pushed article B after word segmentation is as follows: i/no/like/watch/tv/also/no/like/watch/movie.
Step S204, according to a preset stop word list, eliminating stop words from the segmented candidate article A and the segmented historical pushing article B to obtain a key word A and a key word B;
the preset stop word list may be: the stop word list is not limited in any way, and the person skilled in the art can flexibly set the stop word list according to actual needs.
The key term a is: like/see/novel;
the key term B is: no/like/watch/tv/no/like/watch/movie.
And step S205, constructing a directed cyclic graph of the key word B.
In this embodiment, a directed cyclic graph is constructed by using each term of the key term B as a node and using the sequence of the key terms B as the direction of each node, as shown in fig. 6, to obtain the directed cyclic graph constructed for the key term B.
And S206, constructing a result vector, traversing all the words in the key words A in sequence, searching in the directed cyclic graph, marking '1' or '0' according to the search result, and adding a marking number to the result vector.
Wherein, the step S206 may further include:
traversing each word in the key words A in the directed cyclic graph respectively, and judging whether the nodes same as the words can be found or not;
if the same node is found, marking the current word as '1' (a first preset identifier) and adding the current word to the result vector so as to continuously find the next word in the key word A backwards from the current node in the directed cyclic graph;
if the same node is not found, marking the current word as '0' (a second preset identifier) and adding the current word to the result vector so as to continuously find the current word backwards from the current node in the directed cyclic graph; when the step length of traversal of each out degree of the previous node is larger than the preset step length and the node which is the same as the current word is not found, traversing the next word in the key word A again in the directed cyclic graph;
in this embodiment, the preset step size is preferably five, but those skilled in the art can also select the preset step size according to actual needs, which is defined herein.
Specifically, firstly, judging whether the word "like" in the key word a is in the directed cyclic graph, if the same node as the word "like" is found, marking the current word "like" as "1" and adding the word "like" to the result vector, wherein the result vector is 1 at this moment; continuously traversing the next word "see", judging whether the word "see" is in the directed cyclic graph, if the node same as the word "see" is found, marking the current word "see" as "1" and adding the node to the result vector, wherein the result vector is as follows: 11; the next word of the word "see" is "novel", the search is continued in each out-degree of the current node "see", the "television" node and the "movie" node after the word "see" node are found, the same node as the word "novel" is not found, and at this time, the result vector is: 110; traversing the next node of the 'television' node, wherein the next node is 'not', and the node 'not' is different from the word 'novel', continuing searching in the graph, wherein the maximum searching step is five, and none is found, the result vector is 1100000, the traversal is finished, and the final result vector is 1100000.
And completing all traversal to form a final result vector, wherein the final result vector is as follows: 1100000.
step S207, calculating the proportion of the number of '1' in the result vector to the whole result vector to obtain the article similarity of the candidate article A and the historical pushed article, wherein the similarity of the candidate article A is 28.5%.
Step S208, if the similarity of the articles is smaller than or equal to a preset threshold value, judging that the candidate article A is similar to the history pushed article B, and removing the duplicate of the candidate article A; and if the similarity of the articles is greater than a preset threshold value, judging that the candidate article A is not similar to the history pushed article B, and using the candidate article key word A to construct a directed cyclic graph in a supplementing manner.
For the preset threshold, those skilled in the art can flexibly select the threshold according to actual needs, which is not limited herein.
The judgment method for pushing similar articles by using push provided by the invention effectively avoids the problem of repeatedly pushing similar articles for the same user, improves the user experience, and can reduce the resource occupancy rate.
Exemplary devices
Fig. 7 is a schematic structural diagram of an apparatus according to an exemplary embodiment of the present invention. As shown in fig. 7, the apparatus includes:
the selection module 31 acquires the first N articles with the highest matching degree with the user as candidate articles;
an obtaining module 32, configured to obtain a history pushed article;
the similarity calculation module 33 is configured to calculate an article similarity between each candidate article and a historical pushed article based on the candidate articles and the historical pushed articles;
in an embodiment, the similarity calculation module 33 may include a word segmentation unit 331, a stop word rejection unit 332, a construction graph unit 333, a path acquisition unit 334, and a calculation unit 335.
The word segmentation unit 331 is configured to perform word segmentation on the history pushed article and the candidate article, and send the segmented history pushed article and the segmented candidate article to the stop word removing unit 332.
The stop word eliminating unit 332 is configured to eliminate stop words from the segmented historical pushed article and the segmented candidate article, to obtain a historical pushed article key word and a candidate article key word respectively corresponding to the segmented historical pushed article and the candidate article key word, and to send the historical pushed article key word to the graph constructing unit 333.
In an embodiment, the stop word removing unit 332 may further include a stop word list unit (not shown in the figure) configured to remove stop words in the segmented candidate article and the segmented historical pushed article according to the stop word list, and obtain a candidate article key word and a historical pushed article key word respectively.
And the graph constructing unit 333 is used for constructing a directed cyclic graph from the key words of the history pushed articles.
In an embodiment, the construction graph unit 333 is specifically configured to: summarizing historical pushed article key words; and constructing a directed cyclic graph by taking the key words of the historical pushed articles as nodes and taking the sequence of the key words of the historical pushed articles as the direction of each node.
The path obtaining unit 334 is configured to obtain a candidate article path based on a directed cyclic graph and candidate article keywords of a historical pushed article, and obtain a result vector.
In an embodiment, the path obtaining unit 334 is specifically configured to:
traversing each word in the candidate article key words in a directed cyclic graph respectively, and judging whether a node same with the word can be found;
if the same node is found, marking the current word as a first preset identifier and adding the first preset identifier to the result vector so as to continuously find the next word in the candidate article key words backwards from the current node in the directed cyclic graph;
if the same node is not found, marking the current word as a second preset identifier and adding the second preset identifier to the result vector, continuing to search the current word backwards by using the current node in the directed cyclic graph, and traversing the next word in the key words of the candidate article again in the directed cyclic graph when the step length of traversal of each out degree of the current node is larger than the preset step length and the node which is the same as the current word is not found;
and obtaining a final result vector after traversing.
A calculating unit 335, configured to calculate, based on the final result vector of the candidate article, a ratio of the number of the first preset identifiers in the result vector to the length of the overall result vector.
And the duplication removing module 34 is configured to remove candidate articles that duplicate the historical pushed articles in response to a comparison result that the article similarity is less than or equal to a preset threshold.
In an embodiment, if the similarity of the articles is greater than a preset threshold, the candidate articles are not similar to the history pushed articles, and the duplication removal module 34 is further configured to send the keyword of the candidate article to the graph construction unit 333 to construct a directed cyclic graph, and supplement the candidate articles to the directed cyclic graph.
The judgment device for pushing similar articles by push provided by the invention effectively avoids the problem of repeatedly pushing similar articles for the same user, improves the user experience, and can reduce the resource occupancy rate.
It should be noted that, the above-mentioned apparatus corresponds to the method for determining a push-to-push similar article provided by the present invention, and other descriptions can refer to the description of the method for determining a push-to-push similar article provided by the present invention, and are not described herein again.
Exemplary electronic device
Fig. 8 is a structure of an electronic device provided by an exemplary embodiment of the present invention. The electronic device may be either or both of the first device and the second device, or a stand-alone device separate from them, which stand-alone device may communicate with the first device and the second device to receive the acquired input signals therefrom. FIG. 8 illustrates a block diagram of an electronic device in accordance with an embodiment of the disclosure. As shown in fig. 8, the electronic device includes one or more processors 41 and memory 42.
The processor 41 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.
Memory 42 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 41 may execute the program instructions to implement the push similar article determining method of the software program of the various embodiments of the present disclosure described above and/or other desired functions. In one example, the electronic device may further include: an input device 43 and an output device 44, which are interconnected by a bus system and/or other form of connection mechanism (not shown).
The input device 43 may also include, for example, a keyboard, a mouse, and the like.
The output device 44 can output various kinds of information to the outside. The output devices 44 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.
Of course, for simplicity, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 8, omitting components such as buses, input/output interfaces, and so forth. In addition, the electronic device may include any other suitable components, depending on the particular application.
Exemplary computer program product and computer-readable storage Medium
In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the push similar article determination method according to various embodiments of the present disclosure described in the "exemplary methods" section of this specification above.
The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the push similar article determination method according to various embodiments of the present disclosure described in the "exemplary method" section above in this specification.
The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.
The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably herein. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
The method and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices, and methods of the present disclosure, various components or steps may be broken down and/or re-combined. Such decomposition and/or recombination should be considered as equivalents of the present disclosure. The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims (10)

1. A push pushing similar article judgment method is characterized by comprising the following steps:
acquiring the first N articles with the highest matching degree with the user as candidate articles;
acquiring a historical pushed article of a user;
calculating article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles;
and removing candidate articles which are repeated with the historical pushed articles in response to the comparison result that the article similarity is smaller than or equal to a preset threshold value.
2. The method for determining push-pushed similar articles according to claim 1, wherein the calculating the article similarity between each candidate article and the historical push article based on the candidate articles and the historical push article comprises:
respectively segmenting the candidate article and the historical pushed article;
removing the candidate articles after word segmentation and stop words in the historical pushed articles after word segmentation, and respectively and correspondingly obtaining candidate article key words and historical pushed article key words;
constructing a directed cyclic graph according to the key words of the historical pushed articles;
acquiring a candidate article path based on the directed cyclic graph of the historical pushed article and the candidate article key words to obtain a result vector;
and calculating the article similarity of the candidate article and the historical pushed article according to the result vector.
3. The method for determining the push-push similar article according to claim 2, wherein the candidate article after the word segmentation is removed and the stop word in the historical push article after the word segmentation are respectively and correspondingly obtained as a candidate article key word and a historical push article key word, and the method comprises the following steps:
presetting a stop word list;
and according to the stop word list, removing the candidate articles after word segmentation and stop words in the historical pushed articles after word segmentation, and respectively and correspondingly obtaining the candidate article key words and the historical pushed article key words.
4. The push pushing similar article judgment method according to claim 2, wherein the constructing of the directed cyclic graph according to the key words of the historical pushed article comprises:
summarizing the key words of the historical pushed articles;
and constructing the directed cyclic graph by taking the key words of the historical pushed articles as nodes and taking the sequence of the key words of the historical pushed articles as the direction of each node.
5. The method for determining a push-pushed similar article according to claim 2, wherein the obtaining of the candidate article path based on the directed cyclic graph of the historical push article and the candidate article keyword includes:
traversing each word in the candidate article key words in the directed cyclic graph respectively, and judging whether the nodes same as the words can be found or not;
if the same node is found, marking the current word as a first preset identifier and adding the first preset identifier to the result vector, and continuing to find the next word in the candidate article key words backwards by using the current node in the directed cyclic graph;
if the same node is not found, marking the current word as a second preset identifier and adding the second preset identifier to a result vector, continuously searching the current word backwards by using the current node in the directed cyclic graph, and when the step length of traversal of each out degree of the current node is larger than the preset step length and the node which is the same as the current word is not found yet, re-traversing the next word in the key words of the candidate article in the directed cyclic graph;
and obtaining a final result vector after traversing.
6. The method for determining push-pushed similar articles according to claim 5, wherein the calculating the article similarity between the candidate article and the historical push article according to the result vector comprises:
and calculating the proportion of the number of the first preset identifiers in the result vector to the length of the whole result vector based on the final result vector.
7. The method for determining the push-pushing similar article according to claim 2, wherein the method further comprises:
and responding to a comparison result that the article similarity is larger than the preset threshold value, if the candidate article is not similar to the historical pushed article, and using the candidate article key words to construct the directed cyclic graph in a supplementing manner.
8. A push pushing similar article judgment device is characterized by comprising the following components:
the selecting module is used for acquiring the first N articles with the highest matching degree with the user as candidate articles;
the acquisition module is used for acquiring historical pushed articles;
the similarity calculation module is used for calculating the article similarity of each candidate article and the historical pushed articles based on the candidate articles and the historical pushed articles;
and the duplication removing module is used for responding to a comparison result that the article similarity is smaller than or equal to a preset threshold value and removing candidate articles which are duplicated with the historical pushed articles.
9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the push-push similar article determination method according to any one of claims 1 to 7.
10. An electronic device, comprising a memory and a processor, wherein the memory is used for storing a computer executable program, and the processor is used for running the computer executable program to implement the push-push similar article determination method according to any one of claims 1 to 7.
CN202210658173.7A 2022-06-10 2022-06-10 push similar article judgment method and device, storage medium and electronic equipment Pending CN114943036A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210658173.7A CN114943036A (en) 2022-06-10 2022-06-10 push similar article judgment method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210658173.7A CN114943036A (en) 2022-06-10 2022-06-10 push similar article judgment method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN114943036A true CN114943036A (en) 2022-08-26

Family

ID=82909431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210658173.7A Pending CN114943036A (en) 2022-06-10 2022-06-10 push similar article judgment method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN114943036A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292477A (en) * 2022-07-18 2022-11-04 盐城金堤科技有限公司 Method and device for judging pushing similar articles, storage medium and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115292477A (en) * 2022-07-18 2022-11-04 盐城金堤科技有限公司 Method and device for judging pushing similar articles, storage medium and electronic equipment
CN115292477B (en) * 2022-07-18 2024-04-16 盐城天眼察微科技有限公司 Method and device for judging push similar articles, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
US20170116203A1 (en) Method of automated discovery of topic relatedness
CN111797210A (en) Information recommendation method, device and equipment based on user portrait and storage medium
US20130124437A1 (en) Social media user recommendation system and method
CN107766360B (en) Video heat prediction method and device
CN110909182A (en) Multimedia resource searching method and device, computer equipment and storage medium
CN111353862B (en) Commodity recommendation method and device, electronic equipment and storage medium
US11600267B2 (en) Event-based semantic search and retrieval
CN110069713B (en) Personalized recommendation method based on user context perception
Kılınç A spark‐based big data analysis framework for real‐time sentiment prediction on streaming data
CN114780861B (en) Clustering technology-based user multi-interest recommendation method, device, equipment and medium
CN111126060A (en) Method, device and equipment for extracting subject term and storage medium
CN110750615A (en) Text repeatability judgment method and device, electronic equipment and storage medium
CN112231598A (en) Webpage path navigation method and device, electronic equipment and storage medium
CN114943036A (en) push similar article judgment method and device, storage medium and electronic equipment
Chen et al. Graph Meets LLM: A Novel Approach to Collaborative Filtering for Robust Conversational Understanding
CN112395517B (en) House source searching and displaying method and device and computer readable storage medium
CN110750708A (en) Keyword recommendation method and device and electronic equipment
CN116955817A (en) Content recommendation method, device, electronic equipment and storage medium
CN112487181A (en) Keyword determination method and related equipment
CN115292477B (en) Method and device for judging push similar articles, storage medium and electronic equipment
CN111625605B (en) Information synchronization method and device, storage medium and electronic equipment
CN113127639B (en) Abnormal conversation text detection method and device
CN105447020B (en) A kind of method and device of determining business object keyword
CN111324707A (en) User interaction method and device, computer-readable storage medium and electronic equipment
CN112860626A (en) Document sorting method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20230731

Address after: Room 404-405, 504, Building B-17-1, Big data Industrial Park, Kecheng Street, Yannan High tech Zone, Yancheng, Jiangsu Province, 224000

Applicant after: Yancheng Tianyanchawei Technology Co.,Ltd.

Address before: 224000 room 501-503, building b-17-1, Xuehai road big data Industrial Park, Kecheng street, Yannan high tech Zone, Yancheng City, Jiangsu Province

Applicant before: Yancheng Jindi Technology Co.,Ltd.

TA01 Transfer of patent application right