CN114297501A - Text recommendation method, device, equipment and medium - Google Patents

Text recommendation method, device, equipment and medium Download PDF

Info

Publication number
CN114297501A
CN114297501A CN202111649278.8A CN202111649278A CN114297501A CN 114297501 A CN114297501 A CN 114297501A CN 202111649278 A CN202111649278 A CN 202111649278A CN 114297501 A CN114297501 A CN 114297501A
Authority
CN
China
Prior art keywords
text
texts
feature
determining
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111649278.8A
Other languages
Chinese (zh)
Inventor
鄢秋霞
李昱
张圳
李斌
安飞飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111649278.8A priority Critical patent/CN114297501A/en
Publication of CN114297501A publication Critical patent/CN114297501A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a text recommendation method, a text recommendation device, text recommendation equipment and a text recommendation medium, which relate to the technical field of natural language processing, and are characterized in that text features of each text in a plurality of texts are obtained; determining a feature vector of each text according to the text features of each text, and generating feature matrixes of the plurality of texts according to the feature vector of each text; determining sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining similarity information between the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two; and determining a text to be recommended in texts in a preset text library according to the similarity information, and recommending the text to be recommended to a user. By adopting the technical scheme, similar texts can be accurately and quickly recommended to the user.

Description

Text recommendation method, device, equipment and medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a text recommendation method, apparatus, device, and medium.
Background
At present, data information on the network is complex, and a user needs to read a large amount of useless data information when checking required data information. Therefore, how to recommend information similar to the user interest preference for the user is a problem of great public attention.
In the existing method for recommending similar content for a user, the calculation amount of the used content recommendation algorithm is large, so a text recommendation method is urgently needed, and similar texts can be accurately and quickly recommended to the user.
Disclosure of Invention
The application provides a text recommendation method, device, equipment and medium, which can accurately and quickly recommend similar texts to a user.
In a first aspect, the present application provides a text recommendation method, including:
acquiring text characteristics of each text in a plurality of texts;
determining a feature vector of each text according to the text features of each text, and generating feature matrixes of the plurality of texts according to the feature vector of each text;
determining sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining similarity information between the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two;
and determining a text to be recommended in texts in a preset text library according to the similarity information, and recommending the text to be recommended to a user.
In one example, determining a sub-feature matrix of the plurality of texts from the feature matrices of the plurality of texts comprises:
determining a transposed matrix of the feature matrix according to the feature matrices of the plurality of texts;
determining a sub-feature matrix of the feature matrix according to the feature matrices of the texts;
determining a sub-feature matrix of the feature matrix according to the feature matrix of the plurality of texts;
and taking a sub-feature matrix of the feature matrix and a sub-feature matrix of a transposed matrix of the feature matrix as sub-feature matrices of the plurality of texts.
In one example, determining similarity information between the plurality of texts from the sub-feature matrices of the plurality of texts comprises:
determining first cosine similarity information between each sub-feature matrix of each text of the feature matrix and each sub-feature matrix of each text of the transposed matrix of the feature matrix;
determining second cosine similarity information among the texts according to the first cosine similarity information;
and determining similarity information among the texts according to the second cosine similarity information.
In one example, the text features include text content and text labels that characterize feature attributes of the text; if the text features are text contents, determining the feature vector of each text according to the text features of each text, wherein the determining comprises the following steps:
acquiring word information in the text content of each text;
determining word vectors of the word information according to the word information; wherein the word vector characterizes semantic information of the word information;
and determining the central vector of each text according to the word vector, and taking the central vector of each text as the feature vector of each text.
In one example, the text features include text content and text labels that characterize feature attributes of the text; if the text feature is a text label, determining a feature vector of each text according to the text feature of each text, including:
obtaining frequency information and category information in the text label of each text;
and determining the label vector of each text according to the frequency information and the category information, and taking the label vector of each text as the feature vector of each text.
In one example, determining a text to be recommended in texts in a preset text library according to the similarity information includes:
screening out a preset number of texts from a preset text library according to the similarity information;
and determining the text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the preset number and the similarity information of the texts in the residual number.
In a second aspect, the present application provides a text recommendation apparatus based on a text similarity value, the apparatus comprising:
the acquiring unit is used for acquiring text characteristics of each text in the plurality of texts;
the generating unit is used for determining the feature vector of each text according to the text feature of each text and generating the feature matrixes of the plurality of texts according to the feature vector of each text;
the determining unit is used for determining the sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining the similarity information among the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two;
and the recommending unit is used for determining a text to be recommended in the texts in a preset text library according to the similarity information and recommending the text to be recommended to the user.
In one example, a determination unit includes:
a first determining module, configured to determine a transpose matrix of the feature matrix according to the feature matrix of the multiple texts;
the second determination module is used for determining a sub-feature matrix of the feature matrix according to the feature matrices of the texts;
a third determining module, configured to determine, according to a transpose matrix of the feature matrices of the multiple texts, a sub-feature matrix of the transpose matrix of the feature matrices;
a fourth determining module, configured to use a sub-feature matrix of the feature matrix and a sub-feature matrix of a transpose of the feature matrix as the sub-feature matrices of the multiple texts.
In one example, a determination unit includes:
the similarity information determining module is used for determining first cosine similarity information between each sub-feature matrix of each text of the feature matrix and each sub-feature matrix of each text of the transposed matrix of the feature matrix; determining second cosine similarity information among the texts according to the first cosine similarity information; and determining similarity information among the texts according to the second cosine similarity information.
In one example, if the text feature is text content, the generating unit includes:
the first acquisition module is used for acquiring word information in the text content of each text;
the first determining module is used for determining word vectors of the word information according to the word information; wherein the word vector characterizes semantic information of the word information;
and the second determining module is used for determining the central vector of each text according to the word vector, and taking the central vector of each text as the feature vector of each text.
In one example, if the text feature is a text label, the generating unit includes:
the second acquisition module is used for acquiring frequency information and category information in the text label of each text;
and a third determining module, configured to determine, according to the frequency information and the category information, a tag vector of each text, and use the tag vector of each text as a feature vector of each text.
In one example, a recommendation unit includes:
the screening module is used for screening out a preset number of texts from a preset text library according to the similarity information;
and the determining module is used for determining the text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the preset number and the similarity information of the texts in the residual number.
In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions; the processor executes computer-executable instructions stored by the memory to implement the method as described in the first aspect.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for implementing the method as set forth in the first aspect when executed by a processor.
In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.
According to the text recommendation method, the text recommendation device, the text recommendation equipment and the text recommendation medium, the text characteristics of each text in a plurality of texts are obtained; determining a feature vector of each text according to the text features of each text, and generating feature matrixes of the plurality of texts according to the feature vector of each text; determining sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining similarity information between the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two; and determining a text to be recommended in texts in a preset text library according to the similarity information, and recommending the text to be recommended to a user. By adopting the technical scheme, each text can be subjected to blocking processing aiming at the text characteristics of each text and the characteristic vector generated by the text characteristics, and similar texts can be accurately and quickly recommended to the user under the condition of large text data volume.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic flowchart of a text recommendation method according to an embodiment of the present application;
FIG. 2 is a flowchart illustrating a text recommendation method according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of a text recommendation device based on text similarity according to a third embodiment of the present application;
FIG. 4 is a schematic diagram of a text recommendation device based on text similarity values according to a fourth embodiment of the present application;
fig. 5 is a block diagram illustrating a terminal device according to an example embodiment.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The application provides a text recommendation method, a text recommendation device, text recommendation equipment and a text recommendation medium, and aims to solve the technical problems in the prior art.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 1 is a flowchart illustrating a text recommendation method according to an embodiment of the present application. The first embodiment comprises the following steps:
s101, acquiring text characteristics of each text in the plurality of texts.
Illustratively, the text features include text content and text labels, the text content is composed of a plurality of words in the text, and after the text content is obtained, the text content is subjected to word segmentation and word removal. The word removing comprises removing stop words, adverbs, auxiliary words, punctuation marks, prepositions and partial conjunctions, obtaining effective words after word segmentation and word removing, and forming text characteristics by the effective words. The text labels are composed of a plurality of layered labels, labels of different levels are in a relationship, and the labels are independent in pairs. For example, a layered label may be divided into three layers; wherein, the first layer label can be finance, stock, internet finance, trust, entertainment, movie, TV play and European and American star; the second layer of labels are high-income stocks, stable stocks and movies reproduced by novels; the third level is stock code 000000, stock code 000001 and Jane Eyre.
S102, determining a feature vector of each text according to the text features of each text, and generating feature matrixes of a plurality of texts according to the feature vector of each text.
In this embodiment, the process of text feature being text content and the process of text feature being a feature vector of each text determined by the text label are different.
Specifically, when the text feature is the text content, the algorithm that can be adopted by the determined feature vector of each text is Word2 vec; when the text features are text labels, the algorithm that can be used for determining the feature vector of each text is tag2 vec.
After the feature vector of each text is obtained, the feature vectors of each text may be combined to obtain a feature matrix of a plurality of texts. For example, assume that the number of texts is N, and the feature vector of each text is miWherein i ranges from 1 to N. Assume feature vector is miD, the feature matrix of the N texts is M e RN×d
S103, determining sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining similarity information among the texts according to the sub-feature matrixes of the texts; wherein the number of sub-feature matrices is at least two.
In this embodiment, the sub-feature matrices of the plurality of texts are determined according to the feature matrices of the plurality of texts and the preset dimension, for example, the feature matrices of the N texts are M e RN×dThe sub-feature matrix may be Mi, where i ranges from 1-k1, then
Figure BDA0003444515300000061
After the sub-feature matrix is obtained, similarity information between the plurality of texts may be calculated according to the sub-feature matrix. Further, the number of sub-feature matrices is the same as the number of preset dimensions.
And S104, determining a text to be recommended in the texts in the preset text library according to the similarity information, and recommending the text to be recommended to the user.
In the embodiment, a first preset number of texts are screened out from a preset text library according to the similarity information;
determining a text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the second preset number and the similarity information of the texts in the third preset number; wherein the sum of the second preset number and the third preset number is equal to the first preset number.
Exemplarily, when a text to be recommended is recommended to a user, screening out a first preset number of texts in a preset text library according to similarity information; if the number of the texts in the preset text library is N, screening N × k texts with a first preset number, and if the number of the texts with a second preset number is k, calculating similarity information of the texts with the second preset number, wherein the adopted algorithm is wmd algorithm. The algorithm used for calculating the similarity information of the third preset number of texts, namely the similarity information of n × k-k texts, is the rwmd algorithm.
Further, the wmd algorithm is a new method for measuring text similarity, words in two texts are mapped to the embedding space by using the word2vec algorithm, each word in the D1 can find a word correspondence in the D2, that is, the distance of each pair of words in the embedding space is found, and the minimum value of the sum of the distances of all the word pairs is wmd, in the D1 and the D2 of the two texts.
The mathematical formula is as follows:
Figure BDA0003444515300000062
Figure BDA0003444515300000063
wherein the content of the first and second substances,
Figure BDA0003444515300000064
the weight of the word i in the text d, ci the word frequency of the word i in the text d, c (i, j) the word i, j travel cost, c (i, j) | | xi-xj||,xi,xjThe word vectors after the word i, the jembedding, respectively. wmd the time complexity of the calculation is O (P)3logP), where P is the number of non-repeating words in the text.
Specifically, the calculation process of the pruning wmd algorithm is as follows:
the time complexity due to wmd computational efficiency is 0 (P)3logP), where P is the total number of words. The pruned wmd algorithm is used in this embodiment to filter out the k most similar texts for each text from the full amount of text. rwmd based on the wmd objective function, each removes one of the two constraints, then solves for the minimum, and uses the maximum of the two minima as the approximate value of wmd. Wmd needs to be calculated twice.
For example, removing the second constraint, the problem becomes:
Figure BDA0003444515300000071
Figure BDA0003444515300000072
obviously, the optimal solution to this problem becomes:
for one word in the text D1, the word closest to the word in the other text D2 is found, and all the words are transferred to, namely:
Figure BDA0003444515300000073
use of1(d,d′),l2(d, d') respectively representing the minimum values calculated by removing different constraints, wherein the final minimum value of rwmd is lr(d,d′)=max(l1(d,d′),l2(d, d')), wherein rwmd is calculated with a temporal complexity of O (P)2). rwmd is closer wmd than the cosine distance of the text center vector.
The specific process is as follows:
according to the similarity information, screening N x k most similar texts from the full text N, wherein N is a hyper-parameter; wmd of the text content and wmd of the text label of the first k texts are respectively calculated for each text according to the formula
Figure BDA0003444515300000074
Figure BDA0003444515300000075
And obtaining the similarity information of the text content wmd and the similarity information of the text label wmd, obtaining the similarity information of the texts in a second preset number by weighted average according to the text content and the wmd similarity information of the text label, and taking the texts in the second preset number as the KNN list of each text.
Each text respectively calculates rwmd of the rest n x K-K text contents and rwmd of the text labels according to a formula
Figure BDA0003444515300000076
And obtaining the similarity of the text content wmd and the similarity of the text label wmd, and obtaining the similarity information of the third preset number of texts by weighted average according to the similarity information of the text content and the text label.
And determining the text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the second preset number and the similarity information of the texts in the third preset number.
In this embodiment, the similarity information of the second preset number of texts is compared with the similarity information of the third preset number of texts, if the similarity information of the third preset number of texts is smaller than the similarity information of the second preset number of texts in the KNN list, the similarity information is excluded, otherwise wmd of the text content and wmd of the text label are respectively calculated to obtain the similarity information of the text content wmd and the similarity information of the text label wmd, the final similarity information is obtained by weighted average according to the similarity information of the text content and the text label, and if the similarity information of the third preset number of texts is larger than the similarity information of the second preset number of texts in the KNN list, the KNN list is updated.
According to the text recommendation method, the text characteristics of each text in a plurality of texts are obtained; determining a feature vector of the text according to the text features of the text, and generating a plurality of feature matrixes of the text according to the feature vector of the text; determining similarity information among the plurality of texts according to the feature matrix of each text in the plurality of texts; and determining a text to be recommended in the texts in the preset text library according to the similarity information, and recommending the text to be recommended to the user. By adopting the technical scheme, the calculation efficiency of the similarity information among a plurality of texts can be improved under the conditions of large text data volume and limited CPU resources, the original time complexity is reduced, and then similar texts can be accurately and quickly recommended to the user.
Fig. 2 is a flowchart illustrating a text recommendation method according to a second embodiment of the present application. The second embodiment comprises the following steps:
s201, acquiring text characteristics of each text in the plurality of texts.
For example, this step may refer to step S101 described above, and is not described again.
S202, determining a feature vector of each text according to the text features of each text, and generating a feature matrix of a plurality of texts according to the feature vector of each text.
In this embodiment, optionally, the text features include text content and text labels, and the text labels represent feature attributes of the text; if the text features are text contents, determining a feature vector of each text according to the text features of each text, wherein the determining comprises the following steps: acquiring word information in the text content of each text; determining word vectors of the word information according to the word information; the word vector represents semantic information of word information; and determining a central vector of each text according to the word vectors, and taking the central vector of each text as a feature vector of each text.
In this embodiment, it is assumed that after the text d is subjected to word segmentation and word removal, the remaining valid words are w1,w2…,wnThe word frequency corresponding to each word is c1,c2…,cnThe text d can be represented as [ d ] through a normalized bag-of-words model1,d2…,dn]Wherein
Figure BDA0003444515300000081
Is the weight of the word i in one text,where ci represents the number of times the word i appears in the text d and the denominator represents the total number of words (after de-wording) of the text. The weight of each word in the text d can be obtained through word segmentation, word removal and normalized word bag model. A language model is built by utilizing a word2vec technology, words are mapped into a mathematical space, and word embedding is formed, and the formed word embedding has rich semantic information. Combining the normalized word bag model and the word embedding to vectorize the text, wherein the specific combination mode is as follows:
assuming that the word vector of the word i is xi and di is the normalized word frequency of the word i obtained in step 2,
Figure BDA0003444515300000082
and calculating the similarity of every two cosines of the full text according to the cosine similarity distance, wherein the cosine similarity distance is as follows:
Figure BDA0003444515300000083
in this embodiment, optionally, the text features include text content and text labels, and the text labels represent feature attributes of the text; if the text features are text labels, determining a feature vector of each text according to the text features of each text, wherein the determining comprises the following steps:
obtaining frequency information and category information in a text label of each text;
and determining a label vector of each text according to the frequency information and the category information, and taking the label vector of each text as a feature vector of each text.
In this embodiment, the frequencies of all the tags of the text are counted and normalized, and the specific method is similar to the text content word frequency normalization technology. Let text d all labels be t1,t2…,tnThe frequency corresponding to each tag is c1,c2…,cnText d can be represented as [ d ] through a normalized label1,d2…,dn]Wherein
Figure BDA0003444515300000091
Is the weight of tag i in text d, where ciAnd representing the frequency of the label i in the text d (the frequency is different and mainly represents an entity layer, and the frequency of the label is 1 in a subject layer and a concept layer), wherein the entity layer, the subject layer and the concept layer are three levels of the label, and the hierarchical relationship among the three levels is that the subject layer is greater than the concept layer is greater than the entity layer. The denominator represents the total number of tags of the text. Acquiring embedding of all labels of the text from tag embedding, and carrying out weighted average with normalized label frequency and label weight to obtain a label vector of the text. The specific calculation method is as follows: assume that the vector of label i is xi,diTo get normalized frequency of tag i, wiThe weight of label i, the text label vector, can be represented as:
Figure BDA0003444515300000092
and calculating the similarity of two cosine of the full text label according to the cosine similarity distance. Wherein, the cosine similarity distance is:
Figure BDA0003444515300000093
s203, determining a transposed matrix of the feature matrix according to the feature matrices of the plurality of texts; determining a sub-feature matrix of the feature matrix according to the feature matrices of the plurality of texts; determining a sub-feature matrix of the feature matrix according to the feature matrix of the plurality of texts; and taking the sub-feature matrix of the feature matrix and the sub-feature matrix of the transposed matrix of the feature matrix as the sub-feature matrices of the plurality of texts.
In this embodiment, assume that there are N texts, each text vector of which is m1Where the vector dimension is d, m1,…,mNSplicing into a large matrix M ∈ RN×dIf the CPU resource is large enough, the similarity calculation of two-phase cosines of N texts can directly multiply by a large matrix of the N textsThe transposition of a large matrix of N texts is specifically shown as the following formula:
simi=M*Mt
wherein simi ∈ RN×NAnd simi (k, q) represents the cosine similarity between the text k and the text q, and the time complexity is O (1).
Considering the situations of large text data volume and limited CPU resources, in order to improve the calculation efficiency, the M matrix is uniformly divided into k1 blocks according to the column sequence by block matrix multiplication, and then a matrix consisting of a plurality of sub-feature matrices is obtained:
Figure BDA0003444515300000094
will MtThe matrix is evenly divided into k2 blocks according to the sequence of columns, and the sub-feature matrix of the transpose matrix of the feature matrix is obtained
Figure BDA0003444515300000095
S204, determining similarity information among the texts according to the sub-feature matrixes of the texts, wherein the similarity information includes: determining first cosine similarity information between each sub-feature matrix of each text of the feature matrix and each sub-feature matrix of each text of a transposed matrix of the feature matrix; determining second cosine similarity information among the texts according to the first cosine similarity information; and determining similarity information among the plurality of texts according to the second cosine similarity information.
In this embodiment, the similarity of the whole text with two cosine is calculated and converted into all blocks of the M matrix and MtMultiplying all block matrixes of the matrix by each other with the time complexity of O (k)1*k2) Wherein k is1<<N,k2N. The specific calculation method refers to the following pseudo code. The first layer is cycled for k1 times, the second layer is cycled for k2 times, and each calculation result is (i, j) block cosine similarity result:
Figure BDA0003444515300000101
wherein
Figure BDA0003444515300000102
Text represented by simi (i, j) (k, q)
Figure BDA0003444515300000103
Cosine similarity, where k1, k2 are hyper-parameters, can be adjusted. Where sima (i, j) is the first cosine similarity information.
S205, determining the text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the preset number and the similarity information of the texts in the residual number.
For example, this step may refer to step S104 described above, and is not described again.
According to the text recommendation method, the modified wmd algorithm is used for calculating the text label wmd similarity information and the text content wmd similarity information, the text content semantic distance and the text label semantic distance are fully utilized to measure the text similarity, and the accuracy of content-associated text recommendation is improved.
Fig. 3 is a schematic diagram of a text recommendation device based on a text similarity value according to a third embodiment of the present application. The apparatus 30 in the third embodiment includes:
an obtaining unit 301, configured to obtain a text feature of each text in the plurality of texts.
The generating unit 302 is configured to determine a feature vector of each text according to the text feature of each text, and generate a feature matrix of a plurality of texts according to the feature vector of each text.
A determining unit 303, configured to determine sub-feature matrices of the multiple texts according to the feature matrices of the multiple texts, and determine similarity information between the multiple texts according to the sub-feature matrices of the multiple texts; wherein the number of sub-feature matrices is at least two.
And the recommending unit 304 is configured to determine a text to be recommended in the texts in the preset text library according to the similarity information, and recommend the text to be recommended to the user.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described apparatus may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Fig. 4 is a schematic diagram of a text recommendation device based on a text similarity value according to a fourth embodiment of the present application. The apparatus 40 in the fourth embodiment includes:
an obtaining unit 401 is configured to obtain a text feature of each text in the plurality of texts.
The generating unit 402 is configured to determine a feature vector of each text according to the text feature of each text, and generate a feature matrix of a plurality of texts according to the feature vector of each text.
A determining unit 403, configured to determine sub-feature matrices of the multiple texts according to the feature matrices of the multiple texts, and determine similarity information between the multiple texts according to the sub-feature matrices of the multiple texts; wherein the number of sub-feature matrices is at least two.
And the recommending unit 404 is configured to determine a text to be recommended in the texts in the preset text library according to the similarity information, and recommend the text to be recommended to the user.
In one example, the determining unit 403 includes:
a first determining module 4031, configured to determine a transpose matrix of a feature matrix according to the feature matrix of the multiple texts.
A second determining module 4032, configured to determine a sub-feature matrix of the feature matrix according to the feature matrices of the multiple texts.
A third determining module 4033, configured to determine a sub-feature matrix of the transpose matrix of the feature matrix according to the transpose matrix of the feature matrices of the multiple texts.
A fourth determining module 4034, configured to use the sub-feature matrix of the feature matrix and the sub-feature matrix of the transpose of the feature matrix as sub-feature matrices of the multiple texts.
In one example, the determining unit 403 includes:
a similarity information determining module 4035, configured to determine, according to each sub-feature matrix of each text of the feature matrix and each sub-feature matrix of each text of the transposed matrix of the feature matrix, first cosine similarity information between the two sub-feature matrices; determining second cosine similarity information among the texts according to the first cosine similarity information; and determining similarity information among the plurality of texts according to the second cosine similarity information.
In an example, if the text feature is text content, the generating unit 402 includes:
a first obtaining module 4021, configured to obtain word information in text content of each text;
the first determining module 4022 is configured to determine a word vector of the word information according to the word information; the word vector represents semantic information of word information;
the second determining module 4023 is configured to determine a center vector of each text according to the word vectors, and use the center vector of each text as a feature vector of each text.
In an example, if the text feature is a text label, the generating unit 402 includes:
a second obtaining module 4024, configured to obtain frequency information and category information in a text tag of each text;
the third determining module 4025 is configured to determine a tag vector of each text according to the frequency information and the category information, and use the tag vector of each text as a feature vector of each text.
In one example, the recommending unit 404 includes:
the screening module 4041 is configured to screen a preset number of texts from a preset text library according to the similarity information;
the determining module 4042 is configured to determine a text to be recommended in the texts in the preset text library according to a relationship between the similarity information of the preset number of texts and the similarity information of the remaining number of texts.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the above-described apparatus may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.
Fig. 5 is a block diagram illustrating a terminal device, which may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, etc., according to one exemplary embodiment.
The apparatus 500 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 510, an input/output (I/O) interface 512, a sensor component 514, and a communication component 516.
The processing component 502 generally controls overall operation of the device 500, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 502 may include one or more processors 520 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 502 can include one or more modules that facilitate interaction between the processing component 502 and other components. For example, the processing component 502 can include a multimedia module to facilitate interaction between the multimedia component 508 and the processing component 502.
The memory 504 is configured to store various types of data to support operations at the apparatus 500. Examples of such data include instructions for any application or method operating on device 500, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 504 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.
The power supply component 506 provides power to the various components of the device 500. The power components 506 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 500.
The multimedia component 508 includes a screen that provides an output interface between the device 500 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 508 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 500 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.
The audio component 510 is configured to output and/or input audio signals. For example, audio component 510 includes a Microphone (MIC) configured to receive external audio signals when apparatus 500 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 504 or transmitted via the communication component 516. In some embodiments, audio component 510 further includes a speaker for outputting audio signals.
The I/O interface 512 provides an interface between the processing component 502 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.
The sensor assembly 514 includes one or more sensors for providing various aspects of status assessment for the device 500. For example, the sensor assembly 514 may detect an open/closed state of the apparatus 500, the relative positioning of the components, such as a display and keypad of the apparatus 500, the sensor assembly 514 may also detect a change in the position of the apparatus 500 or a component of the apparatus 500, the presence or absence of user contact with the apparatus 500, orientation or acceleration/deceleration of the apparatus 500, and a change in the temperature of the apparatus 500. The sensor assembly 514 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 514 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 514 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 516 is configured to facilitate communication between the apparatus 500 and other devices in a wired or wireless manner. The apparatus 500 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 516 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 516 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 500 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 504 comprising instructions, executable by the processor 520 of the apparatus 500 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of a terminal device, enable the terminal device to perform a text recommendation method based on a text similarity value of the terminal device.
The application also discloses a computer program product comprising a computer program which, when executed by a processor, implements the method as described in the embodiments.
Various implementations of the systems and techniques described here above may be realized in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present application may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or electronic device.
In the context of this application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data electronic device), or that includes a middleware component (e.g., an application electronic device), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include a client and an electronic device. The client and the electronic device are generally remote from each other and typically interact through a communication network. The relationship of client and electronic device arises by virtue of computer programs running on the respective computers and having a client-electronic device relationship to each other. The electronic device may be a cloud electronic device, which is also called a cloud computing electronic device or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The electronic device may also be a distributed system of electronic devices or an electronic device incorporating a blockchain. It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for text recommendation, the method comprising:
acquiring text characteristics of each text in a plurality of texts;
determining a feature vector of each text according to the text features of each text, and generating feature matrixes of the plurality of texts according to the feature vector of each text;
determining sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining similarity information between the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two;
and determining a text to be recommended in texts in a preset text library according to the similarity information, and recommending the text to be recommended to a user.
2. The method of claim 1, wherein determining the sub-feature matrices for the plurality of texts from the feature matrices for the plurality of texts comprises:
determining a transposed matrix of the feature matrix according to the feature matrices of the plurality of texts;
determining a sub-feature matrix of the feature matrix according to the feature matrices of the texts;
determining a sub-feature matrix of the feature matrix according to the feature matrix of the plurality of texts;
and taking a sub-feature matrix of the feature matrix and a sub-feature matrix of a transposed matrix of the feature matrix as sub-feature matrices of the plurality of texts.
3. The method of claim 2, wherein determining similarity information between the plurality of texts from the sub-feature matrices of the plurality of texts comprises:
determining first cosine similarity information between each sub-feature matrix of each text of the feature matrix and each sub-feature matrix of each text of the transposed matrix of the feature matrix;
determining second cosine similarity information among the texts according to the first cosine similarity information;
and determining similarity information among the texts according to the second cosine similarity information.
4. The method of claim 1, wherein the text features include text content and text labels, the text labels characterizing feature attributes of the text; if the text features are text contents, determining the feature vector of each text according to the text features of each text, wherein the determining comprises the following steps:
acquiring word information in the text content of each text;
determining word vectors of the word information according to the word information; wherein the word vector characterizes semantic information of the word information;
and determining the central vector of each text according to the word vector, and taking the central vector of each text as the feature vector of each text.
5. The method of claim 1, wherein the text features include text content and text labels, the text labels characterizing feature attributes of the text; if the text feature is a text label, determining a feature vector of each text according to the text feature of each text, including:
obtaining frequency information and category information in the text label of each text;
and determining the label vector of each text according to the frequency information and the category information, and taking the label vector of each text as the feature vector of each text.
6. The method according to claim 1, wherein determining a text to be recommended in the texts in a preset text library according to the similarity information comprises:
screening out a first preset number of texts from a preset text library according to the similarity information;
determining a text to be recommended in the texts in the preset text library according to the relationship between the similarity information of the texts in the second preset number and the similarity information of the texts in the third preset number; wherein a sum of the second preset number and the third preset number is equal to the first preset number.
7. A text recommendation apparatus, characterized in that the apparatus comprises:
the acquiring unit is used for acquiring text characteristics of each text in the plurality of texts;
the generating unit is used for determining the feature vector of each text according to the text feature of each text and generating the feature matrixes of the plurality of texts according to the feature vector of each text;
the determining unit is used for determining the sub-feature matrixes of the texts according to the feature matrixes of the texts, and determining the similarity information among the texts according to the sub-feature matrixes of the texts; wherein the number of the sub-feature matrices is at least two;
and the recommending unit is used for determining a text to be recommended in the texts in a preset text library according to the similarity information and recommending the text to be recommended to the user.
8. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-6.
9. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1-6.
10. A computer program product, comprising a computer program which, when executed by a processor, implements the method of any one of claims 1-6.
CN202111649278.8A 2021-12-29 2021-12-29 Text recommendation method, device, equipment and medium Pending CN114297501A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111649278.8A CN114297501A (en) 2021-12-29 2021-12-29 Text recommendation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111649278.8A CN114297501A (en) 2021-12-29 2021-12-29 Text recommendation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114297501A true CN114297501A (en) 2022-04-08

Family

ID=80974119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111649278.8A Pending CN114297501A (en) 2021-12-29 2021-12-29 Text recommendation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114297501A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069121A (en) * 2015-08-12 2015-11-18 北京暴风科技股份有限公司 Video pushing method based on video theme similarity
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069121A (en) * 2015-08-12 2015-11-18 北京暴风科技股份有限公司 Video pushing method based on video theme similarity
CN108710613A (en) * 2018-05-22 2018-10-26 平安科技(深圳)有限公司 Acquisition methods, terminal device and the medium of text similarity
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine
CN111554268A (en) * 2020-07-13 2020-08-18 腾讯科技(深圳)有限公司 Language identification method based on language model, text classification method and device

Similar Documents

Publication Publication Date Title
KR102632647B1 (en) Methods and devices, electronic devices, and memory media for detecting face and hand relationships
US11120078B2 (en) Method and device for video processing, electronic device, and storage medium
EP3179379A1 (en) Method and apparatus for determining similarity and terminal therefor
CN107621886B (en) Input recommendation method and device and electronic equipment
CN109993627B (en) Recommendation method, recommendation model training device and storage medium
EP3958110B1 (en) Speech control method and apparatus, terminal device, and storage medium
CN111489155B (en) Data processing method and device for data processing
CN110232181B (en) Comment analysis method and device
CN114298227A (en) Text duplicate removal method, device, equipment and medium
CN111753539B (en) Method and device for identifying sensitive text
CN110297970B (en) Information recommendation model training method and device
CN112328809A (en) Entity classification method, device and computer readable storage medium
CN110147426B (en) Method for determining classification label of query text and related device
CN114297501A (en) Text recommendation method, device, equipment and medium
CN115687303A (en) Data information migration method, device, equipment and storage medium
CN115658063A (en) Page information generation method, device, equipment and storage medium
CN114090738A (en) Method, device and equipment for determining scene data information and storage medium
CN113256379A (en) Method for correlating shopping demands for commodities
CN108241438B (en) Input method, input device and input device
CN110019657B (en) Processing method, apparatus and machine-readable medium
CN111061633A (en) Method, device, terminal and medium for detecting first screen time of webpage
CN112651221A (en) Data processing method and device and data processing device
CN113157703B (en) Data query method and device, electronic equipment and storage medium
CN113741783B (en) Key identification method and device for identifying keys
CN112989172B (en) Content recommendation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination