CN111143510A - Searching method based on latent semantic analysis model - Google Patents

Searching method based on latent semantic analysis model Download PDF

Info

Publication number
CN111143510A
CN111143510A CN201911270265.2A CN201911270265A CN111143510A CN 111143510 A CN111143510 A CN 111143510A CN 201911270265 A CN201911270265 A CN 201911270265A CN 111143510 A CN111143510 A CN 111143510A
Authority
CN
China
Prior art keywords
matrix
search
semantic analysis
analysis model
collaborative filtering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911270265.2A
Other languages
Chinese (zh)
Inventor
魏焱
杜斌
邓旭阳
易仕敏
陈东
陈海涵
廖鹏
何杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Original Assignee
Guangdong Power Grid Co Ltd
Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Power Grid Co Ltd, Electric Power Dispatch Control Center of Guangdong Power Grid Co Ltd filed Critical Guangdong Power Grid Co Ltd
Priority to CN201911270265.2A priority Critical patent/CN111143510A/en
Publication of CN111143510A publication Critical patent/CN111143510A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a searching method based on a potential semantic analysis model, which comprises the following steps that character contents to be searched are received in an input frame, the searched contents are two or more than two sentences, adjectives in the searched contents are removed, then the searched contents are subjected to characteristic division, possibly-related characteristics in the contexts are captured by a GloVe model, designated meanings are juxtaposed, and the searched contents input by a user are subjected to group division by adopting a collaborative filtering mode; the method and the device can perform search to assist in judgment by judging the context, so that the accuracy of the search is improved.

Description

Searching method based on latent semantic analysis model
Technical Field
The invention belongs to the technical field of search, and particularly relates to a search method based on a potential semantic analysis model.
Background
The existing search algorithm decomposes sentences into 'a series of words', analyzes meanings of important words, and accordingly provides simple local search results, and because the model generates a single 'word embedding' representation for each word in a vocabulary table without auxiliary judgment according to context, the 'bank' may have the same representations in the 'bank destination' and the 'river bank', and the search results are confused, so that a search method based on a potential semantic analysis model is provided.
Disclosure of Invention
The invention aims to provide a searching method based on a latent semantic analysis model so as to solve the problems in the background technology.
In order to achieve the purpose, the invention provides the following technical scheme: the searching method based on the latent semantic analysis model comprises the following steps:
s1, receiving the text content to be searched in the input box, wherein the searched content is two or more sentences;
s2, removing adjectives in the search content;
s3, carrying out feature division on the search content, capturing possibly related features in the context by the GloVe model, and juxtaposing the specified meanings;
and S4, performing group division on the search content input by the user by adopting a collaborative filtering mode.
Preferably, the GloVe model is a logarithmic bilinear model with a weighted least squares objective, the main intuition of which is a simple observation, i.e. the ratio of word-word co-occurrence probabilities is presented in coded form.
Preferably, in step S4, the collaborative filtering discovers the preferences of the user by mining the historical search behavior data of the user, groups the user based on different preferences and recommends the content with similar tastes, and the collaborative filtering recommendation algorithm is divided into two categories, which are a collaborative filtering algorithm based on the user and a collaborative filtering algorithm based on the goods.
Preferably, after the above steps are completed, SVD singular value decomposition is performed.
Preferably, the SVD singular value decomposition is matrix decomposition of the search inner diameter, and the erection matrix a is an m × n matrix, so we define the SVD of the matrix a as:
A=U∑VT
where U is a matrix of m x m and is a matrix of m x n, all 0 except the elements on the main diagonal, each element on the main diagonal being called a singular value, and V is a matrix of n x n. U and V are both west matrices, i.e. UTU ═ I and VTV ═ I are satisfied.
Compared with the prior art, the invention has the beneficial effects that:
the method and the device can perform search to assist in judgment by judging the context, so that the accuracy of the search is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic view of the present invention;
FIG. 2 is a graph of the results of an overview of the model calculations of the present invention;
FIG. 3 is a schematic diagram of the relationship of characters of the present invention;
FIG. 4 is a schematic diagram of the relationship between a city and a city zip code of the present invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "vertical", "upper", "lower", "horizontal", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present invention.
In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly and may, for example, be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; either directly or indirectly through intervening media, or through the communication between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
Referring to fig. 1-2, the present invention provides a technical solution: the searching method based on the latent semantic analysis model comprises the following steps:
s1, receiving the text content to be searched in the input box, wherein the searched content is two or more sentences;
s2, removing adjectives in the search content;
s3, then performing feature segmentation on the search content, and the GloVe model captures features that may be associated in context, juxtaposing the assigned meanings, for contextual understanding, e.g., considering co-occurrence probabilities of the target words "ice" and "steam" with various probe words in the vocabulary, ice appears more frequently in solid and steam in gas than in solid, both words often appear with water they share, while both words appear less frequently with irrelevant words, as shown in fig. 2, only in the ratio of probabilities noise from non-discriminative words such as water and fashion can cancel out, so that large values (much larger than 1) correlate well with ice-specific attributes and small values (much smaller than 1) correlate well with ice-specific attributes, characteristic of steam, so that probability ratio encodes some rough form of meaning associated with abstract concepts of thermodynamic phase, when features with relationships, such as "ice" and "fixed" or city name and corresponding city zip code, appear in the context, it may be convenient to further narrow the search results;
and S4, performing group division on the search content input by the user by adopting a collaborative filtering mode.
In this embodiment, it is preferable that the GloVe model is a logarithmic bilinear model with weighted least squares objective, the main intuition of the model is simple observation, i.e. the ratio of word-word co-occurrence probability is presented in the form of coded form, GloVe is for capturing subtle differences that distinguish related words in a quantitative way, such as men and women, for the model it is necessary to associate more than one number to word pairs, the vector difference between two word vectors is a natural and simple candidate for a larger set of discriminators, the GloVe is designed to capture as many meanings as possible as two words are specified in parallel, as shown in fig. 3 and 4;
the basic concept that can distinguish men from women, i.e. gender or gender, can be equivalently specified in various other word pairs, such as king and queen or siblings, and in order to mathematically state this observation we can expect that the vector differences between men (women), queen (queen) and siblings (sisters) may be roughly equal, but contextual decisions can be made by the relationship between them.
In this embodiment, preferably, in the step S4, the collaborative filtering discovers the preference of the user by mining the historical search behavior data of the user, groups the user based on different preferences and recommends the content with similar taste, and the collaborative filtering recommendation algorithm is divided into two categories, which are respectively a collaborative filtering algorithm based on the user and a collaborative filtering algorithm based on the goods.
The collaborative filtering algorithm based on the users finds out the likes (such as release, praise, content comment or share) of the articles by the users through the historical behavior data of the users, and measures and scores the likes. And calculating the relationship between the users according to attitudes and preference degrees of different users to the same article. Recommendations are made among users having the same preferences.
The relationship between the articles is obtained by calculating the scores of different users to different articles. And recommending similar articles for the user based on the relationship among the articles.
In this embodiment, after the above steps are completed, SVD singular value decomposition is preferably performed.
In this embodiment, preferably, the SVD singular value decomposition is a matrix decomposition of the search inner diameter, and the erection matrix a is an m × n matrix, so we define the SVD of the matrix a as:
A=U∑VT
where U is a matrix of m x m and is a matrix of m x n, all 0 except the elements on the main diagonal, each element on the main diagonal being called a singular value, and V is a matrix of n x n. U and V are western matrixes, namely UTU is equal to I, and VTV is equal to I;
the three matrices have very clear physical meanings, each row in the first matrix X represents a class of words with related meanings, each non-zero element of the matrix X represents the importance (or relevance) of each word in the class, and the larger the value, the more relevant the value. Each column in the last matrix Y represents the same topic type of articles, wherein each element represents the correlation of each article in the articles, and the middle matrix represents the correlation between the class words and the article types, so that only one singular value decomposition is carried out on the incidence matrix A, W can simultaneously complete the classification of the near meaning words and the classification of the articles.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (5)

1. The searching method based on the latent semantic analysis model is characterized by comprising the following steps of:
s1, receiving the text content to be searched in the input box, wherein the searched content is two or more sentences;
s2, removing adjectives in the search content;
s3, carrying out feature division on the search content, capturing possibly related features in the context by the GloVe model, and juxtaposing the specified meanings;
and S4, performing group division on the search content input by the user by adopting a collaborative filtering mode.
2. The latent semantic analysis model-based search method of claim 1, wherein: the GloVe model is a logarithmic bilinear model with a weighted least squares objective, the main intuition of which is a simple observation, i.e. the ratio of word-word co-occurrence probabilities is presented in coded form.
3. The latent semantic analysis model-based search method of claim 1, wherein: in step S4, collaborative filtering finds preferences of users by mining historical search behavior data of the users, groups the users are divided based on different preferences and content with similar tastes is recommended, and collaborative filtering recommendation algorithms are divided into two categories, namely, a collaborative filtering algorithm based on users and a collaborative filtering algorithm based on articles.
4. The latent semantic analysis model-based search method of claim 1, wherein: after the steps are completed, SVD singular value decomposition is carried out.
5. The latent semantic analysis model-based search method of claim 4, wherein: the SVD is to perform matrix decomposition on the inner diameter of the search, the erection matrix A is an m x n matrix, and then the SVD of the matrix A is defined as:
A=U∑VT
where U is a matrix of m x m and is a matrix of m x n, all 0 except the elements on the main diagonal, each element on the main diagonal being called a singular value, and V is a matrix of n x n. U and V are both west matrices, i.e. UTU ═ I and VTV ═ I are satisfied.
CN201911270265.2A 2019-12-10 2019-12-10 Searching method based on latent semantic analysis model Pending CN111143510A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911270265.2A CN111143510A (en) 2019-12-10 2019-12-10 Searching method based on latent semantic analysis model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911270265.2A CN111143510A (en) 2019-12-10 2019-12-10 Searching method based on latent semantic analysis model

Publications (1)

Publication Number Publication Date
CN111143510A true CN111143510A (en) 2020-05-12

Family

ID=70518515

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911270265.2A Pending CN111143510A (en) 2019-12-10 2019-12-10 Searching method based on latent semantic analysis model

Country Status (1)

Country Link
CN (1) CN111143510A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341188A (en) * 2017-06-08 2017-11-10 广州市呼百应网络技术股份有限公司 Efficient data screening technique based on semantic analysis
CN107577799A (en) * 2017-09-21 2018-01-12 合肥集知网知识产权运营有限公司 A kind of big data patent retrieval method based on potential applications retrieval model
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature
CN110348497A (en) * 2019-06-28 2019-10-18 西安理工大学 A kind of document representation method based on the building of WT-GloVe term vector
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341188A (en) * 2017-06-08 2017-11-10 广州市呼百应网络技术股份有限公司 Efficient data screening technique based on semantic analysis
CN107577799A (en) * 2017-09-21 2018-01-12 合肥集知网知识产权运营有限公司 A kind of big data patent retrieval method based on potential applications retrieval model
CN108399163A (en) * 2018-03-21 2018-08-14 北京理工大学 Bluebeard compound polymerize the text similarity measure with word combination semantic feature
CN110516033A (en) * 2018-05-04 2019-11-29 北京京东尚科信息技术有限公司 A kind of method and apparatus calculating user preference
CN110348497A (en) * 2019-06-28 2019-10-18 西安理工大学 A kind of document representation method based on the building of WT-GloVe term vector

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
陈珍锐,等: "基于GloVe模型的词向量改进方法" *
黄雯媛: "基于情境信息和神经网络的个性化推荐算法" *

Similar Documents

Publication Publication Date Title
CN111061856B (en) Knowledge perception-based news recommendation method
CN110750656B (en) Multimedia detection method based on knowledge graph
CN106021364B (en) Foundation, image searching method and the device of picture searching dependency prediction model
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
CN112699246B (en) Domain knowledge pushing method based on knowledge graph
CN110795571B (en) Cultural travel resource recommendation method based on deep learning and knowledge graph
CN111062775A (en) Recommendation system recall method based on attention mechanism
CN111222332A (en) Commodity recommendation method combining attention network and user emotion
US11443468B2 (en) Training and using an ensemble of complimentary convolutional neural networks for cross-domain retrieval of fashion item images
CN114332680A (en) Image processing method, video searching method, image processing device, video searching device, computer equipment and storage medium
CN112966091A (en) Knowledge graph recommendation system fusing entity information and heat
CN111667337A (en) Commodity evaluation ordering method and system
CN113850649A (en) Customized recommendation method and recommendation system based on multi-platform user data
CN112800225B (en) Microblog comment emotion classification method and system
CN111666496A (en) Group recommendation method based on comment text
CN108734159A (en) The detection method and system of sensitive information in a kind of image
CN111160130A (en) Multi-dimensional collision recognition method for multi-platform virtual identity account
CN116308685A (en) Product recommendation method and system based on aspect emotion prediction and collaborative filtering
Imron et al. Aspect Based Sentiment Analysis Marketplace Product Reviews Using BERT, LSTM, and CNN
CN104462065B (en) The analysis method and device of event affective style
CN113627550A (en) Image-text emotion analysis method based on multi-mode fusion
CN110942180B (en) Industrial design matching service side prediction method based on xgboost algorithm
CN110321565B (en) Real-time text emotion analysis method, device and equipment based on deep learning
CN110163716B (en) Red wine recommendation method based on convolutional neural network
CN111723302A (en) Recommendation method based on collaborative dual-model deep representation learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512

RJ01 Rejection of invention patent application after publication