WO2024069941A1 - Information processing device, search method, and search program - Google Patents

Information processing device, search method, and search program Download PDF

Info

Publication number
WO2024069941A1
WO2024069941A1 PCT/JP2022/036728 JP2022036728W WO2024069941A1 WO 2024069941 A1 WO2024069941 A1 WO 2024069941A1 JP 2022036728 W JP2022036728 W JP 2022036728W WO 2024069941 A1 WO2024069941 A1 WO 2024069941A1
Authority
WO
WIPO (PCT)
Prior art keywords
replacement
information
search content
unit
search
Prior art date
Application number
PCT/JP2022/036728
Other languages
French (fr)
Japanese (ja)
Inventor
大幹 白藤
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2022/036728 priority Critical patent/WO2024069941A1/en
Publication of WO2024069941A1 publication Critical patent/WO2024069941A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/383Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Definitions

  • This disclosure relates to an information processing device, a search method, and a search program.
  • ambiguity is evaluated. Even if ambiguity is evaluated, there are cases where the search results desired by the user are not obtained. For example, "machine learning” is input as a search keyword. "Machine learning” itself is not ambiguous. However, if the user is looking for search results related to “machine learning” in “image processing”, search results related to “machine learning” in “natural language processing” may be output as the search results. In this way, there are cases where the search results desired by the user are not obtained.
  • the purpose of this disclosure is to output search results that the user desires.
  • the information processing device includes: an acquisition unit that acquires search content, a user category that is a user's category, a replacement dictionary that indicates the correspondence between replacement target words and categories and replacement information that is information indicating replacement; a vectorization model; a plurality of document data; and a plurality of vector data obtained by replacing and vectorizing the plurality of document data; a determination unit that uses the replacement dictionary to determine whether the search content includes the replacement target words; a replacement unit that uses the user category and the replacement dictionary to replace the replacement target words included in the search content with the replacement information when the search content includes the replacement target words; a vector calculation unit that uses the vectorization model to vectorize replacement search content that is the search content including the replacement information and calculates the similarity between the vectorized replacement search content and each of the plurality of vector data; and an output unit that arranges the plurality of document data corresponding to the plurality of vector data in order of the similarity with the highest similarity, and outputs the plurality of document data as
  • FIG. 1 illustrates a search system.
  • FIG. 2 is a diagram illustrating hardware included in an information processing device.
  • FIG. 2 is a block diagram showing the functions of a generating device.
  • FIG. 11 is a diagram illustrating an example of category information.
  • FIG. 11 is a diagram illustrating an example of user information.
  • FIG. 11 is a diagram illustrating an example of model information.
  • FIG. 11 is a diagram showing an example of replacement target word information.
  • FIG. 13 is a diagram illustrating an example of a replacement dictionary.
  • FIG. 4 is a diagram illustrating an example of a document table.
  • FIG. 13 is a flowchart illustrating an example of a process executed by the generating device.
  • FIG. 2 is a block diagram showing functions of a terminal device.
  • FIG. 2 is a block diagram showing functions of the information processing device.
  • FIG. 11 is a diagram showing an example of search content information. 11 is a flowchart illustrating an example of a process executed by an information processing device.
  • Embodiment 1 is a diagram showing a search system.
  • the search system includes an information processing device 100, a generating device 200, and a terminal device 300.
  • the information processing device 100, the generating device 200, and the terminal device 300 communicate with each other via a network 10.
  • the network 10 is a wired network or a wireless network.
  • the information processing device 100 and the generating device 200 may be realized by a single device.
  • the information processing device 100 is a device that executes the search method.
  • the information processing device 100 is a personal computer (PC), a server, a smartphone, a tablet device, or the like.
  • the generating device 200 is a device that generates various information.
  • the terminal device 300 is a device that is used by a user.
  • the information processing apparatus 100 includes a processor 101, a volatile storage device 102, a non-volatile storage device 103, a communication IF (Interface) 104, an input/output IF 105, and a media IF 106.
  • the processor 101 controls the entire information processing device 100.
  • the processor 101 is a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor), etc.
  • the processor 101 may be a multiprocessor.
  • the information processing device 100 may also have a processing circuit.
  • the information processing device 100 may have a microcomputer or a SoC (System on Chip).
  • the volatile storage device 102 is a main storage device of the information processing device 100.
  • the volatile storage device 102 is a random access memory (RAM).
  • the non-volatile storage device 103 is an auxiliary storage device of the information processing device 100.
  • the non-volatile storage device 103 is a hard disk drive (HDD) or a solid state drive (SSD).
  • the communication IF 104 communicates with the generating device 200 and the terminal device 300.
  • the input/output IF 105 connects to an input device (e.g., a keyboard) and an output device (e.g., a display).
  • the media IF 106 communicates with a recording medium.
  • the recording medium is a CD (Compact Disc), a DVD (Digital Versatile Disc), or a flash memory.
  • the generating device 200 and the terminal device 300 like the information processing device 100, have a processor, a volatile storage device, a non-volatile storage device, a communication IF, an input/output IF, and a media IF.
  • the generating device 200 has a storage unit 210, a communication unit 220, and a control unit 230.
  • the control unit 230 includes an acquisition unit 231, a determination unit 232, a replacement unit 233, a preprocessing unit 234, a generating unit 235, and a vectorization unit 236.
  • the storage unit 210 may be realized as a storage area secured in a volatile storage device or a non-volatile storage device included in the generating device 200 .
  • a part or all of the communication unit 220 and the control unit 230 may be realized by a processing circuit included in the generating device 200.
  • a part or all of the communication unit 220 and the control unit 230 may be realized as a program module executed by a processor included in the generating device 200.
  • the storage unit 210 stores category information 211, user information 212, model information 213, replacement target word information 214, replacement dictionary 215, document table 216, and vector information 217. Details of the category information 211, user information 212, model information 213, replacement target word information 214, replacement dictionary 215, document table 216, and vector information 217 will be described later.
  • the communication unit 220 communicates with the information processing device 100 and the terminal device 300 .
  • the acquisition unit 231 acquires the category information 211 from the storage unit 210. An example of the category information 211 is shown below.
  • FIG. 4 is a diagram showing an example of category information.
  • Category information 211 has the fields of category ID (identifier) and category.
  • the acquisition unit 231 acquires the user information 212 from the storage unit 210.
  • FIG. 5 is a diagram showing an example of user information.
  • User information 212 has items such as a user ID and attribute information.
  • the attribute information indicates the user's field of expertise.
  • the determination unit 232 uses the category information 211 and the user information 212 to determine a category corresponding to the attribute information. For example, the determination unit 232 determines the category "image processing" corresponding to the attribute information based on "image" indicated by the attribute information. The determination unit 232 associates the determined category with the attribute information. For example, the determination unit 232 associates the category "image processing" with the attribute information "image.”
  • the determination unit 232 may also estimate a category corresponding to the attribute information using a trained model.
  • the trained model is included in the model information 213.
  • An example of the model information 213 is shown below.
  • FIG. 6 is a diagram showing an example of model information.
  • the model information 213 has items of model ID, model name, and trained model.
  • the determination unit 232 may estimate a category corresponding to the attribute information using the trained model, which is an attribute category estimation model.
  • the determination unit 232 associates the estimated category with the attribute information.
  • the acquisition unit 231 acquires the replacement target word information 214 from the storage unit 210.
  • An example of the replacement target word information 214 is shown below.
  • FIG. 7 is a diagram showing an example of replacement target word information.
  • the replacement target word information 214 has fields for replacement target word ID and replacement target word.
  • the replacement target word may be called an ambiguous word.
  • the replacement unit 233 generates a replacement dictionary 215 using the categories of the user information 212 and the replacement target word information 214.
  • An example of the replacement dictionary 215 is shown below.
  • FIG. 8 is a diagram showing an example of a replacement dictionary.
  • the replacement dictionary 215 is information indicating the correspondence between replacement target words, categories, and replacement information. Specifically, the replacement dictionary 215 has items for replacement target words, categories, and replacement information.
  • the replacement information may be generated based on the replacement target words and categories. For example, the replacement unit 233 generates replacement information "machine learning #image processing" based on the replacement target word "machine learning” and the category "image processing". A case has been described in which replacement information is generated based on the replacement target word and category.
  • the replacement information may be represented by different information. In other words, the replacement information may be any information as long as it indicates a replacement.
  • the acquisition unit 231 acquires the document table 216 from the storage unit 210.
  • An example of the document table 216 is shown below.
  • FIG. 9 is a diagram showing an example of a document table.
  • the document table 216 has items such as a document ID and document data.
  • the preprocessing unit 234 performs morphological analysis on the document data.
  • the preprocessing unit 234 registers the document data after the morphological analysis in the document table 216.
  • the preprocessing unit 234 may also perform preprocessing such as normalization.
  • the preprocessing unit 234 estimates a category corresponding to the document data by using the document category estimation model registered in the model information 213.
  • the preprocessing unit 234 associates the estimated category with the document data.
  • the category is registered in the document table 216.
  • the user may also analyze a category corresponding to the document data, and the preprocessing unit 234 may associate the category obtained by the analysis with the document data.
  • the replacement unit 233 replaces the replacement target word included in the document data after morphological analysis with replacement information using the replacement dictionary 215. For example, the replacement unit 233 replaces the replacement target word "machine learning” included in the document data after morphological analysis with the replacement information "machine learning #natural language processing".
  • the generation unit 235 generates a trained model using the replaced document data (hereinafter, replaced document data) included in the document table 216.
  • the generation unit 235 registers the generated trained model in the model information 213.
  • the name of the trained model is a vectorized model.
  • the vectorization unit 236 vectorizes the replaced document data using the vectorization model registered in the model information 213.
  • the vectorization unit 236 registers the vector data obtained by vectorization in the vector information 217.
  • An example of the vector information 217 is shown below.
  • FIG. 10 is a diagram showing an example of vector information.
  • Vector information 217 has fields for document ID and vector data.
  • FIG. 11 is a flowchart illustrating an example of a process for generating a vectorized model.
  • the acquisition unit 231 acquires the document table 216 from the storage unit 210 .
  • the preprocessing unit 234 performs a morphological analysis on the document data.
  • the preprocessing unit 234 uses the document category estimation model to estimate a category corresponding to the document data.
  • the replacing unit 233 uses the replacement dictionary 215 to replace the words to be replaced that are included in the document data after the morphological analysis with replacement information.
  • the generating unit 235 generates a vectorized model by using the replaced document data included in the document table 216.
  • FIG. 12 is a flowchart illustrating an example of a process executed by the generating device.
  • the acquisition unit 231 acquires the category information 211 and the user information 212 from the storage unit 210.
  • the determination unit 232 uses the category information 211 and the user information 212 to determine a category corresponding to the attribute information.
  • the determining unit 232 associates the determined category with the attribute information.
  • the acquisition unit 231 acquires the document table 216 from the storage unit 210.
  • Step S25 The vectorization unit 236 selects one piece of replaced document data.
  • Step S26 The vectorization unit 236 vectorizes the selected replaced document data by using the vectorization model.
  • Step S27 The vectorization unit 236 registers the vector data obtained by the vectorization in the vector information 217.
  • Step S28 The vectorization unit 236 determines whether or not all of the replaced document data has been selected. If all of the replaced document data has been selected, the process ends. If all of the replaced document data has not been selected, the process proceeds to step S25.
  • the terminal device 300 has a communication unit 310, a control unit 320, an input unit 330, a display unit 340, and a storage unit 350.
  • the control unit 320 includes an acquisition unit 321, a search content receiving unit 322, a transmission unit 323, a result receiving unit 324, and a display control unit 325.
  • the communication unit 310 and the control unit 320 may be partly or entirely realized by a processing circuit included in the terminal device 300. Also, the communication unit 310 and the control unit 320 may be partly or entirely realized as a program module executed by a processor included in the terminal device 300.
  • the input unit 330 is realized by a touch panel or the like.
  • the display unit 340 is realized by a display.
  • the storage unit 350 may be realized as a storage area reserved in a volatile storage device or non-volatile storage device included in the terminal device 300.
  • the communication unit 310 communicates with the information processing device 100 and the generating device 200 .
  • the acquiring unit 321 acquires a user ID. For example, the acquiring unit 321 acquires a user ID input by a user.
  • the acquiring unit 321 stores the user ID in the storage unit 350.
  • the search content receiving unit 322 receives the search content input by the user.
  • the search content includes the search keyword and the meaning of the query.
  • the search content receiving unit 322 stores the search content in the storage unit 350.
  • the transmission unit 323 transmits the user ID and the search details to the information processing device 100 via the communication unit 310 .
  • the result receiving unit 324 receives the search results from the information processing device 100.
  • the result receiving unit 324 stores the search results in the storage unit 350.
  • the display control unit 325 displays the search results on the display unit 340 .
  • the information processing device 100 includes a storage unit 110, a communication unit 120, an acquisition unit 130, a determination unit 140, a substitution unit 150, a vector calculation unit 160, and an output unit 170.
  • the storage unit 110 may be realized as a storage area secured in the volatile storage device 102 or the non-volatile storage device 103 .
  • a part or all of the communication unit 120, the acquisition unit 130, the determination unit 140, the replacement unit 150, the vector calculation unit 160, and the output unit 170 may be realized by a processing circuit.
  • a part or all of the communication unit 120, the acquisition unit 130, the determination unit 140, the replacement unit 150, the vector calculation unit 160, and the output unit 170 may be realized as a module of a program executed by the processor 101.
  • the program executed by the processor 101 is also called a search program.
  • the search program is recorded on a recording medium.
  • the storage unit 110 may store the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217.
  • the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217 are acquired from the generating device 200 by the acquisition unit 130. Then, the acquisition unit 130 stores the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217 in the storage unit 110.
  • the storage unit 110 may store search content information 111.
  • the search content information 111 will be described later.
  • the communication unit 120 communicates with the generating device 200 and the terminal device 300 .
  • the acquisition unit 130 acquires the user ID and the search content from the terminal device 300.
  • the user may input the user ID and the search content to the information processing device 100.
  • the acquisition unit 130 may acquire the user ID and the search content through an input operation by the user.
  • the acquisition unit 130 registers the user ID and the search content in the search content information 111.
  • the acquisition unit 130 also registers the date and time when the search content was acquired in the search content information 111.
  • the acquisition unit 130 refers to the user information 212 and acquires a category corresponding to the user ID.
  • the acquisition unit 130 registers the acquired category in the search content information 111.
  • the category corresponding to the user ID is also called a user category.
  • a user category is a category of a user.
  • a user may also be expressed as a searcher.
  • the acquisition unit 130 may acquire a user category through an input operation by the user.
  • the acquisition unit 130 acquires the replacement dictionary 215 from an external device such as the generation device 200 or the storage unit 110.
  • the determination unit 140 uses the replacement dictionary 215 to determine whether the search content contains the word to be replaced.
  • the determination unit 140 may also perform a morphological analysis on the search content and determine whether the search content after the morphological analysis contains the word to be replaced.
  • the replacement unit 150 replaces the word to be replaced with replacement information using the category corresponding to the user ID and the replacement dictionary 215. For example, when the word to be replaced is "machine learning" and the category corresponding to the user ID is "image processing", the replacement unit 150 replaces the word to be replaced “machine learning” with the replacement information "machine learning#image processing”.
  • the replacement unit 150 registers the replacement search content, which is the search content containing the replacement information, in the search content information 111.
  • the search content information 111 is shown.
  • FIG. 15 is a diagram showing an example of search content information.
  • Search content information 111 has the fields of search ID, user ID, category, search content, replacement search content, and date and time.
  • the acquisition unit 130 acquires the vectorized model from an external device such as the generation device 200 or the storage unit 110.
  • the acquisition unit 130 acquires vector information 217 from an external device such as the generation device 200 or the storage unit 110. That is, the acquisition unit 130 acquires multiple vector data.
  • the multiple vector data are data obtained by replacing and vectorizing multiple document data.
  • the multiple vector data are data obtained by replacing and vectorizing parts of the multiple document data.
  • the acquisition unit 130 acquires a plurality of document data corresponding to a plurality of vector data. For example, the acquisition unit 130 acquires the plurality of document data from an external device such as the generation device 200. For example, the acquisition unit 130 may acquire the plurality of document data based on a document ID included in the vector information 217.
  • the vector calculation unit 160 vectorizes the replacement search content using a vectorization model.
  • the vector calculation unit 160 calculates the similarity between the vectorized replacement search content and each of the multiple vector data included in the vector information 217. For example, when calculating the similarity, cosine similarity is used. Specifically, the vector calculation unit 160 calculates the similarity between the vectorized replacement search content and the vector data "DOCV1" using cosine similarity. The vector calculation unit 160 also calculates the similarity between the vectorized replacement search content and the vector data "DOCV2" using cosine similarity. In this way, the vector calculation unit 160 calculates the similarity corresponding to each of the multiple vector data.
  • the output unit 170 arranges the multiple document data corresponding to the multiple vector data in descending order of similarity.
  • the output unit 170 outputs the multiple document data as search results.
  • the output unit 170 outputs the multiple document data to the terminal device 300.
  • the output unit 170 outputs the multiple document data to the display of the information processing device 100.
  • the output unit 170 may output, as search results, document data corresponding to vector data whose similarity is equal to or exceeds a predetermined threshold value, out of the multiple document data corresponding to the multiple vector data. This allows the information processing device 100 to output search results that are more desired by the user.
  • FIG. 16 is a flowchart illustrating an example of processing executed by the information processing device.
  • the acquiring unit 130 acquires a user ID and search details from the terminal device 300.
  • the acquiring unit 130 refers to the user information 212 and acquires a category corresponding to the user ID.
  • the determination unit 140 determines whether or not the search content contains the word to be replaced using the replacement dictionary 215. If the search content contains the word to be replaced, the process proceeds to step S34. If the search content does not contain the word to be replaced, the process proceeds to step S37.
  • Step S34 The replacing unit 150 replaces the replacement target word with replacement information by using the category corresponding to the user ID and the replacement dictionary 215.
  • Step S35 The vector calculation unit 160 vectorizes the replacement search content, which is the search content including replacement information, by using the vectorization model.
  • Step S36 The vector calculation unit 160 calculates the similarity between the vectorized replacement search content and each of the multiple vector data included in the vector information 217. Then, the process proceeds to step S39.
  • Step S37 The vector calculation unit 160 vectorizes the search content.
  • Step S ⁇ b>38 The vector calculation unit 160 calculates the similarity between the vectorized search content and each of the multiple vector data included in the vector information 217 .
  • Step S39 The output unit 170 arranges the multiple document data corresponding to the multiple vector data in descending order of similarity, and outputs the multiple document data to the terminal device 300 as a search result. This allows the terminal device 300 to display the search results.
  • the information processing device 100 outputs search results based on user categories. Therefore, the information processing device 100 can output search results that the user desires.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An information processing device (100) comprises: an acquisition unit (130) that acquires search content, a user category, a replacement dictionary (215) indicating a correspondence relationship between a replacement target word, a category, and replacement information that is information indicating a replacement, a vectorization model, a plurality of sets of document data, and a plurality of sets of vector data; a determination unit (140) that uses the replacement dictionary (215) to determine whether or not the search content contains a replacement target word; a replacement unit (150) that, when the search content contains a replacement target word, replaces the replacement target word with replacement information by using the user category and the replacement dictionary (215); a vector calculation unit (160) that uses the vectorization model to vectorize the replaced search content, which is search content containing the replacement information, and calculates the degree of similarity between the vectorized search content and each of the plurality of sets of vector data; and an output unit (170) that sorts the plurality of sets of document data corresponding to the plurality of sets of vector data in descending order of the degree of similarity, and outputs the plurality of sets of document data as a search result.

Description

情報処理装置、検索方法、及び検索プログラムInformation processing device, search method, and search program
 本開示は、情報処理装置、検索方法、及び検索プログラムに関する。 This disclosure relates to an information processing device, a search method, and a search program.
 多くの文書に対して検索を行う場合、分散表現が用いられる場合がある。分散表現では、検索キーワードが曖昧でない場合、正確な検索が行われる。しかし、検索キーワードはユーザにより生成される。そのため、検索キーワードが曖昧である場合がある。そこで、キーワードの曖昧性を評価する技術が提案されている(特許文献1を参照)。 When searching many documents, distributed representations may be used. With distributed representations, accurate searches can be performed if the search keywords are not ambiguous. However, search keywords are generated by the user. As a result, the search keywords may be ambiguous. Therefore, a technique for evaluating the ambiguity of keywords has been proposed (see Patent Document 1).
特開2017-045196号公報JP 2017-045196 A
 上記技術では、曖昧性が評価される。曖昧性が評価されても、ユーザが望む検索結果が得られない場合がある。例えば、検索キーワードとして、“機械学習”が入力される。“機械学習”自体には、曖昧性はない。しかし、ユーザが“画像処理”の“機械学習”に関する検索結果を望んでいる場合、検索結果として、“自然言語処理”の“機械学習”に関する検索結果が出力される場合がある。このように、ユーザが望む検索結果が得られない場合がある。 In the above technology, ambiguity is evaluated. Even if ambiguity is evaluated, there are cases where the search results desired by the user are not obtained. For example, "machine learning" is input as a search keyword. "Machine learning" itself is not ambiguous. However, if the user is looking for search results related to "machine learning" in "image processing", search results related to "machine learning" in "natural language processing" may be output as the search results. In this way, there are cases where the search results desired by the user are not obtained.
 本開示の目的は、ユーザが望む検索結果を出力することである。 The purpose of this disclosure is to output search results that the user desires.
 本開示の一態様に係る情報処理装置が提供される。情報処理装置は、検索内容、ユーザのカテゴリであるユーザカテゴリ、置換対象単語とカテゴリと置換を示す情報である置換情報との対応関係を示す置換辞書、ベクトル化モデル、複数の文書データ、及び前記複数の文書データを置換し、ベクトル化することにより得られた複数のベクトルデータを取得する取得部と、前記置換辞書を用いて、前記検索内容に置換対象単語が含まれているか否かを判定する判定部と、前記検索内容に置換対象単語が含まれている場合、前記ユーザカテゴリと前記置換辞書とを用いて、前記検索内容に含まれている置換対象単語を前記置換情報に置換する置換部と、前記ベクトル化モデルを用いて、前記置換情報を含む前記検索内容である置換検索内容をベクトル化し、ベクトル化された前記置換検索内容と、前記複数のベクトルデータのそれぞれとの類似度を算出するベクトル算出部と、前記類似度の高い順に、前記複数のベクトルデータに対応する前記複数の文書データを並べ、前記複数の文書データを、検索結果として出力する出力部と、を有する。  An information processing device according to one aspect of the present disclosure is provided. The information processing device includes: an acquisition unit that acquires search content, a user category that is a user's category, a replacement dictionary that indicates the correspondence between replacement target words and categories and replacement information that is information indicating replacement; a vectorization model; a plurality of document data; and a plurality of vector data obtained by replacing and vectorizing the plurality of document data; a determination unit that uses the replacement dictionary to determine whether the search content includes the replacement target words; a replacement unit that uses the user category and the replacement dictionary to replace the replacement target words included in the search content with the replacement information when the search content includes the replacement target words; a vector calculation unit that uses the vectorization model to vectorize replacement search content that is the search content including the replacement information and calculates the similarity between the vectorized replacement search content and each of the plurality of vector data; and an output unit that arranges the plurality of document data corresponding to the plurality of vector data in order of the similarity with the highest similarity, and outputs the plurality of document data as a search result.
 本開示によれば、ユーザが望む検索結果を出力することができる。 According to this disclosure, it is possible to output search results that the user desires.
検索システムを示す図である。FIG. 1 illustrates a search system. 情報処理装置が有するハードウェアを示す図である。FIG. 2 is a diagram illustrating hardware included in an information processing device. 生成装置の機能を示すブロック図である。FIG. 2 is a block diagram showing the functions of a generating device. カテゴリ情報の例を示す図である。FIG. 11 is a diagram illustrating an example of category information. ユーザ情報の例を示す図である。FIG. 11 is a diagram illustrating an example of user information. モデル情報の例を示す図である。FIG. 11 is a diagram illustrating an example of model information. 置換対象単語情報の例を示す図である。FIG. 11 is a diagram showing an example of replacement target word information. 置換辞書の例を示す図である。FIG. 13 is a diagram illustrating an example of a replacement dictionary. 文書テーブルの例を示す図である。FIG. 4 is a diagram illustrating an example of a document table. ベクトル情報の例を示す図である。FIG. 11 is a diagram illustrating an example of vector information. ベクトル化モデルの生成処理の例を示すフローチャートである。13 is a flowchart illustrating an example of a generation process of a vectorized model. 生成装置が実行する処理の例を示すフローチャートである。13 is a flowchart illustrating an example of a process executed by the generating device. 端末装置の機能を示すブロック図である。FIG. 2 is a block diagram showing functions of a terminal device. 情報処理装置の機能を示すブロック図である。FIG. 2 is a block diagram showing functions of the information processing device. 検索内容情報の例を示す図である。FIG. 11 is a diagram showing an example of search content information. 情報処理装置が実行する処理の例を示すフローチャートである。11 is a flowchart illustrating an example of a process executed by an information processing device.
 以下、図面を参照しながら実施の形態を説明する。以下の実施の形態は、例にすぎず、本開示の範囲内で種々の変更が可能である。 Below, an embodiment will be described with reference to the drawings. The following embodiment is merely an example, and various modifications are possible within the scope of this disclosure.
実施の形態.
 図1は、検索システムを示す図である。検索システムは、情報処理装置100、生成装置200、及び端末装置300を含む。情報処理装置100、生成装置200、及び端末装置300は、ネットワーク10を介して通信する。ネットワーク10は、有線ネットワーク又は無線ネットワークである。情報処理装置100と生成装置200とは、1つの装置で実現してもよい。
Embodiment
1 is a diagram showing a search system. The search system includes an information processing device 100, a generating device 200, and a terminal device 300. The information processing device 100, the generating device 200, and the terminal device 300 communicate with each other via a network 10. The network 10 is a wired network or a wireless network. The information processing device 100 and the generating device 200 may be realized by a single device.
 情報処理装置100は、検索方法を実行する装置である。例えば、情報処理装置100は、PC(Personal Computer)、サーバ、スマートフォン、タブレット装置などである。
 生成装置200は、様々な情報を生成する装置である。端末装置300は、ユーザが使用する装置である。
The information processing device 100 is a device that executes the search method. For example, the information processing device 100 is a personal computer (PC), a server, a smartphone, a tablet device, or the like.
The generating device 200 is a device that generates various information. The terminal device 300 is a device that is used by a user.
 次に、情報処理装置100が有するハードウェアを説明する。
 図2は、情報処理装置が有するハードウェアを示す図である。情報処理装置100は、プロセッサ101、揮発性記憶装置102、不揮発性記憶装置103、通信IF(Interface)104、入出力IF105、及びメディアIF106を有する。
Next, the hardware of the information processing device 100 will be described.
2 is a diagram showing hardware included in the information processing apparatus 100. The information processing apparatus 100 includes a processor 101, a volatile storage device 102, a non-volatile storage device 103, a communication IF (Interface) 104, an input/output IF 105, and a media IF 106.
 プロセッサ101は、情報処理装置100全体を制御する。例えば、プロセッサ101は、CPU(Central Processing Unit)、FPGA(Field Programmable Gate Array)、DSP(Digital Signal Processor)などである。プロセッサ101は、マルチプロセッサでもよい。また、情報処理装置100は、処理回路を有してもよい。さらに、情報処理装置100は、マイクロコンピュータ、又はSoC(System on Chip)を有してもよい。 The processor 101 controls the entire information processing device 100. For example, the processor 101 is a CPU (Central Processing Unit), an FPGA (Field Programmable Gate Array), a DSP (Digital Signal Processor), etc. The processor 101 may be a multiprocessor. The information processing device 100 may also have a processing circuit. Furthermore, the information processing device 100 may have a microcomputer or a SoC (System on Chip).
 揮発性記憶装置102は、情報処理装置100の主記憶装置である。例えば、揮発性記憶装置102は、RAM(Random Access Memory)である。不揮発性記憶装置103は、情報処理装置100の補助記憶装置である。例えば、不揮発性記憶装置103は、HDD(Hard Disk Drive)、又はSSD(Solid State Drive)である。
 通信IF104は、生成装置200と端末装置300と通信する。入出力IF105は、入力装置(例えば、キーボード)と出力装置(例えば、ディスプレイ)と接続する。メディアIF106は、記録媒体を通信する。例えば、記録媒体は、CD(Compact Disc)、DVD(Digital Versatile Disc)、又はフラッシュメモリである。
The volatile storage device 102 is a main storage device of the information processing device 100. For example, the volatile storage device 102 is a random access memory (RAM). The non-volatile storage device 103 is an auxiliary storage device of the information processing device 100. For example, the non-volatile storage device 103 is a hard disk drive (HDD) or a solid state drive (SSD).
The communication IF 104 communicates with the generating device 200 and the terminal device 300. The input/output IF 105 connects to an input device (e.g., a keyboard) and an output device (e.g., a display). The media IF 106 communicates with a recording medium. For example, the recording medium is a CD (Compact Disc), a DVD (Digital Versatile Disc), or a flash memory.
 生成装置200及び端末装置300は、情報処理装置100と同様に、プロセッサ、揮発性記憶装置、不揮発性記憶装置、通信IF、入出力IF、及びメディアIFを有する。 The generating device 200 and the terminal device 300, like the information processing device 100, have a processor, a volatile storage device, a non-volatile storage device, a communication IF, an input/output IF, and a media IF.
 次に、情報処理装置100の機能を説明する前に、生成装置200と端末装置300の機能を説明する。
 図3は、生成装置の機能を示すブロック図である。生成装置200は、記憶部210、通信部220、及び制御部230を有する。制御部230は、取得部231、決定部232、置換部233、前処理部234、生成部235、及びベクトル化部236を含む。
Next, before describing the functions of the information processing device 100, the functions of the generating device 200 and the terminal device 300 will be described.
3 is a block diagram showing the functions of the generating device 200. The generating device 200 has a storage unit 210, a communication unit 220, and a control unit 230. The control unit 230 includes an acquisition unit 231, a determination unit 232, a replacement unit 233, a preprocessing unit 234, a generating unit 235, and a vectorization unit 236.
 記憶部210は、生成装置200が有する揮発性記憶装置又は不揮発性記憶装置に確保した記憶領域として実現してもよい。
 通信部220及び制御部230の一部又は全部は、生成装置200が有する処理回路によって実現してもよい。また、通信部220及び制御部230の一部又は全部は、生成装置200が有するプロセッサが実行するプログラムのモジュールとして実現してもよい。
The storage unit 210 may be realized as a storage area secured in a volatile storage device or a non-volatile storage device included in the generating device 200 .
A part or all of the communication unit 220 and the control unit 230 may be realized by a processing circuit included in the generating device 200. In addition, a part or all of the communication unit 220 and the control unit 230 may be realized as a program module executed by a processor included in the generating device 200.
 記憶部210は、カテゴリ情報211、ユーザ情報212、モデル情報213、置換対象単語情報214、置換辞書215、文書テーブル216、及びベクトル情報217を記憶する。カテゴリ情報211、ユーザ情報212、モデル情報213、置換対象単語情報214、置換辞書215、文書テーブル216、及びベクトル情報217の詳細は、後で説明する。 The storage unit 210 stores category information 211, user information 212, model information 213, replacement target word information 214, replacement dictionary 215, document table 216, and vector information 217. Details of the category information 211, user information 212, model information 213, replacement target word information 214, replacement dictionary 215, document table 216, and vector information 217 will be described later.
 通信部220は、情報処理装置100と端末装置300と通信する。
 取得部231は、カテゴリ情報211を記憶部210から取得する。カテゴリ情報211の例を示す。
The communication unit 220 communicates with the information processing device 100 and the terminal device 300 .
The acquisition unit 231 acquires the category information 211 from the storage unit 210. An example of the category information 211 is shown below.
 図4は、カテゴリ情報の例を示す図である。カテゴリ情報211は、カテゴリID(identifier)とカテゴリとの項目を有する。 FIG. 4 is a diagram showing an example of category information. Category information 211 has the fields of category ID (identifier) and category.
 取得部231は、ユーザ情報212を記憶部210から取得する。 The acquisition unit 231 acquires the user information 212 from the storage unit 210.
 図5は、ユーザ情報の例を示す図である。ユーザ情報212は、ユーザID、属性情報などの項目を有する。例えば、属性情報は、ユーザの専門分野を示す。 FIG. 5 is a diagram showing an example of user information. User information 212 has items such as a user ID and attribute information. For example, the attribute information indicates the user's field of expertise.
 決定部232は、カテゴリ情報211とユーザ情報212と用いて、属性情報に対応するカテゴリを決定する。例えば、決定部232は、属性情報が示す“画像”に基づいて、当該属性情報に対応するカテゴリ“画像処理”を決定する。決定部232は、決定されたカテゴリを属性情報に対応付ける。例えば、決定部232は、属性情報“画像”にカテゴリ“画像処理”を対応付ける。 The determination unit 232 uses the category information 211 and the user information 212 to determine a category corresponding to the attribute information. For example, the determination unit 232 determines the category "image processing" corresponding to the attribute information based on "image" indicated by the attribute information. The determination unit 232 associates the determined category with the attribute information. For example, the determination unit 232 associates the category "image processing" with the attribute information "image."
 また、決定部232は、学習済モデルを用いて、属性情報に対応するカテゴリを推定してもよい。当該学習済モデルは、モデル情報213に含まれている。モデル情報213の例を示す。 The determination unit 232 may also estimate a category corresponding to the attribute information using a trained model. The trained model is included in the model information 213. An example of the model information 213 is shown below.
 図6は、モデル情報の例を示す図である。モデル情報213は、モデルID、モデル名、及び学習済モデルの項目を有する。決定部232は、属性カテゴリ推定モデルである学習済モデルを用いて、属性情報に対応するカテゴリを推定してもよい。決定部232は、推定されたカテゴリを属性情報に対応付ける。 FIG. 6 is a diagram showing an example of model information. The model information 213 has items of model ID, model name, and trained model. The determination unit 232 may estimate a category corresponding to the attribute information using the trained model, which is an attribute category estimation model. The determination unit 232 associates the estimated category with the attribute information.
 取得部231は、置換対象単語情報214を記憶部210から取得する。置換対象単語情報214の例を示す。 The acquisition unit 231 acquires the replacement target word information 214 from the storage unit 210. An example of the replacement target word information 214 is shown below.
 図7は、置換対象単語情報の例を示す図である。置換対象単語情報214は、置換対象単語ID及び置換対象単語の項目を有する。ここで、置換対象単語は、曖昧語と呼んでもよい。 FIG. 7 is a diagram showing an example of replacement target word information. The replacement target word information 214 has fields for replacement target word ID and replacement target word. Here, the replacement target word may be called an ambiguous word.
 置換部233は、ユーザ情報212のカテゴリと置換対象単語情報214とを用いて、置換辞書215を生成する。置換辞書215の例を示す。 The replacement unit 233 generates a replacement dictionary 215 using the categories of the user information 212 and the replacement target word information 214. An example of the replacement dictionary 215 is shown below.
 図8は、置換辞書の例を示す図である。置換辞書215は、置換対象単語とカテゴリと置換情報との対応関係を示す情報である。具体的には、置換辞書215は、置換対象単語、カテゴリ、及び置換情報の項目を有する。置換情報は、置換対象単語及びカテゴリに基づいて、生成されてもよい。例えば、置換部233は、置換対象単語“機械学習”及びカテゴリ“画像処理”に基づいて、置換情報“機械学習#画像処理”を生成する。置換対象単語及びカテゴリに基づいて置換情報が生成される場合を説明した。置換情報は、異なる情報で表されてもよい。すなわち、置換情報は、置換を示す情報であれば、どのような情報でもよい。 FIG. 8 is a diagram showing an example of a replacement dictionary. The replacement dictionary 215 is information indicating the correspondence between replacement target words, categories, and replacement information. Specifically, the replacement dictionary 215 has items for replacement target words, categories, and replacement information. The replacement information may be generated based on the replacement target words and categories. For example, the replacement unit 233 generates replacement information "machine learning #image processing" based on the replacement target word "machine learning" and the category "image processing". A case has been described in which replacement information is generated based on the replacement target word and category. The replacement information may be represented by different information. In other words, the replacement information may be any information as long as it indicates a replacement.
 取得部231は、文書テーブル216を記憶部210から取得する。文書テーブル216の例を示す。 The acquisition unit 231 acquires the document table 216 from the storage unit 210. An example of the document table 216 is shown below.
 図9は、文書テーブルの例を示す図である。文書テーブル216は、文書ID、文書データなどの項目を有する。 FIG. 9 is a diagram showing an example of a document table. The document table 216 has items such as a document ID and document data.
 前処理部234は、文書データに対して、形態素解析を行う。前処理部234は、形態素解析後の文書データを文書テーブル216に登録する。また、前処理部234は、正規化などの前処理を実行してもよい。 The preprocessing unit 234 performs morphological analysis on the document data. The preprocessing unit 234 registers the document data after the morphological analysis in the document table 216. The preprocessing unit 234 may also perform preprocessing such as normalization.
 前処理部234は、モデル情報213に登録されている文書カテゴリ推定モデルを用いて、文書データに対応するカテゴリを推定する。前処理部234は、推定されたカテゴリを文書データに対応付ける。これにより、文書テーブル216には、カテゴリが登録される。
 また、ユーザは、文書データに対応するカテゴリを解析してもよい。前処理部234は、解析によって得られたカテゴリを文書データに対応付けてもよい。
The preprocessing unit 234 estimates a category corresponding to the document data by using the document category estimation model registered in the model information 213. The preprocessing unit 234 associates the estimated category with the document data. As a result, the category is registered in the document table 216.
The user may also analyze a category corresponding to the document data, and the preprocessing unit 234 may associate the category obtained by the analysis with the document data.
 置換部233は、置換辞書215を用いて、形態素解析後の文書データに含まれている置換対象単語を置換情報に置換する。例えば、置換部233は、形態素解析後の文書データに含まれている置換対象単語“機械学習”を、置換情報“機械学習#自然言語処理”に置換する。 The replacement unit 233 replaces the replacement target word included in the document data after morphological analysis with replacement information using the replacement dictionary 215. For example, the replacement unit 233 replaces the replacement target word "machine learning" included in the document data after morphological analysis with the replacement information "machine learning #natural language processing".
 生成部235は、文書テーブル216に含まれている置換後の文書データ(以下、置換文書データ)を用いて、学習済モデルを生成する。生成部235は、生成された学習済モデルをモデル情報213に登録する。なお、当該学習済モデルの名称は、ベクトル化モデルである。 The generation unit 235 generates a trained model using the replaced document data (hereinafter, replaced document data) included in the document table 216. The generation unit 235 registers the generated trained model in the model information 213. The name of the trained model is a vectorized model.
 ベクトル化部236は、モデル情報213に登録されているベクトル化モデルを用いて、置換文書データをベクトル化する。ベクトル化部236は、ベクトル化により得られたベクトルデータを、ベクトル情報217に登録する。ベクトル情報217の例を示す。 The vectorization unit 236 vectorizes the replaced document data using the vectorization model registered in the model information 213. The vectorization unit 236 registers the vector data obtained by vectorization in the vector information 217. An example of the vector information 217 is shown below.
 図10は、ベクトル情報の例を示す図である。ベクトル情報217は、文書ID及びベクトルデータの項目を有する。 FIG. 10 is a diagram showing an example of vector information. Vector information 217 has fields for document ID and vector data.
 次に、ベクトル化モデルの生成処理を、フローチャートを用いて、説明する。
 図11は、ベクトル化モデルの生成処理の例を示すフローチャートである。
 (ステップS11)取得部231は、文書テーブル216を記憶部210から取得する。
 (ステップS12)前処理部234は、文書データに対して、形態素解析を行う。
 (ステップS13)前処理部234は、文書カテゴリ推定モデルを用いて、文書データに対応するカテゴリを推定する。
 (ステップS14)置換部233は、置換辞書215を用いて、形態素解析後の文書データに含まれている置換対象単語を置換情報に置換する。
 (ステップS15)生成部235は、文書テーブル216に含まれている置換文書データを用いて、ベクトル化モデルを生成する。
Next, the process of generating a vectorized model will be described with reference to a flowchart.
FIG. 11 is a flowchart illustrating an example of a process for generating a vectorized model.
(Step S<b>11 ) The acquisition unit 231 acquires the document table 216 from the storage unit 210 .
(Step S12) The preprocessing unit 234 performs a morphological analysis on the document data.
(Step S13) The preprocessing unit 234 uses the document category estimation model to estimate a category corresponding to the document data.
(Step S14) The replacing unit 233 uses the replacement dictionary 215 to replace the words to be replaced that are included in the document data after the morphological analysis with replacement information.
(Step S15) The generating unit 235 generates a vectorized model by using the replaced document data included in the document table 216.
 次に、生成装置200が実行する処理を、フローチャートを用いて、説明する。
 図12は、生成装置が実行する処理の例を示すフローチャートである。
 (ステップS21)取得部231は、カテゴリ情報211とユーザ情報212を記憶部210から取得する。
 (ステップS22)決定部232は、カテゴリ情報211とユーザ情報212と用いて、属性情報に対応するカテゴリを決定する。
 (ステップS23)決定部232は、決定されたカテゴリを属性情報に対応付ける。
 (ステップS24)取得部231は、文書テーブル216を記憶部210から取得する。
Next, the process executed by the generating device 200 will be described with reference to a flowchart.
FIG. 12 is a flowchart illustrating an example of a process executed by the generating device.
(Step S21) The acquisition unit 231 acquires the category information 211 and the user information 212 from the storage unit 210.
(Step S22) The determination unit 232 uses the category information 211 and the user information 212 to determine a category corresponding to the attribute information.
(Step S23) The determining unit 232 associates the determined category with the attribute information.
(Step S24) The acquisition unit 231 acquires the document table 216 from the storage unit 210.
 (ステップS25)ベクトル化部236は、1つの置換文書データを選択する。
 (ステップS26)ベクトル化部236は、ベクトル化モデルを用いて、選択された置換文書データをベクトル化する。
 (ステップS27)ベクトル化部236は、ベクトル化により得られたベクトルデータを、ベクトル情報217に登録する。
 (ステップS28)ベクトル化部236は、全ての置換文書データを選択したか否かを判定する。全ての置換文書データを選択した場合、処理は、終了する。全ての置換文書データを選択していない場合、処理は、ステップS25に進む。
(Step S25) The vectorization unit 236 selects one piece of replaced document data.
(Step S26) The vectorization unit 236 vectorizes the selected replaced document data by using the vectorization model.
(Step S27) The vectorization unit 236 registers the vector data obtained by the vectorization in the vector information 217.
(Step S28) The vectorization unit 236 determines whether or not all of the replaced document data has been selected. If all of the replaced document data has been selected, the process ends. If all of the replaced document data has not been selected, the process proceeds to step S25.
 次に、端末装置300が有する機能を説明する。
 図13は、端末装置の機能を示すブロック図である。端末装置300は、通信部310、制御部320、入力部330、表示部340、及び記憶部350を有する。制御部320は、取得部321、検索内容受信部322、送信部323、結果受信部324、及び表示制御部325を含む。
Next, the functions of the terminal device 300 will be described.
13 is a block diagram showing the functions of the terminal device. The terminal device 300 has a communication unit 310, a control unit 320, an input unit 330, a display unit 340, and a storage unit 350. The control unit 320 includes an acquisition unit 321, a search content receiving unit 322, a transmission unit 323, a result receiving unit 324, and a display control unit 325.
 通信部310及び制御部320の一部又は全部は、端末装置300が有する処理回路によって実現してもよい。また、通信部310及び制御部320の一部又は全部は、端末装置300が有するプロセッサが実行するプログラムのモジュールとして実現してもよい。 例えば、入力部330は、タッチパネルなどで実現される。例えば、表示部340は、ディスプレイで実現される。記憶部350は、端末装置300が有する揮発性記憶装置又は不揮発性記憶装置に確保した記憶領域として実現してもよい。 The communication unit 310 and the control unit 320 may be partly or entirely realized by a processing circuit included in the terminal device 300. Also, the communication unit 310 and the control unit 320 may be partly or entirely realized as a program module executed by a processor included in the terminal device 300. For example, the input unit 330 is realized by a touch panel or the like. For example, the display unit 340 is realized by a display. The storage unit 350 may be realized as a storage area reserved in a volatile storage device or non-volatile storage device included in the terminal device 300.
 通信部310は、情報処理装置100と生成装置200と通信する。
 取得部321は、ユーザIDを取得する。例えば、取得部321は、ユーザが入力したユーザIDを取得する。取得部321は、ユーザIDを記憶部350に格納する。
 検索内容受信部322は、ユーザが入力した検索内容を受信する。検索内容は、検索キーワード及びクエリの意味を含む。検索内容受信部322は、検索内容を記憶部350に格納する。
 送信部323は、通信部310を介して、ユーザIDと検索内容とを情報処理装置100に送信する。
 結果受信部324は、後述するように、検索結果を情報処理装置100から受信する。結果受信部324は、検索結果を記憶部350に格納する。
 表示制御部325は、表示部340に検索結果を表示する。
The communication unit 310 communicates with the information processing device 100 and the generating device 200 .
The acquiring unit 321 acquires a user ID. For example, the acquiring unit 321 acquires a user ID input by a user. The acquiring unit 321 stores the user ID in the storage unit 350.
The search content receiving unit 322 receives the search content input by the user. The search content includes the search keyword and the meaning of the query. The search content receiving unit 322 stores the search content in the storage unit 350.
The transmission unit 323 transmits the user ID and the search details to the information processing device 100 via the communication unit 310 .
As described below, the result receiving unit 324 receives the search results from the information processing device 100. The result receiving unit 324 stores the search results in the storage unit 350.
The display control unit 325 displays the search results on the display unit 340 .
 次に、情報処理装置100が有する機能を説明する。
 図14は、情報処理装置の機能を示すブロック図である。情報処理装置100は、記憶部110、通信部120、取得部130、判定部140、置換部150、ベクトル算出部160、及び出力部170を有する。
Next, functions of the information processing device 100 will be described.
14 is a block diagram showing the functions of the information processing device 100. The information processing device 100 includes a storage unit 110, a communication unit 120, an acquisition unit 130, a determination unit 140, a substitution unit 150, a vector calculation unit 160, and an output unit 170.
 記憶部110は、揮発性記憶装置102又は不揮発性記憶装置103に確保した記憶領域として実現してもよい。
 通信部120、取得部130、判定部140、置換部150、ベクトル算出部160、及び出力部170の一部又は全部は、処理回路によって実現してもよい。また、通信部120、取得部130、判定部140、置換部150、ベクトル算出部160、及び出力部170の一部又は全部は、プロセッサ101が実行するプログラムのモジュールとして実現してもよい。例えば、プロセッサ101が実行するプログラムは、検索プログラムとも言う。例えば、検索プログラムは、記録媒体に記録されている。
The storage unit 110 may be realized as a storage area secured in the volatile storage device 102 or the non-volatile storage device 103 .
A part or all of the communication unit 120, the acquisition unit 130, the determination unit 140, the replacement unit 150, the vector calculation unit 160, and the output unit 170 may be realized by a processing circuit. Also, a part or all of the communication unit 120, the acquisition unit 130, the determination unit 140, the replacement unit 150, the vector calculation unit 160, and the output unit 170 may be realized as a module of a program executed by the processor 101. For example, the program executed by the processor 101 is also called a search program. For example, the search program is recorded on a recording medium.
 記憶部110は、ユーザ情報212、モデル情報213、置換辞書215、及びベクトル情報217を記憶してもよい。例えば、ユーザ情報212、モデル情報213、置換辞書215、及びベクトル情報217は、取得部130により、生成装置200から取得される。そして、取得部130は、ユーザ情報212、モデル情報213、置換辞書215、及びベクトル情報217を記憶部110に格納する。 The storage unit 110 may store the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217. For example, the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217 are acquired from the generating device 200 by the acquisition unit 130. Then, the acquisition unit 130 stores the user information 212, the model information 213, the replacement dictionary 215, and the vector information 217 in the storage unit 110.
 記憶部110は、検索内容情報111を記憶してもよい。検索内容情報111は、後で説明する。
 通信部120は、生成装置200と端末装置300と通信する。
The storage unit 110 may store search content information 111. The search content information 111 will be described later.
The communication unit 120 communicates with the generating device 200 and the terminal device 300 .
 取得部130は、ユーザIDと検索内容とを端末装置300から取得する。ここで、ユーザは、ユーザIDと検索内容とを情報処理装置100に入力してもよい。取得部130は、ユーザの入力操作により、ユーザIDと検索内容とを取得してもよい。 The acquisition unit 130 acquires the user ID and the search content from the terminal device 300. Here, the user may input the user ID and the search content to the information processing device 100. The acquisition unit 130 may acquire the user ID and the search content through an input operation by the user.
 取得部130は、ユーザIDと検索内容とを検索内容情報111に登録する。また、取得部130は、検索内容を取得した日時を検索内容情報111に登録する。取得部130は、ユーザ情報212を参照し、ユーザIDに対応するカテゴリを取得する。取得部130は、取得されたカテゴリを検索内容情報111に登録する。ここで、ユーザIDに対応するカテゴリは、ユーザカテゴリとも言う。つまり、ユーザカテゴリは、ユーザのカテゴリである。ユーザは、検索者と表現してもよい。取得部130は、ユーザの入力操作により、ユーザカテゴリを取得してもよい。 The acquisition unit 130 registers the user ID and the search content in the search content information 111. The acquisition unit 130 also registers the date and time when the search content was acquired in the search content information 111. The acquisition unit 130 refers to the user information 212 and acquires a category corresponding to the user ID. The acquisition unit 130 registers the acquired category in the search content information 111. Here, the category corresponding to the user ID is also called a user category. In other words, a user category is a category of a user. A user may also be expressed as a searcher. The acquisition unit 130 may acquire a user category through an input operation by the user.
 取得部130は、生成装置200などの外部装置又は記憶部110から置換辞書215を取得する。 The acquisition unit 130 acquires the replacement dictionary 215 from an external device such as the generation device 200 or the storage unit 110.
 判定部140は、置換辞書215を用いて、検索内容に置換対象単語が含まれているか否かを判定する。また、判定部140は、検索内容に対して形態素解析を行い、形態素解析後の検索内容に置換対象単語が含まれているか否かを判定してもよい。 The determination unit 140 uses the replacement dictionary 215 to determine whether the search content contains the word to be replaced. The determination unit 140 may also perform a morphological analysis on the search content and determine whether the search content after the morphological analysis contains the word to be replaced.
 置換部150は、検索内容に置換対象単語が含まれている場合、ユーザIDに対応するカテゴリと置換辞書215とを用いて、置換対象単語を置換情報に置換する。例えば、置換対象単語が“機械学習”であり、かつユーザIDに対応するカテゴリが“画像処理”である場合、置換部150は、置換対象単語“機械学習”を、置換情報“機械学習#画像処理”に置換する。置換部150は、置換情報を含む検索内容である置換検索内容を検索内容情報111に登録する。ここで、検索内容情報111の例を示す。 When the search content contains the word to be replaced, the replacement unit 150 replaces the word to be replaced with replacement information using the category corresponding to the user ID and the replacement dictionary 215. For example, when the word to be replaced is "machine learning" and the category corresponding to the user ID is "image processing", the replacement unit 150 replaces the word to be replaced "machine learning" with the replacement information "machine learning#image processing". The replacement unit 150 registers the replacement search content, which is the search content containing the replacement information, in the search content information 111. Here, an example of the search content information 111 is shown.
 図15は、検索内容情報の例を示す図である。検索内容情報111は、検索ID、ユーザID、カテゴリ、検索内容、置換検索内容、及び日時の項目を有する。 FIG. 15 is a diagram showing an example of search content information. Search content information 111 has the fields of search ID, user ID, category, search content, replacement search content, and date and time.
 取得部130は、生成装置200などの外部装置又は記憶部110からベクトル化モデルを取得する。 The acquisition unit 130 acquires the vectorized model from an external device such as the generation device 200 or the storage unit 110.
 取得部130は、生成装置200などの外部装置又は記憶部110からベクトル情報217を取得する。すなわち、取得部130は、複数のベクトルデータを取得する。上記したように、複数のベクトルデータは、複数の文書データを置換し、ベクトル化することにより得られたデータである。詳細には、複数のベクトルデータは、複数の文書データの一部を置換し、ベクトル化することにより得られたデータである。 The acquisition unit 130 acquires vector information 217 from an external device such as the generation device 200 or the storage unit 110. That is, the acquisition unit 130 acquires multiple vector data. As described above, the multiple vector data are data obtained by replacing and vectorizing multiple document data. In detail, the multiple vector data are data obtained by replacing and vectorizing parts of the multiple document data.
 取得部130は、複数のベクトルデータに対応する複数の文書データを取得する。例えば、取得部130は、生成装置200などの外部装置から当該複数の文書データを取得する。例えば、取得部130は、ベクトル情報217に含まれている文書IDに基づいて、当該複数の文書データを取得してもよい。 The acquisition unit 130 acquires a plurality of document data corresponding to a plurality of vector data. For example, the acquisition unit 130 acquires the plurality of document data from an external device such as the generation device 200. For example, the acquisition unit 130 may acquire the plurality of document data based on a document ID included in the vector information 217.
 ベクトル算出部160は、ベクトル化モデルを用いて、置換検索内容をベクトル化する。ベクトル算出部160は、ベクトル化された置換検索内容と、ベクトル情報217に含まれている複数のベクトルデータのそれぞれとの類似度を算出する。例えば、類似度が算出される場合、コサイン類似度が用いられる。具体的には、ベクトル算出部160は、コサイン類似度を用いて、ベクトル化された置換検索内容と、ベクトルデータ“DOCVE1”との類似度を算出する。また、ベクトル算出部160は、コサイン類似度を用いて、ベクトル化された置換検索内容と、ベクトルデータ“DOCVE2”との類似度を算出する。このように、ベクトル算出部160は、複数のベクトルデータのそれぞれに対応する類似度を算出する。 The vector calculation unit 160 vectorizes the replacement search content using a vectorization model. The vector calculation unit 160 calculates the similarity between the vectorized replacement search content and each of the multiple vector data included in the vector information 217. For example, when calculating the similarity, cosine similarity is used. Specifically, the vector calculation unit 160 calculates the similarity between the vectorized replacement search content and the vector data "DOCV1" using cosine similarity. The vector calculation unit 160 also calculates the similarity between the vectorized replacement search content and the vector data "DOCV2" using cosine similarity. In this way, the vector calculation unit 160 calculates the similarity corresponding to each of the multiple vector data.
 出力部170は、類似度の高い順に、複数のベクトルデータに対応する複数の文書データを並べる。出力部170は、複数の文書データを、検索結果として出力する。例えば、出力部170は、複数の文書データを端末装置300に出力する。また、例えば、出力部170は、複数の文書データを情報処理装置100のディスプレイに出力する。出力部170は、複数のベクトルデータに対応する複数の文書データのうち、類似度が予め定められた閾値以上のベクトルデータに対応する文書データを、検索結果として出力してもよい。これにより、情報処理装置100は、ユーザがより望む検索結果を出力することができる。 The output unit 170 arranges the multiple document data corresponding to the multiple vector data in descending order of similarity. The output unit 170 outputs the multiple document data as search results. For example, the output unit 170 outputs the multiple document data to the terminal device 300. Also, for example, the output unit 170 outputs the multiple document data to the display of the information processing device 100. The output unit 170 may output, as search results, document data corresponding to vector data whose similarity is equal to or exceeds a predetermined threshold value, out of the multiple document data corresponding to the multiple vector data. This allows the information processing device 100 to output search results that are more desired by the user.
 次に、情報処理装置100が実行する処理を、フローチャートを用いて、説明する。
 図16は、情報処理装置が実行する処理の例を示すフローチャートである。
 (ステップS31)取得部130は、ユーザIDと検索内容とを端末装置300から取得する。
 (ステップS32)取得部130は、ユーザ情報212を参照し、ユーザIDに対応するカテゴリを取得する。
 (ステップS33)判定部140は、置換辞書215を用いて、検索内容に置換対象単語が含まれているか否かを判定する。検索内容に置換対象単語が含まれている場合、処理は、ステップS34に進む。検索内容に置換対象単語が含まれていない場合、処理は、ステップS37に進む。
Next, the process executed by the information processing device 100 will be described with reference to a flowchart.
FIG. 16 is a flowchart illustrating an example of processing executed by the information processing device.
(Step S31) The acquiring unit 130 acquires a user ID and search details from the terminal device 300.
(Step S32) The acquiring unit 130 refers to the user information 212 and acquires a category corresponding to the user ID.
(Step S33) The determination unit 140 determines whether or not the search content contains the word to be replaced using the replacement dictionary 215. If the search content contains the word to be replaced, the process proceeds to step S34. If the search content does not contain the word to be replaced, the process proceeds to step S37.
 (ステップS34)置換部150は、ユーザIDに対応するカテゴリと置換辞書215とを用いて、置換対象単語を置換情報に置換する。
 (ステップS35)ベクトル算出部160は、ベクトル化モデルを用いて、置換情報を含む検索内容である置換検索内容をベクトル化する。
 (ステップS36)ベクトル算出部160は、ベクトル化された置換検索内容と、ベクトル情報217に含まれている複数のベクトルデータのそれぞれとの類似度を算出する。そして、処理は、ステップS39に進む。
(Step S34) The replacing unit 150 replaces the replacement target word with replacement information by using the category corresponding to the user ID and the replacement dictionary 215.
(Step S35) The vector calculation unit 160 vectorizes the replacement search content, which is the search content including replacement information, by using the vectorization model.
(Step S36) The vector calculation unit 160 calculates the similarity between the vectorized replacement search content and each of the multiple vector data included in the vector information 217. Then, the process proceeds to step S39.
 (ステップS37)ベクトル算出部160は、検索内容をベクトル化する。
 (ステップS38)ベクトル算出部160は、ベクトル化された検索内容と、ベクトル情報217に含まれている複数のベクトルデータのそれぞれとの類似度を算出する。
 (ステップS39)出力部170は、類似度の高い順に、複数のベクトルデータに対応する複数の文書データを並べる。出力部170は、複数の文書データを、検索結果として端末装置300に出力する。
 これにより、端末装置300は、検索結果を表示できる。
(Step S37) The vector calculation unit 160 vectorizes the search content.
(Step S<b>38 ) The vector calculation unit 160 calculates the similarity between the vectorized search content and each of the multiple vector data included in the vector information 217 .
(Step S39) The output unit 170 arranges the multiple document data corresponding to the multiple vector data in descending order of similarity, and outputs the multiple document data to the terminal device 300 as a search result.
This allows the terminal device 300 to display the search results.
 実施の形態によれば、情報処理装置100は、ユーザカテゴリに基づく検索結果を出力する。そのため、情報処理装置100は、ユーザが望む検索結果を出力することができる。 According to the embodiment, the information processing device 100 outputs search results based on user categories. Therefore, the information processing device 100 can output search results that the user desires.
 10 ネットワーク、 100 情報処理装置、 101 プロセッサ、 102 揮発性記憶装置、 103 不揮発性記憶装置、 104 通信IF、 105 入出力IF、 106 メディアIF、 110 記憶部、 111 検索内容情報、 120 通信部、 130 取得部、 140 判定部、 150 置換部、 160 ベクトル算出部、 170 出力部、 200 生成装置、 210 記憶部、 211 カテゴリ情報、 212 ユーザ情報、 213 モデル情報、 214 置換対象単語情報、 215 置換辞書、 216 文書テーブル、 217 ベクトル情報、 220 通信部、 230 制御部、 231 取得部、 232 決定部、 233 置換部、 234 前処理部、 235 生成部、 236 ベクトル化部、 300 端末装置、 310 通信部、 320 制御部、 321 取得部、 322 検索内容受信部、 323 送信部、 324 結果受信部、 325 表示制御部、 330 入力部、 340 表示部、 350 記憶部。 10 network, 100 information processing device, 101 processor, 102 volatile storage device, 103 non-volatile storage device, 104 communication IF, 105 input/output IF, 106 media IF, 110 storage unit, 111 search content information, 120 communication unit, 130 acquisition unit, 140 judgment unit, 150 replacement unit, 160 vector calculation unit, 170 output unit, 200 generation device, 210 storage unit, 211 category information, 212 user information, 213 model information, 21 4 Replacement target word information, 215 Replacement dictionary, 216 Document table, 217 Vector information, 220 Communication unit, 230 Control unit, 231 Acquisition unit, 232 Determination unit, 233 Replacement unit, 234 Preprocessing unit, 235 Generation unit, 236 Vectorization unit, 300 Terminal device, 310 Communication unit, 320 Control unit, 321 Acquisition unit, 322 Search content reception unit, 323 Transmission unit, 324 Result reception unit, 325 Display control unit, 330 Input unit, 340 Display unit, 350 Storage unit.

Claims (4)

  1.  検索内容、ユーザのカテゴリであるユーザカテゴリ、置換対象単語とカテゴリと置換を示す情報である置換情報との対応関係を示す置換辞書、ベクトル化モデル、複数の文書データ、及び前記複数の文書データを置換し、ベクトル化することにより得られた複数のベクトルデータを取得する取得部と、
     前記置換辞書を用いて、前記検索内容に置換対象単語が含まれているか否かを判定する判定部と、
     前記検索内容に置換対象単語が含まれている場合、前記ユーザカテゴリと前記置換辞書とを用いて、前記検索内容に含まれている置換対象単語を前記置換情報に置換する置換部と、
     前記ベクトル化モデルを用いて、前記置換情報を含む前記検索内容である置換検索内容をベクトル化し、ベクトル化された前記置換検索内容と、前記複数のベクトルデータのそれぞれとの類似度を算出するベクトル算出部と、
     前記類似度の高い順に、前記複数のベクトルデータに対応する前記複数の文書データを並べ、前記複数の文書データを、検索結果として出力する出力部と、
     を有する情報処理装置。
    an acquisition unit that acquires search content, a user category that is a user's category, a replacement dictionary that indicates a correspondence relationship between a replacement target word, a category, and replacement information that is information indicating replacement, a vectorization model, a plurality of document data, and a plurality of vector data obtained by replacing and vectorizing the plurality of document data;
    a determination unit that determines whether or not the search content includes a word to be replaced by using the replacement dictionary;
    a replacement unit that replaces the replacement target word included in the search content with the replacement information by using the user category and the replacement dictionary when the search content includes the replacement target word;
    a vector calculation unit that uses the vectorization model to vectorize a replacement search content, which is the search content including the replacement information, and calculates a similarity between the vectorized replacement search content and each of the plurality of vector data;
    an output unit that arranges the plurality of document data corresponding to the plurality of vector data in order of decreasing similarity and outputs the plurality of document data as a search result;
    An information processing device having the above configuration.
  2.  前記出力部は、前記複数のベクトルデータに対応する前記複数の文書データのうち、前記類似度が予め定められた閾値以上のベクトルデータに対応する文書データを、検索結果として出力する、
     請求項1に記載の情報処理装置。
    the output unit outputs, as a search result, document data corresponding to vector data whose similarity is equal to or greater than a predetermined threshold value, from among the plurality of document data corresponding to the plurality of vector data.
    The information processing device according to claim 1 .
  3.  情報処理装置が、
     検索内容、ユーザのカテゴリであるユーザカテゴリ、置換対象単語とカテゴリと置換を示す情報である置換情報との対応関係を示す置換辞書、ベクトル化モデル、複数の文書データ、及び前記複数の文書データを置換し、ベクトル化することにより得られた複数のベクトルデータを取得し、前記置換辞書を用いて、前記検索内容に置換対象単語が含まれているか否かを判定し、前記検索内容に置換対象単語が含まれている場合、前記ユーザカテゴリと前記置換辞書とを用いて、前記検索内容に含まれている置換対象単語を前記置換情報に置換し、
     前記ベクトル化モデルを用いて、前記置換情報を含む前記検索内容である置換検索内容をベクトル化し、
     ベクトル化された前記置換検索内容と、前記複数のベクトルデータのそれぞれとの類似度を算出し、
     前記類似度の高い順に、前記複数のベクトルデータに対応する前記複数の文書データを並べ、
     前記複数の文書データを、検索結果として出力する、
     検索方法。
    An information processing device,
    a search content, a user category which is a user's category, a replacement dictionary which indicates a correspondence relationship between a replacement target word, a category, and replacement information which is information indicating replacement, a vectorization model, a plurality of document data, and a plurality of vector data obtained by replacing and vectorizing the plurality of document data, and using the replacement dictionary, determining whether or not the search content includes a replacement target word, and if the search content includes a replacement target word, replacing the replacement target word included in the search content with the replacement information using the user category and the replacement dictionary;
    vectorizing the replacement search content, the search content including the replacement information, using the vectorization model;
    Calculating a similarity between the vectorized replacement search content and each of the plurality of vector data;
    Arranging the plurality of document data corresponding to the plurality of vector data in order of the degree of similarity;
    outputting the plurality of document data as a search result;
    retrieval method.
  4.  情報処理装置に、
     検索内容、ユーザのカテゴリであるユーザカテゴリ、置換対象単語とカテゴリと置換を示す情報である置換情報との対応関係を示す置換辞書、ベクトル化モデル、複数の文書データ、及び前記複数の文書データを置換し、ベクトル化することにより得られた複数のベクトルデータを取得し、前記置換辞書を用いて、前記検索内容に置換対象単語が含まれているか否かを判定し、前記検索内容に置換対象単語が含まれている場合、前記ユーザカテゴリと前記置換辞書とを用いて、前記検索内容に含まれている置換対象単語を前記置換情報に置換し、
     前記ベクトル化モデルを用いて、前記置換情報を含む前記検索内容である置換検索内容をベクトル化し、
     ベクトル化された前記置換検索内容と、前記複数のベクトルデータのそれぞれとの類似度を算出し、
     前記類似度の高い順に、前記複数のベクトルデータに対応する前記複数の文書データを並べ、
     前記複数の文書データを、検索結果として出力する、
     処理を実行させる検索プログラム。
    In the information processing device,
    a search content, a user category which is a user's category, a replacement dictionary which indicates a correspondence relationship between a replacement target word, a category, and replacement information which is information indicating replacement, a vectorization model, a plurality of document data, and a plurality of vector data obtained by replacing and vectorizing the plurality of document data, and using the replacement dictionary, determining whether or not the search content includes a replacement target word, and if the search content includes a replacement target word, replacing the replacement target word included in the search content with the replacement information using the user category and the replacement dictionary;
    vectorizing the replacement search content, the search content including the replacement information, using the vectorization model;
    Calculating a similarity between the vectorized replacement search content and each of the plurality of vector data;
    Arranging the plurality of document data corresponding to the plurality of vector data in order of the degree of similarity;
    outputting the plurality of document data as a search result;
    The search program that performs the process.
PCT/JP2022/036728 2022-09-30 2022-09-30 Information processing device, search method, and search program WO2024069941A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036728 WO2024069941A1 (en) 2022-09-30 2022-09-30 Information processing device, search method, and search program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/036728 WO2024069941A1 (en) 2022-09-30 2022-09-30 Information processing device, search method, and search program

Publications (1)

Publication Number Publication Date
WO2024069941A1 true WO2024069941A1 (en) 2024-04-04

Family

ID=90476665

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/036728 WO2024069941A1 (en) 2022-09-30 2022-09-30 Information processing device, search method, and search program

Country Status (1)

Country Link
WO (1) WO2024069941A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008234519A (en) * 2007-03-23 2008-10-02 Toyota Central R&D Labs Inc Information retrieval system, information retrieval device, information retrieval method, and its program
JP2015106354A (en) * 2013-12-02 2015-06-08 古河インフォメーション・テクノロジー株式会社 Search suggestion device, search suggestion method, and program
JP2019159813A (en) * 2018-03-13 2019-09-19 日本電信電話株式会社 Substitution device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008234519A (en) * 2007-03-23 2008-10-02 Toyota Central R&D Labs Inc Information retrieval system, information retrieval device, information retrieval method, and its program
JP2015106354A (en) * 2013-12-02 2015-06-08 古河インフォメーション・テクノロジー株式会社 Search suggestion device, search suggestion method, and program
JP2019159813A (en) * 2018-03-13 2019-09-19 日本電信電話株式会社 Substitution device

Similar Documents

Publication Publication Date Title
US11275723B2 (en) Reducing processing for comparing large metadata sets
CN110019732B (en) Intelligent question answering method and related device
US8559731B2 (en) Personalized tag ranking
US20120203584A1 (en) System and method for identifying potential customers
JP7252914B2 (en) Method, apparatus, apparatus and medium for providing search suggestions
US9256649B2 (en) Method and system of filtering and recommending documents
CN100552674C (en) The device and method that is used to translate
JP2010170529A (en) Method and system for object classification
JP2012533818A (en) Ranking search results based on word weights
CN112988980B (en) Target product query method and device, computer equipment and storage medium
CN111078842A (en) Method, device, server and storage medium for determining query result
CN116932730B (en) Document question-answering method and related equipment based on multi-way tree and large-scale language model
CN111144098B (en) Recall method and device for extended question
US20180157744A1 (en) Comparison table automatic generation method, device and computer program product of the same
WO2024069941A1 (en) Information processing device, search method, and search program
JP2023132977A (en) Search program, search device, and search method
CN114328844A (en) Text data set management method, device, equipment and storage medium
CN113761213A (en) Data query system and method based on knowledge graph and terminal equipment
JP4314271B2 (en) Inter-word relevance calculation device, inter-word relevance calculation method, inter-word relevance calculation program, and recording medium recording the program
JP4217410B2 (en) Information retrieval apparatus, control method therefor, and program
CN117555950B (en) Data blood relationship construction method based on data center
JP2018180866A (en) Determination method, determination program and determination device
CN115577078B (en) Engineering cost audit information retrieval method, system, equipment and storage medium
JP2018055224A (en) Data generating device, method, and program
WO2023281691A1 (en) Information processing device, information processing method, and information processing program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22961000

Country of ref document: EP

Kind code of ref document: A1