WO2019091026A1

WO2019091026A1 - Knowledge base document rapid search method, application server, and computer readable storage medium

Info

Publication number: WO2019091026A1
Application number: PCT/CN2018/077675
Authority: WO
Inventors: 张师琲; 侯丽
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-11-10
Filing date: 2018-02-28
Publication date: 2019-05-16
Also published as: CN108038096A

Abstract

A knowledge base document rapid search method, an application server, and a computer readable storage medium, the method comprising: receiving search information inputted by a user (S10); analysing and processing the search information to acquire a query word (S120); on the basis of the query word, searching the documents in a knowledge base, and ranking the search results on the basis of a search match degree (S130); by means of an abstract generating model and a key word generating model, acquiring an abstract and key words of each document (S140); and outputting the ranked search results and correspondingly outputting the abstract and keywords of a target document (S150). The present knowledge base document rapid search method, application server, and computer readable storage medium can rapidly and accurately search the documents in a knowledge base, and can rapidly understand the main content of retrieved documents.

Description

知识库文档快速检索方法、应用服务器及计算机可读存储介质Knowledge base document quick retrieval method, application server and computer readable storage medium

本申请要求于2017年11月10日提交中国专利局、申请号为201711106767.2、发明名称为“知识库文档快速检索方法、应用服务器及计算机可读存储介质”的中国专利申请的优先权，其全部内容通过引用结合在申请中。This application claims the priority of the Chinese Patent Application filed on November 10, 2017, the Chinese Patent Office, the application number is 201711106767.2, and the invention is entitled "Knowledge Database Document Quick Search Method, Application Server and Computer Readable Storage Media". The content is incorporated into the application by reference.

技术领域Technical field

本申请涉及数据分析技术领域，尤其涉及一种知识库文档快速检索方法、应用服务器计算机可读存储介质。The present application relates to the field of data analysis technologies, and in particular, to a quick retrieval method of a knowledge base document, and an application server computer readable storage medium.

背景技术Background technique

随着Intemet以及相关技术的发展与成熟，人们己经进入信息量极其丰富的时代。存在网络上的文档、档案的种类很多，如人事档案、财务档案、技术档案、合同档案、案件档案，各个企业、机构为了内部人士可以方便查阅资料常常建立包含各种档案的知识库，如何对知识库中的档案进行快速、准确的检索，及如何快速了解检索到的档案的主要内容是亟需解决的一大问题。With the development and maturity of Internet and related technologies, people have entered an era of extremely rich information. There are many types of documents and files on the network, such as personnel files, financial files, technical files, contract files, and case files. Enterprises and organizations often create knowledge bases containing various files for internal users to easily access information. How to Quick and accurate retrieval of files in the knowledge base, and how to quickly understand the main content of the retrieved files is a major problem that needs to be solved.

发明内容Summary of the invention

有鉴于此，本申请提出一种知识库文档快速检索方法及应用服务器，以解决如何对知识库中的档案进行快速、准确的检索，及如何快速了解检索到的档案的主要内容的问题。In view of this, the present application proposes a quick retrieval method and an application server for a knowledge base document to solve the problem of how to quickly and accurately retrieve the files in the knowledge base and how to quickly understand the main contents of the retrieved files.

首先，为实现上述目的，本申请提出一种知识库文档快速检索方法，该方法包括步骤：First, in order to achieve the above object, the present application provides a method for quickly searching a knowledge base document, the method comprising the steps of:

接收用户输入的检索信息；Receiving retrieval information input by the user;

对所述检索信息进行分析、处理以获取查询词；The search information is analyzed and processed to obtain a query word;

根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序；Searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree;

通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词；及Obtaining abstracts and keywords of each document through the summary generation model and the keyword generation model;

输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。The sorted search results are output and correspond to the summary and keywords of the output target document.

此外，为实现上述目的，本申请还提供一种应用服务器，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的知识库文档快速检索***，所述知识库文档快速检索***被所述处理器执行时实现如上述的知识库文档快速检索方法的步骤。In addition, in order to achieve the above object, the present application further provides an application server, including a memory, a processor, and a knowledge base document fast retrieval system stored on the memory and operable on the processor, the knowledge base document The step of implementing the knowledge base document quick retrieval method as described above when the fast retrieval system is executed by the processor.

进一步地，为实现上述目的，本申请还提供一种计算机可读存储介质，所述计算机可读存储介质存储有知识库文档快速检索***，所述知识库文档快速检索***可被至少一个处理器执行，以使所述至少一个处理器执行如上述的知识库文档快速检索方法的步骤。Further, to achieve the above object, the present application further provides a computer readable storage medium storing a knowledge base document fast retrieval system, the knowledge base document fast retrieval system being configurable by at least one processor Executing to cause the at least one processor to perform the steps of the knowledge base document fast retrieval method as described above.

相较于现有技术，本申请所提出的知识库文档快速检索方法、应用服务器及计算机可读存储介质，首先接收用户输入的检索信息；其次对所述检索信息进行分析、处理以获取查询词；再次根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序；然后通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词；最后输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。采用本申请所提出的知识库文档快速检索方法、应用服务器及计算机可读存储介质，可以对知识库中的档案进行快速、准确的检索，及可以快速了解检索到的档案的主要内容。Compared with the prior art, the knowledge database document fast retrieval method, the application server, and the computer readable storage medium proposed by the present application first receive retrieval information input by a user; secondly, analyze and process the retrieval information to obtain a query word. Searching the documents in the knowledge base again according to the query words, and sorting the search results according to the search matching degree; then obtaining the abstracts and keywords of each document through the summary generation model and the keyword generation model; Search results and corresponding to the summary and keywords of the output target document. By using the knowledge database document fast retrieval method, the application server and the computer readable storage medium proposed in the present application, the files in the knowledge base can be quickly and accurately retrieved, and the main contents of the retrieved files can be quickly understood.

附图说明DRAWINGS

图1是本申请应用服务器一可选的硬件架构的示意图；1 is a schematic diagram of an optional hardware architecture of an application server of the present application;

图2是本申请知识库文档快速检索***实施方式的程序模块示意图；2 is a schematic diagram of a program module of an implementation manner of a quick retrieval system of the knowledge base document of the present application;

图3是本申请知识库文档快速检索方法第一实施方式的流程示意图；3 is a schematic flowchart of a first embodiment of a method for quickly searching a knowledge base of the present application;

图4是本申请知识库文档快速检索方法第二实施方式的流程示意图；4 is a schematic flowchart of a second embodiment of a method for quickly searching a knowledge base of the present application;

图5是本申请知识库文档快速检索方法第三实施方式的流程示意图；5 is a schematic flowchart of a third embodiment of a method for quickly searching a knowledge base of the present application;

图6是本申请知识库文档快速检索方法第四实施方式的流程示意图；6 is a schematic flowchart of a fourth embodiment of a method for quickly searching a knowledge base of the present application;

图7是本申请知识库文档快速检索方法第五实施方式的流程示意图；7 is a schematic flowchart of a fifth embodiment of a method for quickly searching a knowledge base of the present application;

图8是本申请知识库文档快速检索方法第六实施方式的流程示意图；8 is a schematic flowchart of a sixth embodiment of a method for quickly searching a knowledge base of the present application;

图9是本申请知识库文档快速检索方法第七实施方式的流程示意图。FIG. 9 is a schematic flowchart diagram of a seventh embodiment of a quick retrieval method of the knowledge base document of the present application.

本申请目的的实现、功能特点及优点将结合实施方式，参照附图做进一步说明。The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施方式，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施方式仅用以解释本申请，并不用于限定本申请。基于本申请中的实施方式，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施方式，都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

需要说明的是，在本申请中涉及“第一”、“第二”等的描述仅用于描述目的，而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此，限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外，各个实施方式之间的技术方案可以相互结合，但是必须是以本领域普通技术人员能够实现为基础，当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在，也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.

参阅图1所示，是本申请应用服务器1一可选的硬件架构的示意图。Referring to FIG. 1, it is a schematic diagram of an optional hardware architecture of the application server 1 of the present application.

本实施方式中，所述应用服务器1可包括，但不仅限于，可通过***总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是，图1仅示出了具有组件11-13的应用服务器1，但是应理解的是，并不要求实施所有示出的组件，可以替代的实施更多或者更少的组件。In this embodiment, the application server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the application server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.

其中，所述应用服务器1可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备，该应用服务器1可以是独立的服务器，也可以是多个服务器所组成的服务器集群。The application server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server. The application server 1 may be an independent server or a server cluster composed of multiple servers. .

所述存储器11至少包括一种类型的可读存储介质，所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如，SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施方式中，所述存储器11可以是所述应用服务器1的内部存储单元，例如该应用服务器1的硬盘或内存。在另一些实施方式中，所述存储器11也可以是所述应用服务器1的外部存储设备，例如该应用服务器1上配备的插接式硬盘，智能存储卡(Smart Media Card,SMC)，安全数字(Secure Digital,SD)卡，闪存卡(Flash Card)等。当然，所述存储器11还可以既包括所述应用服务器1的内部存储单元也包括其外部存储设备。本实施方式中，所述存储器11通常用于存储安装于所述应用服务器1的操作***和各类应用软件，例如知识库文档快速检索***200的程序代码等。此外，所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. In some embodiments, the memory 11 may be an internal storage unit of the application server 1, such as a hard disk or memory of the application server 1. In other embodiments, the memory 11 may also be an external storage device of the application server 1, such as a plug-in hard disk equipped on the application server 1, a smart memory card (SMC), and a secure digital number. (Secure Digital, SD) card, flash card, etc. Of course, the memory 11 can also include both the internal storage unit of the application server 1 and its external storage device. In the present embodiment, the memory 11 is generally used to store an operating system installed in the application server 1 and various types of application software, such as program code of the knowledge base document quick retrieval system 200. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.

所述处理器12在一些实施方式中可以是中央处理器(Central Processing Unit，CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述应用服务器1的总体操作。本实施方式中，所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据，例如运行所述的知识库文档快速检索***200等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control the overall operation of the application server 1. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running the knowledge base document fast retrieval system 200 and the like.

所述网络接口13可包括无线网络接口或有线网络接口，该网络接口13通常用于在所述应用服务器1与其他电子设备之间建立通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the application server 1 and other electronic devices.

至此，己经详细介绍了本申请相关设备的硬件结构和功能。下面，将基于上述介绍提出本申请的各个实施方式。So far, the hardware structure and functions of the devices related to this application have been described in detail. Hereinafter, various embodiments of the present application will be made based on the above description.

首先，本申请提出一种知识库文档快速检索***200。First, the present application proposes a knowledge base document rapid retrieval system 200.

参阅图2所示，是本申请知识库文档快速检索***200第一实施方式的程序模块图。Referring to Figure 2, there is shown a block diagram of the first embodiment of the knowledge base document quick retrieval system 200 of the present application.

所述知识库文档快速检索***200包括一系列的存储于存储器11上的计算机程序指令，当该计算机程序指令被处理器12执行时，可以实现本申请各实施方式的知识库文档快速检索操作。在一些实施方式中，基于该计算机程序指令各部分所实现的特定的操作，知识库文档快速检索***200可以被划分为一个或多个模块。例如，在图2中，所述知识库文档快速检索***200可以被分割成获取模块21、分析处理模块22、检索模块23、排序模块24、建立模块25、调用模块26及输出模块27。其中：The knowledge base document rapid retrieval system 200 includes a series of computer program instructions stored on the memory 11, and when the computer program instructions are executed by the processor 12, the knowledge base document quick retrieval operation of the embodiments of the present application can be implemented. In some embodiments, the knowledge base document quick retrieval system 200 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the knowledge base document quick retrieval system 200 can be divided into an acquisition module 21, an analysis processing module 22, a retrieval module 23, a sorting module 24, an establishing module 25, a calling module 26, and an output module 27. among them:

所述获取模块21，用于接收用户输入的检索信息。The obtaining module 21 is configured to receive retrieval information input by a user.

具体地，所述检索信息根据不同情况有不同的方式，举例而言，可包括以下三种：第一种情况，所述检索信息为句子的情况；第二种情况，所述检索信息为单词的情况；第三种情况，所述检索信息包括句子及单词的情况。Specifically, the search information may be different according to different situations. For example, the search information may include the following three types: in the first case, the search information is a sentence; in the second case, the search information is a word. In the third case, the retrieval information includes the case of a sentence and a word.

所述分析处理模块22，用于对所述检索信息进行分析、处理以获取查询词。The analysis processing module 22 is configured to analyze and process the search information to obtain a query word.

第一种方式：针对检索信息为句子的情况，通过语法分析及语义分析结合的方式对输入的句子进行分词处理，剔除无意义的字词符号，抽取出若干个查询词并传给检索模块进行搜索。举例而言，若用户输入“今年中国的经济形式怎么样？”，通过分析可获得“中国”，“经济”的关键查询词，而剔除了助词、疑问词、符号等不重要的字词符号；The first way: for the case where the search information is a sentence, the input sentence is processed by word segmentation through the combination of grammar analysis and semantic analysis, the meaningless word symbols are eliminated, and several query words are extracted and transmitted to the retrieval module. search for. For example, if the user enters “How is the economic form of China this year?”, the key query words of “China” and “Economy” can be obtained through analysis, and unimportant word symbols such as auxiliary words, interrogative words and symbols are excluded. ;

第二种方式：针对检索信息为单词的情况，依据预设的规则把查询词在概念上扩展成对应的同义词、近义词及上下位词，依据同义近义词相似度算法抽取部分扩展词或接收用户选择的扩展词做为查询词，作为查询词的扩展词的选择可以根据各词的优先级别。举例而言，用户输入“大学生”，可以后的“大学生”扩展词“本科生”、“研究生”、“专科生”、“大专生”、“中专生”等等：The second way: for the case where the retrieval information is a word, the query word is conceptually extended into corresponding synonyms, synonyms and upper and lower words according to a preset rule, and part of the extended words or receiving users are extracted according to the synonym similarity similarity algorithm. The selected extended word is used as the query word, and the choice of the extended word as the query word can be based on the priority level of each word. For example, the user enters "college students", and the subsequent "college students" can be extended to "undergraduate students", "graduate students", "specialist students", "college students", "secondary students", etc.:

第三种方式：将两种功能结合，具体结合过程为：首先对检索信息进行语义分析与语法分析相结合的分词处理，然后将分割出的査询词在概念上扩展成对应的同义词、近义词或者上下位词，依据相似度优先算法抽取部分扩展词或接收用户选择的扩展词，最后把查询词和限定的扩展词一起作为查询条件传给检索模块。举例而言，若用户输入“今年中国的经济形式怎么样？”***获得了“中国”，“经济”两个查询词，那么通过分析可获得“中国”的扩展词，比如“大陆”，“内地”，“国内”等；根据“经济”可获得扩展词“GDP”、“贸易”、“商业”、“财经”、“金融”等。The third way is to combine the two functions. The specific combination process is as follows: firstly, the semantic analysis and the grammatical analysis are combined to process the word segmentation, and then the segmented query words are conceptually extended into corresponding synonyms and synonyms. Or the upper and lower words, according to the similarity priority algorithm, extract part of the extended words or receive the extended words selected by the user, and finally pass the query words together with the defined extended words as the query conditions to the retrieval module. For example, if the user enters "How is the economic form of China this year?" The system obtains two query words "China" and "Economy", then the analysis can obtain the extension words of "China", such as "Mainland", "Inland", "domestic", etc.; according to "economic", the extension words "GDP", "trade", "commercial", "financial", "financial", etc. are available.

所述检索模块23，用于根据所述查询词对知识库中的文档进行搜索。The searching module 23 is configured to search a document in the knowledge base according to the query word.

具体地，知识库中的文档包括多种类型，例如包括pdf、doc、docx、ppt、excel、txt、html、xml、zip、tar等格式的文本文件。根据所述查询词可进行全文检索操作，以数据库为源，建立索引库，利用TF-IDF计算权重获得搜索匹配度，对检索结果根据所搜匹配度进行智能排序，并且使检索词高亮显示。检索方式包括跨语言信息检索、拼写检查、正则检索(针对专业人士)、实时检索结果和条目的记录等，实现了辅助检索的最优操作。检索过程中，还可根据历史记录和热搜进行搜索结果自动补全。Specifically, the documents in the knowledge base include various types, for example, text files including pdf, doc, docx, ppt, excel, txt, html, xml, zip, tar, and the like. According to the query word, a full-text search operation can be performed, and a database is used as a source to build an index library, and the search matching degree is obtained by using TF-IDF to calculate the weight, and the search result is intelligently sorted according to the searched matching degree, and the search word is highlighted. . Search methods include cross-language information retrieval, spell check, regular search (for professionals), real-time search results, and entry records, etc., to achieve the optimal operation of assisted retrieval. Search results can be automatically completed based on historical records and hot searches during the search process.

所述排序模块24，用于根据搜索匹配度对搜索结果进行排序。The sorting module 24 is configured to sort the search results according to the search matching degree.

所述建立模块25，用于建立摘要生成模型及关键词生成模型。The establishing module 25 is configured to establish a summary generation model and a keyword generation model.

所述调用模块26，用于调用所述摘要生成模型及关键词生成模型获得各文档的摘要及关键词。The calling module 26 is configured to invoke the summary generation model and the keyword generation model to obtain a summary and keywords of each document.

具体地，获取各文档的摘要及关键词包括以下步骤：Specifically, obtaining a summary and keywords of each document includes the following steps:

第一，对目标文档进行断句、分词，将目标文档的内容拆分成句子及词语。First, the target document is sentenced and segmented, and the content of the target document is split into sentences and words.

第二，通过摘要生成模型获取权重值大于预设值的句子生成摘要,通过关键词生成模型选择词频大于预设值的词语生成关键词。Secondly, the digest generating model obtains a sentence generating digest whose weight value is greater than a preset value, and generates a keyword by using a keyword generating model to select a word with a word frequency greater than a preset value.

所述输出模块27，用于输出排序后的搜索结果，并对应输出目标文档的摘要及关键词。The output module 27 is configured to output the sorted search result, and corresponding to the summary and keywords of the output target document.

具体地，用户通常点击排名靠前的文档进行查看，当用户点击某一文档时，显示模块将显示文档的内容/摘要/关键词等。Specifically, the user usually clicks on the top-ranked document for viewing. When the user clicks on a certain document, the display module displays the content/summary/keyword of the document.

此外，本申请还提出一种知识库文档快速检索方法。In addition, the present application also proposes a quick retrieval method of the knowledge base document.

参阅图3所示，是本申请知识库文档快速检索方法第一实施方式的流程示意图。在本实施方式中，根据不同的需求，图3所示的流程图中的步骤的执行顺序可以改变，某些步骤可以省略。Referring to FIG. 3, it is a schematic flowchart of the first embodiment of the quick retrieval method of the knowledge base document of the present application. In the present embodiment, the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.

步骤S110，接收用户输入的检索信息。Step S110, receiving retrieval information input by the user.

步骤S120，对所述检索信息进行分析、处理以获取查询词。Step S120, analyzing and processing the search information to obtain a query word.

步骤S130，根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序。Step S130: Searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree.

步骤S140，通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词。Step S140, obtaining a summary and a keyword of each document by using a digest generation model and a keyword generation model.

步骤S150，输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。Step S150, outputting the sorted search result, and corresponding to the summary and keywords of the output target document.

如图4所示，是本申请知识库文档快速检索方法的第二实施方式的流程示意图。在第一实施方式中的步骤S120“对所述检索信息进行分析、处理以获取查询词”具体包括如下步骤：As shown in FIG. 4, it is a schematic flowchart of a second implementation manner of the quick retrieval method of the knowledge base document of the present application. In step S120 in the first embodiment, the method of analyzing and processing the search information to obtain a query word specifically includes the following steps:

S210，当所述检索信息为句子时，通过语法分析及语义分析结合的方式对输入的句子进行分词处理，剔除无意义的字词符号，抽取出若干个所述查询词。S210, when the search information is a sentence, the input sentence is subjected to word segmentation by a combination of syntax analysis and semantic analysis, the meaningless word symbol is removed, and a plurality of the query words are extracted.

举例而言，若用户输入“今年中国的经济形式怎么样？”，通过分析可获得“中国”，“经济”的关键查询词，而剔除了助词、疑问词、符号等不重要的字词符号。For example, if the user enters “How is the economic form of China this year?”, the key query words of “China” and “Economy” can be obtained through analysis, and unimportant word symbols such as auxiliary words, interrogative words and symbols are excluded. .

S210，当所述检索信息为单词时，依据预设的规则把所述单词在概念上扩展成对应的同义词、近义词及上下位词，依据同义近义词相似度算法抽取部分扩展词或接收用户选择的扩展词做为所述查询词。S210. When the search information is a word, the word is conceptually expanded into corresponding synonyms, synonyms, and upper and lower words according to a preset rule, and part of the extended words or receiving user selection is extracted according to the synonym similarity similarity algorithm. The extension word is used as the query term.

举例而言，用户输入“大学生”，可以后的“大学生”扩展词“本科生”、“研究生”、“专科生”、“大专生”、“中专生”等等。For example, the user inputs "college students", and the latter "university students" expand the words "undergraduate students", "graduate students", "specialist students", "college students", "secondary students" and so on.

如图5所示，是本申请知识库文档快速检索方法的第三实施方式的流程示意图。在第一实施方式中的步骤S120“对所述检索信息进行分析、处理以获取查询词”具体包括如下步骤：As shown in FIG. 5, it is a schematic flowchart of a third embodiment of the quick retrieval method of the knowledge base document of the present application. In step S120 in the first embodiment, “analysing and processing the search information to obtain a query word” specifically includes the following steps:

S310，对所述检索信息进行语义分析与语法分析相结合的分词处理。S310. Perform word segmentation processing combining the semantic analysis and the grammar analysis on the search information.

S320，将分割出的所述査询词在概念上扩展成对应的同义词、近义词或者上下位词。S320, conceptually expanding the segmented query words into corresponding synonyms, synonyms, or upper and lower words.

S330，依据相似度优先算法抽取部分扩展词或接收用户选择的扩展词。S330. Extract a partial extension word according to the similarity priority algorithm or receive an extension word selected by the user.

S340，把所述查询词和限定的扩展词一起作为所述查询词。S340, using the query word together with the defined extended word as the query word.

具体地，当用户输入一句话、一段话作为检索信息，***首先分割段落、句子为词语，经过分析后获得较重要的词语，并将重要的词语进行词意扩展，扩展词语包括上位词、下位词、近义词、同义词等等。举例而言，若用户输入“今年中国的经济形式怎么样？”***获得了“中国”，“经济”两个查询词，那么***可获得“中国”的扩展词，比如“大陆”，“内地”，“国内”等；根据“经济”可获得扩展词“GDP”、“贸易”、“商业”、“财经”、“金融”等。Specifically, when the user inputs a sentence or a paragraph as the retrieval information, the system first divides the paragraph and the sentence into words, and after analysis, obtains more important words, and expands the important words into meanings, and the extended words include the upper words and the lower positions. Words, synonyms, synonyms, and so on. For example, if the user enters "How is the economic form of China this year?" The system obtains two query words "China" and "Economy", then the system can obtain the extension words of "China", such as "Mainland", "Mainland" "Domestic", etc.; according to the "economic", the extension words "GDP", "trade", "commercial", "financial", "financial", etc. are available.

如图6所示，是本申请知识库文档快速检索方法的第四实施方式的流程示意图。在第一实施方式中的步骤S130“根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序”具体包括：As shown in FIG. 6, it is a schematic flowchart of a fourth embodiment of the method for quickly searching for a knowledge base document of the present application. In step S130 in the first embodiment, “searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree” specifically includes:

S410，根据所述查询词进行全文检索操作。S410. Perform a full-text retrieval operation according to the query word.

具体地，检索方式包括跨语言信息检索、拼写检查、正则检索(针对专业人士)、实时检索结果和条目的记录等，实现了辅助检索的最优操作。Specifically, the retrieval methods include cross-language information retrieval, spell checking, regular retrieval (for professionals), real-time retrieval results, and entry records, etc., and the optimal operation of the auxiliary retrieval is realized.

S420，以数据库为源，建立索引库，利用TF-IDF计算权重获得搜索匹配度。S420: using a database as a source, establishing an index library, and using TF-IDF to calculate weights to obtain search matching degree.

具体地，TF-IDF是一种统计方法，用以评估一字词对于一个文件集或一个语料库中的其中一份文件的重要程度。字词的重要性随着它在文件中出现的次数成正比增加，但同时会随着它在语料库中出现的频率成反比下降。TF-IDF的主要思想是：如果某个词或短语在一篇文章中出现的频率TF高，并且在其他文章中很少出现，则认为此词或者短语具有很好的类别区分能力，适合用来分类。TFIDF实际上是：TF*IDF，TF词频(Term Frequency)，IDF逆向文件频率(Inverse Document Frequency)。TF表示词条在文档中出现的频率。Specifically, TF-IDF is a statistical method used to assess the importance of a word for a file set or one of the files in a corpus. The importance of a word increases proportionally with the number of times it appears in the file, but it also decreases inversely with the frequency it appears in the corpus. The main idea of TF-IDF is: If a word or phrase appears in an article with a high frequency TF and rarely appears in other articles, then the word or phrase is considered to have good class distinguishing ability and is suitable for use. To classify. TFIDF is actually: TF*IDF, TF word frequency (Term Frequency), IDF inverse file frequency (Inverse Document Frequency). TF indicates how often the term appears in the document.

S430，对检索结果根据所搜匹配度进行智能排序，并且使检索词高亮显示。S430, intelligently sorting the search results according to the searched matching degree, and highlighting the search words.

具体地，可以设置一个匹配度阈值，将大于所述匹配度阈值的文档显示出来。用户还可以根据需要在一个界面上显示文档的数目，例如为20、30、50等。Specifically, a matching degree threshold may be set to display a document larger than the matching degree threshold. The user can also display the number of documents on one interface as needed, for example, 20, 30, 50, and the like.

如图7所示，是本申请知识库文档快速检索方法的第五实施方式的流程示意图。在第一实施方式中的步骤S103“根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序”之后还包括步骤：FIG. 7 is a schematic flowchart diagram of a fifth embodiment of the method for quickly searching for a knowledge base document of the present application. In step S103 in the first embodiment, "searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree" further includes the steps of:

S510，根据历史记录和热搜进行搜索结果自动补全。S510, the search result is automatically completed according to the historical record and the hot search.

具体地，结合历史所搜记录及热搜可以对所搜的结果进行补充及优化，使得所搜结果更加完善、准确。所述历史搜索记录存储于数据库或者服务器中，所述热搜结果也可以从数据库或者服务器的检索记录统计数据中获得。Specifically, the search results and the hot search can be used to supplement and optimize the search results, so that the search results are more complete and accurate. The historical search record is stored in a database or a server, and the hot search result can also be obtained from the database or the retrieved record statistics of the server.

如图8所示，是本申请知识库文档快速检索方法的第六实施方式的流程示意图。在第一实施方式中的步骤S140“通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词”具体包括：FIG. 8 is a schematic flowchart diagram of a sixth embodiment of a method for quickly searching for a knowledge base document of the present application. In step S140 in the first embodiment, "the summary and keywords of each document are obtained by the digest generation model and the keyword generation model" specifically include:

S610，对目标文档进行断句、分词，将目标文档的内容拆分成句子及词语；S610, performing segmentation and word segmentation on the target document, and splitting the content of the target document into sentences and words;

S620，通过所述摘要生成模型获取权重值大于预设值的句子生成所述摘要，通过所述关键词生成模型选择词频大于预设值的词语生成所述关键词。S620: Generate, by using the digest generation model, a sentence whose weight value is greater than a preset value, generate the digest, and generate, by using the keyword generation model, a word whose word frequency is greater than a preset value to generate the keyword.

如图9所示，是本申请知识库文档快速检索方法的第七实施方式的流程示意图。在第一实施方式中的步骤S140“通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词”还包括：As shown in FIG. 9, it is a schematic flowchart of a seventh embodiment of the method for quickly searching for a knowledge base document of the present application. In step S140 in the first embodiment, "the summary and keywords of each document are obtained by the digest generation model and the keyword generation model" further includes:

S710，根据如下公式建立所述摘要生成模型：S710, the summary generation model is established according to the following formula:

Wi＝a*WPi+b*WSiWi=a*WPi+b*WSi

m为奇数

m is odd

m为偶数

m is even

S720，以词频统计为基础建立所述关键词生成模型；S720. The keyword generation model is established based on word frequency statistics.

其中，Wi各个句子的权重值；Wij为每个句子和各个关键词的权重，WPi为位置权重值，WSi为语义权重值，a和b为权重系数，wp(ij)为第j各关键词在第i各句子中出现的频数，sp(j)为各个句子里面包含有第j各关键词的句子数，m为句子总数，n为关键词总数。Among them, the weight value of each sentence of Wi; Wij is the weight of each sentence and each keyword, WPi is the position weight value, WSi is the semantic weight value, a and b are the weight coefficients, and wp(ij) is the jth keyword The frequency that appears in the i-th sentence, sp(j) is the number of sentences in each sentence containing the j-th keyword, m is the total number of sentences, and n is the total number of keywords.

本申请本申请上述本申请实施方式序号仅仅为了描述，不代表实施方式的优劣。The present application serial number of the present application is for the purpose of description only and does not represent the advantages and disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施方式方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备(可以是手机，计算机，服务器，空调器，或者网络设备等)执行本申请各个实施方式所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in the various embodiments of the present application.

以上仅为本申请的优选实施方式，并非因此限制本申请的专利范围，凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and thus does not limit the scope of the patent application, and the equivalent structure or equivalent process transformation made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

一种知识库文档快速检索方法，应用于应用服务器，其特征在于，所述方法包括步骤：A method for quickly searching a knowledge base document, which is applied to an application server, characterized in that the method comprises the steps of:

接收用户输入的检索信息；Receiving retrieval information input by the user;

对所述检索信息进行分析、处理以获取查询词；The search information is analyzed and processed to obtain a query word;

根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序；Searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree;

通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词；及Obtaining abstracts and keywords of each document through the summary generation model and the keyword generation model;

输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。The sorted search results are output and correspond to the summary and keywords of the output target document.
如权利要求1所述的知识库文档快速检索方法，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The method for quickly searching a knowledge base document according to claim 1, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

当所述检索信息为句子时，通过语法分析及语义分析结合的方式对输入的句子进行分词处理，剔除无意义的字词符号，抽取出若干个所述查询词；及When the search information is a sentence, the input sentence is subjected to word segmentation by a combination of grammar analysis and semantic analysis, the meaningless word symbol is removed, and a plurality of the query words are extracted;

当所述检索信息为单词时，依据预设的规则把所述单词在概念上扩展成对应的同义词、近义词及上下位词，依据同义近义词相似度算法抽取部分扩展词或接收用户选择的扩展词作为所述查询词。When the retrieval information is a word, the word is conceptually expanded into corresponding synonyms, synonyms, and upper and lower words according to a preset rule, and a partial extension word is extracted according to a synonym similarity similarity algorithm or an extension selected by the user is received. The word is used as the query term.
如权利要求2所述的知识库快速检索方法，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The method for quickly searching a knowledge base according to claim 2, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

对所述检索信息进行语义分析与语法分析相结合的分词处理，将经过分词处理分割得到的词语作为所述查询词；Performing word segmentation processing combining the semantic analysis and the grammar analysis on the search information, and using the word segmentation processed by the word segmentation processing as the query word;

将分割出的所述査询词在概念上扩展成对应的同义词、近义词或者上下位词，依据相似度优先算法抽取部分扩展词或接收用户选择的扩展词；The segmented query words are conceptually expanded into corresponding synonyms, synonyms or upper and lower words, and the partial expansion words are extracted according to the similarity priority algorithm or the extended words selected by the user are received;

把所述查询词和限定的扩展词一起作为所述查询词。The query word and the defined extended word are taken together as the query word.
如权利要求1所述的知识库快速检索方法，其特征在于，所述根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序的步骤还包括：The method for quickly searching a knowledge base according to claim 1, wherein the step of searching the documents in the knowledge base according to the query words and sorting the search results according to the search matching degree further comprises:

根据所述查询词进行全文检索操作；Performing a full-text search operation according to the query word;

以数据库为源，建立索引库，利用TF-IDF计算权重获得搜索匹配度；及Using the database as the source, build an index library, and use TF-IDF to calculate the weight to obtain the search matching degree;

对检索结果根据所搜匹配度进行智能排序，并且使检索词高亮显示。The search results are intelligently sorted according to the searched matching degree, and the search words are highlighted.
如权利要求4所述的知识库快速检索方法，其特征在于，所述检索操作包括跨语言信息检索、拼写检查及正则检索。The knowledge base quick retrieval method according to claim 4, wherein the retrieval operation comprises cross-language information retrieval, spell check, and regular retrieval.
如权利要求4所述的知识库快速检索方法，其特征在于，所述根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序的步骤还包括：The method for quickly searching for a knowledge base according to claim 4, wherein the step of searching the documents in the knowledge base according to the query words and sorting the search results according to the search matching degree further comprises:

根据历史记录和热搜进行搜索结果自动补全。Search results are automatically completed based on historical records and hot searches.
如权利要求1所述的知识库快速检索方法，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The method for quickly searching a knowledge base according to claim 1, wherein the step of obtaining a summary and a keyword of each document by using the summary generation model and the keyword generation model further comprises:

对目标文档进行断句、分词，将目标文档的内容拆分成句子及词语；及Breaking sentences and word segments on the target document, and splitting the content of the target document into sentences and words; and

通过所述摘要生成模型获取权重值大于预设值的句子生成所述摘要，通过所述关键词生成模型选择词频大于预设值的词语生成所述关键词。And generating, by the digest generation model, a sentence whose weight value is greater than a preset value, generating the digest, and generating, by the keyword generation model, a word whose word frequency is greater than a preset value to generate the keyword.
如权利要求7所述的知识库快速检索方法，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The method for quickly searching a knowledge base according to claim 7, wherein the step of obtaining a summary and a keyword of each document by using the summary generation model and the keyword generation model further comprises:

根据如下公式建立所述摘要生成模型：The summary generation model is established according to the following formula:

Wi＝a*WPi+b*WSiWi=a*WPi+b*WSi

m为奇数
m is odd

m为偶数
m is even

及
and

以词频统计为基础建立所述关键词生成模型；Establishing the keyword generation model based on word frequency statistics;

其中，Wi各个句子的权重值；Wij为每个句子和各个关键词的权重，WPi为位置权重值，WSi为语义权重值，a和b为权重系数，wp(ij)为第j各关键词在第i各句子中出现的频数，sp(j)为各个句子里面包含有第j各关键词的句子数，m为句子总数，n为关键词总数。Among them, the weight value of each sentence of Wi; Wij is the weight of each sentence and each keyword, WPi is the position weight value, WSi is the semantic weight value, a and b are the weight coefficients, and wp(ij) is the jth keyword The frequency that appears in the i-th sentence, sp(j) is the number of sentences in each sentence containing the j-th keyword, m is the total number of sentences, and n is the total number of keywords.
一种应用服务器，其特征在于，所述应用服务器包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的知识库文档快速检索***，所述知识库文档快速检索***被所述处理器执行时实现如下步骤：An application server, comprising: a memory, a processor, and a knowledge base document fast retrieval system stored on the memory and operable on the processor, the knowledge base document fast retrieval system The following steps are implemented when executed by the processor:

接收用户输入的检索信息；Receiving retrieval information input by the user;

对所述检索信息进行分析、处理以获取查询词；The search information is analyzed and processed to obtain a query word;

根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序；Searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree;

通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词；及Obtaining abstracts and keywords of each document through the summary generation model and the keyword generation model;

输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。The sorted search results are output and correspond to the summary and keywords of the output target document.
如权利要求9所述的应用服务器，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The application server according to claim 9, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

当所述检索信息为句子时，通过语法分析及语义分析结合的方式对输入的句子进行分词处理，剔除无意义的字词符号，抽取出若干个所述查询词；及When the search information is a sentence, the input sentence is subjected to word segmentation by a combination of grammar analysis and semantic analysis, the meaningless word symbol is removed, and a plurality of the query words are extracted;

当所述检索信息为单词时，依据预设的规则把所述单词在概念上扩展成对应的同义词、近义词及上下位词，依据同义近义词相似度算法抽取部分扩展词或接收用户选择的扩展词作为所述查询词。When the retrieval information is a word, the word is conceptually expanded into corresponding synonyms, synonyms, and upper and lower words according to a preset rule, and a partial extension word is extracted according to a synonym similarity similarity algorithm or an extension selected by the user is received. The word is used as the query term.
如权利要求10所述的应用服务器，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The application server according to claim 10, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

对所述检索信息进行语义分析与语法分析相结合的分词处理，将经过分词处理分割得到的词语作为所述查询词；Performing word segmentation processing combining the semantic analysis and the grammar analysis on the search information, and using the word segmentation processed by the word segmentation processing as the query word;

将分割出的所述査询词在概念上扩展成对应的同义词、近义词或者上下位词，依据相似度优先算法抽取部分扩展词或接收用户选择的扩展词；The segmented query words are conceptually expanded into corresponding synonyms, synonyms or upper and lower words, and the partial expansion words are extracted according to the similarity priority algorithm or the extended words selected by the user are received;

把所述查询词和限定的扩展词一起作为所述查询词。The query word and the defined extended word are taken together as the query word.
如权利要求9所述的应用服务器，其特征在于，所述根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序的步骤还包括：The application server according to claim 9, wherein the step of searching the documents in the knowledge base according to the query words and sorting the search results according to the search matching degree further comprises:

根据所述查询词进行全文检索操作；Performing a full-text search operation according to the query word;

以数据库为源，建立索引库，利用TF-IDF计算权重获得搜索匹配度；及Using the database as the source, build an index library, and use TF-IDF to calculate the weight to obtain the search matching degree;

对检索结果根据所搜匹配度进行智能排序，并且使检索词高亮显示。The search results are intelligently sorted according to the searched matching degree, and the search words are highlighted.
如权利要求9所述的应用服务器，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The application server according to claim 9, wherein the step of obtaining a digest and a keyword of each document by using the digest generation model and the keyword generation model further comprises:

对目标文档进行断句、分词，将目标文档的内容拆分成句子及词语；及Breaking sentences and word segments on the target document, and splitting the content of the target document into sentences and words; and

通过所述摘要生成模型获取权重值大于预设值的句子生成所述摘要，通过所述关键词生成模型选择词频大于预设值的词语生成所述关键词。And generating, by the digest generation model, a sentence whose weight value is greater than a preset value, generating the digest, and generating, by the keyword generation model, a word whose word frequency is greater than a preset value to generate the keyword.
如权利要求13所述的应用服务器，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The application server according to claim 13, wherein the step of obtaining a digest and a keyword of each document by using the digest generation model and the keyword generation model further comprises:

根据如下公式建立所述摘要生成模型：The summary generation model is established according to the following formula:

Wi＝a*WPi+b*WSiWi=a*WPi+b*WSi

m为奇数
m is odd

m为偶数
m is even

及
and

以词频统计为基础建立所述关键词生成模型；Establishing the keyword generation model based on word frequency statistics;

其中，Wi各个句子的权重值；Wij为每个句子和各个关键词的权重，WPi为位置权重值，WSi为语义权重值，a和b为权重系数，wp(ij)为第j各关键词在第i各句子中出现的频数，sp(j)为各个句子里面包含有第j各关键词的句子数，m为句子总数，n为关键词总数。Among them, the weight value of each sentence of Wi; Wij is the weight of each sentence and each keyword, WPi is the position weight value, WSi is the semantic weight value, a and b are the weight coefficients, and wp(ij) is the jth keyword The frequency that appears in the i-th sentence, sp(j) is the number of sentences in each sentence containing the j-th keyword, m is the total number of sentences, and n is the total number of keywords.
一种计算机可读存储介质，其特征在于，所述计算机可读存储介质存储有知识库文档快速检索***，所述知识库文档快速检索***可被至少一个处理器执行，以使所述至少一个处理器执行如下步骤：A computer readable storage medium, wherein the computer readable storage medium stores a knowledge base document quick retrieval system, the knowledge base document fast retrieval system being executable by at least one processor to cause the at least one The processor performs the following steps:

接收用户输入的检索信息；Receiving retrieval information input by the user;

对所述检索信息进行分析、处理以获取查询词；The search information is analyzed and processed to obtain a query word;

根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序；Searching for documents in the knowledge base according to the query words, and sorting the search results according to the search matching degree;

通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词；及Obtaining abstracts and keywords of each document through the summary generation model and the keyword generation model;

输出排序后的搜索结果，并对应输出目标文档的所述摘要及关键词。The sorted search results are output and correspond to the summary and keywords of the output target document.
如权利要求15所述的计算机可读存储介质，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The computer readable storage medium according to claim 15, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

当所述检索信息为句子时，通过语法分析及语义分析结合的方式对输入的句子进行分词处理，剔除无意义的字词符号，抽取出若干个所述查询词；及When the search information is a sentence, the input sentence is subjected to word segmentation by a combination of grammar analysis and semantic analysis, the meaningless word symbol is removed, and a plurality of the query words are extracted;

当所述检索信息为单词时，依据预设的规则把所述单词在概念上扩展成对应的同义词、近义词及上下位词，依据同义近义词相似度算法抽取部分扩展词或接收用户选择的扩展词作为所述查询词。When the retrieval information is a word, the word is conceptually expanded into corresponding synonyms, synonyms, and upper and lower words according to a preset rule, and a partial extension word is extracted according to a synonym similarity similarity algorithm or an extension selected by the user is received. The word is used as the query term.
如权利要求16所述的计算机可读存储介质，其特征在于，所述对所述检索信息进行分析、处理以获取查询词的步骤还包括：The computer readable storage medium according to claim 16, wherein the step of analyzing and processing the search information to obtain a query word further comprises:

对所述检索信息进行语义分析与语法分析相结合的分词处理，将经过分词处理分割得到的词语作为所述查询词；Performing word segmentation processing combining the semantic analysis and the grammar analysis on the search information, and using the word segmentation processed by the word segmentation processing as the query word;

将分割出的所述査询词在概念上扩展成对应的同义词、近义词或者上下位词，依据相似度优先算法抽取部分扩展词或接收用户选择的扩展词；The segmented query words are conceptually expanded into corresponding synonyms, synonyms or upper and lower words, and the partial expansion words are extracted according to the similarity priority algorithm or the extended words selected by the user are received;

把所述查询词和限定的扩展词一起作为所述查询词。The query word and the defined extended word are taken together as the query word.
如权利要求15所述的计算机可读存储介质，其特征在于，所述根据所述查询词对知识库中的文档进行搜索，并根据搜索匹配度对搜索结果进行排序的步骤还包括：The computer readable storage medium according to claim 15, wherein the step of searching the documents in the knowledge base according to the query words and sorting the search results according to the search matching degree further comprises:

根据所述查询词进行全文检索操作；Performing a full-text search operation according to the query word;

以数据库为源，建立索引库，利用TF-IDF计算权重获得搜索匹配度；及Using the database as the source, build an index library, and use TF-IDF to calculate the weight to obtain the search matching degree;

对检索结果根据所搜匹配度进行智能排序，并且使检索词高亮显示。The search results are intelligently sorted according to the searched matching degree, and the search words are highlighted.
如权利要求15所述的计算机可读存储介质，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The computer readable storage medium according to claim 15, wherein the step of obtaining a digest and a keyword of each document by using the digest generation model and the keyword generation model further comprises:

对目标文档进行断句、分词，将目标文档的内容拆分成句子及词语；及Breaking sentences and word segments on the target document, and splitting the content of the target document into sentences and words; and

通过所述摘要生成模型获取权重值大于预设值的句子生成所述摘要，通过所述关键词生成模型选择词频大于预设值的词语生成所述关键词。And generating, by the digest generation model, a sentence whose weight value is greater than a preset value, generating the digest, and generating, by the keyword generation model, a word whose word frequency is greater than a preset value to generate the keyword.
如权利要求19所述的计算机可读存储介质，其特征在于，所述通过摘要生成模型及关键词生成模型获得各文档的摘要及关键词的步骤还包括：The computer readable storage medium according to claim 19, wherein the step of obtaining a digest and a keyword of each document by using the digest generation model and the keyword generation model further comprises:

根据如下公式建立所述摘要生成模型：The summary generation model is established according to the following formula:

Wi＝a*WPi+b*WSiWi=a*WPi+b*WSi

m为奇数
m is odd

m为偶数
m is even

及
and

以词频统计为基础建立所述关键词生成模型；Establishing the keyword generation model based on word frequency statistics;

其中，Wi各个句子的权重值；Wij为每个句子和各个关键词的权重，WPi为位置权重值，WSi为语义权重值，a和b为权重系数，wp(ij)为第j各关键词在第i各句子中出现的频数，sp(j)为各个句子里面包含有第j各关键词的句子数，m为句子总数，n为关键词总数。Among them, the weight value of each sentence of Wi; Wij is the weight of each sentence and each keyword, WPi is the position weight value, WSi is the semantic weight value, a and b are the weight coefficients, and wp(ij) is the jth keyword The frequency that appears in the i-th sentence, sp(j) is the number of sentences in each sentence containing the j-th keyword, m is the total number of sentences, and n is the total number of keywords.