CN117149951A - Intelligent retrieval method and device, electronic equipment and storage medium - Google Patents

Intelligent retrieval method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN117149951A
CN117149951A CN202311093756.0A CN202311093756A CN117149951A CN 117149951 A CN117149951 A CN 117149951A CN 202311093756 A CN202311093756 A CN 202311093756A CN 117149951 A CN117149951 A CN 117149951A
Authority
CN
China
Prior art keywords
text
user
matching degree
document library
dimensional array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311093756.0A
Other languages
Chinese (zh)
Inventor
王宇琪
唐焱
张译
谷鹏
段海斌
朱占生
宋肖翔
白汶鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinjiang Lianhai Ina Int Information Technology Ltd
Original Assignee
Xinjiang Lianhai Ina Int Information Technology Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinjiang Lianhai Ina Int Information Technology Ltd filed Critical Xinjiang Lianhai Ina Int Information Technology Ltd
Priority to CN202311093756.0A priority Critical patent/CN117149951A/en
Publication of CN117149951A publication Critical patent/CN117149951A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/38Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an intelligent retrieval method, a system, electronic equipment and a storage medium, which are characterized in that text identifiers in a document library corresponding to text input by a user in a pre-established inverted index list are read according to the text input by the user, and the text identifiers in the document library corresponding to the text input by the user read from the inverted index list are stored in a two-dimensional array, so that the matching degree of the text corresponding to the text identifiers in the document library stored in the two-dimensional array is calculated only during retrieval, and the matching degree of all the texts in the document library is not calculated, thereby greatly improving the retrieval efficiency.

Description

Intelligent retrieval method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an intelligent retrieval method, an intelligent retrieval device, an electronic device, and a storage medium.
Background
With the development of intelligent search technology, more and more search systems are in the public view. Particularly, the method is applied to a public security system, and is helpful for searching laws and regulations, cases and related scenes and improving the case handling efficiency of policemen.
However, for searching laws and regulations, cases and related scenes, the conventional searching method is to acquire the text input by the user, then compare the text input by the user with all the texts stored in the searching library word by word, and output the searching result until the text matched with the text input by the user is searched, so that a great amount of time is wasted for searching in the case handling process, and the case handling efficiency of the police is reduced.
Disclosure of Invention
In view of the above problems, the present application provides an intelligent retrieval method, apparatus, electronic device, and storage medium, so as to implement the method. The specific scheme is as follows:
an intelligent retrieval method, comprising:
acquiring a text input by a user;
according to the text input by the user, reading text identifiers in a document library corresponding to the text input by the user in a pre-established inverted index list, and storing the text identifiers in the document library corresponding to the text input by the user read from the inverted index list in a two-dimensional array;
calculating the matching degree of texts corresponding to text identifiers in a document library stored in the two-dimensional array and texts input by a user one by one;
and determining a text with the matching degree meeting the preset requirement as a retrieval result corresponding to the text input by the user, and outputting the retrieval result.
Optionally, the process of pre-establishing the inverted index list includes:
extracting key information of each text in a document library, wherein the key information comprises: keywords or keywords;
and generating an inverted index list corresponding to each text in the document library, wherein the inverted index list comprises key information and text identifiers corresponding to the key information.
Optionally, storing the text identifiers in the document library corresponding to the text input by the user, read from the inverted index list, in a two-dimensional array includes:
storing the text in the document library read from the inverted index in the row information of the two-dimensional array, wherein each row of information in the two-dimensional array represents each text information in the document library;
the length information of the text in the document library read from the inverted index is stored in column information of a two-dimensional array, each column information in the two-dimensional array representing the length information of each text in the document library.
Optionally, calculating, one by one, a matching degree between a text corresponding to a text identifier in a document library stored in the two-dimensional array and a text input by a user includes:
respectively calculating the maximum public word string length of the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array, wherein the maximum public word string length is the same and continuous field length between the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array;
And calculating to obtain the matching degree of the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array by using the maximum common substring length, the text length input by the user and the text length corresponding to the text identifier in the document library stored in the two-dimensional array.
Optionally, calculating, by using the maximum common sub-string length, the text length input by the user, and the text length corresponding to the text identifier in the document library stored in the two-dimensional array, a matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user includes:
the matching degree is calculated by a formula for calculating the matching degree, and the formula for calculating the matching degree is as follows:
where length (lcs) is the maximum common substring length, length (a) is the text length entered by the user, and length (b) is the text length corresponding to the text identifier in the document library stored in the two-dimensional array.
Optionally, determining the text with the matching degree meeting the preset requirement includes:
and determining the matching degree of texts corresponding to the text identifiers in the document libraries stored in all the two-dimensional arrays and texts input by the users, and taking the text with the highest matching degree as the text with the determined matching degree meeting the preset requirement.
Optionally, determining the text with the matching degree meeting the preset requirement includes:
judging whether the matching degree corresponding to the text with the highest matching degree exceeds a set threshold value;
if the matching degree corresponding to the text with the highest matching degree exceeds the set threshold, the text with the highest matching degree is used as the text with the determined matching degree meeting the preset requirement;
if the matching degree corresponding to the text with the highest matching degree is judged not to exceed the set threshold value, the search returns to failure.
Optionally, the method further comprises:
and acquiring document name list information corresponding to the text in the document library read from the inverted index, acquiring the position relation corresponding to the last text matched with the text input by the user in each row of the two-dimensional array corresponding to the text in the document library read from the inverted index, and storing the information and the position relation in the two-dimensional array.
An intelligent retrieval system, comprising:
the acquisition module is used for acquiring the text input by the user;
the reading module is used for reading text identifiers in a document library corresponding to the text input by the user in a pre-established inverted index list according to the text input by the user, and storing the text identifiers in the document library corresponding to the text input by the user read from the inverted index list in a two-dimensional array;
The computing module is used for computing the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;
and the output module is used for determining the text with the matching degree meeting the preset requirement, taking the text as a retrieval result corresponding to the text input by the user and outputting the retrieval result.
An electronic device comprising at least one processor and a memory coupled to the processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to enable the electronic device to implement the foregoing intelligent retrieval method.
A computer storage medium carrying one or more computer programs which, when executed by an electronic device, enable the electronic device to implement the foregoing intelligent retrieval method.
According to the technical scheme, in the intelligent retrieval method, the text identifiers in the document library corresponding to the text input by the user in the inverted index list which is built in advance are read according to the text input by the user, and the text identifiers in the document library corresponding to the text input by the user which is read from the inverted index list are stored in the two-dimensional array, so that the matching degree of the text corresponding to the text identifiers in the document library stored in the two-dimensional array is calculated only during retrieval, and the matching degree of all the texts in the document library is not calculated, and the retrieval efficiency is greatly improved.
Drawings
The above and other features, advantages and aspects of embodiments of the present application will become more apparent by reference to the following detailed description when taken in conjunction with the accompanying drawings. The same or similar reference numbers will be used throughout the drawings to refer to the same or like elements. It should be understood that the figures are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a flow chart of an intelligent retrieval method provided by the application;
FIG. 2 is an interface schematic diagram of an intelligent retrieval system according to the present application;
FIG. 3 is a schematic diagram of an interface for scene searching in an intelligent retrieval system according to the present application;
FIG. 4 is a schematic diagram of an intelligent search device according to the present application;
fig. 5 is a schematic structural diagram of an electronic device according to the present application.
Detailed Description
Embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While the application is susceptible of embodiment in the drawings, it is to be understood that the application may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided to provide a more thorough and complete understanding of the application. It should be understood that the drawings and embodiments of the application are for illustration purposes only and are not intended to limit the scope of the present application.
The term "including" and variations thereof as used herein are intended to be open-ended, i.e., including, but not limited to. The term "based on" is "based at least in part on"; the term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments. Related definitions of other terms will be given in the description below.
It should be noted that the terms "first," "second," and the like herein are merely used for distinguishing between different devices, modules, or units and not for limiting the order or interdependence of the functions performed by such devices, modules, or units.
It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.
In the prior art, for searching laws and regulations, cases and related scenes, the traditional searching method is to acquire texts input by users, then compare the texts input by the users with all texts stored in a searching library word by word, and output searching results until the texts matched with the texts input by the users are searched, so that a great deal of time is wasted for searching in the case handling process of the police, and the case handling efficiency of the police is reduced.
In order to solve the above problems, the present application provides an intelligent searching method, an apparatus, an electronic device and a storage medium, and the present application will be described in detail with reference to specific embodiments.
Fig. 1 illustrates a flowchart of an intelligent retrieval method according to an embodiment of the present application. The method specifically comprises the following steps:
s10, acquiring a text input by a user.
Fig. 2 shows a schematic diagram of an intelligent retrieval system interface according to an embodiment of the present application. The user may select a search category by clicking on a menu bar, where the search category includes a legal search 202, a case search 203, and a scenario search 204. Each column corresponds to different search scenes, various laws and regulations under the conditions of different complex cases can be searched in the rule search 202, cases similar to the case being processed by the case processing policemen can be searched in the case search 203 for the case processing policemen to refer, and the scenes similar to the case being processed by the case processing policemen can be searched in the scene search 204 for the case processing policemen to refer.
Illustratively, the user may implement the search for related laws and regulations by clicking on the laws and regulations search 202 in the menu bar and inputting the keywords of the related laws and regulations in the input bar 201.
The legal regulation library in the intelligent retrieval system provided by the embodiment of the application covers national legal regulations, local legal regulations, international treaties, international conventions, judicial interpretations and the like, and the content relates to various fields in social life. The intelligent retrieval system provided by the embodiment of the application can search related laws and regulations from a huge amount of legal and regulation libraries through keywords input by a user, and simultaneously, the user can check two to three cases related to the laws and regulations through clicking search results, so that the search efficiency of the user is improved.
Also, the user can search for related cases by clicking on the case search 203 in the menu bar and inputting keywords of related cases in the input bar 201. Specifically, the case database in the intelligent retrieval system provided by the embodiment of the application covers each business department in police and sends out common case types, and a user can search out related cases with user input cases from a large number of case databases by inputting keywords of the related cases.
It should be noted that, the intelligent search system provided in the embodiment of the present application also supports the search of the relevant scenario by the user, and the user may input a simple description about the case scenario in the input field 201 by clicking the scenario search 204 in the menu field, where the number of words of the description about the case scenario should be controlled within 500 words, so as to implement the search of the scenario of the relevant case.
Specifically, the user can obtain three parts of search results through searching the case scenes, the first part mainly comprises relevant laws and regulations applicable to the case scenes searched by the user, and the user can select laws and regulations most relevant to the scenes encountered by the user from the relevant laws and regulations provided by the intelligent search system for reference by the user; the second part mainly comprises case contents which are provided for the user and related to two to three scenes similar to the scenes encountered by the user, and the user can search the contents related to the scenes encountered by the user from the two to three case contents provided by the intelligent retrieval system for the user to select and judge; the third part mainly comprises some law enforcement standardization prompts applicable to the scene encountered by the user, and the user can take corresponding countermeasures according to the law enforcement standardization prompts provided by the intelligent retrieval system.
Exemplary, fig. 3 is an interface schematic diagram of a scene search in an intelligent search system according to the present application. The user can input a simple description about the case scene in the input field 201 by clicking the scene search 204 in the menu field, and the word number of the description about the case scene should be controlled within 500 words, so as to realize the search for the scene of the related case, and enter an interface schematic diagram of the scene search in the intelligent retrieval system as shown in fig. 3.
As shown in fig. 3, the interface for the scene search in the intelligent retrieval system includes a search box 301, a similar case 302, an information prompt box 303, and applicable laws and regulations 304. The search box 301 may be used for the user to continue searching for the scenario description about the relevant case, and the user may describe the case scenario in detail by inputting the description word describing the relevant case scenario again, so that the intelligent search system searches for the relevant case scenario according to the detailed case scenario information input by the user. Similar cases 302 are used to present a brief description of the case scenario relevant to the user's search, and the user can view detailed presentations of the relevant cases by clicking on a search results interface that views the specific case. The information prompt box 303 is used for prompting the police with some solutions for related cases, for example, the police with case can check the first aid measures of various injuries and related practices by clicking. The applicable legal rules 304 are used for displaying the legal rules related to the case scene searched by the user to the user, so that the user can reasonably and legally process related cases according to the legal rules, and meanwhile, the user can click to view a search result interface of the legal rules.
It should be further noted that, the intelligent retrieval system provided by the embodiment of the application also supports the user to enter the case or the prompt information of law enforcement standardization by himself. Specifically, the user may enter the user's repository interface by clicking on my repository 205 in the intelligent retrieval system interface. Specifically, in the user's repository interface, the user may view self-entered case information or law enforcement standardization hints. The information input into the resource library by the user can be set to be visible only by the user, the user with the advanced authority can also input related cases or prompt information normalized by law enforcement by himself, the user with the advanced authority can select the visible range, and the user can select whether the user can only see the information or all people can select the information by himself.
In addition, the intelligent retrieval system provided by the embodiment of the application also supports the function of synchronizing personal information data of the user, the user can log in the same account in different terminal equipment, personal information among different equipment supports the function of synchronizing data, also supports the functions of collecting common laws and regulations, cases and the like of the user, and also supports the note function, namely, the user can check case information input by the user through the note function.
S20, reading text identifiers in a document library corresponding to the text input by the user in the pre-established inverted index list according to the text input by the user, and storing the text identifiers in the document library read from the inverted index in a two-dimensional array.
When the intelligent retrieval system provided by the embodiment of the application retrieves the text input by the user, the text identification in the document library corresponding to the text input by the user in the pre-established inverted index list is read according to the text input by the user, and the text identification in the document library read from the inverted index list is stored in a two-dimensional array.
It should be noted that, the core of the intelligent search system provided by the embodiment of the present application is to process all the texts in the document library by adopting an inverted index mode, specifically, the inverted index is a data structure for full text search, it splits each text in the document library into keywords or forms of keywords, and forms all the documents containing the keywords or keywords into a list, and replaces all the texts in the document library with the list.
Specifically, the data structure of an inverted index is typically composed of two parts: dictionary and inverted list. All keywords or keywords which are obtained by splitting texts in a document library are stored in a dictionary, and the keywords or keywords in the dictionary correspond to a document list in an inverted list and can be stored according to a certain sequence generally according to initial letters or hash values of the keywords or keywords. The inverted list is a core data structure of the inverted index, which is used to record in which document in the document library each keyword or keyword appears, and record related statistical data, such as frequency of appearance in the document, location information in the document, word frequency, and the like.
It should be further noted that, according to the text input by the user, the intelligent retrieval system provided by the embodiment of the application reads the text identifier in the document library corresponding to the text input by the user in the inverted list, and stores the text identifier in the document library read from the inverted list in the two-dimensional array. Specifically, the intelligent retrieval system provided by the embodiment of the application establishes a two-dimensional array x×y, wherein X represents the number of rows of the two-dimensional array, Y represents the number of columns of the two-dimensional array, the row information of the two-dimensional array is used for storing a document list corresponding to the text in the document library read from the inverted list, that is, the number of the document list corresponding to the text in the document library read from the inverted list is equal to the number of the rows of the two-dimensional array, and the column information of the two-dimensional array is used for storing the document position information of the text in the document library read from the inverted list, which is stored in the inverted index.
S30, calculating the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one.
In the embodiment of the application, the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user is required to be calculated one by one.
It should be noted that, when the intelligent retrieval system provided by the embodiment of the application calculates the matching degree of the text input by the user and the text in the document library, only the text in the document library stored in the two-dimensional array is required to be calculated, and the matching degree of all the texts in the document library and the text input by the user is not required to be calculated, so that the retrieval efficiency is greatly improved. Specifically, a specific calculation formula for calculating the matching degree of the text input by the user and the text in the document library is as follows
Where length is used to represent the length of text, length (a) represents the length of text a, length (b) represents the length of text b, and length (lcs) represents the maximum common substring length of text a and text b. In the intelligent retrieval system provided by the embodiment of the application, length (a) can correspond to the text length input by a user, length (b) can correspond to the text length in a document library, and length (lcs) can represent the maximum common substring length of the text input by the user and the text in the document library
Note that, lcs herein refers to a maximum common sub-string length, which is different from a maximum common sub-sequence, and the maximum common sub-string length requires no text continuation in the sub-sequence, and the maximum common sub-string length requires text continuation in the sub-string.
Illustratively, the largest common substring of the strings "ascde" and "axcxdde" is "de" because "de" is continuous in both strings, and the longest common subsequence is "acde" because the longest common subsequence does not require that the characters in the strings be continuous, so long as the longest common subsequence is satisfied to be consistent in order from one to the next in both strings. Therefore, if the matching degree of the character string "ascde" and the character string "axcxdde" is calculated, the matching degree of the character string "ascde" and the character string "axcxdde" can be found to be 0.11 by the above formula for calculating the matching degree.
The intelligent retrieval system provided by the embodiment of the application obtains the text in the document library with the highest matching degree with the text input by the user by calculating the matching degree of the text input by the user and the text in the document library stored in the two-dimensional array.
It should be noted that, in the intelligent retrieval system provided by the embodiment of the application, when the inverted index processes the text in the document library, the text in the document library is split into the form of keywords, and the documents containing the keywords are formed into a list to form an inverted list. Therefore, when the intelligent retrieval system provided by the embodiment of the application processes the text input by the user, the text input by the user is split into the form of keywords to match the text input by the user with the text in the document library.
By way of example, the text entered by the user is "fighting free", and by analyzing the text entered by the user, the "fighting" of document name 250 and the "fighting" of document name 888 containing the text entered by the user can be queried from the inverted index, and the "fighting" of document name 250 and the "fighting" of document name 888 are stored in the two-dimensional array.
Specifically, the text input by the user can be split into the form of keywords, namely, the form of no, fighting and fighting, and the text input by the user can be split into the form of keywords and keywords, namely, the form of no and fighting. The intelligent retrieval system provided by the embodiment of the application adopts the mode of splitting the text input by the user into keywords and keywords, namely the modes of no and fighting when calculating the matching degree.
The intelligent retrieval system can process the 'no' word in the 'no fighting', and from the document names stored in the inverted index, only the 'fighting' with the document name of 250 and the 'fighting' with the document name of 888 are stored in the inverted index, and no one of the 'no' words appears in the document names, so that the matching degree of the 'no' word in the 'no fighting' and the text in the document library stored in the two-dimensional array is not calculated, and the data in the two-dimensional array is not modified.
Secondly, the intelligent retrieval system can process the keyword 'fighting' in 'no fighting', and from the view of the document names stored in the inverted index, the 'fighting' with the document name of 250 and the 'fighting' with the document name of 888 in the inverted index hit the keyword 'fighting' in the text input by the user, so that the matching degree of the keyword 'fighting' in the text input by the user, the 'fighting' with the document name of 250 and the 'fighting' with the document name of 888 is required to be calculated.
Specifically, when the keyword "fighting" in "no fighting" of the text input by the user is processed, the matching degree of the keyword "fighting" in "no fighting" of the text input by the user and the "fighting" of the document name 250 in the inverted index needs to be calculated, and the maximum common substring length of the "no fighting" of the text input by the user and the "fighting" of the document name 250 in the inverted index needs to be determined first. As can be seen from the definition of the maximum common substring length, the maximum common substring length requires that the text in the substring is continuous, so that the maximum common substring length of "fighting" of the text input by the user and the document name of 250 in the inverted index can be determined to be 2, that is, length (lcs) can be equal to 2, then the respective text lengths of "fighting" of the text input by the user and the document name of 250 in the inverted index can be determined, the text length of "fighting" of the text input by the user can be determined to be 3, the text length of "fighting" of the document name of 250 in the inverted index is determined to be 4, so that length (a) can be equal to 3, length (b) can be equal to 4, and the formula for calculating the matching degree of the text input by the user and the document name of 250 in the document library can be obtained by substituting the above values into the matching degree of "fighting" of the text input by the user and the document name of 250 in the inverted index "fighting" 33.33 ".
Since two documents matched with the text input by the user are stored in the inverted index, the matching degree of the keyword 'fighting' in the 'no fighting' of the text input by the user and the 'fighting' of the document name 888 is also required to be calculated, and the same is true, firstly, the maximum common substring length of the 'no fighting' of the text input by the user and the 'fighting' of the document name 888 in the inverted index is required to be determined to be 2, namely, length (lcs) can be equal to 2, secondly, the text length of the 'no fighting' of the text input by the user is determined to be 3, the text length of the 'fighting' of the document name 888 in the inverted index is determined to be 2, therefore, length (a) can be equal to 3, length (b) can be equal to 2, and the formula for calculating the matching degree of the text input by the user and the text in the document library can be obtained by substituting the above numerical values into the formula for obtaining the matching degree of the 'no fighting' of the text input by the user and the document name of the document name 888 in the inverted index to be 0.67.
S40, determining a text with the matching degree meeting the preset requirement as a retrieval result corresponding to the text input by the user, and outputting the retrieval result.
In the embodiment of the present application, after the matching degree between the text in the document library stored in the two-dimensional array and the text input by the user is calculated in the step S30, the calculated matching degree between the text in the document library and the text input by the user may be sorted from high to low, and the text corresponding to the highest matching degree is output.
Illustratively, in the step S30, the matching degree of "no fighting" of the text input by the user and the "fighting" of the document name 250 in the inverted index is calculated to be 0.33, and the matching degree of "no fighting" of the text input by the user and the "fighting" of the document name 888 in the inverted index is calculated to be 0.67, so that the "fighting" of the document name 888 in the inverted index with the highest matching degree of the text input by the user can be obtained by comparison, and the text corresponding to the "fighting" of the document name 888 in the inverted index is output.
In other embodiments, the threshold range may be set before outputting the text with the highest matching degree, and only after the matching degree corresponding to the text with the highest matching degree exceeds the preset threshold range, the text with the highest matching degree may be output, otherwise, the current search result may remind the user that the text in the document library does not match with the text input by the user, or the matching degree between the text in the document library and the text input by the user does not reach the preset threshold range, so that the user needs to be reminded to replace the keyword or the keyword again for searching.
The embodiment of the application can set the preset threshold range to be output only if the text with the matching degree larger than 0.8, otherwise, the user is reminded of invalidating the search, and the user is reminded of carrying out the search again by changing keywords or the form of keywords until the text input by the user can search out the text meeting the threshold range. Specifically, in the step S30, the matching degree of "no fighting" of the text input by the user and "fighting" of the document name 250 in the inverted index is calculated to be 0.33, and the matching degree of "no fighting" of the text input by the user and "fighting" of the document name 888 in the inverted index is calculated to be 0.67, so that the matching degree calculated by the two document names hit by the inverted index does not reach the threshold range preset by the user, and therefore the search result fails this time, the user needs to be reminded to search again by changing keywords or the form of keywords until the text input by the user can search the text meeting the threshold range.
According to the embodiment of the application, the text in the document library corresponding to the text input by the user in the inverted index is read according to the text input by the user, and the text in the document library read from the inverted index is stored in the two-dimensional array, so that the matching degree of the text stored in the two-dimensional array is calculated only during retrieval, and the matching degree of all the texts in the document library is not calculated, thereby greatly improving the retrieval efficiency.
It should be noted that, because the intelligent search method provided by the embodiment of the present application needs to traverse the entire two-dimensional array in the search process, and the manner of traversing the two-dimensional array is complex and changeable, in order to improve the accuracy and speed of the intelligent search system provided by the embodiment of the present application, in some embodiments, the intelligent search algorithm provided by the present application is also optimized, and the specific method is as follows:
in the embodiment of the application, when traversing the matching degree of the text in the document library stored in the two-dimensional array and the text input by the user, in order to accelerate the process of traversing the two-dimensional array, two extra columns of information can be added in the two-dimensional array for storing other information of the text in the document library.
By storing the document name composition list corresponding to the text in the document library in the two-dimensional array, the retrieval system can save the retrieval time greatly by only needing to traverse the column information corresponding to the list corresponding to the document names corresponding to the text in all the document libraries read from the inverted index without traversing all rows and all columns in the two-dimensional array when traversing the two-dimensional array. Meanwhile, the added other column of information is used for storing the position relation of the text corresponding to the text matched with the text input by the user in each row in the two-dimensional array, which is read from the inverted index, so that the retrieval system can not traverse all text information stored in all rows of information in the two-dimensional array when traversing the two-dimensional array, only the position relation of the last text corresponding to the text matched with the text input by the user in each row in the two-dimensional array is required to be retrieved, the text information stored after traversing each row is not required, and the retrieval time is greatly saved.
It should be noted that, the database adopted in the intelligent retrieval system provided by the embodiment of the application is a distributed relational database, and the distributed relational database is different from the traditional database, and the traditional single-server database can only be vertically expanded in processing capacity, and when the data volume is increased to a certain extent, the traditional single-server cannot meet the requirement. The advantage of a distributed relational database is that it is easy to expand and a new data category can be added after the initial creation of the distributed relational database without the need to modify all existing application software. An application program in the distributed relational database can transparently operate the database, and data in the database is stored in different local databases, managed by different database management systems, run on different machines, supported by different operating systems and connected together by different communication networks.
As shown in fig. 4, an intelligent search device provided by an embodiment of the present application includes:
an obtaining module 401, configured to obtain text input by a user;
the reading module 402 is configured to read, according to a text input by a user, a text identifier in a document library corresponding to the text input by the user in a pre-established inverted index list, and store the text identifier in the document library corresponding to the text input by the user read from the inverted index list in a two-dimensional array;
A calculating module 403, configured to calculate, one by one, a matching degree between a text corresponding to a text identifier in a document library stored in the two-dimensional array and a text input by a user;
and the output module 404 is used for determining the text with the matching degree meeting the preset requirement as a retrieval result corresponding to the text input by the user and outputting the retrieval result.
Fig. 5 is a schematic diagram of an electronic device according to an embodiment of the present application.
Referring to fig. 5, a schematic diagram of an electronic device suitable for use in implementing embodiments of the present application is shown. The electronic device in the embodiment of the present application may include, but is not limited to, a fixed terminal such as a mobile phone, a notebook computer, a PDA (personal digital assistant), a PAD (tablet computer), a desktop computer, and the like. The electronic device shown in fig. 5 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the present application.
As shown in fig. 5, the electronic device may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 501, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage means 508 into a Random Access Memory (RAM) 503. In the state where the electronic device is powered on, various programs and data necessary for the operation of the electronic device are also stored in the RAM 503. The processing device 501, the ROM 502, and the RAM 503 are connected to each other via a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
In general, the following devices may be connected to the I/O interface 505: input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 507 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 508 including, for example, memory cards, hard disks, etc.; and communication means 509. The communication means 509 may allow the electronic device to communicate with other devices wirelessly or by wire to exchange data. While fig. 5 shows an electronic device having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are example forms of implementing the claims.
While several specific implementation details are included in the above discussion, these should not be construed as limiting the scope of the application. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
The above description is only illustrative of the preferred embodiments of the present application and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present application is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present application (but not limited to) having similar functions are replaced with each other.

Claims (10)

1. An intelligent retrieval method, comprising:
acquiring a text input by a user;
reading text identifiers in a document library corresponding to the text input by the user in a pre-established inverted index list according to the text input by the user, and storing the text identifiers in the document library corresponding to the text input by the user read from the inverted index list in a two-dimensional array;
calculating the matching degree of texts corresponding to text identifiers in a document library stored in the two-dimensional array and the texts input by the user one by one;
and determining the text with the matching degree meeting the preset requirement as a retrieval result corresponding to the text input by the user, and outputting the retrieval result.
2. The intelligent retrieval method according to claim 1, wherein the process of pre-creating the inverted index list comprises:
extracting key information of each text in the document library, wherein the key information comprises: keywords or keywords;
and generating an inverted index list corresponding to each text in the document library, wherein the inverted index list comprises the key information and text identifiers corresponding to the key information.
3. The intelligent retrieval method according to claim 1, wherein storing text identifiers in a document library corresponding to the text input by the user, read from the inverted index list, in a two-dimensional array comprises:
storing the text in the document library read from the inverted index in the row information of the two-dimensional array, wherein each row of information in the two-dimensional array represents each text information in the document library;
and storing the length information of the texts in the document library read from the inverted index in column information of the two-dimensional array, wherein each column information in the two-dimensional array represents the length information of each text in the document library.
4. The intelligent retrieval method according to claim 1, wherein the calculating, one by one, the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user includes:
respectively calculating the maximum public word string length of the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array, wherein the maximum public word string length is the same and continuous field length between the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array;
And calculating the matching degree of the text input by the user and the text corresponding to the text identifier in the document library stored in the two-dimensional array by using the maximum public substring length, the text length input by the user and the text length corresponding to the text identifier in the document library stored in the two-dimensional array.
5. The intelligent search method according to claim 4, wherein calculating the matching degree between the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user by using the maximum common substring length, the text length input by the user, and the text length corresponding to the text identifier in the document library stored in the two-dimensional array comprises:
the matching degree is calculated by a formula for calculating the matching degree, and the formula for calculating the matching degree is as follows:
the length (lcs) is the maximum common substring length, the length (a) is the text length input by the user, and the length (b) is the text length corresponding to the text identifier in the document library stored in the two-dimensional array.
6. The intelligent retrieval method according to claim 1, wherein the determining the text that the matching degree satisfies a preset requirement includes:
Determining the matching degree of texts corresponding to text identifiers in a document library stored in the two-dimensional array and the texts input by the user, and taking the text with the highest matching degree as the text with the determined matching degree meeting the preset requirement;
or alternatively;
judging whether the matching degree corresponding to the text with the highest matching degree exceeds a set threshold value;
if the matching degree corresponding to the text with the highest matching degree exceeds the set threshold, the text with the highest matching degree is used as the determined text with the matching degree meeting the preset requirement;
if the matching degree corresponding to the text with the highest matching degree is judged not to exceed the set threshold value, the search returns to failure.
7. The intelligent retrieval method according to claim 1, further comprising:
and acquiring document name list information corresponding to the text in the document library read from the inverted index, acquiring a position relation corresponding to the last text matched with the text input by the user in each row of the two-dimensional array, which corresponds to the text in the document library read from the inverted index, and storing the information and the position relation in the two-dimensional array.
8. An intelligent retrieval device, comprising:
the acquisition module is used for acquiring the text input by the user;
the reading module is used for reading text identifiers in a document library corresponding to the text input by the user in a pre-established inverted index list according to the text input by the user, and storing the text identifiers in the document library corresponding to the text input by the user read from the inverted index list in a two-dimensional array;
the calculating module is used for calculating the matching degree of the text corresponding to the text identifier in the document library stored in the two-dimensional array and the text input by the user one by one;
and the output module is used for determining the text with the matching degree meeting the preset requirement, taking the text as a retrieval result corresponding to the text input by the user and outputting the retrieval result.
9. An electronic device comprising at least one processor and a memory coupled to the processor, wherein:
the memory is used for storing a computer program;
the processor is configured to execute the computer program to enable the electronic device to implement the intelligent retrieval method according to any one of claims 1 to 7.
10. A computer storage medium carrying one or more computer programs which, when executed by an electronic device, enable the electronic device to implement the intelligent retrieval method of any one of claims 1 to 7.
CN202311093756.0A 2023-08-25 2023-08-25 Intelligent retrieval method and device, electronic equipment and storage medium Pending CN117149951A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311093756.0A CN117149951A (en) 2023-08-25 2023-08-25 Intelligent retrieval method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311093756.0A CN117149951A (en) 2023-08-25 2023-08-25 Intelligent retrieval method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117149951A true CN117149951A (en) 2023-12-01

Family

ID=88903777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311093756.0A Pending CN117149951A (en) 2023-08-25 2023-08-25 Intelligent retrieval method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117149951A (en)

Similar Documents

Publication Publication Date Title
CN110647614B (en) Intelligent question-answering method, device, medium and electronic equipment
CN109670163B (en) Information identification method, information recommendation method, template construction method and computing device
CN109885773B (en) Personalized article recommendation method, system, medium and equipment
CN107145571B (en) Searching method and device
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN105988996B (en) Index file generation method and device
CN111626048A (en) Text error correction method, device, equipment and storage medium
CN111797214A (en) FAQ database-based problem screening method and device, computer equipment and medium
US8606779B2 (en) Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
CN113407785B (en) Data processing method and system based on distributed storage system
CN108268438B (en) Page content extraction method and device and client
CN114973351B (en) Face recognition method, device, equipment and storage medium
WO2019173085A1 (en) Intelligent knowledge-learning and question-answering
CN111767445A (en) Data searching method and device, computer equipment and storage medium
CN111666383A (en) Information processing method, information processing device, electronic equipment and computer readable storage medium
CN113596601A (en) Video picture positioning method, related device, equipment and storage medium
CN111586695B (en) Short message identification method and related equipment
CN113010484A (en) Log file management method and device
CN111078849B (en) Method and device for outputting information
CN114356968A (en) Query statement generation method and device, computer equipment and storage medium
CN110263121A (en) Table data processing method, device, electronic device and computer readable storage medium
CN112487159B (en) Search method, search device, and computer-readable storage medium
CN108875050B (en) Text-oriented digital evidence-obtaining analysis method and device and computer readable medium
CN117149951A (en) Intelligent retrieval method and device, electronic equipment and storage medium
CN113987496A (en) Malicious attack detection method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination