CN115774717A - Data searching method and device, electronic equipment and computer readable storage medium - Google Patents

Data searching method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN115774717A
CN115774717A CN202211522829.9A CN202211522829A CN115774717A CN 115774717 A CN115774717 A CN 115774717A CN 202211522829 A CN202211522829 A CN 202211522829A CN 115774717 A CN115774717 A CN 115774717A
Authority
CN
China
Prior art keywords
frequency
searched
data
text
word segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211522829.9A
Other languages
Chinese (zh)
Inventor
黄敏
周伟杰
熊善良
蔡文笔
韦有朋
洪丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haizhuofei Network Technology Co ltd
Original Assignee
Beijing Haizhuofei Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haizhuofei Network Technology Co ltd filed Critical Beijing Haizhuofei Network Technology Co ltd
Priority to CN202211522829.9A priority Critical patent/CN115774717A/en
Publication of CN115774717A publication Critical patent/CN115774717A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a data search technology, and discloses a data search method, which comprises the following steps: acquiring high-frequency search records of each data table in a preset database, and marking position information of each high-frequency search record in the corresponding data table; performing word segmentation operation on the high-frequency search record to obtain high-frequency word segmentation; taking the high-frequency participles as indexes, and creating a participle index table formed by the high-frequency participles and the corresponding position information of the high-frequency search record; receiving a text to be searched input by a user, and identifying keywords of the text to be searched; and inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result. The invention also provides a data searching method, electronic equipment and a computer readable storage medium. The invention can improve the data searching efficiency.

Description

Data searching method and device, electronic equipment and computer readable storage medium
Technical Field
The present invention relates to the field of data searching technologies, and in particular, to a data searching method and apparatus, an electronic device, and a computer-readable storage medium.
Background
The data searching capability is an important capability for some professional websites or systems, for example, an enterprise information query system, by which various types of relevant information of a target enterprise can be queried through a data searching function provided by the system.
Traversing database data according to query conditions is a common data search method, but when a large number of data tables exist in a system, if all records in all the data tables are compared with the query conditions one by one according to the query conditions, and then records meeting the conditions are returned, a large number of disk I/O operations are caused, so that the final data query efficiency is low, and the user experience is influenced. Therefore, how to improve the data searching capability is a very critical issue.
Disclosure of Invention
The invention provides a data search method, a data search device, electronic equipment and a computer-readable storage medium, and mainly aims to improve data search efficiency.
In order to achieve the above object, the present invention provides a data searching method, including:
acquiring high-frequency search records of each data table in a preset database, and marking position information of each high-frequency search record in a corresponding data table;
performing word segmentation operation on the high-frequency search record to obtain high-frequency word segmentation;
taking the high-frequency participles as indexes, and creating a participle index table formed by the high-frequency participles and the corresponding position information of the high-frequency search record;
receiving a text to be searched input by a user, and identifying keywords of the text to be searched;
and inquiring the position information corresponding to the high-frequency word segmentation matched with the keyword in the word segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
Optionally, the obtaining the high-frequency search record in each data table in the preset database includes:
obtaining an operation log of the preset database within a preset time period;
according to the operation log, sequentially counting the operation times of each data record in each data table;
and selecting the field with the operation times larger than or equal to a preset operation threshold value as a high-frequency search record of the corresponding data table.
Optionally, the performing a word segmentation operation on the high-frequency search record to obtain a high-frequency word segmentation includes:
performing word segmentation on the high-frequency search record by using at least two word segmentation algorithms to obtain word segmentation results corresponding to each word segmentation algorithm;
taking the participles of the intersection part in different participle results as determined participles, and taking the participles of the non-intersection part in different participle results as undetermined participles;
taking the to-be-determined participles which contain the same characters and are adjacent to each other in the high-frequency search record as a ratio pair group;
sequentially calculating the information loss of each participle in each ratio pair group relative to the high-frequency search record;
and selecting the participle with the largest information loss as the determined participle of the corresponding ratio pair group, and collecting all the determined participles as the high-frequency participle.
Optionally, the sequentially calculating information loss of each participle in each ratio-pair group with respect to the high-frequency search record includes:
sequentially taking each participle in each ratio pair group as a target participle, and removing the target participle from the high-frequency search record to obtain a comparison field;
carrying out vector conversion on the high-frequency search records to obtain a high-frequency search record vector matrix, and carrying out vector conversion on the comparison fields to obtain a comparison field vector matrix;
and calculating the distance between the vector matrix of the high-frequency search record and the vector matrix of the contrast field, and taking the distance as the information loss of the corresponding target participle relative to the high-frequency search record.
Optionally, the identifying keywords of the text to be searched includes:
according to a preset service rule, performing regular judgment on the text to be searched;
when the search text accords with the regular judgment, taking an output result of the regular judgment as a keyword of the text to be searched;
when the search text does not accord with the regular judgment, performing word segmentation on the text to be searched to obtain one or more than one word to be searched;
generating a word vector of each word to be searched and a text vector matrix of the text to be searched;
sequentially calculating the key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched;
and selecting the word to be searched of which the key value meets the preset key value condition as the key word of the text to be searched.
Optionally, the sequentially calculating the key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched includes:
calculating the key value of each word to be searched by using the following key value algorithm:
Figure BDA0003971943140000031
wherein K is the key value, | W | is the text vector matrix of the text to be searched, T is the matrix transposition symbol, | | is the modulo symbol,
Figure BDA0003971943140000032
and the word vectors of the word segmentation to be searched are obtained.
In order to solve the above problem, the present invention further provides a data search method, where the apparatus includes:
the high-frequency search record acquisition module is used for acquiring the high-frequency search record of each data table in a preset database and marking the position information of each high-frequency search record in the corresponding data table;
the word segmentation index table creation module is used for executing word segmentation operation on the high-frequency search records to obtain high-frequency word segmentation, and creating a word segmentation index table formed by the high-frequency word segmentation and the position information of the corresponding high-frequency search records by taking the high-frequency word segmentation as an index;
the data search module is used for receiving a text to be searched input by a user, identifying keywords of the text to be searched, inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
Optionally, the index table creating module performs a word segmentation operation on the high-frequency search record by:
performing word segmentation on the high-frequency search record by using at least two word segmentation algorithms to obtain word segmentation results corresponding to each word segmentation algorithm;
taking the participles of the intersection part in different participle results as determined participles, and taking the participles of the non-intersection part in different participle results as undetermined participles;
taking the to-be-determined participles which contain the same characters and are adjacent to each other in the high-frequency search record as a ratio pair group;
sequentially calculating the information loss of each participle in each ratio pair group relative to the high-frequency search record;
and selecting the participle with the largest information loss as the determined participle of the corresponding ratio pair group, and collecting all the determined participles as the high-frequency participle.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one computer program; and
and the processor executes the program stored in the memory to realize the data searching method.
In order to solve the above problem, the present invention also provides a computer-readable storage medium having at least one computer program stored therein, the at least one computer program being executed by a processor in an electronic device to implement the data search method described above.
According to the embodiment of the invention, high-frequency word segmentation is obtained by segmenting the high-frequency search record, a word segmentation index table taking the high-frequency word segmentation as an index is created, the keywords corresponding to the text to be searched are matched with the high-frequency word segmentation in the word segmentation index table, the data position information pointed by the high-frequency word segmentation matched with the keywords is obtained, and the search result is obtained according to the obtained position information.
Drawings
Fig. 1 is a schematic flow chart of a data searching method according to an embodiment of the present invention;
fig. 2 is a schematic detailed implementation flowchart of one step in the data searching method according to an embodiment of the present invention;
fig. 3 is a schematic detailed implementation flowchart of another step in the data searching method according to an embodiment of the present invention;
fig. 4 is a detailed implementation flowchart of another step in the data searching method according to an embodiment of the present invention;
fig. 5 is a schematic detailed implementation flowchart of another step in the data searching method according to an embodiment of the present invention;
FIG. 6 is a functional block diagram of a data searching method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device for implementing the data search method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a data searching method. The execution subject of the data search method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the data search method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a block chain platform. The server side can be an independent server, and can also be a cloud server providing basic cloud computing services such as cloud service, a cloud database, cloud computing, cloud functions, cloud storage, network service, cloud communication, middleware service, domain name service, security service, content Delivery Network (CDN), big data and an artificial intelligence platform.
Fig. 1 is a schematic flow chart of a data searching method according to an embodiment of the present invention.
In this embodiment, the data search method includes:
s1, acquiring high-frequency search records of each data sheet in a preset database, and marking position information of each high-frequency search record in a corresponding data sheet;
in the embodiment of the present invention, an enterprise information query system is taken as an example for description. The enterprise information query system provides comprehensive, reliable and transparent data information of various enterprises of a certain scale, such as enterprise microblogs, enterprise operation range, enterprise organization architecture, enterprise operation condition, enterprise news, enterprise credit information, office addresses, enterprise employee information, enterprise associated information and the like.
In the embodiment of the present invention, the preset database refers to a database that stores the various enterprise information data according to a certain organization form, and preferably, the preset database may be a relational database, such as Oracle, mySQL, DB2, and the like.
In the embodiment of the present invention, the high frequency search record refers to data content with relatively high access frequency in a data table.
In detail, referring to fig. 2, the acquiring a high frequency search record of each data table in the preset database includes:
s11, obtaining an operation log of the preset database within a preset time period;
s12, sequentially counting the operation times of each data record in each data table according to the operation log;
and S13, selecting the data record with the operation times larger than or equal to a preset operation threshold value as a high-frequency search record of the corresponding data table.
In the embodiment of the present invention, the preset time period may be a duration of 1 month or 3 months, and the specific time period may be set according to an actual service operation condition.
In the embodiment of the present invention, the operations include, but are not limited to, reading, modifying, inserting, and deleting each data record.
Illustratively, each type of information of the related personnel of the enterprise has different attributes, for example, attributes such as a job position attribute, a stock right attribute, an enrollment year and the like, and the different attributes are stored in the preset database corresponding to different data records. According to the analysis of the operation log, the number of times that the CTO role of the enterprise is inquired is the largest, and the data record corresponding to the CTO role can be used as a high-frequency search record.
In this embodiment of the present invention, the preset operation threshold may be adjusted according to an actual service condition, for example, when the preset time period is 1 week long, the preset operation threshold may be that the number of operations of the data record per day is 200.
In the embodiment of the invention, the position information of each high-frequency search record in the corresponding data table can be marked through the ID of the data table where the high-frequency search record is located and the row and column information of the high-frequency search record in the corresponding data table.
In the embodiment of the invention, the high-frequency search record of each data table in the preset database is obtained and marked, so that the scanning range of the data table in the preset database is conveniently limited in the subsequent data search operation, and the improvement of the data search efficiency is facilitated.
S2, performing word segmentation operation on the high-frequency search records to obtain high-frequency word segmentation, and creating a word segmentation index table formed by the high-frequency word segmentation and the position information of the corresponding high-frequency search records by taking the high-frequency word segmentation as an index;
it can be understood that, when a user searches for a certain target data, due to poor information and flexibility of information expression, in many cases, an information text to be searched input by the user is related to the target data, rather than complete target data, so that word segmentation is performed on the high-frequency search record, various situations of the search text input by the user can be effectively covered, and accuracy of data search is guaranteed.
In detail, referring to fig. 3, the performing a word segmentation operation on the high-frequency search record to obtain a high-frequency word segmentation includes:
s21, performing word segmentation on the high-frequency search records by using at least two word segmentation algorithms to obtain word segmentation results corresponding to each word segmentation algorithm;
s22, taking the participles of the intersection part in different participle results as determined participles, and taking the participles of the non-intersection part in different participle results as undetermined participles;
s23, forming a ratio pair group by the word segments which contain the same characters and are adjacent in position in the high-frequency search record in the to-be-determined word segments;
s24, sequentially calculating the information loss of each participle in each ratio pair group relative to the high-frequency search record;
and S25, selecting the participle with the largest information loss as the determined participle of the corresponding ratio pair group, and collecting all the determined participles as the high-frequency participle.
It can be understood that common word segmentation algorithms include a character string matching word segmentation algorithm, a word meaning word segmentation algorithm, a statistical word segmentation algorithm, and the like, wherein the character string matching word segmentation algorithms further include a forward maximum matching method, a reverse maximum matching method, and a shortest path word segmentation method. The conventional word segmentation algorithm is a mature technical means at present, and in the embodiment of the present invention, the detailed description is not repeated.
In the embodiment of the invention, the number of word segmentation algorithms and the specific algorithm can be set according to actual conditions.
Illustratively, suppose "Li Shengli says really" participle by the first participle algorithm, resulting in a first participle result of "lie", "win", "say", "true", "reason". The word segmentation is carried out through a second word segmentation algorithm, and the obtained second word segmentation result is 'li', 'victory', 'say', 'real' or 'ideal'. The two participles of lie and win are overlapped parts in the first participle algorithm and the second participle algorithm, and belong to determined participles. "say", "true", "actual", "physical" and "say", "true", "ideal" are to be construed as definite clauses. "say" and "say" constitute a set of comparison groups. The "true" and "true" are grouped into a set of comparison groups. And (3) combining real, physical and ideal into a group of ratio pairs, and calculating the information loss of each participle in each ratio pair relative to the high-frequency search record in turn.
In detail, referring to fig. 4, said sequentially calculating information loss of each participle in each said ratio-pair group with respect to said high frequency search record includes:
s241, sequentially taking each participle in each ratio pair group as a target participle, and removing the target participle from the high-frequency search record to obtain a comparison field;
s242, carrying out vector conversion on the high-frequency search records to obtain a high-frequency record vector matrix, and carrying out vector conversion on the comparison field to obtain a comparison vector matrix;
and S243, calculating the distance between the high-frequency record vector matrix and the contrast vector matrix, and taking the distance as the information loss of the corresponding target participle relative to the high-frequency search record.
In the embodiment of the present invention, models having a word vector conversion function, such as a word2vec model and an NLP (Natural Language Processing) model, may be used to perform vector conversion on the high frequency search record and the comparison field.
In the embodiment of the invention, the distance between the high-frequency recording vector matrix and the contrast vector matrix can be calculated by adopting a Chebyshev distance formula.
It can be understood that, when the distance is larger, it indicates that the larger the information loss value caused by the missing of the target participle in the high-frequency search record is, the larger the influence of the target participle on the high-frequency search record is, the closer the target participle is to the correct participle mode, and therefore, the participle with the largest information loss is selected as the determined participle of the corresponding ratio pair group.
In the embodiment of the present invention, when a high-frequency search record has a plurality of high-frequency participles, the high-frequency participles may be used as a joint index, and the following is a record in the participle index table:
company ID (index 1) Staff duty (index 2) Location information
123456 CEO Infotable-123
The high-frequency participles 123456 and the high-frequency participles CEO are joint indexes, the location information of the corresponding high-frequency search record is an Infotable-123, and the "Infotable" indicates the name of the data table corresponding to the high-frequency search record, and in practical application, the ID of the data table may also be used. "123" indicates the row number recorded in the "info table" data table for the corresponding high frequency search.
And S3, receiving a text to be searched input by a user, identifying keywords of the text to be searched, inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
In the embodiment of the invention, the enterprise information inquiry system utilizes the embedded browser for the user to input the information to be retrieved, namely the text to be searched, in the browsing address bar.
In the embodiment of the invention, the keyword refers to the most direct information for representing the search object in the text to be searched. For example, the text to be searched input by the user is "a business listing time", and the keywords are "a business" and "listing time".
In the embodiment of the invention, the keywords in the text to be searched can be identified by utilizing a semantic identification technology or a regular judgment method.
In detail, referring to fig. 5, the identifying keywords of the text to be searched includes:
s31, performing regular judgment on the text to be searched according to a preset service rule;
when the search text accords with the regular judgment, executing S32, and taking an output result of the regular judgment as a keyword of the text to be searched;
when the search text does not accord with the regular judgment, executing S33, and performing word segmentation on the text to be searched to obtain one or more word segments to be searched;
s34, generating a word vector of each word to be searched and a text vector matrix of the text to be searched;
s35, sequentially calculating a key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched;
s36, selecting the word to be searched of which the key value meets the preset key value condition as the key word of the text to be searched.
Illustratively, the preset business rules include, but are not limited to, a company domain name rule, a company mailbox rule, a company ID rule, and the like, for example, whether the text to be searched meets the company domain name rule is judged by regular rules, and if the judgment result is that the text to be searched meets the company domain name rule, the keyword corresponding to the text to be searched is the domain name of the company.
Further, if the text to be searched does not conform to any of the preset business rules, keywords corresponding to the text to be searched can be further mined by utilizing a method for calculating key values of the participles corresponding to the text to be searched.
It can be understood that when the text to be searched contains more segmentations to be searched, not every segmentation to be searched can embody the characteristics of the text to be searched, and therefore, the multiple segmentation to be searched needs to be screened.
Specifically, the sequentially calculating the key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched includes:
calculating the key value of each word to be searched by using the following key value algorithm:
Figure BDA0003971943140000091
wherein K is the key value, | W | is the text vector matrix of the text to be searched, T is the matrix transposition symbol, | | is the modulo symbol,
Figure BDA0003971943140000101
and the word vector of each word to be searched is obtained.
In the embodiment of the present invention, the preset key value condition may be that the key values of each to-be-searched participle are in a descending order, and N top-ranked to-be-searched participles are selected as the key words, where N may be 1 or 2, and may be set according to actual situations.
Illustratively, the text to be searched includes: the method comprises the steps of searching a participle A to be searched, a participle B to be searched and a participle C to be searched, wherein the key value of the participle A to be searched is 80, the key value of the participle B to be searched is 70, the key value of the participle C to be searched is 30, and if N is 2, the participle A to be searched and the participle B to be searched are selected as keywords corresponding to a text to be searched according to the descending order of the key values.
In the embodiment of the invention, the text to be searched is judged regularly according to the preset business rule, and then word segmentation and key value calculation are carried out on the text to be searched, so that the calculation workload can be reduced, the time consumption of data retrieval is reduced, and the efficiency of data retrieval is improved.
In the embodiment of the invention, in the word segmentation index table, the keywords are matched with index fields, namely high-frequency words, in the word segmentation index table one by one, the position information corresponding to the matched high-frequency words is obtained, and finally, the corresponding data record is directly obtained according to the matched position information without traversing the data table in the preset database, so that the efficiency of data retrieval is improved.
According to the embodiment of the invention, high-frequency word segmentation is obtained by segmenting the high-frequency search record, a word segmentation index table with the high-frequency word segmentation as an index is created, the keywords corresponding to the text to be searched are matched with the high-frequency word segmentation in the word segmentation index table, the data position information pointed by the high-frequency word segmentation matched with the keywords is obtained, and the search result is obtained according to the obtained position information.
Fig. 6 is a functional block diagram of a data search apparatus according to an embodiment of the present invention.
The data search apparatus 100 of the present invention may be installed in an electronic device. According to the implemented functions, the data search apparatus 100 includes: a high-frequency search record acquisition module 101, a participle index table creation module 102 and an index table-based data search module 103. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the high-frequency search record acquisition module 101 is configured to acquire a high-frequency search record of each data table in a preset database, and mark position information of each high-frequency search record in a corresponding data table;
the word segmentation index table creating module 102 is configured to perform word segmentation on the high-frequency search record to obtain a high-frequency word segmentation, and create a word segmentation index table formed by the high-frequency word segmentation and position information of the corresponding high-frequency search record by using the high-frequency word segmentation as an index;
the data search module 103 based on the index table is configured to receive a text to be searched input by a user, identify a keyword of the text to be searched, query, in the word segmentation index table, location information corresponding to a high-frequency word segmentation matched with the keyword, and acquire data corresponding to the queried location information as a search result.
In detail, when the modules in the data search method 100 according to the embodiment of the present invention are used, the same technical means as the data search method described in fig. 1 to 5 are used, and the same technical effect can be produced, which is not described herein again.
Fig. 7 is a schematic structural diagram of an electronic device implementing a data search method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a data search program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a data search program, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be composed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., data search programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 7 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 7 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The data search program stored in the memory 11 of the electronic device 1 is a combination of a plurality of instructions, which when executed in the processor 10, can implement:
acquiring high-frequency search records of each data table in a preset database, and marking position information of each high-frequency search record in the corresponding data table;
performing word segmentation operation on the high-frequency search record to obtain high-frequency word segmentation;
taking the high-frequency participles as indexes, and creating a participle index table formed by the high-frequency participles and the corresponding position information of the high-frequency search record;
receiving a text to be searched input by a user, and identifying keywords of the text to be searched;
and inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, a recording medium, a usb-disk, a removable hard disk, a magnetic diskette, an optical disk, a computer Memory, a Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring high-frequency search records of each data table in a preset database, and marking position information of each high-frequency search record in the corresponding data table;
performing word segmentation operation on the high-frequency search record to obtain high-frequency word segmentation;
taking the high-frequency participles as indexes, and creating a participle index table formed by the high-frequency participles and the corresponding position information of the high-frequency search record;
receiving a text to be searched input by a user, and identifying keywords of the text to be searched;
and inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, which is used for verifying the validity (anti-counterfeiting) of the information and generating a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
The embodiment of the application can acquire and process related data based on a holographic projection technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only intended to illustrate the technical solutions of the present invention and not to limit the same, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (10)

1. A method of searching data, the method comprising:
acquiring high-frequency search records of each data table in a preset database, and marking position information of each high-frequency search record in the corresponding data table;
performing word segmentation operation on the high-frequency search record to obtain high-frequency word segmentation;
taking the high-frequency participles as indexes, and creating a participle index table formed by the high-frequency participles and the corresponding position information of the high-frequency search record;
receiving a text to be searched input by a user, and identifying keywords of the text to be searched;
and inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
2. The data searching method of claim 1, wherein the obtaining of the high frequency search record in each data table in the preset database comprises:
obtaining an operation log of the preset database within a preset time period;
sequentially counting the operation times of each data record in each data table according to the operation log;
and selecting the field with the operation times larger than or equal to a preset operation threshold value as a high-frequency search record of the corresponding data table.
3. The data searching method of claim 1, wherein the performing a word segmentation operation on the high frequency search record to obtain a high frequency word segmentation comprises:
performing word segmentation on the high-frequency search record by using at least two word segmentation algorithms to obtain word segmentation results corresponding to each word segmentation algorithm;
taking the participles of the intersection part in different participle results as determined participles, and taking the participles of the non-intersection part in different participle results as undetermined participles;
taking the to-be-determined participles which contain the same characters and are adjacent to each other in the high-frequency search record as a ratio pair group;
sequentially calculating the information loss of each participle in each ratio pair group relative to the high-frequency search record;
and selecting the participle with the largest information loss as the determined participle of the corresponding ratio pair group, and collecting all the determined participles as the high-frequency participle.
4. The data searching method of claim 3, wherein said sequentially calculating information loss for each participle in each said ratio-pair group relative to said high frequency search record comprises:
sequentially taking each participle in each ratio pair group as a target participle, and removing the target participle from the high-frequency search record to obtain a comparison field;
carrying out vector conversion on the high-frequency search records to obtain a high-frequency search record vector matrix, and carrying out vector conversion on the comparison fields to obtain a comparison field vector matrix;
and calculating the distance between the vector matrix of the high-frequency search records and the vector matrix of the contrast field, and taking the distance as the information loss of the corresponding target participles relative to the high-frequency search records.
5. The data searching method of claim 1, wherein the identifying the keyword of the text to be searched comprises:
according to a preset service rule, performing regular judgment on the text to be searched;
when the search text accords with the regular judgment, taking an output result of the regular judgment as a keyword of the text to be searched;
when the search text does not accord with the regular judgment, performing word segmentation on the text to be searched to obtain one or more than one word to be searched;
generating a word vector of each word to be searched and a text vector matrix of the text to be searched;
sequentially calculating the key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched;
and selecting the word to be searched of which the key value meets the preset key value condition as the key word of the text to be searched.
6. The data searching method of claim 5, wherein the sequentially calculating the key value of each word to be searched according to the word vector of each word to be searched and the text vector matrix of the text to be searched comprises:
calculating the key value of each word to be searched by using the following key value algorithm:
Figure FDA0003971943130000021
wherein K is the key value, | W | is the text vector matrix of the text to be searched, T is the matrix transposition symbol, | is the modulo symbol,
Figure FDA0003971943130000022
and the word vectors of the word segmentation to be searched.
7. A data search apparatus, characterized in that the apparatus comprises:
the high-frequency search record acquisition module is used for acquiring the high-frequency search record of each data table in a preset database and marking the position information of each high-frequency search record in the corresponding data table;
the word segmentation index table creation module is used for executing word segmentation operation on the high-frequency search records to obtain high-frequency word segmentation, and creating a word segmentation index table formed by the high-frequency word segmentation and the position information of the corresponding high-frequency search records by taking the high-frequency word segmentation as an index;
the data search module is used for receiving a text to be searched input by a user, identifying keywords of the text to be searched, inquiring the position information corresponding to the high-frequency segmentation matched with the keywords in the segmentation index table, and acquiring data corresponding to the inquired position information as a search result.
8. The data search apparatus of claim 7, wherein the index table creation module performs a word segmentation operation on the high frequency search record by:
performing word segmentation on the high-frequency search record by using at least two word segmentation algorithms to obtain word segmentation results corresponding to each word segmentation algorithm;
taking the participles of the intersection part in different participle results as determined participles, and taking the participles of the non-intersection part in different participle results as undetermined participles;
taking the to-be-determined participles which contain the same characters and are adjacent to each other in the high-frequency search record as a comparison pair group;
sequentially calculating the information loss of each participle in each ratio pair group relative to the high-frequency search record;
and selecting the participle with the largest information loss as a determined participle of the corresponding ratio pair group, and collecting all the determined participles as the high-frequency participle.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the data search method of any one of claims 1 to 6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a data search method according to any one of claims 1 to 6.
CN202211522829.9A 2022-11-30 2022-11-30 Data searching method and device, electronic equipment and computer readable storage medium Pending CN115774717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211522829.9A CN115774717A (en) 2022-11-30 2022-11-30 Data searching method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211522829.9A CN115774717A (en) 2022-11-30 2022-11-30 Data searching method and device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN115774717A true CN115774717A (en) 2023-03-10

Family

ID=85390761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211522829.9A Pending CN115774717A (en) 2022-11-30 2022-11-30 Data searching method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN115774717A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910022A (en) * 2024-03-19 2024-04-19 深圳高灯计算机科技有限公司 Data searching method, device, computer equipment, storage medium and product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910022A (en) * 2024-03-19 2024-04-19 深圳高灯计算机科技有限公司 Data searching method, device, computer equipment, storage medium and product

Similar Documents

Publication Publication Date Title
CN113836131B (en) Big data cleaning method and device, computer equipment and storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN114979120B (en) Data uploading method, device, equipment and storage medium
CN114138784B (en) Information tracing method and device based on storage library, electronic equipment and medium
CN115002200A (en) User portrait based message pushing method, device, equipment and storage medium
CN113886708A (en) Product recommendation method, device, equipment and storage medium based on user information
CN113946690A (en) Potential customer mining method and device, electronic equipment and storage medium
CN111159204B (en) Method and system for generating label in configuration mode
CN112560465A (en) Method and device for monitoring batch abnormal events, electronic equipment and storage medium
CN114386509A (en) Data fusion method and device, electronic equipment and storage medium
CN115774717A (en) Data searching method and device, electronic equipment and computer readable storage medium
CN114547696A (en) File desensitization method and device, electronic equipment and storage medium
CN114637811A (en) Data table entity relation graph generation method, device, equipment and storage medium
CN114398346A (en) Data migration method, device, equipment and storage medium
CN114416939A (en) Intelligent question and answer method, device, equipment and storage medium
CN113505273A (en) Data sorting method, device, equipment and medium based on repeated data screening
CN114708073B (en) Intelligent detection method and device for surrounding mark and serial mark, electronic equipment and storage medium
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN114996386A (en) Business role identification method, device, equipment and storage medium
CN113505117A (en) Data quality evaluation method, device, equipment and medium based on data indexes
CN114610854A (en) Intelligent question and answer method, device, equipment and storage medium
CN114490667A (en) Multidimensional data analysis method and device, electronic equipment and medium
CN113449002A (en) Vehicle recommendation method and device, electronic equipment and storage medium
CN113344674A (en) Product recommendation method, device, equipment and storage medium based on user purchasing power
CN113987206A (en) Abnormal user identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination