CN113641780B - Search method, system, device, storage medium and computer program product - Google Patents

Search method, system, device, storage medium and computer program product Download PDF

Info

Publication number
CN113641780B
CN113641780B CN202111201085.6A CN202111201085A CN113641780B CN 113641780 B CN113641780 B CN 113641780B CN 202111201085 A CN202111201085 A CN 202111201085A CN 113641780 B CN113641780 B CN 113641780B
Authority
CN
China
Prior art keywords
array
index table
data
data segment
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111201085.6A
Other languages
Chinese (zh)
Other versions
CN113641780A (en
Inventor
刘洋
李飞飞
沈春辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Cloud Computing Ltd
Original Assignee
Alibaba China Co Ltd
Alibaba Cloud Computing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd, Alibaba Cloud Computing Ltd filed Critical Alibaba China Co Ltd
Priority to CN202111201085.6A priority Critical patent/CN113641780B/en
Publication of CN113641780A publication Critical patent/CN113641780A/en
Application granted granted Critical
Publication of CN113641780B publication Critical patent/CN113641780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/319Inverted lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/325Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An embodiment of the invention provides a searching method, a system, a device, a storage medium and a computer program product, wherein the method comprises the following steps: and responding to the write-in operation of the data, writing the recorded data into the memory, and updating the first index table stored in the memory. And in response to the search operation, determining search results in the first index table and the second index table which are stored in different positions by means of a preset interface supporting different storage structures. In the method, the recorded data written into the memory can be updated into the first index table in real time, and the recorded data can be searched in real time by means of the first index table. Meanwhile, written record data can be continuously supplemented into the index tables in an updating mode, the number of the index tables cannot be increased, namely the number of the first index tables is stable, searching can be performed in a small number of first index tables and second index tables, and the real-time searching speed is improved.

Description

Search method, system, device, storage medium and computer program product
Technical Field
The present invention relates to the field of databases, and in particular, to a search method, apparatus, storage medium, and computer program product.
Background
With the development of the internet and the internet of things, different industries such as industry and service industry can improve the intelligent degree of online service and production by means of equipment with network access capability.
In the actual service providing or production process, a large amount of data is generated in real time, and the data can reflect the service state or the operation condition of production equipment. Full-text retrieval is required by means of the index, and the data generated in real time is searched out, namely, real-time searching of the data is realized. The user can know the service state and the operation condition of the production equipment in time according to the real-time searching result.
Based on the above description, how to guarantee the real-time searching speed becomes an urgent problem when searching data in real time.
Disclosure of Invention
Embodiments of the present invention provide a searching method, system, device, storage medium, and computer program product, which are used to ensure the searching speed of real-time searching.
In a first aspect, an embodiment of the present invention provides a search method, including:
responding to the write operation, and writing the recorded data into the memory;
responding to the write-in operation, and updating a first index table in a memory according to the recorded data;
and responding to the search operation, and determining a search result according to the first index table and a second index table in the disk by means of a preset interface supporting different storage structures, wherein the second index table and the first index table have different storage structures.
In a second aspect, an embodiment of the present invention provides a computer program product, which includes computer programs/instructions, wherein when the computer programs are executed by a processor, the processor is caused to implement the search method in the first aspect.
In a third aspect, an embodiment of the present invention provides a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to implement at least the search method according to the first aspect.
In a fourth aspect, an embodiment of the present invention provides an electronic device, including: a memory and a processor; the memory stores index data corresponding to the recording data by using a preset storage structure, the preset storage structure stores data segments contained in the recording data as text type data segments, and the preset storage structure comprises a first array, a second array, a third array and a fourth array
The elements of the first array are obtained according to the hash value of the identification information of the recorded data to which the data segment belongs and the hash value of the data segment;
the elements of the second array are obtained according to the elements of the first array and the length value of the encoding result of the data fragment;
the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment;
the fourth array records the incidence relation between the elements of the third array and the identification information of the record data to which the data fragments belong;
the memory further stores executable code which, when executed by the processor, causes the processor to perform the search method of the first aspect.
In a fifth aspect, an embodiment of the present invention provides another electronic device, including: a memory and a processor; the memory stores index data corresponding to the recording data by using a preset storage structure, data segments contained in the recording data are numerical data, and the preset storage structure comprises a first array, a second array and a third array;
the elements of the first array are obtained according to the identification information of the recorded data to which the target data segment belongs and the hash value of the data segment;
the elements of the second array are obtained according to the elements of the first array and the length value of the encoding result of the data fragment;
the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment;
the memory further stores executable code which, when executed by the processor, causes the processor to perform the search method of the first aspect.
In a sixth aspect, an embodiment of the present invention provides a search system, including: a magnetic disk, a memory and a processor;
the memory is used for storing a first index table;
the magnetic disk is used for storing a second index table, and the second index table and the first index table have different storage structures;
the processor is used for responding to the write-in operation and writing the recorded data into the memory; in response to the write operation, updating the first index table according to the record data; and responding to the search operation, and determining a search result according to the first index table and the second index table by means of a preset interface supporting different storage structures.
The searching method provided by the embodiment of the invention responds to the writing operation of the data, writes the recorded data into the memory, and updates the first index table stored in the memory according to the recorded data. And responding to the searching operation, and searching in a first index table stored in the memory and a second index table stored in the magnetic disk respectively by means of a preset interface to obtain a searching result. The two index tables stored in different positions have different storage structures, and the index tables with different storage structures can be read and searched by means of a preset interface. The storage structure of the first index table ensures that the first index table is readable in the memory, and the storage structure of the second index table ensures that the second index table is readable in the disk.
Therefore, in the above method, the record data written into the memory can be updated into the first index table in real time, and the real-time search of the record data can be realized by using the first index table. Meanwhile, written record data can be supplemented into the index tables in an updating mode, the number of the index tables cannot be increased, namely the number of the first index tables is stable, searching can be performed in a small number of first index tables and a small number of second index tables, and the searching speed of real-time searching is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of a searching method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a search system;
FIG. 3 is a flow chart of another searching method provided by the embodiment of the invention;
FIG. 4 is a flowchart illustrating updating of an inverted index table for text-based data segments according to an embodiment of the present invention;
FIG. 5 is a comparison of the inverted index table before and after updating according to the embodiment shown in FIG. 4;
FIG. 6 is a diagram of an inverted index table corresponding to the embodiment shown in FIG. 4;
FIG. 7 is a flowchart illustrating updating a forward index table for a text-based data segment according to an embodiment of the present invention;
FIG. 8 is a comparison of a forward index table before and after updating according to the embodiment shown in FIG. 7;
FIG. 9 is a diagram of a forward index table corresponding to the embodiment shown in FIG. 7;
FIG. 10 is a comparison diagram of an inverted index table before and after updating according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating another forward index table according to an embodiment of the present invention;
FIG. 12 is a diagram of an alternative inverted index table according to an embodiment of the present invention;
fig. 13 is a schematic structural diagram of a search apparatus according to an embodiment of the present invention;
fig. 14 is a schematic structural diagram of an electronic device corresponding to the embodiment shown in fig. 13;
fig. 15 is a schematic structural diagram of another electronic device according to an embodiment of the present invention;
fig. 16 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, and "a" and "an" generally include at least two, but do not exclude at least one, unless the context clearly dictates otherwise.
It should be understood that the term "and/or" as used herein is merely a relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter associated objects are in an "or" relationship.
Depending on the context, the words "if" or "if" as used herein may be interpreted as "at \8230; \8230when" or "when 8230; \8230when" or "in response to a determination" or "in response to a recognition". Similarly, the phrases "if determined" or "if identified (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when identified (a stated condition or event)" or "in response to an identification (a stated condition or event)", depending on the context.
It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a good or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such good or system. Without further limitation, an element defined by the phrases "comprising one of 8230; \8230;" 8230; "does not exclude the presence of additional like elements in articles of commerce or systems including such elements.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments. In addition, the sequence of steps in each method embodiment described below is only an example and is not strictly limited.
Fig. 1 is a flowchart of a searching method according to an embodiment of the present invention. The searching method provided by the embodiment of the invention can be executed by a server receiving the recorded data. And more specifically by a processor in the server. It will be appreciated that the server may be implemented as software, or a combination of software and hardware.
Before explaining the searching method provided by the embodiment shown in fig. 1, the following description may be made:
as described in the background art, data can be generated in real time in both the process of providing online services and the actual production process, and this data is the recorded data in the embodiments provided by the present invention. For online service or industrial production scenes, real-time search of recorded data is often required to be quickly performed, so that service states and operation conditions of production equipment can be known in time. These scenarios may be, for example, order distribution scenarios, pipeline production scenarios, and the like. In combination with the description in the background art, the real-time search of the recorded data can be understood as follows: the record data can be searched after being written into the database in the server.
In the order distribution scene, the record data generated in real time is used for indicating the order transportation state, and the record data may include fields such as a user ID, an order number, an order price, and a logistics position of a commodity in the order. The logging data may be written in real time into a database of the server configuration.
When the transportation state of the order changes, specifically, the position of the commodity in the order changes from the place A to the place B, a piece of recorded data 1 can be generated in real time. For the recorded data 1 stored in the database, the processor in the server needs to be able to search it from the database in real time and quickly by means of the index, and further update the searched recorded data 1 to the terminal device of the user in real time, so that the user can know the delivery status of the goods in time. Compared with other types of orders, the takeaway orders have higher requirements on the efficiency of real-time searching of the recorded data, so that the user can know the delivery state of the takeaway orders in time.
In a pipeline production scenario, similarly, data is recorded to indicate the operational status of various devices on the pipeline. The record data may include fields such as device identification, operating status information, etc. The working state information is used for reflecting whether the equipment works abnormally or not. The logging data may be written in real time into a database of the server configuration.
In the actual production process, each device on the production line can detect the working state of the device at regular time to generate the record data containing fields such as device identification, working state information and the like. The processor in the server needs to be able to search out the recorded data reflecting the abnormal operation of the equipment from the database in real time and quickly, and feed back the searched recorded data to the relevant personnel, so that the relevant personnel can know the operating state of each equipment on the production line in time according to the searched recorded data.
In addition to the production equipment on the production line, it is also necessary to quickly perform real-time search of recorded data for various network devices used when providing different on-line services to know the operating states of the network devices. Certainly, the invention does not limit the type of the equipment, and any scene in which the equipment is required to be known to be normal or not in time can use the searching method provided by the embodiments of the invention to search the recorded data reflecting the abnormal operation of the equipment.
In addition, the invention does not limit the use scenes, and any scene needing data real-time search can also use the searching method provided by the embodiments of the invention, so that real-time search is realized and the searching speed can be ensured.
Based on the above description, the following embodiments are described taking a sales order distribution scenario as an example. As shown in fig. 1, the searching method may specifically include the following steps:
and S101, responding to the writing operation, and writing the recorded data into the memory.
The user may generate a take-away order via an application installed on the terminal device and order distribution is performed by the order distribution system. During the process of delivering the takeaway order by the delivery personnel, the position of the commodity in the order is changed in real time, and along with the real-time change of the position, the server in the order delivery system can generate the record data in real time according to the position information acquired in real time.
In response to the write operation, the processor in the server may write the generated recording data in the memory of the server in real time. Optionally, the recorded data may also be written into a disk of the server at the same time, for data recovery after the server is abnormally down, so as to ensure that the generated recorded data is not lost.
Optionally, the record data typically contains at least one field. For example, the generated recording data 1 may have the following format: doc ID =1, user ID: buildingA1, streetA2, order ID 100; time: and T1.
The record data 1 includes 5 fields including doc ID, user ID, position, order ID, and Time. doc ID represents identification information of the recorded data, namely a recorded data number, user ID represents a user identification, position represents a commodity Position, order ID represents an order number, and Time represents generation Time of the recorded data. The recorded data indicates that: the take-away order of user alice, anderson, with order number 100, is at location a at time T1.
S102, responding to the write operation, and updating the first index table in the memory according to the recorded data.
Further, in response to the write operation of the record data, the processor may immediately update the first index table in the memory according to the record data, that is, supplement the content of the record data to the first index table. Optionally, the first index table may specifically include an inverted index table and a forward index table. Optionally, since the record data includes at least one field, the first index table is updated by supplementing each field in the record data into the first index table, that is, different fields included in the record data may have corresponding first index tables, and each field has a corresponding forward index table and an inverted index table.
The first index table may include indexes corresponding to a plurality of pieces of record data generated before the time T1, and the above example is carried out, where the process of updating the first index table is as follows: in response to the write operation, the record data 1 is further supplemented into the first index table. For 5 fields contained in the record data 1, each field has a corresponding reverse index table and a forward index table.
Alternatively, the real-time update of the first index table may be performed by a data write thread established by the processor. After the recorded data is updated into the first index table, the processor may further generate a write success message to control the data write thread to continue writing the next recorded data. In the same scenario, the recorded data generated in real time often has the same field, so as the recorded data is written continuously, although the content contained in the first index table is increased continuously, the number of the first index table is not changed. That is, multiple recording times in the same scene may multiplex the same first index table.
Optionally, with real-time update of the first index table, the content recorded in the first index table is more and more, and the memory resource occupied by the content is more and more, so that normal writing of the recorded data may be affected. In order to ensure normal writing of data, optionally, the processor may refresh the first index table in the memory to the disk by periodically invoking Flush operation, that is, a brand new second index table is generated according to the first index table stored in the memory, and the stored first index table is deleted at the same time, thereby realizing release of memory resources. Optionally, the processor may also periodically flush the first index table to disk by invoking Commit and Fsync operations.
Alternatively, the periods for invoking the Flush operation, commit operation, and Fsync operation may be set manually. However, in order to reduce the occupation of the memory resources by the first index table, the period for invoking the operations may be set to be a shorter period, so that the first index table in the memory may be persisted to the disk as soon as possible.
Optionally, the generation of the second index table may also be performed by the data writing thread. And since periodically refreshing the first index table to the disk indicates that the second index table is periodically generated, rather than being periodically updated, the number of second index tables in the disk increases as the first index table is periodically refreshed to the disk.
According to the above description, the processor can update the first index table in real time and generate the second index table periodically, and the number of the first index table in the memory is stable and the number of the second index table in the disk is increasing.
And S103, responding to the search operation, and determining a search result according to the first index table and a second index table in the disk by means of preset interfaces supporting different data storage structures, wherein the second index table and the first index table have different storage structures.
When the user wants to know the location of the goods contained in the take-away order, the search operation may be triggered by means of the terminal device. And the processor in the server responds to the search operation, searches the recorded data according to the first index table in the memory and the second index table in the magnetic disk by means of a preset interface so as to determine the recorded data generated when the search operation is triggered, and informs the user of the position of the commodity contained in the recorded data.
The first index table and the second index table may have different storage structures. The first index table has a first storage structure for ensuring that the first index table can be read in the memory; the second index table has a second storage structure to ensure that the second index table is readable on disk. And the preset interface can also support the storage structures of the two index tables, so that the first index table in the memory and the second index table in the disk can be read by the preset interface respectively, and real-time search of the recorded data is realized.
Optionally, after the first index table is refreshed into the disk each time, the processor may further establish a search thread, so that the processor performs a search for the recorded data by using the search thread. And the search thread and the data write thread mentioned in step 102 are included in the same process.
In the above description, the real-time update of the first index table by the processor in the server, the periodic generation of the second index table, and the search process of the recorded data may be understood in conjunction with the schematic structural diagram of the processor shown in fig. 2.
In this embodiment, in response to the write operation of the data, the recorded data is written into the memory, and the first index table stored in the memory is updated. And responding to the searching operation, and respectively determining searching results in a first index table stored in the memory and a second index table stored in the magnetic disk by means of a preset interface. The two index tables stored in different positions have different storage structures, and the reading and searching of the index tables with different storage structures can be realized by means of a preset interface.
Therefore, in the method, the record data written into the memory can be updated into the first index table in real time, and the real-time search of the record data can be realized by the aid of the first index table. Meanwhile, written record data can be supplemented into the index tables in an updating mode, the number of the index tables cannot be increased, namely the number of the first index tables is stable, searching can be performed in a small number of first index tables and second index tables, and the real-time searching efficiency is improved.
In addition, for the generation of the index table, in addition to the generation manner of the first index table and the second index table provided in the embodiment shown in fig. 1, optionally, there is also another manner:
the processor responds to the operation of writing the record data into the memory, and can generate an index table (simply referred to as a third index table) which is readable by the processor and has a second storage structure by calling Flush operation in real time, wherein the third index table is stored in the memory. That is, for each record data written in real time, a corresponding third index table is generated for the record data in real time. This results in that the number of the third index tables in the memory is increased as the recorded data is written.
Meanwhile, for the third index table in the memory, the processor may also merge and refresh a plurality of third index tables in the memory to the disk by periodically invoking Flush operation or Commit operation to obtain an index table (referred to as a fourth index table) stored in the disk, that is, to implement persistence of the index table in the disk, and the fourth index table also has a second storage structure.
In the above process, on one hand, after the flush operation is called to generate the third index table, the processor may return a write success message, so that the processor continues to write the next piece of recorded data. The way of continuing to write the next record data after generating the index table of the previous record data can greatly reduce the writing speed of the record data. In response to the search operation, on the other hand, the processor may perform a real-time search of the record data according to a third index table stored in the memory and a fourth index table stored in the disk. However, the memory of the processor stores a plurality of index tables, which not only occupies memory resources, but also requires the processor to open a plurality of index tables for searching during searching, thereby greatly affecting the searching efficiency.
In view of the above problem, in the embodiment shown in fig. 1, the record data is updated in real time in the first index table stored in the memory, instead of generating one index table for each piece of record data again, so that the number of the first index tables in the storage is not increased, thereby greatly reducing the number of the first index tables stored in the memory, and when the processor searches for the record data by using the first index table and the second index table, the number of the index tables to be opened is also greatly reduced, thereby improving the search efficiency. Meanwhile, after the recorded data is updated to the first index table, a write success message can be returned to continue writing the next recorded data. Therefore, the time required for updating the index table is shorter than the time required for generating the index table, and the write speed of the record data can be increased.
Fig. 3 is a flowchart of another searching method according to an embodiment of the present invention. As shown in fig. 3, the method may include the steps of:
s201, responding to the writing operation, writing the record data into the memory, wherein the record data comprises at least one field.
The execution process of step 201 may refer to the related description in the embodiment shown in fig. 1, and is not described herein again.
S202, responding to the writing operation, and dividing data contained in at least one field in the record data into at least one data fragment.
Each piece of record data comprises at least one field, and at least one data segment can be obtained by dividing the record data written into the memory. The data fragment can be regarded as the minimum unit which can be searched, that is, the index table stores the record data in units of data fragments. And the relationship between the data fragment, field and record data is: a piece of record data may include at least one field, and a field may include at least one data fragment.
Taking the example shown in fig. 1 as an example, the record data 1 includes 5 fields of docID, order ID, time, user ID, and Position. For the fields user ID and Position whose data type is text type, the division of the fields can be understood as word segmentation, that is, alice.anderson is divided into two data segments, alice and Anderson, buildingA1 and StreetA2 are divided into two data segments, buildingA1 and StreetA 2. For the fields docID, order ID and Time whose data type is numeric, the content contained in each field is also a data fragment.
S203, updating the first index table corresponding to the field of the at least one data fragment according to the data type of the at least one data fragment.
Each field in the record data has a corresponding first index table, and the first index table may specifically include a forward index table and a reverse index table. Based on this, each data segment can be updated to the corresponding reverse index table and forward index table according to the respective data type of at least one data segment obtained by division, that is, the updating of the recording data to the first index table is completed.
And S204, responding to the search operation, determining a search result according to the first index table and a second index table in the disk by means of preset interfaces supporting different storage structures, wherein the second index table and the first index table have different storage structures.
The execution process of step 204 may refer to the related description in the embodiment shown in fig. 1, which is not described herein again.
In this embodiment, the content included in the recorded data is divided to obtain at least one data segment that is the smallest searchable unit, and the first index table is updated based on the data segment. For the contents that are not described in detail in this embodiment and the technical effects that can be achieved, reference may be made to the relevant description in the embodiment shown in fig. 1, and details are not repeated here.
Based on the above-mentioned takeaway order scenario, the public record data in the embodiment shown in fig. 3 may include at least one field, and different fields may have different data types, for example, the user ID and the Position are text type data, and the doc ID, the order ID and the Time field are numeric type data, so that at least one of the divided data fragments also has different data types. At this time, for data segments of different data types, the data segments may also be updated in different manners in the first index table corresponding to the field to which the data segments belong. The first index table corresponding to each of the different fields may further include a reverse index table and a forward index table. The storage structure of the reverse index table and the forward index table corresponding to the field to which the text type data fragment belongs may be embodied as a multi-layer array. The storage structure of the forward index table and the forward index table corresponding to the field to which the numeric data fragment belongs may be expressed as a mapping relationship.
After a piece of recorded data is divided, at least one text type data segment can be obtained, and at least one numerical type data segment can also be obtained. Alternatively, the text-type data segment may specifically include a character string, a date, an array, and the like. Moreover, since the processes of updating each data fragment of the same type into the index table are the same, the description of the updating process can be performed by taking any data fragment, i.e. the target data fragment, in at least one data fragment of the same type as an example.
Alternatively, if the target data segment is a text-type data segment, the process of updating the target data segment into the multi-layer array constituting the inverted index table may be as shown in fig. 4. The update process of the inverted index table can also be understood in conjunction with fig. 5. The updating process of the inverted index table may specifically include the following steps:
s301, updating a first array in the inverted index table corresponding to the field to which the target data segment belongs according to the hash value of the identification information of the record data to which the target data segment belongs and the hash value of the target data segment.
Specifically, according to the hash value of the identification information of the record data to which the target data segment belongs, a first element corresponding to the target data segment in the first array is determined. Meanwhile, determining the subscript of the first element in the first array according to the hash value of the target data segment. Finally, the first array is updated according to the first element and the index of the first element.
Continuing to take the example shown in fig. 1 and 3, the recorded data 1 is: doc ID =1, user ID: alice. Anderson; position: buildingA1, streetA2, order ID 100; time: and T1.
By dividing each field in the recording data 1, several text-type data fragments, i.e., alice, anderson, building a1, and street a2, can be obtained. Wherein, alice and Anderson belong to a field, which is updated to the same inverted index table, buildingA1 and StreetA2 are updated to another inverted index table.
Then, taking the text-type data segment "Alice" as the target data segment, and describing the updating process of the first array in the inverted index table with reference to (b) in fig. 5:
and determining that the identification information of the record data to which the 'Alice' belongs is 1, and the hash value of the identification information is also 1, so that the hash value '1' is also the first element corresponding to the target data fragment 'Alice' in the first array. Then, calculating the hash value of "Alice" to be 3, "3" is the subscript of the first element "1" in the first array, that is, represents the position of the first element "1" in the first array. The element in the first array with the index "3" is updated with the first element "1". Alternatively, the subscript of the first array may start counting from 0.
S302, updating a second array in the inverted index table corresponding to the field to which the target data segment belongs according to the array elements contained in the first array and the length value of the encoding result of the target data segment.
Specifically, the first element obtained in step 301 is determined as the subscript of the second element corresponding to the target data segment in the second array. And determining a second element according to the length values of the coding results of the other data segments contained in the third array in the inverted index table and the number of the other data segments. And finally, updating the second array according to the second element and the subscript of the second element. Alternatively, the subscript of the second array may start counting from 0.
Continuing with the example of "Alice" as the target data segment, the update process of the second array in the inverted index table is described with reference to (b) in fig. 5:
the first element "1" of the target data segment "Alice" in the first array is determined as the subscript of the second element of the target data segment "Alice" in the second array.
Next, as can be seen from the inverted index table shown in fig. 5 (a), before the target data fragment "Alice" is updated to the second array in the inverted index table, the third array in the inverted index table already stores the encoding result of 1 data fragment (Bob) corresponding to the user ID field in the record data 0, and the length value 3 of the encoding result, that is, the position with subscripts of 0 to 3 in the third array, has been filled with the encoding result "Bob" of the data fragment corresponding to the user ID in the record data 0 and the length value "3" of the encoding result. Wherein, the recording data 0 is written before the recording data 1, and the recording data 0 is doc ID =0, user ID Bob, position: buildingB1, streetB2, order ID 80; time: and T0. As can be seen from fig. 5 (a), the hash value of the data segment "Bob" is 7, the identification information of the recorded data 0 is 0, and the hash value of the identification information is also 0.
At this time, the sum of the number of data fragments (the number is 1) corresponding to the user ID in the recording data 0 and the length value (the length value is 3) of the encoding result of each data fragment may be determined as 4 in the second element of the target data fragment "Alice" in the recording data 1 in the second array. That is, in the third array, the encoding result of "Alice" and the length value of the encoding result are filled with the position with the subscript of 4 as the starting point.
Finally, the second element "4" is updated in the second array at the position with the subscript "1".
For ease of understanding, in the above description, the encoding result of the data segment contained in the third array may be the same as the original data segment, but in practice, the two may be different.
And S303, updating a third array in the inverted index table corresponding to the field to which the target data segment belongs according to the array elements contained in the second array and the encoding result of the target data segment.
Specifically, a second element of the target data segment in the second data is determined as a subscript of a third element of the first type corresponding to the target data segment in the third array. And updating the third array by taking the length value of the coding result of the target data segment as the third element of the first type according to the subscript of the third element of the first type number, and updating the third array by taking the coding result of the target data segment as the third element of the second type. Wherein the third element of the first type in the third array indicates a length value of the target data segment encoding result. The third element of the second type indicates an encoding result of the target data segment. Alternatively, the subscript of the third array may start counting from 0.
Continuing with the example of "Alice" as the target data segment, the update process of the third array in the inverted index table is described with reference to (b) in fig. 5:
determining a second element "4" of the target data fragment "Alice" in the second array as a subscript of a third element of the target data fragment "Alice" in the third array, determining a length value "5" of an encoding result of "Alice" as the third element of the first type, determining an encoding result "Alice" of the target data fragment "Alice" as the third element of the second type, and sequentially filling the third elements of the two types into the third array, that is, updating the third array.
S304, establishing an incidence relation between array elements contained in the third array and identification information of the record data to which the target data fragment belongs to obtain a fourth array in the inverted index table corresponding to the field to which the target data fragment belongs.
Based on the updated third data, an association relationship between the array element in the third array and the identification information of the record data to which the target data segment belongs may be further established, specifically, an association relationship between the third element of the first type and the identification information may be established, and the association relationship is also referred to as a fourth array in the inverted index table.
Continuing to take "Alice" as an example of the target data segment, the updating process of the fourth array in the inverted index table is described with reference to (b) in fig. 5:
and establishing an incidence relation between a third element '5' of the first type of the target data fragment 'Alice' in the third array and the identification information '1' of the record data to which the target data fragment 'Alice' belongs, so as to realize the update of the fourth array.
After the target data segment "Alice" is updated to the inverted index table corresponding to the user ID according to the above steps, the inverted index table may be as shown in (b) of fig. 5.
Based on the above description, the multi-layer arrays constituting the inverted index table may have the following relationships: the first element of the target data fragment in the first array is an index of a second element of the data fragment in the second array, and the second element of the target data fragment in the second array is an index of a third element of the first type of the data fragment in the third data.
Optionally, in practice, a plurality of pieces of successively written recording data may be divided into a plurality of identical data segments, for example, at time T2, written recording data 2 is doc ID =2, alice.anderson: buildingA3, streetA4, order ID 100; time: and T2. Then, alice, a text-type data segment may still be divided, and the data segment may still be updated to the first to fourth arrays forming the inverted index table according to the above-mentioned manner, where the updated inverted index table may be as shown in fig. 6. However, since the calculated hash value of "Alice" in the record data 2 is 3 and conflicts with the element with index 3 in the first array, the identification information of the record data 2 is updated to the fourth array of the inverted index table.
In this embodiment, for the data segment of the text type included in the recorded data, each array in the multi-layer arrays may be updated in sequence, thereby completing the update of the entire inverted index table. And the inverted index table with the structure can be directly read in the memory.
Optionally, similar to the embodiment shown in fig. 4, for the text-type target data segment, it can also be updated into the forward index table, and the updating process can be as shown in fig. 7. The update process of the forward index table can also be understood in conjunction with fig. 8. The updating process may specifically include the steps of:
s401, updating a seventh array in the forward index table corresponding to the field to which the target data segment belongs according to the identification information of the recorded data to which the target data segment belongs and the hash value of the target data segment.
Specifically, according to the identification information of the record data to which the target data segment belongs, a seventh element corresponding to the target data segment in the seventh array is determined. And determining the subscript of the first element in the seventh array according to the hash value of the target data segment. Finally, the seventh array is updated according to the seventh element and the subscript of the seventh element. Alternatively, the subscript of the seventh array may start counting from 0.
Continuing to take the example shown in fig. 1 and 3, the recorded data 1 is: doc ID =1, user ID, alice. Anderson; position: buildingA1, streetA2, order ID 100; time: and T1. Taking "Alice" as an example of the target data fragment, the update process of the seventh array in the forward index table is described with reference to (b) in fig. 8:
if the identification information of the record data to which the "Alice" belongs is determined to be 1, then "1" is also the seventh element corresponding to the target data fragment "Alice" in the seventh array. Then, the hash value of "Alice" is calculated to be 3, and "3" is the position of the seventh element "1" in the seventh array. The position in the seventh array with the subscript "3" is updated with the seventh element "1".
S402, updating an eighth array in the forward index table corresponding to the field to which the target data segment belongs according to the array elements contained in the seventh array and the length value of the encoding result of the target data segment.
Specifically, the seventh element obtained in step 401 is determined as the subscript of the eighth element corresponding to the target data segment in the eighth array. And determining an eighth element according to the length value of the encoding result of other data segments contained in the ninth array in the inverted index table and the number of the other data segments. Finally, the eighth array is updated according to the eighth element and the subscript of the eighth element. Alternatively, the subscript of the eighth array may start counting from 0.
Continuing with the example of "Alice" as the target data segment, the update process of the eighth array in the inverted index table is described with reference to (b) in fig. 8:
and determining a seventh element "1" of the target data fragment "Alice" in the seventh array as a subscript of an eighth element of the target data fragment "Alice" in the eighth array.
Next, as can be seen from the forward index table shown in fig. 8 (a), before the target data fragment "Alice" is updated to the eighth array in the forward index table, the ninth array in the forward index table already stores the encoding result of 1 data fragment (Bob) corresponding to the user ID in the record data 0, and the length value of the encoding result is 3, that is, the positions of 0 to 3 subscripts in the ninth array have been filled with the encoding result "Bob" corresponding to the user ID in the record data 0 and the length value of the encoding result is "3". Wherein, the recording data 0 is written before the recording data 1, and the recording data 0 is doc ID =0, bob, position: buildingB1, streetB2, order ID 80; time: and T0.
The sum of the number of data fragments (the number is 1) corresponding to the user ID in the recording data 0 and the length value (the length value is 3) of the encoding result of each data fragment may be determined as the eighth element of the target data fragment "Alice" in the eighth array in the recording data 1 being 4. That is, in the ninth array, the encoding result of "Alice" and the length value of the encoding result are filled with the position with the subscript of 4 as the starting point.
Finally, the eighth array is updated with the second element "4" at the position indexed by "1".
For ease of understanding, in the above description, the encoding result of the data segment included in the ninth array may be the same as the original data segment, but in practice, the two may be different.
And S403, updating a ninth array in the forward index table corresponding to the field to which the target data segment belongs according to the array elements contained in the eighth array and the encoding result of the target data segment.
Specifically, a second element of the target data segment in the second data is determined as a subscript of a third element of the first type corresponding to the target data segment in the third array. And then according to the subscript of the third element of the first type number, taking the length value of the encoding result of the target data segment as the third element of the first type, updating the third array, taking the encoding result of the target data segment as the third element of the second type, and updating the third array. Wherein the third element of the first type in the third array indicates a length value of the target data segment encoding result. The third element of the second type indicates an encoding result of the target data segment.
Continuing to take "Alice" as an example of the target data segment, the updating process of the third array in the inverted index table is described with reference to (b) in fig. 8:
determining a second element "4" of the target data fragment "Alice" in the second array as a subscript of a third element of the target data fragment "Alice" in the third array, determining a length value "5" of an encoding result of "Alice" as the third element of the first type, simultaneously determining the encoding result "Alice" as the third element of the second type, and sequentially filling the third elements of the two types into the third array, namely, updating the third array is realized.
According to the above described updating method, the relationship between the multi-layer arrays constituting the forward index table is: the seventh element of the target data segment in the seventh array is an index of the eighth element of the data segment in the eighth array, and the eighth element of the target data segment in the eighth array is an index of the third element of the first type of the data segment in the ninth data.
Optionally, in practice, the plurality of pieces of recorded data written successively may also be divided into the same data segments, for example, the recorded data 2 written at time T2 is: doc ID =2, user ID: buildingA3, streetA4, order ID 100; time: and T2. The target data fragment "Alice" can still be updated into the seventh to ninth arrays in the above manner, and the updated forward index table can be as shown in fig. 9. However, since the calculated hash value of "Alice" in the recorded data 2 is 3 and conflicts with the element with the index of 3 in the first array, the hash value of "Alice" is added by 1 to become 4, and then the target data segment "Alice" in the recorded data 2 is continuously updated to the forward index table in the manner described above.
In this embodiment, for the data segment of the text type included in the record data, each array in the multi-layer arrays may be sequentially updated, so as to complete the updating of the entire forward index table. And the forward index table with the structure can be directly read in the memory.
As can be seen from the combination of the embodiments shown in fig. 4 and fig. 7, the forward index table and the reverse index table have similar updating processes, but the hash value of the identification information of the searched recorded data of the target data segment used when the first array in the reverse index table is updated, and the identification information of the searched recorded data of the target data segment directly used when the seventh array in the forward index table is updated.
Although the storage structures of the forward-arranged index table and the reverse-arranged index table are all multi-layer arrays, the multi-layer relations between the arrays are different, and the specific structural difference between the forward-arranged index table and the reverse-arranged index table can be understood by comparing fig. 8 and fig. 5.
Alternatively, if the target data segment is a numeric data segment, the process of updating the target data segment into the inverted index table may be described as follows: and updating the association relationship between the identification information of the record data to which the target data segment belongs and the target data segment to a fifth array in the inverted index table.
Continuing to take the above example, the recorded data 1 is: doc ID =1, user ID: alice. Anderson; position: buildingA1, streetA2, order ID 100; time: and T1.
The fields in the recorded data 1 are divided into several pieces of data of numerical type 100 and T1. Wherein, "100" belongs to a field, which is updated to an inverted index table, and "T1" is updated to another inverted index table. Taking "100" as an example of the target data segment, the update process of the fifth array in the inverted index table is as follows: and establishing an association relation between 100 and doc ID = 1.
Optionally, the inverted index table corresponding to the field to which the numeric target data segment belongs may further include a sixth array, configured to store the most significant values of multiple data segments in the same field in the record data written in the preset time period.
And if the numerical value of the target data segment is larger than the maximum value contained in the sixth array in the inverted index table, updating the maximum value in the sixth array according to the target data segment, otherwise, not updating the maximum value in the sixth array.
And if the numerical value of the target data segment is smaller than the minimum value contained in the sixth array in the inverted index table, updating the minimum value in the sixth array according to the target data segment, otherwise, not updating the minimum value in the sixth array.
For example, before writing the record data 1, the record data 0 already written is: doc ID =0, user ID: bob; position: buildingB1, streetB2, order ID 80; time: and T0. At this time, as shown in (a) of fig. 10, before the recording data 1 is written, the maximum value is 80 and the minimum value is 1 in the sixth array included in the inverted index table corresponding to the order ID. In the write recording data 1: doc ID =1, user ID: alice. Anderson; position: buildingA1, streetA2, order ID 100; time: after T1, the maximum value included in the sixth array is updated to 100, and the minimum value is unchanged, as shown in (b) of fig. 10.
Alternatively, for a numeric target data segment, the process of updating it into the forward index table may be described as follows: according to the identification information of the record data to which the target data segment belongs and the key value pair established by the target data segment; and updating the forward index table according to the key value pair. That is, the mapping relationship between the identification information of the recorded data to which the target data segment belongs and the target data segment is established.
Continuing to take the above example, the recorded data 1 is: doc ID =1, user ID: alice. Anderson; position: buildingA1, streetA2, order ID 100; time: and T1.
The fields in the recorded data 1 are divided into several pieces of data of numerical type 100 and T1. Wherein "100" belongs to a field, which is updated to an inverted index table, and T1 is updated to another inverted index table. Taking "100" as an example of the target data segment, the updating process of the forward index table is as follows: taking "doc ID =1" as KEY and "100" as Value, the two constitute a KEY-Value pair, that is, a mapping relationship between the two is established. The updating process of the forward index table can be understood in conjunction with fig. 11.
Based on the provided mode, different types of data fragments can be updated to the forward index table and the reverse index table corresponding to the fields to which the data fragments belong. And the real-time written record data can be updated to the forward index table and the reverse index table stored in the memory in real time. And then responding to the search operation, the processor can also realize real-time search of the recorded data based on the updated forward index table and reverse index table in the memory and the second index table in the disk.
Optionally, after the forward index table and the reverse index table corresponding to the text-type data segment are updated, the order of the elements in the first array in the reverse index table may be readjusted according to a preset rule, for example, according to an alphabetical order. It can be seen that Alice should be before Bob alphabetically, and therefore the inverted index table shown in fig. 5 (b) is adjusted to the inverted index table shown in fig. 12. Similarly, the forward index table may also be adjusted as described above. In the actual searching process, the dichotomy is often adopted for searching, and the searching efficiency can be improved by adjusting the forward index table and the backward index table.
Optionally, based on the first index table and the second index table obtained above, in an actual search process, when performing a search with doc ID as a keyword, the forward index table may be directly used for a search, and corresponding record data may be obtained. When a certain text type data segment in other fields of non-doc ID is used as a keyword for searching, the doc ID containing the keyword can be found by using the reverse index table, and then the record data with the doc ID and the position of the keyword in the record data can be found by using the forward index table.
The embodiment of the invention provides a structural schematic diagram of a search system. As shown in fig. 2, the system may specifically include: disk, memory, and processor. In practice, the corresponding hardware representation of the system may be a server.
The memory is used for storing the first index table. The magnetic disk is used for storing a second index table, and the second index table and the first index table have different storage structures;
the processor is used for responding to the writing operation and writing the recorded data into the memory; in response to the write operation, updating the first index table according to the recorded data; and responding to the search operation, and determining a search result according to the first index table and the second index table by means of a preset interface supporting different storage structures.
The specific operation of each part of the system can be referred to the related description in the embodiment shown in fig. 1. Optionally, a data writing thread and a searching thread are further specifically established in the processor, and specific working processes and working timings of the two threads may refer to the related description in the embodiments shown in fig. 1 to 12, which is not described herein again.
In addition, other contents not described in detail in this embodiment may also refer to the related descriptions in the embodiments shown in fig. 1 to fig. 12, and are not described again here.
The search apparatus of one or more embodiments of the present invention will be described in detail below. Those skilled in the art will appreciate that these search means may be constructed by configuring them by the steps taught in this embodiment using commercially available hardware components.
Fig. 13 is a schematic structural diagram of a search apparatus according to an embodiment of the present invention, and as shown in fig. 13, the search apparatus includes:
and the writing module 11 is configured to write the record data into the memory in response to the writing operation.
And an updating module 12, configured to update the first index table in the memory according to the record data in response to the write operation.
And a search module 13, configured to determine, in response to a search operation, a search result according to the first index table and a second index table in the disk by using preset interfaces supporting different storage structures, where the second index table and the first index table have different storage structures.
Optionally, the record data comprises at least one field; the update module 12 specifically includes:
a dividing unit 121, configured to divide data included in at least one field of the record data into at least one data fragment.
An updating unit 122, configured to update the first index table corresponding to each field to which the at least one data fragment belongs according to the data type of each of the at least one data fragment.
Optionally, the data segment includes text-type data, and the first index table includes an inverted index table with a storage structure of a multi-layer array;
the updating unit 122 is specifically configured to: updating a first array in an inverted index table corresponding to a field to which the target data segment belongs according to the hash value of the identification information of the recorded data to which the target data segment belongs and the hash value of the target data segment, wherein the target data segment is any one of the at least one data segment;
updating a second array in the inverted index table corresponding to the field to which the target data fragment belongs according to array elements contained in the first array and the length value of the encoding result of the target data fragment;
updating a third array in the inverted index table corresponding to the field to which the target data segment belongs according to array elements contained in the second array and the encoding result of the target data segment;
and establishing an incidence relation between array elements contained in the third array and the identification information of the record data to which the target data fragment belongs to obtain a fourth array in the inverted index table corresponding to the field to which the target data fragment belongs.
Optionally, the updating unit 122 is specifically configured to: determining a first element corresponding to the target data segment in the first array according to the hash value of the identification information of the record data to which the target data segment belongs;
determining a subscript of the first element in the first array according to the hash value of the target data segment;
and updating the first array according to the first element and the subscript of the first element.
Optionally, the updating unit 122 is specifically configured to: determining the first element as a subscript of a corresponding second element of the target data segment in the second array;
determining the second element according to the length values of the coding results of other data segments already contained in the third array in the inverted index table and the number of the other data segments;
and updating the second array according to the second element and the subscript of the second element.
Optionally, the updating unit 122 is specifically configured to: determining the second element as a subscript of a third element of the first type corresponding to the target data segment in the third array;
updating the third array by taking the length value of the encoding result of the target data segment as the third element of the first type according to the subscript of the third element of the first type number;
and taking the encoding result of the target data segment as a third element of a second type, and updating the third array.
Optionally, the data segment includes numerical data; the first index table comprises an inverted index table;
the updating unit 122 is specifically configured to: the updating the first index table corresponding to each field to which the at least one data fragment belongs according to the data type of each of the at least one data fragment includes:
updating the association relationship between the identification information of the record data to which the target data segment belongs and the target data segment to a fifth array in the inverted index table, wherein the target data segment is any one of the at least one data segment
Optionally, the updating unit 122 is specifically configured to: if the numerical value of the target data segment is larger than the maximum value contained in a sixth array in the inverted index table, updating the maximum value in the sixth array according to the target data segment;
or,
and if the value of the target data segment is smaller than the minimum value contained in a sixth array in the inverted index table, updating the minimum value in the sixth array according to the target data segment.
Optionally, the data segment includes text type data, and the first index table includes a forward index table with a multi-layer array storage structure;
the updating unit 122 is specifically configured to: updating a seventh array in a forward index table corresponding to a field to which the target data segment belongs according to the identification information of the recorded data to which the target data segment belongs and the hash value of the target data segment, wherein the target data segment is any one of the at least one data segment;
updating an eighth array in the forward index table corresponding to the field to which the target data segment belongs according to array elements contained in the seventh array and the length value of the encoding result of the target data segment;
and updating a ninth array in the forward index table corresponding to the field to which the target data segment belongs according to the array elements contained in the eighth array and the encoding result of the target data segment.
The apparatus shown in fig. 13 can perform the method of the embodiment shown in fig. 1 to 12, and reference may be made to the related description of the embodiment shown in fig. 1 to 12 for a part not described in detail in this embodiment. The implementation process and technical effect of the technical solution refer to the descriptions in the embodiments shown in fig. 1 to fig. 12, which are not described herein again.
The internal functions and structures of the search apparatus are described above, and in one possible design, the structure of the search apparatus may be implemented as an electronic device, which may include: a processor 21 and a memory 22. Wherein the memory 22 is used for storing a program for supporting the electronic device to execute the searching method provided in the embodiments shown in fig. 1 to 12, and the processor 21 is configured to execute the program stored in the memory 22.
The program comprises one or more computer instructions which, when executed by the processor 21, are capable of performing the steps of:
responding to the write operation, and writing the recorded data into the memory;
responding to the write-in operation, and updating a first index table in a memory according to the recorded data;
and responding to the search operation, and determining a search result according to the first index table and a second index table in the disk by means of a preset interface supporting different storage structures, wherein the second index table and the first index table have different storage structures.
Optionally, the processor 21 is further configured to perform all or part of the steps in the embodiments shown in fig. 1 to 12.
Optionally, the electronic device may further include a communication interface 23 in the structure, so that the electronic device communicates with other devices or a communication network.
In addition, an embodiment of the present invention provides a computer storage medium for storing computer software instructions for the electronic device, which includes a program for executing the search method in the method embodiments shown in fig. 1 to 12.
Based on the foregoing embodiments, as shown in fig. 15, another electronic device according to an embodiment of the present invention is further provided for a text-type data segment included in recorded data, where the electronic device includes a processor 31 and a memory 32; the memory 32 stores index data corresponding to the recording data in a preset storage structure; the preset storage structure comprises a first array, a second array, a third array and a fourth array.
And obtaining the elements of the first array according to the hash value of the identification information of the recorded data to which the data segment belongs and the hash value of the data segment.
And the elements of the second array are obtained according to the elements of the first array and the length value of the coding result of the data segment.
And the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment.
The fourth array records the incidence relation between the elements of the third array and the identification information of the record data to which the data fragment belongs.
The preset storage structure of the index data may be specifically an inverted index table shown in fig. 5 or fig. 6.
The memory 32 also stores executable code that, when executed by the processor, causes the processor to perform all or some of the steps described above in the embodiments of fig. 1-12.
Optionally, the electronic device may further include a communication interface 33 in the structure, so that the electronic device can communicate with other devices or a communication network.
As for text-based data fragments contained in the recorded data, as shown in fig. 16, the electronic device according to an embodiment of the present invention further includes a processor 41 and a memory 42, where the memory 42 stores index data corresponding to the recorded data in a preset storage structure; the preset storage structure comprises a first array, a second array and a third array.
And obtaining the elements of the first array according to the identification information of the recorded data to which the target data segment belongs and the hash value of the data segment.
And the elements of the second array are obtained according to the elements of the first array and the length value of the coding result of the data segment.
And the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment.
The first to third arrays in this embodiment are also the seventh to ninth arrays in the embodiment shown in fig. 7. The preset storage structure of the index data provided by this embodiment may specifically be the forward index table shown in fig. 8 or fig. 9.
The memory 32 also stores executable code that, when executed by the processor, causes the processor to perform all or some of the steps described above in the embodiments of fig. 1-12.
Optionally, the electronic device may further include a communication interface 43 in the structure, so that the electronic device can communicate with other devices or a communication network.
The embodiment of the invention provides further electronic equipment for numerical data fragments contained in recorded data. The electronic equipment comprises a processor and a memory, wherein the memory stores index data corresponding to recorded data in a preset storage structure; the preset storage structure of the index data may specifically be the forward index table and the reverse index table provided in the embodiment shown in fig. 10 or fig. 11
In addition, an embodiment of the present invention further provides a computer program product, where the computer program product includes: computer program/instructions, which, when executed by a processor, cause the processor to carry out the search method as in the method embodiment shown in fig. 1 to 12.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (13)

1. A method of searching, comprising:
responding to the write operation, and writing the record data containing at least one field into the memory;
in response to the write operation, dividing data contained in the at least one field into at least one data fragment;
updating a first index table corresponding to each of the at least one data segment in the memory according to the data type of each of the at least one data segment in the recorded data;
responding to a search operation, determining a search result according to the first index table and a second index table in the disk by means of preset interfaces supporting different storage structures, wherein the second index table and the first index table have different storage structures;
the first index table corresponding to the text type data segment comprises a reverse index table and a forward index table, wherein the storage structures of the reverse index table and the forward index table are all multilayer arrays; in the reverse index table and the forward index table, the element of the text type data segment in the previous array is a subscript of the element of the text type data segment in the next array.
2. The method of claim 1, wherein the data segments comprise textual type data;
the updating, according to the data type of each of the at least one data fragment, the first index table corresponding to each of the fields to which the at least one data fragment belongs includes:
updating a first array in an inverted index table corresponding to a field to which a target data segment belongs according to a hash value of identification information of recorded data to which the target data segment belongs and the hash value of the target data segment, wherein the target data segment is any one of the at least one data segment;
updating a second array in the inverted index table corresponding to the field to which the target data segment belongs according to array elements contained in the first array and the length value of the encoding result of the target data segment;
updating a third array in the inverted index table corresponding to the field to which the target data segment belongs according to array elements contained in the second array and the encoding result of the target data segment;
and establishing an incidence relation between array elements contained in the third array and the identification information of the record data to which the target data segment belongs to obtain a fourth array in the inverted index table corresponding to the field to which the target data segment belongs.
3. The method according to claim 2, wherein the updating the first array in the inverted index table corresponding to the field to which the target data segment belongs according to the hash value of the identification information of the record data to which the target data segment belongs and the hash value of the target data segment comprises:
determining a first element corresponding to the target data segment in the first array according to the hash value of the identification information of the record data to which the target data segment belongs;
determining a subscript of the first element in the first array according to the hash value of the target data segment;
and updating the first array according to the first element and the subscript of the first element.
4. The method according to claim 3, wherein the updating, according to the array elements included in the first array and the length value of the encoding result of the target data segment, the second array in the inverted index table corresponding to the field to which the target data segment belongs includes:
determining the first element as a subscript of a corresponding second element of the target data segment in the second array;
determining the second element according to the length value of the encoding result of other data segments already contained in the third array in the inverted index table and the number of the other data segments;
and updating the second array according to the second element and the subscript of the second element.
5. The method according to claim 4, wherein the updating the third array in the inverted index table corresponding to the field to which the target data fragment belongs according to the array elements included in the second array and the encoding result of the target data fragment comprises:
determining the second element as a subscript of a third element of the first type corresponding to the target data segment in the third array;
according to the subscript of the third element of the first type, taking the length value of the encoding result of the target data segment as the third element of the first type, and updating the third array;
and updating the third array by taking the encoding result of the target data segment as a third element of a second type.
6. The method of claim 1, wherein the data segments comprise numerical data; the first index table comprises an inverted index table;
the updating, according to the data type of each of the at least one data fragment, the first index table corresponding to each of the fields to which the at least one data fragment belongs includes:
and updating the association relationship between the identification information of the record data to which the target data segment belongs and the target data segment to a fifth array in the inverted index table, wherein the target data segment is any data segment in the at least one data segment.
7. The method of claim 6, further comprising:
if the numerical value of the target data segment is larger than the maximum value contained in a sixth array in the inverted index table, updating the maximum value in the sixth array according to the target data segment;
or,
and if the numerical value of the target data segment is smaller than the minimum value contained in a sixth array in the inverted index table, updating the minimum value in the sixth array according to the target data segment.
8. The method of claim 1, wherein the data segments comprise textual type data;
the updating, according to the data type of each of the at least one data fragment, the first index table corresponding to each of the fields to which the at least one data fragment belongs includes:
updating a seventh array in a forward index table corresponding to a field to which a target data fragment belongs according to identification information of recorded data to which the target data fragment belongs and a hash value of the target data fragment, wherein the target data fragment is any one of the at least one data fragment;
updating an eighth array in the forward index table corresponding to the field to which the target data fragment belongs according to the array elements contained in the seventh array and the length value of the encoding result of the target data fragment;
and updating a ninth array in the forward index table corresponding to the field to which the target data segment belongs according to the array elements contained in the eighth array and the encoding result of the target data segment.
9. A search system, comprising: a magnetic disk, a memory and a processor;
the memory is used for storing a first index table;
the disk is used for storing a second index table, and the second index table and the first index table have different storage structures;
the processor is used for responding to a write operation and writing record data containing at least one field into the memory; in response to the write operation, dividing data contained in the at least one field into at least one data fragment; updating the first index table corresponding to the at least one data fragment in the memory according to the data type of the at least one data fragment in the recorded data; responding to the search operation, and determining a search result according to the first index table and the second index table by means of preset interfaces supporting different storage structures;
the first index table corresponding to the text type data segment comprises a reverse index table and a forward index table, wherein the storage structures of the reverse index table and the forward index table are all multilayer arrays; in the reverse index table and the forward index table, the element of the text type data segment in the previous array is a subscript of the element of the text type data segment in the next array.
10. An electronic device, comprising: a memory and a processor; the memory stores index data corresponding to recorded data by a preset storage structure, and data segments contained in the recorded data are text-type data segments;
the preset storage structure comprises a first array, a second array, a third array and a fourth array;
the elements of the first array are obtained according to the hash value of the identification information of the record data to which the data segment belongs and the hash value of the data segment;
the elements of the second array are obtained according to the elements of the first array and the length value of the encoding result of the data fragment;
the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment;
the fourth array records the incidence relation between the elements of the third array and the identification information of the record data to which the data fragments belong;
the memory further stores executable code that, when executed by the processor, causes the processor to perform the search method of any one of claims 1 to 8.
11. An electronic device, comprising: a memory and a processor; the memory stores index data corresponding to recorded data by using a preset storage structure, data segments contained in the recorded data are numerical data, and the preset storage structure comprises a first array, a second array and a third array;
the elements of the first array are obtained according to the identification information of the recorded data to which the target data segment belongs and the hash value of the data segment;
the elements of the second array are obtained according to the elements of the first array and the length value of the coding result of the data segment;
the elements of the third array are obtained according to the elements of the second array and the coding result of the data fragment;
the memory further stores executable code that, when executed by the processor, causes the processor to perform the search method of any one of claims 1 to 8.
12. A non-transitory machine-readable storage medium having executable code stored thereon, which when executed by a processor of an electronic device, causes the processor to perform the search method of any one of claims 1 to 8.
13. A computer program product, comprising: computer program/instructions, wherein the computer program, when executed by a processor, causes the processor to implement the search method of any one of claims 1 to 8.
CN202111201085.6A 2021-10-15 2021-10-15 Search method, system, device, storage medium and computer program product Active CN113641780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111201085.6A CN113641780B (en) 2021-10-15 2021-10-15 Search method, system, device, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111201085.6A CN113641780B (en) 2021-10-15 2021-10-15 Search method, system, device, storage medium and computer program product

Publications (2)

Publication Number Publication Date
CN113641780A CN113641780A (en) 2021-11-12
CN113641780B true CN113641780B (en) 2023-02-03

Family

ID=78427075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111201085.6A Active CN113641780B (en) 2021-10-15 2021-10-15 Search method, system, device, storage medium and computer program product

Country Status (1)

Country Link
CN (1) CN113641780B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069590B (en) * 2024-04-22 2024-06-21 极限数据(北京)科技有限公司 Forward index processing method, device, medium and equipment for searching database

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104133781A (en) * 2013-05-03 2014-11-05 富鸿康科技(深圳)有限公司 Network storage equipment and method thereof for improving data access speed
WO2016192057A1 (en) * 2015-06-03 2016-12-08 华为技术有限公司 Updating method and device for index table
CN110109868B (en) * 2018-01-18 2023-07-18 伊姆西Ip控股有限责任公司 Method, apparatus and computer program product for indexing files
CN109726264B (en) * 2019-01-16 2022-02-25 北京百度网讯科技有限公司 Method, apparatus, device and medium for index information update

Also Published As

Publication number Publication date
CN113641780A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
US10417265B2 (en) High performance parallel indexing for forensics and electronic discovery
US7702609B2 (en) Adapting to inexact user input
US11294875B2 (en) Data storage on tree nodes
US10255363B2 (en) Refining search query results
US10592508B2 (en) Organizing datasets for adaptive responses to queries
CN110188100A (en) Data processing method, device and computer storage medium
CN110134681B (en) Data storage and query method and device, computer equipment and storage medium
US10496656B2 (en) Compressing time stamp columns
CN112100182A (en) Data warehousing processing method and device and server
CN113641780B (en) Search method, system, device, storage medium and computer program product
US11954086B2 (en) Index data structures and graphical user interface
CN107622090B (en) Object acquisition method, device and system
KR102153259B1 (en) Data domain recommendation method and method for constructing integrated data repository management system using recommended domain
CN116955856A (en) Information display method, device, electronic equipment and storage medium
US20230153286A1 (en) Method and system for hybrid query based on cloud analysis scene, and storage medium
CN115080684B (en) Network disk document indexing method and device, network disk and storage medium
CN115469810A (en) Data acquisition method, device, equipment and storage medium
CN111221817B (en) Service information data storage method, device, computer equipment and storage medium
CN113761102A (en) Data processing method, device, server, system and storage medium
CN113515504B (en) Data management method, device, electronic equipment and storage medium
CN113886723B (en) Method and device for determining ordering stability, storage medium and electronic equipment
CN111095183A (en) Semantic dimensions in user interfaces
CN113987322A (en) Index data query method and device, computer equipment and computer program product
US20230325366A1 (en) System and method for entity disambiguation for customer relationship management
CN113111120B (en) Service data verification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240509

Address after: Room 1-2-A06, Yungu Park, No. 1008 Dengcai Street, Sandun Town, Xihu District, Hangzhou City, Zhejiang Province, 310030

Patentee after: Aliyun Computing Co.,Ltd.

Country or region after: China

Address before: No.12, Zhuantang science and technology economic block, Xihu District, Hangzhou City, Zhejiang Province, 310012

Patentee before: Aliyun Computing Co.,Ltd.

Country or region before: China

Patentee before: Alibaba (China) Co.,Ltd.