WO2023093607A1 - 一种离线数据模糊搜索方法、装置、设备和介质 - Google Patents

一种离线数据模糊搜索方法、装置、设备和介质 Download PDF

Info

Publication number
WO2023093607A1
WO2023093607A1 PCT/CN2022/132523 CN2022132523W WO2023093607A1 WO 2023093607 A1 WO2023093607 A1 WO 2023093607A1 CN 2022132523 W CN2022132523 W CN 2022132523W WO 2023093607 A1 WO2023093607 A1 WO 2023093607A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
fuzzy search
identifier
search
cut
Prior art date
Application number
PCT/CN2022/132523
Other languages
English (en)
French (fr)
Inventor
唐智强
Original Assignee
天翼数字生活科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天翼数字生活科技有限公司 filed Critical 天翼数字生活科技有限公司
Publication of WO2023093607A1 publication Critical patent/WO2023093607A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Definitions

  • the present invention relates to the technical field of fuzzy search, in particular to a method, device, device and medium for offline data fuzzy search.
  • the source data to be searched is compared with the pattern data as a comparison standard one by one, and the data is selected according to the similarity, or the data is searched at the traditional sentence level.
  • the present invention provides an off-line data fuzzy search method, device, equipment and medium, which solves the problem that the existing technology needs to continuously iterate in the face of massive data, it is difficult to maintain system stability, and it is impossible to perform data search flexibly and quickly technical problem.
  • a fuzzy offline data search method provided in the first aspect of the present invention includes:
  • the method before the step of verifying the fuzzy search request information when the fuzzy search request information sent by any client is received, the method further includes:
  • Analyzing the configuration file obtaining data cutting specification information in the configuration file and reading the data set to be cut from the target database corresponding to the data cutting specification information;
  • Each of the page identifiers is cached to a preset cache database, and each of the cut data sets is cached to a preset storage database.
  • the step of parsing the configuration file, obtaining the data cutting specification information in the configuration file and reading the data set to be cut from the target database corresponding to the data cutting specification information includes:
  • the data cutting specification information includes a timing task identifier;
  • timing task identifier is the first preset identifier, read the target database corresponding to the data cutting specification information according to the preset cycle, and obtain the data set to be cut;
  • timed task identifier is the second preset identifier, then return the read failure prompt.
  • the data cutting specification information includes segmented cutting specifications, the data set to be cut includes multiple rows of data to be cut, and each row of the data to be cut has a corresponding data identifier; the data cutting specification according to the The step of performing data cutting on the data set to be cut to obtain a plurality of cut data sets includes:
  • the step of verifying the fuzzy search request information includes:
  • the step of querying a preset storage database in turn according to each of the paging identifiers to obtain the data sets to be screened corresponding to each of the paging identifiers includes:
  • the preset storage database is queried by using the query statement, and the data sets to be screened corresponding to each paging identifier are sequentially obtained.
  • the data set to be screened includes multiple rows of data to be screened; the step of filtering each data set to be screened according to the fuzzy search field to obtain at least one target search data includes:
  • the data to be screened whose field similarity is greater than a preset similarity threshold is determined as target search data.
  • the second aspect of the present invention provides an off-line data fuzzy search device, including:
  • Information verification module for when receiving the fuzzy search request information sent by any client, verify the fuzzy search request information
  • a fuzzy query information extraction module used to extract fuzzy search fields and search parameters from the fuzzy search request information if the verification is passed;
  • a paging identifier acquisition module configured to acquire the paging identifiers corresponding to the search parameters one by one from a preset cache database
  • a paging identification query module configured to query a preset storage database in turn according to each of the paging identifications, to obtain data sets to be screened corresponding to each of the paging identifications;
  • a data screening module configured to filter each of the data sets to be screened according to the fuzzy search field to obtain at least one target search data
  • the data collection and return module is used for summarizing all the target search data, generating a target search data set and returning it to the client.
  • the third aspect of the present invention provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the first method according to the present invention.
  • a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes the first method according to the present invention.
  • the fourth aspect of the present invention provides a computer-readable storage medium, on which a computer program is stored.
  • the computer program is executed, the off-line data fuzzy search method according to any one of the first aspect of the present invention is implemented.
  • the present invention has the following advantages:
  • the fuzzy search request information sent by any client is received, the fuzzy search request information is verified; if the verification is passed, the fuzzy search field and search parameters are extracted from the fuzzy search request information; one by one from the preset cache database Obtain the paging ID corresponding to the search parameter; query the preset storage database in turn according to each paging ID to obtain the data sets to be screened corresponding to each paging ID; filter each data set to be screened according to the fuzzy search field to obtain at least one target Search data; aggregate all target search data, generate target search data set and return to the client. Therefore, the operation of querying a large table is divided into multiple small steps for fuzzy search, and finally the query results are combined and returned, which can effectively reduce the server load while meeting user needs, and effectively maintain system stability.
  • Fig. 1 is a flow chart of the steps of an off-line data fuzzy search method provided by an embodiment of the present invention
  • Fig. 2 is a flow chart of the steps of the database caching process provided by the embodiment of the present invention.
  • Fig. 3 is the implementation frame diagram of a kind of off-line data fuzzy search process of the embodiment of the present invention.
  • Fig. 4 is a structural block diagram of an off-line data fuzzy search device provided by an embodiment of the present invention.
  • the embodiment of the present invention provides an off-line data fuzzy search method, device, equipment and medium, which are used to solve the problem that the existing technology needs to continuously iterate in the face of massive data, it is difficult to maintain system stability, and it cannot be performed flexibly and quickly.
  • Technical issues with data search are used to solve the problem that the existing technology needs to continuously iterate in the face of massive data, it is difficult to maintain system stability, and it cannot be performed flexibly and quickly.
  • FIG. 1 is a flow chart of steps of an off-line data fuzzy search method provided by an embodiment of the present invention.
  • a kind of off-line data fuzzy search method provided by the present invention comprises:
  • Step 101 when receiving the fuzzy search request information sent by any client, verify the fuzzy search request information
  • step 101 may include the following sub-steps:
  • the fuzzy search request information refers to information that carries multiple fuzzy search parameters, such as search conditions, search types, search fields, etc., input by the user according to requirements.
  • the fuzzy search request information sent by any client is received, the fuzzy search request information is parsed to obtain a plurality of fuzzy search parameters contained therein.
  • the fuzzy search parameters include the fuzzy search field and search parameters. If all of them are included, it can be judged that the verification is passed and wait for the next fuzzy search. If there is any or none of the conditions included , it is judged that the verification fails, and the verification failure information is returned or the fuzzy search is ended directly.
  • the method before performing step 101, the method also includes the following steps S11-S15:
  • project fuzzy search usually requires constant redeployment.
  • preset configuration software such as Apollo Configuration Center can receive user configuration information input by users and generate corresponding configuration files.
  • Apollo is easy to deploy. It is developed based on Spring Boot and Spring Cloud. It can be run directly after packaging, without additional installation of application containers such as Tomcat. It can manage the configurations of different environments and different clusters in a unified manner, and can also implement configuration changes to take effect in real time (hot release). After the user modifies and publishes the configuration in Apollo, the client can receive the latest configuration in real time (1 second) and notify to the application.
  • the details are as follows:
  • PageSize Data cutting specification information, which can be set according to the actual situation
  • TableName data table name, supports multiple tables, and multiple tables are separated by English commas ","
  • SearchTime Timing task execution cycle, the smaller the interval, the stronger the timeliness and the greater the performance consumption
  • nextRunTime running time interval, in seconds
  • resultSize the maximum number of result sets, the default is 1000
  • S12 may include the following substeps:
  • the data cutting specification information includes the timing task identification;
  • timing task identifier is the first preset identifier, read the target database corresponding to the data cutting specification information according to the preset cycle, and obtain the data set to be cut;
  • a prompt of failure to read is returned.
  • the data cutting specification information in the configuration file may be obtained by parsing the configuration file, where the data cutting specification information includes the timing task identifier. Then based on the type of the timing task identification, if the timing task identification is the first preset identification, then the target database corresponding to the data cutting specification information can be read according to the preset cycle, and the data set to be cut is obtained from the target database, so as to The scheduled task execution cycle obtains the data set to be cut saved in the target database.
  • timing task identifier is the second preset identifier, it indicates that no data cutting timing task has been set at this time, and at this time, a reading failure prompt can be returned to end the fuzzy search process.
  • the data cutting specification information includes segmented cutting specifications
  • the data set to be cut includes multiple rows of data to be cut
  • each row of data to be cut has a corresponding data identifier.
  • S13 may include the following substeps:
  • Read data identifiers sequentially from the start identifier, and record the number of reads in real time
  • the data to be cut corresponding to the read data identifier is determined as the cut data set and the last data identifier that has been read is obtained;
  • next data identifier of the last data identifier that has been read as the new start identifier, jump to execute the steps of reading data identifiers sequentially from the start identifier, and record the number of reads in real time until all data identifiers have been read Fetch to get multiple cut data sets.
  • the data set to be divided may consist of multiple lines of data to be divided, and each line of data to be divided has a corresponding data identifier, for example, a data row ID corresponding to each line of data to be divided.
  • the commonly used query method is to use the mysql syntax limit[offset,]rows to query. The query will be very fast at the beginning, and the query will be slower as the query progresses. When the query reaches hundreds of thousands, it will cause mysql to scan hundreds of thousands of rows. It will seriously consume the performance of mysql.
  • the minimum value of the data marks in each data set to be cut can be obtained as the starting mark, and the data marks are read sequentially from the starting mark and recorded in real time If the number of reads is equal to the segmented cutting specification, the seven data corresponding to the read data identifier can be determined as the cut data set, and the next one of the last identifier that has been read can be obtained. The data identifier is used as a new starting identifier, and the data identifier is read again until all the data identifiers are read. At this time, multiple cut data sets can be obtained.
  • the number of scanned rows of mysql is skipped through the ID limit, and then the limit rows method is used to query, so as to realize the millisecond-level data return. If the query data set is empty, you can continue to read Take the next data set to be cut until all the data sets to be cut are processed.
  • Redis can be used to cache the paging identifier to the cache database, and the cache format is as follows:
  • tableName-[tableName]-[pageNum]-minId is the value of the key written in the cache, and the square brackets are the variable values, which are the specific cut data set table name and page number
  • [minId] the minimum id value stored, that is, the paging identifier
  • Redis is an open source, in-memory data structure storage system that can be used as a database, cache, and message middleware. It supports multiple types of data structures, such as strings, hashes, lists, sets, sorted sets, etc.
  • Step 102 if the verification is passed, extract the fuzzy search field and search parameters from the fuzzy search request information;
  • the fuzzy search fields and search parameters may be extracted from the fuzzy search request information.
  • the fuzzy search field may be a specific text fuzzy search field
  • the search parameters may include, but are not limited to, parameters such as table name and page number of the cut data set.
  • Step 103 obtaining the paging identifiers corresponding to the search parameters one by one from the preset cache database;
  • the associated page identifications can be obtained from the cache database one by one according to the search parameters.
  • the minimum ID can be obtained page by page in the cache database, that is, tableName-[tableName]-[pageNum]-minId can be obtained page by page from the cache database to obtain the subsequent storage database query data base.
  • Step 104 querying the preset storage database sequentially according to each paging identifier, to obtain the data sets to be screened corresponding to each paging identifier;
  • step 104 may include the following substeps:
  • a query statement is used to query the preset storage database, and the data sets to be screened corresponding to each paging identifier are sequentially obtained.
  • the query statements corresponding to each paging identifier can be respectively constructed according to each pagination identifier combined with the preset sentence rules, and then the query statement can be used to query the preset storage database, and each pagination identifier corresponding to each pagination identifier respectively corresponds to the to-be-screened data set.
  • the query statement can be a SQL statement: select * from TableName where id > [page identification] limit PageSize.
  • Step 105 filtering each data set to be screened according to the fuzzy search field to obtain at least one target search data
  • the data set to be screened includes multiple rows of data to be screened, and step 105 may include the following substeps:
  • the data to be screened whose field similarity is greater than a preset similarity threshold is determined as the target search data.
  • the field similarity between each row of data to be screened and the fuzzy search field can be calculated respectively, and then the data to be screened with a field similarity greater than the preset similarity threshold
  • the filtered data is determined as target search data.
  • the fuzzy comparison between the data queried in the database and the information that the user needs to search is used.
  • the main solution is the fuzzy search of text data, and the records containing the search information in the database are summarized and merged. At the beginning of each loop calculation, it is necessary to judge whether the total number of collected data records reaches the total number of records, and if so, jump out of the loop and the program ends.
  • Step 106 summarizing all target search data, generating a target search data set and returning it to the client.
  • FIG. 3 shows an implementation framework diagram of an offline data fuzzy search process according to an embodiment of the present invention.
  • offline data fuzzy search can be realized through the following process:
  • Aero is a distributed configuration center that can centrally manage the configurations of different environments and different clusters. After the configuration is modified, it can be pushed to the application side in real time, and it has the characteristics of standardized permissions and process governance. It is suitable for microservice configuration management scenarios.
  • the timing task of data cutting can be controlled through the Apollo configuration center to realize functions such as database connection, reading table information that needs to be accessed, cutting data size, operation cycle, start/stop, etc. It can be executed regularly to divide billion-level data into several parts according to the specified cutting size, obtain the start identifier of each part, and finally store the start identifier into the cache according to the cache naming rules.
  • the paging identifier is cyclically obtained from the cache, and each piece of record information is queried through the paging identifier, and then calculated and calculated in the memory according to the field to be searched, and the calculated records of each section are summarized and merged. Finally, the merged data set is returned to complete the business logic of the entire search.
  • the fuzzy search request information sent by any client is received, the fuzzy search request information is verified; if the verification is passed, the fuzzy search field and search parameters are extracted from the fuzzy search request information; Obtain the paging identifiers corresponding to the search parameters one by one from the preset cache database; query the preset storage database in turn according to each paging identifier to obtain the data sets to be screened corresponding to each paging identifier; filter each to-be-filtered data set according to the fuzzy search field Data set, at least one target search data is obtained; all target search data are aggregated to generate a target search data set and returned to the client. Therefore, the operation of querying a large table is divided into multiple small steps for fuzzy search, and finally the query results are combined and returned, which can effectively reduce the server load while meeting user needs, and effectively maintain system stability.
  • FIG. 4 is a structural block diagram of an off-line data fuzzy search device provided by an embodiment of the present invention.
  • An embodiment of the present invention provides an off-line data fuzzy search device, including:
  • the fuzzy query information extraction module 402 is used to extract fuzzy search fields and search parameters from the fuzzy search request information if the verification is passed;
  • a paging identification acquisition module 403, configured to acquire the paging identifications corresponding to the search parameters one by one from the preset cache database;
  • the paging identification query module 404 is used to query the preset storage database in turn according to each paging identification, so as to obtain the corresponding data sets to be screened respectively for each paging identification;
  • a data screening module 405, configured to filter each data set to be screened according to the fuzzy search field to obtain at least one target search data
  • the data collection and return module 406 is used for summarizing all target search data, generating a target search data set and returning it to the client.
  • the device also includes:
  • the configuration file generation module is used to generate a corresponding configuration file in response to the received user configuration information
  • the configuration file parsing module is used to parse the configuration file, obtain the data cutting specification information in the configuration file and read the data set to be cut from the target database corresponding to the data cutting specification information;
  • the data cutting module is used to perform data cutting on the data set to be cut according to the data cutting specification information to obtain multiple cut data sets;
  • a page identification extraction module configured to extract the page identification corresponding to each cut data set
  • the data caching module is configured to cache each page identifier to a preset cache database, and cache each cut data set to a preset storage database.
  • the configuration file parsing module is specifically used for:
  • the data cutting specification information includes the timing task identification;
  • timing task identifier is the first preset identifier, read the target database corresponding to the data cutting specification information according to the preset cycle, and obtain the data set to be cut;
  • a prompt of failure to read is returned.
  • the data cutting specification information includes segmentation cutting specifications, the data set to be cut includes multiple rows of data to be cut, and each row of data to be cut has a corresponding data identifier; the data cutting module is specifically used for:
  • Read data identifiers sequentially from the start identifier, and record the number of reads in real time
  • the data to be cut corresponding to the read data identifier is determined as the cut data set and the last data identifier that has been read is obtained;
  • next data identifier of the last data identifier that has been read as the new start identifier, jump to execute the steps of reading data identifiers sequentially from the start identifier, and record the number of reads in real time until all data identifiers have been read Fetch to get multiple cut data sets.
  • the information verification module 401 is specifically used for:
  • the page identification query module 404 is specifically used for:
  • a query statement is used to query the preset storage database, and the data sets to be screened corresponding to each paging identifier are sequentially obtained.
  • the data set to be screened includes multiple rows of data to be screened; the data screening module 405 is specifically used for:
  • the data to be screened whose field similarity is greater than a preset similarity threshold is determined as the target search data.
  • the embodiment of the present invention also provides an electronic device, including a memory and a processor, wherein a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes any one of the present invention.
  • a computer program is stored in the memory, and when the computer program is executed by the processor, the processor executes any one of the present invention.
  • the embodiment of the present invention also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed, the off-line data fuzzy search method as described in any embodiment of the present invention is realized.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
  • the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium , including several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present invention.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种离线数据模糊搜索方法、装置、设备和介质,方法包括:当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验;若校验通过,则从所述模糊搜索请求信息提取模糊搜索字段和搜索参数;从预设的缓存数据库逐一获取所述搜索参数对应的分页标识;按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集;按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据;汇总全部所述目标搜索数据,生成目标搜索数据集并返回至所述用户端。在满足用户需求的同时有效降低服务器负荷,进而有效维护***稳定性。

Description

一种离线数据模糊搜索方法、装置、设备和介质 技术领域
本发明涉及模糊搜索技术领域,尤其涉及一种离线数据模糊搜索方法、装置、设备和介质。
背景技术
随着信息时代的到来,需要处理的数据海量增长。在数据处理之中,经常要在海量数据之中搜索所需的数据,因此如何既迅速又准确地搜索、定位数据对于数据的高效处理至关重要。
现有技术中存在多种数据搜索方法,通常主要采用待搜索的源数据与作为比较标准的模式数据逐个比较,按照相似度进行选取,或是传统语句级别的方式进行数据搜索。
但上述方案在面对海量数据的情况下需要进行不断迭代,难以维持***稳定性,无法灵活快速地进行数据搜索。
发明内容
本发明提供了一种离线数据模糊搜索方法、装置、设备和介质,解决了现有技术在面对海量数据的情况下需要进行不断迭代,难以维持***稳定性,无法灵活快速地进行数据搜索的技术问题。
本发明第一方面提供的一种离线数据模糊搜索方法,包括:
当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验;
若校验通过,则从所述模糊搜索请求信息提取模糊搜索字段和搜索参数;
从预设的缓存数据库逐一获取所述搜索参数对应的分页标识;
按照每个所述分页标识依次查询预设的存储数据库,得到每个所述 分页标识分别对应的待筛选数据集;
按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据;
汇总全部所述目标搜索数据,生成目标搜索数据集并返回至所述用户端。
可选地,在所述当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验的步骤之前,还包括:
响应接收到的用户配置信息,生成对应的配置文件;
解析所述配置文件,获取所述配置文件内的数据切割规格信息并从所述数据切割规格信息对应的目标数据库读取待切割数据集;
根据所述数据切割规格信息对所述待切割数据集进行数据切割,得到多个已切割数据集;
提取每个所述已切割数据集分别对应的分页标识;
将各个所述分页标识缓存至预设的缓存数据库,并将各个所述已切割数据集缓存至预设的存储数据库。
可选地,所述解析所述配置文件,获取所述配置文件内的数据切割规格信息并从所述数据切割规格信息对应的目标数据库读取待切割数据集的步骤,包括:
解析所述配置文件,获取所述配置文件内的数据切割规格信息;所述数据切割规格信息包括定时任务标识;
若所述定时任务标识为第一预设标识,则按照预设周期读取所述数据切割规格信息对应的目标数据库,获取待切割数据集;
当所述目标数据库读取失败时,返回读取失败提示;
若所述定时任务标识为第二预设标识,则返回所述读取失败提示。
可选地,所述数据切割规格信息包括分段切割规格,所述待切割数据集包括多行待切割数据,每行所述待切割数据具有对应的数据标识;所述根据所述数据切割规格信息对所述待切割数据集进行数据切割,得到多个已切割数据集的步骤,包括:
获取所述数据标识的最小值作为起始标识;
从所述起始标识依次读取所述数据标识,并实时记录读取数量;
当所述读取数量等于所述分段切割规格时,将已读取的数据标识对应的待切割数据确定为已切割数据集并获取已读取的最后一个数据标识;
将所述已读取的最后一个数据标识的下一个数据标识作为新的起始标识,跳转执行所述从所述起始标识依次读取所述数据标识,并实时记录读取数量的步骤,直至全部所述数据标识均已读取,得到多个已切割数据集。
可选地,所述当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验的步骤,包括:
当接收到任一用户端发送的模糊搜索请求信息时,解析所述模糊搜索请求信息,得到多个模糊搜索参数;
判断所述模糊搜索参数内是否包括所述模糊搜索字段和所述搜索参数;
若是,则判定校验通过;
若否,则判定校验不通过。
可选地,所述按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集的步骤,包括:
按照每个所述分页标识结合预设的语句规则,分别构建查询语句;
采用所述查询语句查询预设的存储数据库,依次得到每个分页标识分别对应的待筛选数据集。
可选地,所述待筛选数据集包括多行待筛选数据;所述按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据的步骤,包括:
分别计算各行所述待筛选数据与所述模糊搜索字段之间的字段相似度;
将所述字段相似度大于预设的相似度阈值的待筛选数据确定为目标搜索数据。
本发明第二方面提供了一种离线数据模糊搜索装置,包括:
信息校验模块,用于当接收到任一用户端发送的模糊搜索请求信息 时,对所述模糊搜索请求信息进行校验;
模糊查询信息提取模块,用于若校验通过,则从所述模糊搜索请求信息提取模糊搜索字段和搜索参数;
分页标识获取模块,用于从预设的缓存数据库逐一获取所述搜索参数对应的分页标识;
分页标识查询模块,用于按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集;
数据筛选模块,用于按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据;
数据汇总与返回模块,用于汇总全部所述目标搜索数据,生成目标搜索数据集并返回至所述用户端。
本发明第三方面提供了一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如本发明第一方面任一项所述的离线数据模糊搜索方法的步骤。
本发明第四方面提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被执行时实现如本发明第一方面任一项所述的离线数据模糊搜索方法。
从以上技术方案可以看出,本发明具有以下优点:
当接收到任一用户端发送的模糊搜索请求信息时,对模糊搜索请求信息进行校验;若校验通过,则从模糊搜索请求信息提取模糊搜索字段和搜索参数;从预设的缓存数据库逐一获取搜索参数对应的分页标识;按照每个分页标识依次查询预设的存储数据库,得到每个分页标识分别对应的待筛选数据集;按照模糊搜索字段筛选每个待筛选数据集,得到至少一个目标搜索数据;汇总全部目标搜索数据,生成目标搜索数据集并返回至用户端。从而将查询大表的操作,分为多个小步骤进行模糊搜索,最后将查询结果进行合并返回,在满足用户需求的同时有效降低服务器负荷,进而有效维护***稳定性。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。
图1为本发明实施例提供的一种离线数据模糊搜索方法的步骤流程图;
图2为本发明实施例提供的数据库缓存过程的步骤流程图;
图3为本发明实施例的一种离线数据模糊搜索过程的实现框架图;
图4为本发明实施例提供的一种离线数据模糊搜索装置的结构框图。
具体实施方式
本发明实施例提供了一种离线数据模糊搜索方法、装置、设备和介质,用于解决现有技术在面对海量数据的情况下需要进行不断迭代,难以维持***稳定性,无法灵活快速地进行数据搜索的技术问题。
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本发明一部分实施例,而非全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
请参阅图1,图1为本发明实施例提供的一种离线数据模糊搜索方法的步骤流程图。
本发明提供的一种离线数据模糊搜索方法,包括:
步骤101,当接收到任一用户端发送的模糊搜索请求信息时,对模糊搜索请求信息进行校验;
可选地,步骤101可以包括以下子步骤:
当接收到任一用户端发送的模糊搜索请求信息时,解析模糊搜索请求信息,得到多个模糊搜索参数;
判断模糊搜索参数内是否包括模糊搜索字段和搜索参数;
若是,则判定校验通过;
若否,则判定校验不通过。
模糊搜索请求信息指的是用户按照需求输入的携带多个模糊搜索参数,例如搜索条件,搜索类型,搜索字段等内容的信息。
在本发明实施例中,当接收到任一用户端发送的模糊搜索请求信息时,解析该模糊搜索请求信息,以获取到其中包含的多个模糊搜索参数。为判断后续模糊搜索能否正常进行,可以在获取到多个模糊搜索参数后,进一步判断其中是否包含有模糊搜索所需求的固定字段。此时可以通过检索模糊搜索参数中是否包括模糊搜索字段和搜索参数,若是全部都包括,则可以判定为校验通过,等待下一步的模糊搜索,若是存在任一个未包括或均未包括的情况,则判定为校验不通过,返回校验失败信息或直接结束模糊搜索。
请参阅图2,在执行步骤101之前,本方法还包括以下步骤S11-S15:
S11、响应接收到的用户配置信息,生成对应的配置文件;
在具体实现中,项目模糊搜索通常需要不断的重新部署,为实现秒级切换项目配置的过程,可以通过预设的配置软件例如Apollo配置中心接收用户输入的用户配置信息,生成对应的配置文件。
其中,Apollo部署简单,它是基于Spring Boot和Spring Cloud开发,打包后可以直接运行,不需要额外安装Tomcat等应用容器。它能够统一管理不同环境、不同集群的配置,也能实现配置修改实时生效(热发布),用户在Apollo修改完配置并发布后,客户端能实时(1秒)接收到最新的配置,并通知到应用程序。详细信息如下:
PageSize:数据切割规格信息,可以根据实际情况进行设置
TableName:数据表名,支持多张表,多张表使用英文逗号“,”分割
ProgramDown:定时任务标识(1:关闭,0:正常运行)
SearchTime:定时任务执行周期,间隔越小,时效性越强,性能消 耗越大
nextRunTime:运行时间间隔,单位秒
resultSize:结果集最大条数,默认1000
spring.datasource.url:数据库连接地址
spring.datasource.username:数据库用户名
spring.datasource.password:数据库密码
spring.datasource.driver-class-name:数据库驱动
S12、解析配置文件,获取配置文件内的数据切割规格信息并从数据切割规格信息对应的目标数据库读取待切割数据集;
进一步地,S12可以包括以下子步骤:
解析配置文件,获取配置文件内的数据切割规格信息;数据切割规格信息包括定时任务标识;
若定时任务标识为第一预设标识,则按照预设周期读取数据切割规格信息对应的目标数据库,获取待切割数据集;
当目标数据库读取失败时,返回读取失败提示;
若定时任务标识为第二预设标识,则返回读取失败提示。
在本发明实施例中,在获取到配置文件后,可以通过解析配置文件获取到其中的数据切割规格信息,其中数据切割规格信息包括定时任务标识。再基于定时任务标识的类型,若是定时任务标识为第一预设标识,则可以按照预设周期读取数据切割规格信息所对应的目标数据库,从目标数据库中获取到待切割数据集,以按照定时任务执行周期获取到目标数据库内所保存的待切割数据集。
若是定时任务标识为第二预设标识,则表明此时并未设置有数据切割定时任务,此时可以返回读取失败提示,结束模糊搜索过程。
S13、根据数据切割规格信息对待切割数据集进行数据切割,得到多个已切割数据集;
进一步地,数据切割规格信息包括分段切割规格,待切割数据集包括多行待切割数据,每行待切割数据具有对应的数据标识,S13可以包括以下子步骤:
获取数据标识的最小值作为起始标识;
从起始标识依次读取数据标识,并实时记录读取数量;
当读取数量等于分段切割规格时,将已读取的数据标识对应的待切割数据确定为已切割数据集并获取已读取的最后一个数据标识;
将已读取的最后一个数据标识的下一个数据标识作为新的起始标识,跳转执行从起始标识依次读取数据标识,并实时记录读取数量的步骤,直至全部数据标识均已读取,得到多个已切割数据集。
在本发明的一个示例中,通常待切割数据集可以由多行待切割数据组成,而每行待切割数据均具有对应的数据标识,例如每行待切割数据对应的数据行ID。常用的查询方式是使用mysql的语法limit[offset,]rows进行查询,刚开始查询会很快,查询越到后面就越慢,当查询到几十万以后会导致mysql扫描几十万行,这会严重消耗mysql的性能。为提高获取与数据切割效率,在获取到待切割数据集后,可以获取各个待切割数据集内的数据标识的最小值作为起始标识,从起始标识开始依次读取数据标识并实时记录标识的读取数量,若读取数量等于分段切割规格时,则可以将已读取的数据标识对应的带七个数据确定为已切割数据集,并获取已读取的最后一个标识的下一个数据标识作为新的起始标识,再次读取数据标识,直至全部数据标识均被读取,此时可以得到多个已切割数据集。
在具体实现中,通过指定ID的大小,通过ID限定的方式跳过mysql的扫描行数,再使用limit rows方式进行查询,从而实现毫秒级数据返回,若是查询数据集为空,则可以继续读取下一待切割数据集,直至全部待切割数据集均被处理。
S14、提取每个已切割数据集分别对应的分页标识;
S15、将各个分页标识缓存至预设的缓存数据库,并将各个已切割数据集缓存至预设的存储数据库。
在得到多个已切割数据集后,可以选取各个已切割数据集内的最小ID作为各个已切割数据集对应的分页标识,例如可以在上述查询的过程中直接对每个数据标识也就是数据行ID进行排序,得到最小ID。再将各个分页标识缓存至缓存数据库,同时将各个已切割数据集缓存至预设的存储 数据集。
需要说明的是,将分页标识缓存至缓存数据库可以使用Redis,缓存格式如下:
tableName-[tableName]-[pageNum]-minId:[minId]
tableName-[tableName]-[pageNum]-minId:为写入缓存中键的值,中括号为变量值,分别为具体的已切割数据集表名,及页码编号
[minId]:为存储的最小id值,即分页标识;
Redis,是一个开源的,内存中的数据结构存储***,它可以用作数据库、缓存和消息中间件。它支持多种类型的数据结构,如字符串(strings),散列(hashes),列表(lists),集合(sets),有序集合(sorted sets)等。
步骤102,若校验通过,则从模糊搜索请求信息提取模糊搜索字段和搜索参数;
在本发明实施例中,在校验通过的情况下,可以从模糊搜索请求信息中提取到模糊搜索字段和搜索参数。
在具体实现中,模糊搜索字段可以为具体的文字模糊搜索字段,搜索参数可以包括但不限于已切割数据集表名、页码编号等参数。
步骤103,从预设的缓存数据库逐一获取搜索参数对应的分页标识;
与此同时,在获取到搜索参数后,可以按照搜索参数从缓存数据库逐一获取所关联的分页标识。
在具体实现中,由于页码编号的存在,在缓存数据库可以逐页获取最小ID,也就是可以从缓存数据库内逐页获取到tableName-[tableName]-[pageNum]-minId,以获取到后续存储数据库的查询数据基础。
步骤104,按照每个分页标识依次查询预设的存储数据库,得到每个分页标识分别对应的待筛选数据集;
可选地,步骤104可以包括以下子步骤:
按照每个分页标识结合预设的语句规则,分别构建查询语句;
采用查询语句查询预设的存储数据库,依次得到每个分页标识分别对应的待筛选数据集。
在本发明实施例中,可以按照各个分页标识结合预设的语句规则,分别构建各个分页标识对应的查询语句,再采用查询语句查询预设的存储数据库,以的各个分页标识分别对应的待筛选数据集。
具体的,查询语句可以为sql语句:select*from TableName where id>[分页标识]limit PageSize。
步骤105,按照模糊搜索字段筛选每个待筛选数据集,得到至少一个目标搜索数据;
可选地,待筛选数据集包括多行待筛选数据,步骤105可以包括以下子步骤:
分别计算各行待筛选数据与模糊搜索字段之间的字段相似度;
将字段相似度大于预设的相似度阈值的待筛选数据确定为目标搜索数据。
在本发明的一个示例中,在获取到待筛选数据集后,可以分别计算其中各行待筛选数据和模糊搜索字段之间的字段相似度,再将字段相似度大于预设的相似度阈值的待筛选数据确定为目标搜索数据。
值得一提的是,使用数据库中查询的数据与用户需要搜索的信息进行模糊比较,主要解决是文字数据模糊搜索,将数据库中有包含搜索信息的记录进行归总合并。在每次循环计算开始时,需要判断收集好的数据记录总数是否达到总的记录数,如果达到则跳出循环,程序结束。
步骤106,汇总全部目标搜索数据,生成目标搜索数据集并返回至用户端。
请参阅图3,图3示出了本发明实施例的一种离线数据模糊搜索过程的实现框架图。
在本发明的另一个示例中,可以通过以下过程实现离线数据模糊搜索:
一、搭建使用统一的Apollo配置中心,用于灵活的配置程序需要的参数信息,在不重新部署项目的情况下,实现秒级切换项目配置。Apollo是分布式配置中心,能够集中化管理应用不同环境、不同集群的配置,配置修改后能够实时推送到应用端,并且具备规范的权限、流程治理等特性, 适用于微服务配置管理场景。
二、开发一个数据切割定时任务,用于切割海量数据,并将分段标识保存在缓存中。数据切割定时任务,可以通过Apollo配置中心进行控制,实现数据库的连接、读取需要接入的表信息、切割数据大小、运行周期、启动/停止等功能。能够定时执行按照指定切割大小将亿级数据切割成若干份,获取每一份的开始标识,最后将开始标识按照缓存命名规则存入缓存中。
三、开发一个模糊搜索业务模块,结合缓存与数据库,最终将查询的结果返回给客户端。根据搜索的表名,循环从缓存中获取分页标识,通过分页标识查询到每段记录信息,然后根据需要搜索的字段,在内存中进行计算计较,将每段计算后的记录进行归总合并,最后将合并后的数据集合进行返回,完成整个搜索的业务逻辑。
在不改变数据库架构,也不接入搜索引擎的情况下,将一次模糊查询进行N次分发,直到查询到结果为止。有效的避免在海量数据中进行全文检索,保证了数据库的稳定性,又满足了用户的查询要求,提供最好的查询性能。
在本发明实施例中,当接收到任一用户端发送的模糊搜索请求信息时,对模糊搜索请求信息进行校验;若校验通过,则从模糊搜索请求信息提取模糊搜索字段和搜索参数;从预设的缓存数据库逐一获取搜索参数对应的分页标识;按照每个分页标识依次查询预设的存储数据库,得到每个分页标识分别对应的待筛选数据集;按照模糊搜索字段筛选每个待筛选数据集,得到至少一个目标搜索数据;汇总全部目标搜索数据,生成目标搜索数据集并返回至用户端。从而将查询大表的操作,分为多个小步骤进行模糊搜索,最后将查询结果进行合并返回,在满足用户需求的同时有效降低服务器负荷,进而有效维护***稳定性。
请参阅图4,图4为本发明实施例提供的一种离线数据模糊搜索装置的结构框图。
本发明实施例提供了一种离线数据模糊搜索装置,包括:
信息校验模块401,用于当接收到任一用户端发送的模糊搜索请求 信息时,对模糊搜索请求信息进行校验;
模糊查询信息提取模块402,用于若校验通过,则从模糊搜索请求信息提取模糊搜索字段和搜索参数;
分页标识获取模块403,用于从预设的缓存数据库逐一获取搜索参数对应的分页标识;
分页标识查询模块404,用于按照每个分页标识依次查询预设的存储数据库,得到每个分页标识分别对应的待筛选数据集;
数据筛选模块405,用于按照模糊搜索字段筛选每个待筛选数据集,得到至少一个目标搜索数据;
数据汇总与返回模块406,用于汇总全部目标搜索数据,生成目标搜索数据集并返回至用户端。
可选地,本装置还包括:
配置文件生成模块,用于响应接收到的用户配置信息,生成对应的配置文件;
配置文件解析模块,用于解析配置文件,获取配置文件内的数据切割规格信息并从数据切割规格信息对应的目标数据库读取待切割数据集;
数据切割模块,用于根据数据切割规格信息对待切割数据集进行数据切割,得到多个已切割数据集;
分页标识提取模块,用于提取每个已切割数据集分别对应的分页标识;
数据缓存模块,用于将各个分页标识缓存至预设的缓存数据库,并将各个已切割数据集缓存至预设的存储数据库。
可选地,配置文件解析模块具体用于:
解析配置文件,获取配置文件内的数据切割规格信息;数据切割规格信息包括定时任务标识;
若定时任务标识为第一预设标识,则按照预设周期读取数据切割规格信息对应的目标数据库,获取待切割数据集;
当目标数据库读取失败时,返回读取失败提示;
若定时任务标识为第二预设标识,则返回读取失败提示。
可选地,数据切割规格信息包括分段切割规格,待切割数据集包括多行待切割数据,每行待切割数据具有对应的数据标识;数据切割模块具体用于:
获取数据标识的最小值作为起始标识;
从起始标识依次读取数据标识,并实时记录读取数量;
当读取数量等于分段切割规格时,将已读取的数据标识对应的待切割数据确定为已切割数据集并获取已读取的最后一个数据标识;
将已读取的最后一个数据标识的下一个数据标识作为新的起始标识,跳转执行从起始标识依次读取数据标识,并实时记录读取数量的步骤,直至全部数据标识均已读取,得到多个已切割数据集。
可选地,信息校验模块401具体用于:
当接收到任一用户端发送的模糊搜索请求信息时,解析模糊搜索请求信息,得到多个模糊搜索参数;
判断模糊搜索参数内是否包括模糊搜索字段和搜索参数;
若是,则判定校验通过;
若否,则判定校验不通过。
可选地,分页标识查询模块404具体用于:
按照每个分页标识结合预设的语句规则,分别构建查询语句;
采用查询语句查询预设的存储数据库,依次得到每个分页标识分别对应的待筛选数据集。
可选地,待筛选数据集包括多行待筛选数据;数据筛选模块405具体用于:
分别计算各行待筛选数据与模糊搜索字段之间的字段相似度;
将字段相似度大于预设的相似度阈值的待筛选数据确定为目标搜索数据。
本发明实施例还提供了一种电子设备,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如本发明任一实施例所述的离线数据模糊搜索方法的步骤。
本发明实施例还提供了一种计算机可读存储介质,其上存储有计算 机程序,所述计算机程序被执行时实现如本发明任一实施例所述的离线数据模糊搜索方法。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本发明所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种离线数据模糊搜索方法,其特征在于,包括:
    当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验;
    若校验通过,则从所述模糊搜索请求信息提取模糊搜索字段和搜索参数;
    从预设的缓存数据库逐一获取所述搜索参数对应的分页标识;
    按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集;
    按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据;
    汇总全部所述目标搜索数据,生成目标搜索数据集并返回至所述用户端。
  2. 根据权利要求1所述的方法,其特征在于,在所述当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验的步骤之前,还包括:
    响应接收到的用户配置信息,生成对应的配置文件;
    解析所述配置文件,获取所述配置文件内的数据切割规格信息并从所述数据切割规格信息对应的目标数据库读取待切割数据集;
    根据所述数据切割规格信息对所述待切割数据集进行数据切割,得到多个已切割数据集;
    提取每个所述已切割数据集分别对应的分页标识;
    将各个所述分页标识缓存至预设的缓存数据库,并将各个所述已切割数据集缓存至预设的存储数据库。
  3. 根据权利要求2所述的方法,其特征在于,所述解析所述配置文件,获取所述配置文件内的数据切割规格信息并从所述数据切割规格信息对应的目标数据库读取待切割数据集的步骤,包括:
    解析所述配置文件,获取所述配置文件内的数据切割规格信息;所述数据切割规格信息包括定时任务标识;
    若所述定时任务标识为第一预设标识,则按照预设周期读取所述数据切割规格信息对应的目标数据库,获取待切割数据集;
    当所述目标数据库读取失败时,返回读取失败提示;
    若所述定时任务标识为第二预设标识,则返回所述读取失败提示。
  4. 根据权利要求2所述的方法,其特征在于,所述数据切割规格信息包括分段切割规格,所述待切割数据集包括多行待切割数据,每行所述待切割数据具有对应的数据标识;所述根据所述数据切割规格信息对所述待切割数据集进行数据切割,得到多个已切割数据集的步骤,包括:
    获取所述数据标识的最小值作为起始标识;
    从所述起始标识依次读取所述数据标识,并实时记录读取数量;
    当所述读取数量等于所述分段切割规格时,将已读取的数据标识对应的待切割数据确定为已切割数据集并获取已读取的最后一个数据标识;
    将所述已读取的最后一个数据标识的下一个数据标识作为新的起始标识,跳转执行所述从所述起始标识依次读取所述数据标识,并实时记录读取数量的步骤,直至全部所述数据标识均已读取,得到多个已切割数据集。
  5. 根据权利要求1所述的方法,其特征在于,所述当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验的步骤,包括:
    当接收到任一用户端发送的模糊搜索请求信息时,解析所述模糊搜索请求信息,得到多个模糊搜索参数;
    判断所述模糊搜索参数内是否包括所述模糊搜索字段和所述搜索参数;
    若是,则判定校验通过;
    若否,则判定校验不通过。
  6. 根据权利要求1所述的方法,其特征在于,所述按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集的步骤,包括:
    按照每个所述分页标识结合预设的语句规则,分别构建查询语句;
    采用所述查询语句查询预设的存储数据库,依次得到每个分页标识分别对应的待筛选数据集。
  7. 根据权利要求1所述的方法,其特征在于,所述待筛选数据集包括多行待筛选数据;所述按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据的步骤,包括:
    分别计算各行所述待筛选数据与所述模糊搜索字段之间的字段相似度;
    将所述字段相似度大于预设的相似度阈值的待筛选数据确定为目标搜索数据。
  8. 一种离线数据模糊搜索装置,其特征在于,包括:
    信息校验模块,用于当接收到任一用户端发送的模糊搜索请求信息时,对所述模糊搜索请求信息进行校验;
    模糊查询信息提取模块,用于若校验通过,则从所述模糊搜索请求信息提取模糊搜索字段和搜索参数;
    分页标识获取模块,用于从预设的缓存数据库逐一获取所述搜索参数对应的分页标识;
    分页标识查询模块,用于按照每个所述分页标识依次查询预设的存储数据库,得到每个所述分页标识分别对应的待筛选数据集;
    数据筛选模块,用于按照所述模糊搜索字段筛选每个所述待筛选数据集,得到至少一个目标搜索数据;
    数据汇总与返回模块,用于汇总全部所述目标搜索数据,生成目标搜索数据集并返回至所述用户端。
  9. 一种电子设备,其特征在于,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1-7任一项所述的离线数据模糊搜索方法的步骤。
  10. 一种电子设备,其特征在于,包括存储器及处理器,所述存储器中储存有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1-7任一项所述的离线数据模糊搜索方法的步骤。
PCT/CN2022/132523 2021-11-23 2022-11-17 一种离线数据模糊搜索方法、装置、设备和介质 WO2023093607A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111396179.3 2021-11-23
CN202111396179.3A CN114116762A (zh) 2021-11-23 2021-11-23 一种离线数据模糊搜索方法、装置、设备和介质

Publications (1)

Publication Number Publication Date
WO2023093607A1 true WO2023093607A1 (zh) 2023-06-01

Family

ID=80440076

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132523 WO2023093607A1 (zh) 2021-11-23 2022-11-17 一种离线数据模糊搜索方法、装置、设备和介质

Country Status (2)

Country Link
CN (1) CN114116762A (zh)
WO (1) WO2023093607A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435509A (zh) * 2023-12-20 2024-01-23 深圳市智慧城市科技发展集团有限公司 接口数据的动态比对方法、动态比对设备和存储介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116762A (zh) * 2021-11-23 2022-03-01 天翼数字生活科技有限公司 一种离线数据模糊搜索方法、装置、设备和介质
CN115794892B (zh) * 2023-01-09 2023-05-23 北京创新乐知网络技术有限公司 基于分层缓存的搜索方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569255A (zh) * 2019-08-16 2019-12-13 苏宁云计算有限公司 数据库分库分表的分页查询方法、装置和计算机设备
CN110597859A (zh) * 2019-09-06 2019-12-20 天津车之家数据信息技术有限公司 一种分页查询数据的方法和装置
CN111400315A (zh) * 2020-03-04 2020-07-10 深圳乐信软件技术有限公司 一种单表数据查询方法、装置、设备及储存介质
CN112148731A (zh) * 2020-08-13 2020-12-29 新华三大数据技术有限公司 一种数据分页查询方法、装置及存储介质
CN112199420A (zh) * 2020-10-16 2021-01-08 成都房联云码科技有限公司 一种房产隐私字段信息模糊搜索方法
US11068537B1 (en) * 2018-12-11 2021-07-20 Amazon Technologies, Inc. Partition segmenting in a distributed time-series database
CN114116762A (zh) * 2021-11-23 2022-03-01 天翼数字生活科技有限公司 一种离线数据模糊搜索方法、装置、设备和介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11068537B1 (en) * 2018-12-11 2021-07-20 Amazon Technologies, Inc. Partition segmenting in a distributed time-series database
CN110569255A (zh) * 2019-08-16 2019-12-13 苏宁云计算有限公司 数据库分库分表的分页查询方法、装置和计算机设备
CN110597859A (zh) * 2019-09-06 2019-12-20 天津车之家数据信息技术有限公司 一种分页查询数据的方法和装置
CN111400315A (zh) * 2020-03-04 2020-07-10 深圳乐信软件技术有限公司 一种单表数据查询方法、装置、设备及储存介质
CN112148731A (zh) * 2020-08-13 2020-12-29 新华三大数据技术有限公司 一种数据分页查询方法、装置及存储介质
CN112199420A (zh) * 2020-10-16 2021-01-08 成都房联云码科技有限公司 一种房产隐私字段信息模糊搜索方法
CN114116762A (zh) * 2021-11-23 2022-03-01 天翼数字生活科技有限公司 一种离线数据模糊搜索方法、装置、设备和介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435509A (zh) * 2023-12-20 2024-01-23 深圳市智慧城市科技发展集团有限公司 接口数据的动态比对方法、动态比对设备和存储介质
CN117435509B (zh) * 2023-12-20 2024-04-02 深圳市智慧城市科技发展集团有限公司 接口数据的动态比对方法、动态比对设备和存储介质

Also Published As

Publication number Publication date
CN114116762A (zh) 2022-03-01

Similar Documents

Publication Publication Date Title
WO2023093607A1 (zh) 一种离线数据模糊搜索方法、装置、设备和介质
CN111526060B (zh) 业务日志的处理方法及***
Ramaswamy et al. Automatic detection of fragments in dynamically generated web pages
EP3251031B1 (en) Techniques for compact data storage of network traffic and efficient search thereof
US10565208B2 (en) Analyzing multiple data streams as a single data object
US8751486B1 (en) Executing structured queries on unstructured data
CN103412924B (zh) 日志多语言查询方法和***
US11321315B2 (en) Methods and systems for database optimization
CN106982150B (zh) 一种基于Hadoop的移动互联网用户行为分析方法
US20200117676A1 (en) Method and system for executing queries on indexed views
US8661022B2 (en) Database management method and system
US9361338B2 (en) Offloaded, incremental database statistics collection and optimization
US20150341771A1 (en) Hotspot aggregation method and device
CN104050276A (zh) 一种分布式数据库的缓存处理方法及***
CN111897867A (zh) 一种数据库日志统计方法、***及相关装置
WO2017000592A1 (zh) 数据处理方法、装置及***
CN113282555A (zh) 一种数据处理方法、装置、设备及存储介质
CN117171108B (zh) 一种虚拟模型映射方法和***
CN107609151A (zh) 基于Redis实现XBRL实例文档缓存的方法
CN114398520A (zh) 数据检索方法、***、装置、电子设备及存储介质
Näsholm Extracting data from nosql databases-a step towards interactive visual analysis of nosql data
CN116186116A (zh) 一种基于等保测评的资产问题分析方法
EP3436988B1 (en) "methods and systems for database optimisation"
CN104657370B (zh) 一种实现多维立方体关联的方法和装置
US20240220519A1 (en) Systems and methods for managing log data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897690

Country of ref document: EP

Kind code of ref document: A1