CN113051460A - Elasticissearch-based data retrieval method and system, electronic device and storage medium - Google Patents

Elasticissearch-based data retrieval method and system, electronic device and storage medium Download PDF

Info

Publication number
CN113051460A
CN113051460A CN202110336591.XA CN202110336591A CN113051460A CN 113051460 A CN113051460 A CN 113051460A CN 202110336591 A CN202110336591 A CN 202110336591A CN 113051460 A CN113051460 A CN 113051460A
Authority
CN
China
Prior art keywords
information
index
retrieval
internet
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110336591.XA
Other languages
Chinese (zh)
Inventor
张裴裴
王雪峰
骆飞
李青龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Smart Starlight Information Technology Co ltd
Original Assignee
Beijing Smart Starlight Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Smart Starlight Information Technology Co ltd filed Critical Beijing Smart Starlight Information Technology Co ltd
Priority to CN202110336591.XA priority Critical patent/CN113051460A/en
Publication of CN113051460A publication Critical patent/CN113051460A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data retrieval method, a system, electronic equipment and a storage medium based on an elastic search, wherein the method comprises the following steps: classifying and acquiring the corresponding information of the internet information by an acquisition system; determining the index name corresponding to each internet data in the Elasticissearch cluster according to the information classification and acquisition time; storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch; obtaining an index retrieval range corresponding to the retrieval statement according to the retrieval statement; and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result. The Internet information is stored in the corresponding index in the Elasticissearch cluster according to the information classification and acquisition time, and the index can be searched in a designated manner during searching, so that the multi-dimensional full-text searching is realized, and the searching efficiency is improved.

Description

Elasticissearch-based data retrieval method and system, electronic device and storage medium
Technical Field
The invention relates to the field of internet data processing, in particular to a data retrieval method and system based on an elastic search, electronic equipment and a storage medium.
Background
The current information retrieval is mainly full-text retrieval by using keywords, and mainstream search engines in the market can only search related text information of webpages, which brings great inconvenience to the retrieval, and has the advantages of large retrieval range, long retrieval time and low retrieval efficiency.
Disclosure of Invention
In view of this, embodiments of the present invention provide an elastic search based data retrieval method, system, electronic device and storage medium, so as to solve the disadvantage of low retrieval efficiency in the prior art.
Therefore, the embodiment of the invention provides the following technical scheme:
according to a first aspect, an embodiment of the present invention provides an elastic search based data retrieval method, including: acquiring internet information, wherein the internet information comprises a plurality of internet data; the method comprises the steps that information classification and collection time corresponding to each piece of internet data are obtained through internet information through a collection system, and the information classification is used for representing the source position of the internet data; adapting through an information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster; storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch; searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
Optionally, after the step of retrieving in the Elasticsearch cluster according to the index retrieval range to obtain the retrieval result, the method further includes: and displaying the retrieval result.
Optionally, the step of performing result display on the search result includes: acquiring display requirement information; identifying the retrieval result according to the display demand information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, after the step of storing the internet data into the index corresponding to the index name in the Elasticsearch cluster according to the index name, the method further includes: determining index deletion time according to service requirements; and deleting the index with the earlier index time according to the preset deletion period according to the index deletion time.
Optionally, before the step of adapting through the information index adaptation module according to the information classification and acquisition time, the method further includes: establishing an index in an Elasticissearch cluster in advance; and mapping the indexes with the information classification and acquisition time one by one.
According to a second aspect, an embodiment of the present invention provides an elastic search based data retrieval system, including: the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring internet information which comprises a plurality of internet data; the first processing module is used for classifying and acquiring information corresponding to each piece of internet data obtained by the internet information through the acquisition system, wherein the information classification is used for representing the source position of the internet data; the second processing module is used for carrying out adaptation through the information index adaptation module according to the information classification and acquisition time and determining the index name corresponding to each piece of internet data in the Elasticissearch cluster; the third processing module is used for storing the internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; the second acquisition module is used for acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; the fourth processing module is used for generating a retrieval statement for the information to be retrieved according to the query grammar of the Elasticissearch; the fifth processing module is used for searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and the sixth processing module is used for searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
Optionally, the method further comprises: and the seventh processing module is used for displaying the result of the retrieval result.
Optionally, the seventh processing module includes: the first acquisition unit is used for acquiring the display requirement information; the first processing unit is used for identifying the retrieval result according to the display requirement information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, the method further comprises: the eighth processing module is used for determining the index deletion time according to the service requirement; and the ninth processing module is used for deleting the indexes with the index time ahead according to the index deletion time and the preset deletion period.
Optionally, the method further comprises: a tenth processing module, configured to establish an index in the Elasticsearch cluster in advance; and the eleventh processing module is used for mapping the indexes with the information classification and acquisition time one by one.
According to a third aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method for elistic search based data retrieval as described in any of the above first aspects.
According to a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for retrieving data based on Elasticsearch described in any of the first aspect.
The technical scheme of the embodiment of the invention has the following advantages:
the embodiment of the invention provides a data retrieval method, a system, electronic equipment and a storage medium based on an elastic search, wherein the method comprises the following steps: acquiring internet information, wherein the internet information comprises a plurality of internet data; the method comprises the steps that information classification and collection time corresponding to each piece of internet data are obtained through internet information through a collection system, and the information classification is used for representing the source position of the internet data; adapting through an information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster; storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name; acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range; generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch; searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement; and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result. In the steps, the internet information is stored in the index corresponding to the Elasticissearch cluster according to the information classification and acquisition time of the internet information, and the index can be searched according to the information classification and acquisition time in the subsequent search, so that the multi-dimensional full-text search of the information classification and acquisition time is realized, and the search efficiency is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a specific example of an Elasticsearch-based data retrieval method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a specific example of an Elasticsearch cluster index of the Elasticsearch-based data retrieval method according to the embodiment of the present invention;
FIG. 3 is a block diagram of a specific example of an Elasticissearch-based data retrieval system according to an embodiment of the present invention;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a data retrieval method based on an Elasticsearch, as shown in fig. 1, the method includes steps S1-S8.
Step S1: internet information is acquired, and the Internet information comprises a plurality of Internet data.
In this embodiment, the internet information includes a plurality of internet data, and the internet information can be provided by the collection system, and the collection system is responsible for information collection, and specifically, the collection system obtains the internet information by the collaborative crawler. This is only illustrated schematically in the present embodiment, and is not limited thereto. The acquired internet information specifically comprises internet data and information classification and acquisition time corresponding to the internet data.
Step S2: and obtaining information classification and acquisition time corresponding to each piece of internet data through the acquisition system for the internet information, wherein the information classification is used for representing the source position of the internet data.
In this embodiment, the information classification is used to characterize the source location of the internet data, and the acquisition time is the time when the acquisition system acquires the information. Specifically, the source location may be different websites such as WeChat, microblog, Baidu, headline, and surf, and the different websites may be different information categories. In this embodiment, the information classification mainly includes: the present invention relates to a network media (web), a microblog (weibo), a Weixin (weixin), a forum (forum), a bar (***), a headline (toutiao), a newspaper (printmedia), a video (video), etc., which are only schematically described in this embodiment, and are not limited thereto; in other embodiments, the information classification may also include other classifications, which may be set as appropriate as needed.
Specifically, the acquisition system acquires the acquired information classification by configuring a specific information classification for the collaborative crawler, for example, if a certain group of crawlers is responsible for acquiring microblog data, info _ flag is added to the information, and another group of crawlers is responsible for acquiring hectometer bar data, then info _ flag is added to the information, and by analogy, different data sources have different identifications; the collection time is the time when the crawler collects the information, and ctime is currentTime.
Step S3: and adapting through an information index adaptation module according to the information classification and acquisition time, and determining the index name corresponding to each internet data in the Elasticissearch cluster.
In this embodiment, the information index adaptation module is mainly responsible for adapting the acquired information to the index in the Elasticsearch cluster. The module can map information classification and acquisition time with indexes in the Elasticissearch cluster, and mainly aims to uniformly manage index names so as to store acquired acquisition data to corresponding indexes in the Elasticissearch cluster, and meanwhile, the module can play a decoupling role in a retrieval process.
Specifically, an index is generated in the elastic search cluster in advance, the name of the index is determined according to the index information classification and the index acquisition time, and the specific format may be the index acquisition time _ index information classification. In this embodiment, the index acquisition time may be accurate to the day, and of course, in other embodiments, the index acquisition time may also be set to other values, for example, one week, half a month, and the like, and may be reasonably set according to actual needs. This is only schematically described in the present embodiment, and is not limited thereto. One specific example of the index established in the Elasticsearch cluster is shown in fig. 2.
For example, if the index information is classified as microblog and the index collection time is 20210315 a day, the collection data from the microblog and the collection time 20210315 in the whole day is stored in the folder of "20210315 _ microblog".
The acquisition system pushes the information to the message queue, and the information index adaptation module reads the information in the message queue. The information exists in a json format in the message queue, the information is referred to as data for short, info _ flag in the data is an information classification identifier, gtime is information acquisition time, ctime is information publishing time, and the module maps an index name according to the information classification and acquisition time, wherein the index name is as follows: data.get ("gtime") + "_" + data.get ("info _ flag"), such as information classified as a microblog, with an acquisition time of 20201219, the index name is: 20201219_ weibo.
Step S4: and storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name.
In this embodiment, an index storage space is established for each index name in the Elasticsearch cluster, so that the collected data is stored according to the index name, and in the subsequent retrieval process, the retrieval range can also be determined according to the retrieval statement. After the index name corresponding to each internet data is obtained, the internet information can be respectively stored in the Elasticsearch cluster according to the index name.
Step S5: and acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range.
In this embodiment, the information to be retrieved is determined according to the retrieval requirement, and may specifically include the retrieval keyword, the information classification of the retrieval information, and the retrieval time range.
Step S6: and generating a retrieval statement for the information to be retrieved according to the query grammar of the Elasticissearch.
In this embodiment, according to the specific search keyword to be searched, the information classification of the search information, and the search time range, the information and index adapter module is called, a specific search statement is generated according to the query syntax of the Elasticsearch, and then the Elasticsearch cluster is used for searching.
The key of the search refers to a keyword that the user wants to search, for example, if the user wants to search the information that the keyword of "two parties" is on the "microblog" platform and the time range is within 20210321 and 20210322, the name of the searched index is: 20210321_ weibo,20210322_ weibo, the retrieved statement is:
{ "query": { "bone": { "filter": { "bone": { "must _ not": { "term": { "data _ type":3} },' must ": {" range ": {" public _ time ": {" gte ": 1615824000000", "lte": 1616428740000"}, {" bone ": {" short ": {" shell ": } } } } }," mut ": {" query ": two parties", "idfields": [ "title", "content" } } } }.
Step S7: and searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement.
In this embodiment, the index statement includes the information classification and the time range of the information to be retrieved, so that the index retrieval range in the Elasticsearch cluster corresponding to the information to be retrieved can be determined.
Step S8: and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
In this embodiment, the index name may be determined according to the index retrieval range, and then the acquired data stored in the index corresponding to the index name is found by searching in the Elasticsearch cluster according to the index name, and the acquired data is retrieved to obtain the retrieval result.
In the steps, the internet information is stored in the index corresponding to the Elasticissearch cluster according to the information classification and acquisition time of the internet information, and the index can be searched according to the information classification and acquisition time in the subsequent search, so that the multi-dimensional full-text search of the information classification and acquisition time is realized, and the search efficiency is improved.
As an exemplary embodiment, the step S8 is further included after the step of retrieving the results of the retrieval in the Elasticsearch cluster according to the index retrieval range, and the step S9 is included.
Step S9: and displaying the retrieval result.
In the present embodiment, step S9 includes steps S91-S93.
Step S91: and acquiring display requirement information.
In this embodiment, the display requirement information is determined according to the user retrieval requirement. Specifically, the display requirement information comprises keyword colors and preset attribute extraction information; this is only schematically illustrated in the present embodiment, which is not limited to this, and the present embodiment may be reasonably configured as required in practical application.
Wherein, the keywords are retrieval keywords input by the user; the preset attribute is a key attribute, the key attribute belongs to the service characteristics of the service system, for example, in the public opinion industry, information publishing time, author figure images, information forwarding chains and the like all belong to the key attribute, and the service system processes information according to the service characteristics of the service system.
Step S92: and identifying the retrieval result according to the display demand information to obtain the identified retrieval result.
Specifically, the search result is identified according to the display requirement information, for example, if the color of the keyword in the display requirement information is set to be red, the keyword in the search result is marked with red.
Step S93: and displaying the identified retrieval result.
Specifically, the identified retrieval result is displayed to the user, so that the user can more visually see the retrieval result.
According to the steps, the retrieval result is identified according to the display requirement information, and the identified retrieval result is displayed, so that the retrieval result is more visual.
As an exemplary embodiment, after the step of storing the internet data in the index corresponding to the index name in the Elasticsearch cluster according to the index name in the step S4, steps S10-S11 are further included.
Step S10: and determining the index deletion time according to the service requirement.
In this embodiment, the service requirement includes a requirement for the retrieval time, and the index deletion time may be determined according to the requirement for the retrieval time. For example, if the retrieval time is about 5 years or about 10 years, data about five years ago or about ten years ago can be deleted to reduce the storage space.
Specifically, the index deletion time may be one day, one week, one month, or the like, and may be determined reasonably according to the service requirement.
Step S11: and deleting the index with the earlier index time according to the preset deletion period according to the index deletion time.
In this embodiment, the preset deletion period may be reasonably set according to actual needs, specifically, the preset deletion period may be one day, one week, one month, and the like, which is only schematically described in this embodiment and is not limited thereto.
For example, if the index deletion time is one week and the preset deletion period is one week, the acquired data of one week with the earliest index time is deleted every week.
In this embodiment, the information classification is actually a fixed dimension, a new index is generated every day as time passes, an index used in the next day is created at 1 point in the morning every day by using a timing script, and meanwhile, the integral deletion of an earlier index can be performed according to actual business requirements, so that the problem of performance degradation of an Elasticsearch cluster when conditional data deletion is performed is solved.
According to the steps, the indexes with earlier time are deleted regularly according to actual service requirements, so that the aims of managing and storing mass data are fulfilled.
As an exemplary embodiment, the step S3 further includes steps S12-S13 before the step of adapting by the information index adaptation module according to the information classification and collection time.
Step S12: indexes are built in advance in the Elasticsearch cluster.
In this embodiment, an index storage space is established for each index name in the Elasticsearch cluster, so that the collected data is stored according to the index name.
Step S13: and mapping the indexes with the information classification and acquisition time one by one.
In the present embodiment, a specific example of the mapping process is as follows.
For example
Figure BDA0002997941720000111
For example, if the information classification is weibo, the acquisition time is 20201219, then the name of the index is 20201219_ weibo; for another example, if the information classification is weixin, the index name is 20201219_ weixin.
The above steps, an index is established in the Elasticsearch cluster in advance, and information classification and acquisition time are mapped so as to store the acquired data into the Elasticsearch cluster.
In the embodiment, the index name of the Elasticissearch is generated according to the information classification and acquisition time of the Internet information, and the data is stored into the corresponding index during storage; during retrieval, retrieval of the designated index can be carried out according to information classification and acquisition time; when the index is deleted, the index of a certain specified classification and date can be completely deleted at one time, so that multi-dimensional full-text retrieval of information classification, acquisition time and the like is realized, and the massive data can be efficiently and quickly managed.
A detailed description is given below with a specific example.
a. Information acquisition system
The method mainly provides basic internet information for the embodiment, performs classification identification on the information, namely information classification, realizes interaction with the embodiment through a message queue, and comprises the steps of pushing the information to the message queue by an acquisition system and reading the information in the message queue by a processing system.
b. An information processing system (processing system) mainly comprises the following sub-modules
1) Information and index adapter module
This module is mainly responsible for the adaptation of information to the indexes in the Elasticsearch cluster. The module can map information classification and acquisition time with indexes in the Elasticissearch cluster, and the main purpose is to uniformly manage index names and play a decoupling role.
2) Elasticissearch cluster index management module
This module is mainly responsible for the management of the Elasticsearch cluster index. The module can call an information and index adapter module, and generate an index in the Elasticissearch cluster in advance, wherein the index name is as follows: the time of acquisition _ information class (acquisition time is accurate to days), for example, if the information class is weibo, the acquisition time is 20201219, the name of the index is 20201219_ weibo, if the information class is weixin, the name of the index is 20201219_ weixin, and so on. Meanwhile, the module can delete the index with earlier time regularly according to the actual service requirement so as to achieve the purpose of managing mass data.
3) Information warehousing management module
The module is mainly responsible for storing information into the elastic search cluster, and when the information processing system receives data pushed by the acquisition system, the information and index adapter module is called to adapt the information and the index, and then the data is stored into the corresponding index.
c. Information retrieval system (short for retrieval system)
After the system or the module finishes classifying and storing the information, the retrieval system provides a standard interface for the outside to serve each business system, the retrieval system calls the information and index adapter module according to the specific key words to be retrieved, the information classification and the time range, a specific retrieval statement is generated according to the query grammar of the Elasticissearch, and then the client of the Elasticissearch cluster is used for retrieval.
d. Information display system (business system for short)
The service system is a user-oriented system, which mainly provides some convenient interactive operations for users, the users can input search keywords, select information classification, time range or other search conditions, the service system sends a search request to the search system for information search, and finally the information is displayed to the users after keyword red marking and key attribute extraction are carried out in the service system.
The embodiment also provides a data retrieval system based on the elastic search, which is used for implementing the above embodiments and preferred embodiments, and the description of the system already made is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the system described in the embodiments below is preferably implemented in software, implementations in hardware, or a combination of software and hardware are also possible and contemplated.
The embodiment also provides an Elasticsearch-based data retrieval system, as shown in fig. 3, including:
the system comprises a first acquisition module 1, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring internet information which comprises a plurality of internet data;
the first processing module 2 is used for classifying and acquiring information corresponding to each piece of internet data obtained by the internet information through an acquisition system, wherein the information classification is used for representing the source position of the internet data;
the second processing module 3 is used for performing adaptation through the information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster;
the third processing module 4 is configured to store the internet data into an index corresponding to the index name in the Elasticsearch cluster according to the index name;
the second obtaining module 5 is used for obtaining information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range;
the fourth processing module 6 is configured to generate a retrieval statement for the information to be retrieved according to the query syntax of the Elasticsearch;
the fifth processing module 7 is configured to search in the information index adaptation module according to the search statement to obtain an index search range corresponding to the search statement;
and the sixth processing module 8 is configured to perform retrieval in the Elasticsearch cluster according to the index retrieval range to obtain a retrieval result.
Optionally, the method further comprises: and the seventh processing module is used for displaying the result of the retrieval result.
Optionally, the seventh processing module includes: the first acquisition unit is used for acquiring the display requirement information; the first processing unit is used for identifying the retrieval result according to the display requirement information to obtain the identified retrieval result; and displaying the identified retrieval result.
Optionally, the display requirement information includes a keyword color and preset attribute extraction information.
Optionally, the method further comprises: the eighth processing module is used for determining the index deletion time according to the service requirement; and the ninth processing module is used for deleting the indexes with the index time ahead according to the index deletion time and the preset deletion period.
Optionally, the method further comprises: a tenth processing module, configured to establish an index in the Elasticsearch cluster in advance; and the eleventh processing module is used for mapping the indexes with the information classification and acquisition time one by one.
The Elasticsearch based data retrieval system in this embodiment is presented in the form of functional units, where a unit refers to an ASIC circuit, a processor and a memory executing one or more software or fixed programs, and/or other devices that can provide the above-described functionality.
Further functional descriptions of the modules are the same as those of the corresponding embodiments, and are not repeated herein.
An embodiment of the present invention further provides an electronic device, as shown in fig. 4, the electronic device includes one or more processors 71 and a memory 72, where one processor 71 is taken as an example in fig. 4.
The controller may further include: an input device 73 and an output device 74.
The processor 71, the memory 72, the input device 73 and the output device 74 may be connected by a bus or other means, as exemplified by the bus connection in fig. 4.
The processor 71 may be a Central Processing Unit (CPU). The Processor 71 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof. A general purpose processor may be a microprocessor or any conventional processor or the like.
The memory 72 is a non-transitory computer readable storage medium, and can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the Elasticsearch-based data retrieval method in the embodiment of the present application. The processor 71 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 72, namely, implements the Elasticsearch-based data retrieval method of the above-described method embodiment.
The memory 72 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of a processing device operated by the server, and the like. Further, the memory 72 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 72 may optionally include memory located remotely from the processor 71, which may be connected to a network connection device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 73 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the processing device of the server. The output device 74 may include a display device such as a display screen.
One or more modules are stored in the memory 72, which when executed by the one or more processors 71 perform the method shown in FIG. 1.
It will be understood by those skilled in the art that all or part of the processes in the method according to the above embodiments may be implemented by instructing relevant hardware through a computer program, and the executed program may be stored in a computer-readable storage medium, and when executed, may include the processes according to the embodiments of the data retrieval method based on the Elasticsearch. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (9)

1. A data retrieval method based on an elastic search is characterized by comprising the following steps:
acquiring internet information, wherein the internet information comprises a plurality of internet data;
the method comprises the steps that information classification and collection time corresponding to each piece of internet data are obtained through internet information through a collection system, and the information classification is used for representing the source position of the internet data;
adapting through an information index adaptation module according to the information classification and acquisition time, and determining an index name corresponding to each internet data in the Elasticissearch cluster;
storing the Internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name;
acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range;
generating a retrieval statement for the information to be retrieved according to the query syntax of the Elasticissearch;
searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement;
and searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
2. The method for retrieving data based on the elastic search according to claim 1, wherein after the step of retrieving in the elastic search cluster according to the index retrieval range and obtaining the retrieval result, the method further comprises:
and displaying the retrieval result.
3. The data retrieval method based on the Elasticsearch of claim 2, wherein the step of displaying the result of the retrieval comprises:
acquiring display requirement information;
identifying the retrieval result according to the display demand information to obtain the identified retrieval result;
and displaying the identified retrieval result.
4. The elastic search based data retrieval method according to claim 1,
the display requirement information comprises keyword colors and preset attribute extraction information.
5. The Elasticissearch-based data retrieval method as claimed in any one of claims 1-4, wherein after the step of storing the Internet data into the index corresponding to the index name in the Elasticissearch cluster according to the index name, the method further comprises:
determining index deletion time according to service requirements;
and deleting the index with the earlier index time according to the preset deletion period according to the index deletion time.
6. The Elasticsearch-based data retrieval method as recited in any of claims 1-4, wherein before the step of adapting by the information index adaptation module according to information classification and collection time, further comprising:
establishing an index in an Elasticissearch cluster in advance;
and mapping the indexes with the information classification and acquisition time one by one.
7. An Elasticsearch-based data retrieval system, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring internet information which comprises a plurality of internet data;
the first processing module is used for classifying and acquiring information corresponding to each piece of internet data obtained by the internet information through the acquisition system, wherein the information classification is used for representing the source position of the internet data;
the second processing module is used for carrying out adaptation through the information index adaptation module according to the information classification and acquisition time and determining the index name corresponding to each piece of internet data in the Elasticissearch cluster;
the third processing module is used for storing the internet data into an index corresponding to the index name in the Elasticissearch cluster according to the index name;
the second acquisition module is used for acquiring information to be retrieved, wherein the information to be retrieved comprises retrieval keywords, information classification of the retrieval information and a retrieval time range;
the fourth processing module is used for generating a retrieval statement for the information to be retrieved according to the query grammar of the Elasticissearch;
the fifth processing module is used for searching in the information index adaptation module according to the retrieval statement to obtain an index retrieval range corresponding to the retrieval statement;
and the sixth processing module is used for searching in the Elasticissearch cluster according to the index searching range to obtain a searching result.
8. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to cause the at least one processor to perform the method of Elasticsearch based data retrieval as claimed in any of claims 1-6.
9. A computer-readable storage medium storing computer instructions for causing a computer to execute the method for Elasticsearch-based data retrieval according to any of claims 1-6.
CN202110336591.XA 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium Pending CN113051460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110336591.XA CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110336591.XA CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN113051460A true CN113051460A (en) 2021-06-29

Family

ID=76516243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110336591.XA Pending CN113051460A (en) 2021-03-29 2021-03-29 Elasticissearch-based data retrieval method and system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN113051460A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486138A (en) * 2021-07-20 2021-10-08 北京明略软件***有限公司 Elasticissearch-based retrieval method, system and computer-readable storage medium
CN114090505A (en) * 2021-11-23 2022-02-25 成都深思科技有限公司 Intelligent resource scheduling and efficient concurrent data classification method
CN116401259A (en) * 2023-06-08 2023-07-07 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960037A (en) * 2017-03-22 2017-07-18 河海大学 A kind of distributed index the resources integration and share method across intranet and extranet
CN110222054A (en) * 2019-05-22 2019-09-10 福建大屏网络科技有限公司 A kind of method, apparatus, terminal device and storage medium improving retrieval rate
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN111563095A (en) * 2020-04-30 2020-08-21 上海新炬网络信息技术股份有限公司 Data retrieval device based on HBase

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960037A (en) * 2017-03-22 2017-07-18 河海大学 A kind of distributed index the resources integration and share method across intranet and extranet
CN110222054A (en) * 2019-05-22 2019-09-10 福建大屏网络科技有限公司 A kind of method, apparatus, terminal device and storage medium improving retrieval rate
CN111026710A (en) * 2019-12-11 2020-04-17 华南师范大学 Data set retrieval method and system
CN111339244A (en) * 2020-02-29 2020-06-26 山东浪潮通软信息科技有限公司 Tax policy and regulation inquiry method, computer equipment and storage medium
CN111563095A (en) * 2020-04-30 2020-08-21 上海新炬网络信息技术股份有限公司 Data retrieval device based on HBase

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486138A (en) * 2021-07-20 2021-10-08 北京明略软件***有限公司 Elasticissearch-based retrieval method, system and computer-readable storage medium
CN114090505A (en) * 2021-11-23 2022-02-25 成都深思科技有限公司 Intelligent resource scheduling and efficient concurrent data classification method
CN116401259A (en) * 2023-06-08 2023-07-07 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database
CN116401259B (en) * 2023-06-08 2023-08-22 北京江融信科技有限公司 Automatic pre-creation index method and system for elastic search database

Similar Documents

Publication Publication Date Title
US11281793B2 (en) User permission data query method and apparatus, electronic device and medium
CN110362544B (en) Log processing system, log processing method, terminal and storage medium
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
WO2017166644A1 (en) Data acquisition method and system
CN108509437B (en) ElasticSearch query acceleration method
US20150341771A1 (en) Hotspot aggregation method and device
WO2015096609A1 (en) Method and system for creating inverted index file of video resource
TW201437832A (en) Information recommendation method and device thereof and information resource recommendation system
US20150213066A1 (en) System and method for creating data models from complex raw log files
WO2017174013A1 (en) Data storage management method and apparatus, and data storage system
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
US20150113007A1 (en) Dynamic query response with metadata
CN113010476B (en) Metadata searching method, device, equipment and computer readable storage medium
US20190377815A1 (en) Storing data items and identifying stored data items
US10491606B2 (en) Method and apparatus for providing website authentication data for search engine
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
CN112307318B (en) Content publishing method, system and device
CN112559913B (en) Data processing method, device, computing equipment and readable storage medium
CN110955855A (en) Information interception method, device and terminal
CN110515979B (en) Data query method, device, equipment and storage medium
CN116521729A (en) Information classification searching method and device based on elastic search
CN111061719B (en) Data collection method, device, equipment and storage medium
CN111680072B (en) System and method for dividing social information data
CN113377771A (en) Data exploration method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination