CN105260399A - Method for acquiring and retrieving distributed log - Google Patents

Method for acquiring and retrieving distributed log Download PDF

Info

Publication number
CN105260399A
CN105260399A CN201510593536.3A CN201510593536A CN105260399A CN 105260399 A CN105260399 A CN 105260399A CN 201510593536 A CN201510593536 A CN 201510593536A CN 105260399 A CN105260399 A CN 105260399A
Authority
CN
China
Prior art keywords
log
daily record
index
data
collection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510593536.3A
Other languages
Chinese (zh)
Inventor
杨剑
张磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Following International Information Ltd Co
Original Assignee
Xi'an Following International Information Ltd Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Following International Information Ltd Co filed Critical Xi'an Following International Information Ltd Co
Priority to CN201510593536.3A priority Critical patent/CN105260399A/en
Publication of CN105260399A publication Critical patent/CN105260399A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method for acquiring and retrieving distributed logs. The method comprises the following steps: a log acquiring network acquiring log messages, through a universal interface, storing the log messages from different sources in a storage system, through the universal interface, storing the log messages in different types from different sources in log files in unified character codes; a log search system segmenting the log files which are stored in the storage system, and performing index operation on the segmented log messages, according to key attributes in the logs, the log search system extracting data from unstructured log data, and reorganizing the data to form a log message index; and using the formed log message index to retrieve logs. The method makes logs become a systematic structure, and log messages in different types form a log acquiring network which can acquire different protocols, so as to realize an objective of a distributed, high-performance, real-time, and extensible log search system of a cloud calculation data center.

Description

A kind of collection of distributed information log and search method
Technical field
The invention belongs to computer information technology field, relate to a kind of collection and search method of distributed information log.
Background technology
A large amount of physical equipments, operation system is deployed in cloud computing environment, simultaneously, also O&M is deployed, the platform monitoring of safety management etc., management system, log information becomes the important means of platform monitoring, management, magnanimity, need to be used by multiple system from the log information of distinct device, system, simultaneously in platform once equipment goes wrong, log information just becomes investigation fault, trouble-saving important means.
Because in cloud computation data center, the data of physical equipment, operation system are a lot, distinct device, system use different agreement to produce the log information of self, traditional data processing mode is unable to do what one wishes for a large amount of log data acquisitions, process, retrieval, meanwhile, log information unstructured data characteristic inherently also improves the difficulty of log information collection, process, retrieval.
Summary of the invention
The object of this invention is to provide a kind of collection and search method of distributed information log, solve the problem that traditional data processing mode is difficult to gather a large amount of daily record datas, process and retrieve.
Technical scheme of the present invention is, a kind of collection of distributed information log and search method are specifically implemented according to following steps:
Step 1, log collection:
Log collection network collection log information, is stored the log information of separate sources within the storage system by general-purpose interface, by general-purpose interface, log information that is dissimilar and separate sources is preserved the journal file becoming Unicode.
Step 2, log processing:
Blog search system carries out cutting to the journal file be saved in storage system, index operation is carried out to the log information after cutting, according to the determinant attribute in daily record, blog search system extracts data from destructuring daily record data, and reorganization becomes log information index, the structure of log information index comprises " daily record index " " daily record index segment " " daily record index file " " daily record index territory " and " daily record index entry " five different levels;
Step 3, log searching:
The log information index formed through step 2 is carried out log searching, completes the collection to distributed information log and retrieval.
Feature of the present invention is also,
In step 1, general-purpose interface comprises daily record and generates interface, daily record choreography interface and log transmission interface.
Determinant attribute comprises daily record generation time, Log Types, daily record key word, log content and daily record rank; Destructuring daily record data comprises video data, voice data, image data, view data, document data and text data.
Log information index comprises " daily record index file " sequence, and one " daily record index file " is " daily record index territory " sequence, and one " daily record index territory " is the name sequence of " daily record index entry ".
In step 2, blog search system also carries out caching process to log information index.
In step 3, the method for log searching comprises search daily record index and cluster retrieval.
Search daily record index, is the log information index utilizing step 2, obtains the document chained list of each key word of the inquiry, carry out the filtration of document public content, document difference information filtering, document content merging treatment, obtain result document to document chained list.
Cluster is retrieved, and is to use fragmentation schema cutting data, and by Data distribution8 in whole cluster, each burst is a complete index, then merges index and is polymerized all burst search inquiries.
The invention has the beneficial effects as follows, a kind of collection of distributed information log and search method, by the log information of separate sources is stored within the storage system, to go forward side by side line index process, daily record is made to become the structure of a system, dissimilar log information can be made to be formed log collection network that one can gather different agreement, thus realize that cloud computation data center is distributed, high-performance, in real time, the object of easily extensible blog search system.
Embodiment
Below in conjunction with embodiment, the present invention is described in detail.
Technical scheme of the present invention is, a kind of collection of distributed information log and search method are specifically implemented according to following steps:
Step 1, log collection:
Log collection network collection log information, generated the general-purpose interfaces such as interface, daily record choreography interface and log transmission interface by daily record the log information of separate sources is stored within the storage system, by general-purpose interface, log information that is dissimilar and separate sources is preserved the journal file becoming Unicode.
Step 2, log processing:
Blog search system carries out cutting to the journal file be saved in storage system, index operation is carried out to the log information after cutting, according to the determinant attribute in daily record, blog search system extracts data from destructuring daily record data, and reorganization becomes log information index.Wherein, determinant attribute comprises the information such as daily record generation time, Log Types, daily record key word, log content, daily record rank; Destructuring daily record data comprises video data, voice data, image data, view data, document data and text data; The structure of log information index comprises " daily record index " " daily record index segment " " daily record index file " " daily record index territory " and " daily record index entry " five different levels; Log information index comprises " daily record index file " sequence, and one " daily record index file " is " daily record index territory " sequence, and one " daily record index territory " is the name sequence of " daily record index entry "; Blog search system also carries out caching process to log information index.
Step 3, log searching:
The log information index formed through step 2 is carried out log searching, completes the collection to distributed information log and retrieval.Log searching comprises search daily record index and cluster retrieval, wherein, search daily record index, it is the log information index utilizing step 2, obtain the document chained list of each key word of the inquiry, the filtration of document public content, document difference information filtering, document content merging treatment are carried out to document chained list, obtains result document.Cluster is retrieved, and is to use fragmentation schema cutting data, and by Data distribution8 in whole cluster, each burst is a complete index, then merges index and is polymerized all burst search inquiries.

Claims (8)

1. the collection of distributed information log and a search method, is characterized in that, specifically implement according to following steps:
Step 1, log collection:
Log collection network collection log information, is stored the log information of separate sources within the storage system by general-purpose interface, by general-purpose interface, log information that is dissimilar and separate sources is preserved the journal file becoming Unicode;
Step 2, log processing:
The journal file be saved in through step 2 in storage system is carried out cutting by blog search system, index operation is carried out to the log information after cutting, according to the determinant attribute in daily record, blog search system extracts data from destructuring daily record data, and reorganization becomes log information index, the structure of log information index comprises " daily record index " " daily record index segment " " daily record index file " " daily record index territory " and " daily record index entry " five different levels;
Step 3, log searching:
The log information index formed through step 2 is carried out log searching, completes the collection to distributed information log and retrieval.
2. the collection of a kind of distributed information log according to claim 1 and search method, is characterized in that, in described step 1, general-purpose interface comprises daily record and generates interface, daily record choreography interface and log transmission interface.
3. the collection of a kind of distributed information log according to claim 1 and search method, it is characterized in that, in described step 2, determinant attribute comprises daily record generation time, Log Types, daily record key word, log content and daily record rank; Destructuring daily record data comprises video data, voice data, image data, view data, document data and text data.
4. the collection of a kind of distributed information log according to claim 1 and search method, it is characterized in that, described log information index comprises " daily record index file " sequence, one " daily record index file " is " daily record index territory " sequence, and one " daily record index territory " is the name sequence of " daily record index entry ".
5. the collection of a kind of distributed information log according to claim 1 and search method, it is characterized in that, in step 2, blog search system also carries out caching process to log information index.
6. the collection of a kind of distributed information log according to any one of claim 1-5 and search method, is characterized in that, in described step 3, the method for log searching comprises search daily record index and cluster retrieval.
7. the collection of a kind of distributed information log according to claim 6 and search method, it is characterized in that, described search daily record index, it is the log information index utilizing step 2, obtain the document chained list of each key word of the inquiry, the filtration of document public content, document difference information filtering, document content merging treatment are carried out to document chained list, obtains result document.
8. the collection of a kind of distributed information log according to claim 6 and search method, it is characterized in that, described cluster retrieval, use fragmentation schema cutting data, by Data distribution8 in whole cluster, each burst is a complete index, then merges index and is polymerized all burst search inquiries.
CN201510593536.3A 2015-09-17 2015-09-17 Method for acquiring and retrieving distributed log Pending CN105260399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510593536.3A CN105260399A (en) 2015-09-17 2015-09-17 Method for acquiring and retrieving distributed log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510593536.3A CN105260399A (en) 2015-09-17 2015-09-17 Method for acquiring and retrieving distributed log

Publications (1)

Publication Number Publication Date
CN105260399A true CN105260399A (en) 2016-01-20

Family

ID=55100091

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510593536.3A Pending CN105260399A (en) 2015-09-17 2015-09-17 Method for acquiring and retrieving distributed log

Country Status (1)

Country Link
CN (1) CN105260399A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979273A (en) * 2016-05-06 2016-09-28 苏州清云网络科技有限公司 Cloud monitor and cloud operation of intelligent commercial TVs based on big data and cloud computation
CN106055697A (en) * 2016-06-15 2016-10-26 安徽天枢信息科技有限公司 Unstructured event log data classification and storage method and device
CN106227727A (en) * 2016-06-30 2016-12-14 乐视控股(北京)有限公司 Daily record update method, device and the system of a kind of distributed system
CN106294723A (en) * 2016-08-10 2017-01-04 成都广达新网科技股份有限公司 A kind of apply in the quick inspection method of system journal of embedded device and system
CN106326370A (en) * 2016-08-12 2017-01-11 德基网络科技南京有限公司 Big data counting method based on electronic business platform
CN106992886A (en) * 2017-04-05 2017-07-28 国家电网公司 A kind of log analysis method and device based on distributed storage
CN107291872A (en) * 2017-06-16 2017-10-24 郑州云海信息技术有限公司 A kind of platform blog search rendering method based on mass data
CN107741956A (en) * 2017-09-18 2018-02-27 杭州安恒信息技术有限公司 A kind of blog search method based on web container configuration file
CN110362549A (en) * 2019-06-17 2019-10-22 平安普惠企业管理有限公司 Log memory search method, electronic device and computer equipment
CN110362547A (en) * 2018-04-02 2019-10-22 阿里巴巴集团控股有限公司 Coding, parsing, storage method and the device of journal file
CN111061721A (en) * 2018-10-16 2020-04-24 成都鼎桥通信技术有限公司 Data processing method and device
CN111177360A (en) * 2019-12-16 2020-05-19 中国电子科技网络信息安全有限公司 Self-adaptive filtering method and device based on user logs on cloud
CN116701336A (en) * 2023-05-19 2023-09-05 国网物资有限公司 Power data log processing method, electronic device and computer readable medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163265A (en) * 2007-11-20 2008-04-16 中兴通讯股份有限公司 Distributed database based on multimedia message log inquiring method and system
CN104301360A (en) * 2013-07-19 2015-01-21 阿里巴巴集团控股有限公司 Method, log server and system for recording log data
CN104778188A (en) * 2014-02-24 2015-07-15 贵州电网公司信息通信分公司 Distributed device log collection method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101163265A (en) * 2007-11-20 2008-04-16 中兴通讯股份有限公司 Distributed database based on multimedia message log inquiring method and system
CN104301360A (en) * 2013-07-19 2015-01-21 阿里巴巴集团控股有限公司 Method, log server and system for recording log data
CN104778188A (en) * 2014-02-24 2015-07-15 贵州电网公司信息通信分公司 Distributed device log collection method

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105979273B (en) * 2016-05-06 2021-04-02 苏州清云网络科技有限公司 Cloud monitoring and cloud operation and maintenance of intelligent commercial television based on big data and cloud computing
CN105979273A (en) * 2016-05-06 2016-09-28 苏州清云网络科技有限公司 Cloud monitor and cloud operation of intelligent commercial TVs based on big data and cloud computation
CN106055697A (en) * 2016-06-15 2016-10-26 安徽天枢信息科技有限公司 Unstructured event log data classification and storage method and device
CN106227727A (en) * 2016-06-30 2016-12-14 乐视控股(北京)有限公司 Daily record update method, device and the system of a kind of distributed system
CN106294723A (en) * 2016-08-10 2017-01-04 成都广达新网科技股份有限公司 A kind of apply in the quick inspection method of system journal of embedded device and system
CN106326370A (en) * 2016-08-12 2017-01-11 德基网络科技南京有限公司 Big data counting method based on electronic business platform
CN106992886A (en) * 2017-04-05 2017-07-28 国家电网公司 A kind of log analysis method and device based on distributed storage
CN107291872A (en) * 2017-06-16 2017-10-24 郑州云海信息技术有限公司 A kind of platform blog search rendering method based on mass data
CN107741956A (en) * 2017-09-18 2018-02-27 杭州安恒信息技术有限公司 A kind of blog search method based on web container configuration file
CN110362547A (en) * 2018-04-02 2019-10-22 阿里巴巴集团控股有限公司 Coding, parsing, storage method and the device of journal file
CN110362547B (en) * 2018-04-02 2023-10-03 杭州阿里巴巴智融数字技术有限公司 Method and device for encoding, analyzing and storing log file
CN111061721A (en) * 2018-10-16 2020-04-24 成都鼎桥通信技术有限公司 Data processing method and device
CN110362549A (en) * 2019-06-17 2019-10-22 平安普惠企业管理有限公司 Log memory search method, electronic device and computer equipment
CN111177360A (en) * 2019-12-16 2020-05-19 中国电子科技网络信息安全有限公司 Self-adaptive filtering method and device based on user logs on cloud
CN111177360B (en) * 2019-12-16 2022-04-22 中国电子科技网络信息安全有限公司 Self-adaptive filtering method and device based on user logs on cloud
CN116701336A (en) * 2023-05-19 2023-09-05 国网物资有限公司 Power data log processing method, electronic device and computer readable medium

Similar Documents

Publication Publication Date Title
CN105260399A (en) Method for acquiring and retrieving distributed log
CN109299102B (en) HBase secondary index system and method based on Elastcissearch
KR102079752B1 (en) Natural language search results for intent queries
CN104182405B (en) Method and device for connection query
CN108268600B (en) AI-based unstructured data management method and device
EP2977916A1 (en) Search suggestion method and apparatus for map search, and computer storage medium and device
DE202015009777U1 (en) Transparent discovery of a semi-structured data scheme
CN104469832B (en) Mobile communications network accident analysis locating assist system
CN107092639A (en) A kind of search engine system
CN112084249B (en) Access record extraction method and device
CN111221791A (en) Method for importing multi-source heterogeneous data into data lake
CN104572976B (en) Website data update method and system
US20130191328A1 (en) Standardized framework for reporting archived legacy system data
JP2016525250A (en) Bilingual corpus data expansion method, apparatus and storage medium
US10289739B1 (en) System to recommend content based on trending social media topics
US10250550B2 (en) Social message monitoring method and apparatus
CN110570928A (en) HBase and ozone based medical image file access method
CN103034650B (en) A kind of data handling system and method
CN103366008A (en) Resource searching method and device
CN111221785A (en) Semantic data lake construction method of multi-source heterogeneous data
Vianna et al. A tool for personal data extraction
CN108650546A (en) Barrage processing method, computer readable storage medium and electronic equipment
JP6752547B2 (en) Database management method and database management system
CN107679097B (en) Distributed data processing method, system and storage medium
CN105245394A (en) Method and equipment for analyzing network access log based on layered approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160120

RJ01 Rejection of invention patent application after publication