WO2022253131A1 - 数据解析方法、装置、计算机设备和存储介质 - Google Patents

数据解析方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2022253131A1
WO2022253131A1 PCT/CN2022/095555 CN2022095555W WO2022253131A1 WO 2022253131 A1 WO2022253131 A1 WO 2022253131A1 CN 2022095555 W CN2022095555 W CN 2022095555W WO 2022253131 A1 WO2022253131 A1 WO 2022253131A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
current business
processed
data
current
Prior art date
Application number
PCT/CN2022/095555
Other languages
English (en)
French (fr)
Inventor
刘海彬
谢永恒
万月亮
Original Assignee
北京锐安科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京锐安科技有限公司 filed Critical 北京锐安科技有限公司
Publication of WO2022253131A1 publication Critical patent/WO2022253131A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files

Definitions

  • the embodiments of the present application relate to data processing technologies, for example, to a data parsing method, device, computer equipment, and storage medium.
  • Embodiments of the present application provide a data analysis method, device, computer equipment, and storage medium, so as to quickly analyze, merge, and store batch data.
  • the embodiment of the present application provides a data parsing method, the method comprising:
  • Analyze the file to be processed under the thread corresponding to the file to be processed generate a current business object for each row of data in the file to be processed, and save a plurality of current business objects corresponding to the file to be processed into the object collection; wherein, the file to be processed
  • the file contains multiple lines of data;
  • the index object includes the key fields of the current business object that need to be merged;
  • multiple current business objects are merged with historical business objects in the business database, and multiple current business objects are stored.
  • the embodiment of the present application also provides a data parsing device, which includes:
  • the current business object generation module is set to analyze the file to be processed under the thread corresponding to the file to be processed, generate a current business object for each line of data in the file to be processed, and save multiple current business objects corresponding to the file to be processed to the object collection In; wherein, the file to be processed includes multiple rows of data;
  • the current business object merging module is set to adopt a multi-threaded method to merge multiple current business objects in the object collection according to the index object; wherein, the index object includes key fields of the current business object that need to be merged;
  • the historical business object merging module is configured to adopt a multi-threaded manner to merge multiple current business objects with historical business objects in the business database, and store the multiple current business objects.
  • the embodiment of the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor.
  • the processor executes the program, it implements the The data parsing method described in any one of the embodiments.
  • the embodiments of the present application also provide a storage medium containing computer-executable instructions, which are configured to execute the data as described in any one of the embodiments of the present application when executed by a computer processor. parsing method.
  • Fig. 1 is a flow chart of a data analysis method in an embodiment of the present application
  • Fig. 2 is a flow chart of a data parsing method in another embodiment of the present application.
  • Fig. 3 is a schematic structural diagram of a data parsing device in an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a computer device in an embodiment of the present application.
  • Fig. 1 is a flow chart of a data parsing method provided by an embodiment of the present application. This embodiment is applicable to the situation of extracting associated business data from batch data, and the method can be executed by a data parsing device. It can be implemented by software and/or hardware, and is generally integrated in a computer device.
  • the file to be processed is a file requiring data analysis
  • the file to be processed corresponds to a thread one by one, and a thread is created for each file to be processed, which can improve the efficiency of data analysis of the file to be processed.
  • parsing the file to be processed under the thread corresponding to the file to be processed it also includes: obtaining the compressed file of the current file, decompressing the compressed file of the current file, and using the file generated after decompression as the file to be processed; After storing the current business objects, it also includes: storing the current compressed file into the log database.
  • the obtained source file is in the form of a compressed package, and after decompressing the obtained compressed package of the current file, the decompressed file is analyzed as a file to be processed. After the files to be processed are processed, the current file compression package is stored in the log database for backup, so as to meet the needs of other subsequent business processing.
  • the current business object is an object generated by parsing each line of data in the file to be processed according to the log data format.
  • the file to be processed can correspond to multiple current business objects.
  • the object collection is used to store multiple current business objects generated after each thread parses each file to be processed.
  • the processing file parses the data by row, and generates the current business object by parsing each row of data according to the log data format, and each thread saves the generated current business object into the object collection.
  • all current business objects are saved in the object collection, and the number of interactions with the database can be reduced by merging the mergeable business objects in the object collection, thereby improving the capability and speed of data processing.
  • the index object can be generated in advance. After obtaining the object collection, determine the key fields required for merging the current business objects in the object collection that need to be merged, and save the key fields as field values in the index object. By indexing objects, you can determine the current business objects with which key fields are associated. There may be one or more key fields, which is not limited in this embodiment.
  • multiple threads are used to analyze multiple files to be processed respectively to obtain the current business object, and threads correspond to files to be processed one by one. After the multiple threads respectively save the current business object in the object collection, multiple threads are also used to merge the current business objects in the object collection.
  • the number of threads for merging the current business objects is not limited. Configure according to the data processing capability of the computer.
  • the java multi-thread mode contdownlatch may be used to process the current business objects in the object collection in parallel.
  • Contdownlatch is a synchronization tool class, which allows one or more threads to wait until other threads finish executing, but this embodiment does not limit the synchronization tool used.
  • an ES Elasticsearch, elastic search
  • ES elasticsearch, elastic search
  • the business database before storing multiple current business objects in the business database, first determine whether there is a historical business object that matches the current business object stored in the business database.
  • the historical business object corresponds to The field value of the historical business object is matched with the field value of the current business object matching the historical business object, and the field value of the historical business object compared with the current business object is saved in the current business object.
  • This setting can ensure that the corresponding business information in the current business object is not only the latest updated information, but also can ensure the comprehensiveness of the business information.
  • this embodiment does not limit the number of threads for merging historical business objects and current business objects, and it can be configured according to the data processing capability of the computer.
  • the process of parsing the file to be processed to generate the current business object, merging multiple current business objects in the object collection, and merging the historical business object and the current business object are all carried out in a multi-threaded manner. It can improve the efficiency of data processing and prevent data accumulation and blocking when data is parsed, merged and stored in a massive data scenario.
  • the file to be processed is parsed under the thread corresponding to the file to be processed, a current business object is generated for each row of data in the file to be processed, and the business object is saved to the object
  • the current business object in the object collection is merged according to the index object through multi-threading, and the current business object is merged with the historical business object in the business database through multi-threading, and the current business object and the index object are saved in business database.
  • Fig. 2 is a flow chart of a data parsing method provided by another embodiment of the present application.
  • the embodiment of the present application generates the current business object according to each row of data in the file to be processed, according to the index
  • the process of merging current business objects for objects, and the process of merging current business objects with historical business objects are further specified.
  • Satisfying the execution condition for obtaining compressed packages at regular intervals may be to obtain a batch of compressed packages at preset time intervals, and this embodiment does not limit the specific time intervals.
  • the generated files to be processed are processed according to the processing method of S110-S130, and the historical file compression packages are decompressed. If the compressed file is stored in the log database, it is determined that the condition for completing the processing of the historical file compressed package is satisfied.
  • the obtained data is in the form of a file compression package
  • the obtained current file compression package is decompressed
  • the decompressed file is processed as a file to be processed.
  • a thread is created for each file to be processed, and the target thread implements parsing of the file to be processed, generation and preservation of the current business object.
  • the target thread parses each line of data in the file to be processed according to the log data format, obtains the field value, and generates the current business object according to the field value corresponding to each line of data.
  • the target thread generates a current business object for each row of data in the file to be processed, and saves each current business object in the object collection.
  • the object collection stores the current business objects generated by different threads after parsing the corresponding files to be processed.
  • the key field is the merging basis of the current business objects that need to be merged. According to the key field, each current business object associated with the key field can be obtained in the object collection.
  • the primary key information is the unique identifier of the current business object, and the primary key information can be used to indicate the sort order of the current business object, that is, to indicate the sort order of the row data corresponding to the current business object in the file to be processed.
  • the primary key information can uniquely identify the current business object
  • the primary key information of each current business object associated with the key field and the key field are correspondingly stored in the index object. According to the index object, the complete data of each associated current business object can be associated and queried, which improves the efficiency of data search.
  • S270 Correspondingly store the primary key information of the current business object and the key field.
  • Differential field values refer to field values that are included in the historical business object but not in the current business object.
  • the primary key information of the current business object it can be determined whether there is a historical business object matching the current business object in the business database. If there is a historical business object, compare the field value of the historical business object with the current business object. Store the difference field value that exists in the historical business object, but the field value corresponding to the current business object is empty, into the current business object. The field value of the object shall prevail. This setting can make the field values contained in the current business object updated and more comprehensive.
  • the difference field value is stored in the current business object, and the field value matched with the historical business object is empty, which can ensure the comprehensiveness and accuracy of the data in the current business object.
  • the data of the current business object is more complete after being merged with the historical business object.
  • the current business object related to the key field can be quickly obtained, which improves the data value of the file to be processed, and provides convenience for subsequent business data analysis.
  • the current file compression package is stored in the log database, which is convenient for data backtracking, and can also be used for other subsequent business needs.
  • the current compressed package is obtained for decompression, and the files copied after decompression are used as files to be processed, corresponding to each file to be processed
  • the target thread corresponding to the file to be processed generates a current business object for each row of data in the file to be processed, saves the business object in the object collection, and then obtains the data in the object collection according to the key fields in the index object through multiple threads
  • the primary key information of the current business object that matches the key field, stores the primary key information and the key field correspondingly, and determines the matching historical business object in the business database according to the primary key information of the current business object through multiple threads, and merges it with the current business object , save the current business object and index object to the business database. It improves the low data processing efficiency in the massive data scenario in related technologies, which is easy to cause data accumulation and blockage, and cannot meet the needs of business data analysis, and realizes the rapid parsing, merging and
  • FIG. 3 is a schematic structural diagram of a data parsing device in an embodiment of the present application.
  • the device includes: a current business object generating module 310 , a current business object merging module 320 and a historical business object merging module 330 . in:
  • the current business object generation module 310 is configured to analyze the file to be processed under the thread corresponding to the file to be processed, generate a current business object for each line of data in the file to be processed, and save a plurality of current business objects corresponding to the file to be processed to the object In the collection; wherein, the file to be processed includes multiple rows of data;
  • the current business object merging module 320 is configured to adopt a multi-threaded manner to merge a plurality of current business objects in the object collection according to the index object; wherein, the index object includes key fields of the current business objects that need to be merged;
  • the historical business object merging module 330 is configured to merge multiple current business objects with historical business objects in the business database in a multi-threaded manner, and store the multiple current business objects.
  • the file to be processed is parsed under the thread corresponding to the file to be processed, a current business object is generated for each row of data in the file to be processed, and the business object is saved to the object
  • multiple current business objects in the object collection are merged according to the index object through multi-threading, and the current business object is merged with the historical business object in the business database through multi-threading, and the current business object and the index object Save it to the business database.
  • the device further includes:
  • the current file compression package processing module is configured to obtain the current file compression package, decompress the current file compression package, and use the files generated after decompression as files to be processed;
  • the current file compression package storage module is set to store the current file compression package in the log database.
  • the current file compression package processing module includes:
  • the condition judging unit is configured to obtain the current compressed file package in response to determining that the compressed package timing acquisition execution condition is met and the historical file compressed package processing completion condition is satisfied.
  • the current business object generation module 310 includes:
  • the current business object generation unit is configured to analyze each line of data in the file to be processed through the target thread, obtain the field value corresponding to each line of data, and generate the current business object corresponding to each line of data according to the field value;
  • the current business object saving unit is configured to save a plurality of current business objects corresponding to the file to be processed into the object collection through the target thread.
  • the current business object merging module 320 includes:
  • the primary key information acquisition unit is configured to adopt a multi-threaded manner, according to the key field in the index object, obtain the current business object matching the key field in the object set, and obtain the primary key information of the current business object;
  • the primary key information storage unit is configured to store the primary key information and key fields of the current business object correspondingly.
  • the historical business object merging module 330 includes:
  • the difference field value acquisition unit is set to adopt multi-threading mode, according to the primary key information of the current business object, obtain the historical business object matching the current business object in the business database, and obtain the difference field of the historical business object compared with the current business object value;
  • Difference field value storage unit set to store the difference field value in the current business object that matches the historical business object.
  • the historical business object merging module 330 includes:
  • the object storage unit is configured to store multiple current business objects and index objects that have been merged with historical business objects into the business database.
  • the data parsing device provided in the embodiment of the present application can execute the data parsing method provided in any embodiment of the present application, and has corresponding functional modules and beneficial effects for executing the method.
  • Fig. 4 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device includes a processor 70, a memory 71, an input device 72 and an output device 73; the number of processors 70 in the computer device Can be one or more, take a processor 70 as an example among Fig. 4;
  • Processor 70, memory 71, input device 72 and output device 73 in the computer equipment can be connected by bus or other ways, in Fig. 4 by bus connection as an example.
  • the memory 71 can be set to store software programs, computer-executable programs and modules, such as the modules corresponding to the data analysis method in the embodiment of the present application (for example, the current business object generation in the data analysis device module 310, current business object merging module 320 and historical business object merging module 33).
  • the processor 70 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 71 , that is, implements the above-mentioned data analysis method.
  • the method includes:
  • Analyze the file to be processed under the thread corresponding to the file to be processed generate a current business object for each row of data in the file to be processed, and save a plurality of current business objects corresponding to the file to be processed into the object collection; wherein, the file to be processed
  • the file contains multiple lines of data;
  • the index object includes the key fields of the current business object that need to be merged;
  • multiple current business objects are merged with historical business objects in the business database, and multiple current business objects are stored.
  • the memory 71 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and at least one application required by a function; the data storage area may store data created according to the use of the terminal, and the like.
  • the memory 71 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices.
  • the memory 71 may further include memory located remotely relative to the processor 70, and these remote memories may be connected to the computer device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 72 may be configured to receive input of numeric or character information, and generate key signal input related to user settings and function control of the computer device.
  • the output device 73 may include a display device such as a display screen.
  • the embodiment of the present application also provides a storage medium containing computer-executable instructions, the computer-executable instructions are configured to execute a data analysis method when executed by a computer processor, the method comprising:
  • Analyze the file to be processed under the thread corresponding to the file to be processed generate a current business object for each row of data in the file to be processed, and save a plurality of current business objects corresponding to the file to be processed into the object collection; wherein, the file to be processed
  • the file includes multiple rows of data; using multi-threading, according to the index object, multiple current business objects in the object collection are merged; wherein, the index object includes the key fields of the current business object that need to be merged;
  • multiple current business objects are merged with historical business objects in the business database, and multiple current business objects are stored.
  • a storage medium containing computer-executable instructions provided in the embodiments of the present application the computer-executable instructions are not limited to the method operations described above, and can also execute the data analysis method provided in any embodiment of the present application. related operations.
  • the present application can be realized by software and necessary general-purpose hardware, and of course it can also be realized by hardware.
  • the essence of the technical solution of this application or the part that contributes to the related technology can be embodied in the form of software products, and the computer software products can be stored in computer-readable storage media, such as computer floppy disks, Read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), flash memory (FLASH), hard disk or optical disc, etc., including several instructions to make a computer device (which can be a personal computer, A server, or a network device, etc.) executes the methods described in multiple embodiments of the present application.
  • the computer readable storage medium may be a non-transitory computer readable storage medium.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above-mentioned division, as long as the corresponding functions can be realized; in addition, The specific names of multiple functional units are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据解析方法、装置、计算机设备和存储介质。该方法包括:在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;采用多线程方式,根据索引对象,对对象集合中的所述多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;采用多线程方式将所述多个当前业务对象与业务数据库中的历史业务对象进行合并,并将所述多个当前业务对象进行存储。

Description

数据解析方法、装置、计算机设备和存储介质
本申请要求在2021年6月3日提交中国专利局、申请号为202110618394.7的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及数据处理技术,例如涉及一种数据解析方法、装置、计算机设备和存储介质。
背景技术
在互联网浪潮中,多种应用操作下产生了大量的数据,多种数据携带了大量的信息,而且每种数据携带的部分信息可能相同或者相关联。
根据应用的业务发展的需求,需要解析业务数据,并根据业务数据中需要的业务信息生成结构化的数据,从而对结构化的数据进行数据分析,实现应用的优化。但是相关技术中,在海量业务数据的场景下,对大量数据进行解析,并获取想要的数据进行结构化,对数据处理能力提出了较高的要求,数据处理效率较低,容易造成数据堆积阻塞,无法满足业务数据的分析需求。
发明内容
本申请实施例提供一种数据解析方法、装置、计算机设备和存储介质,以实现快速对批量数据进行解析、合并和存储。
第一方面,本申请实施例提供了一种数据解析方法,该方法包括:
在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;
采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
第二方面,本申请实施例还提供了一种数据解析装置,该装置包括:
当前业务对象生成模块,设置为在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
当前业务对象合并模块,设置为采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;
历史业务对象合并模块,设置为采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
第三方面,本申请实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如本申请实施例中任一所述的数据解析方法。
第四方面,本申请实施例还提供了一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时设置为执行如本申请实施例中任一所述的数据解析方法。
附图说明
图1是本申请一实施例中的一种数据解析方法的流程图;
图2是本申请另一实施例中的一种数据解析方法的流程图;
图3是本申请一实施例中的一种数据解析装置的结构示意图;
图4是本申请一实施例中的一种计算机设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
图1是本申请一实施例提供的一种数据解析方法的流程图,本实施例可适用于在批量数据中提取相关联的业务数据的情况,该方法可以由数据解析装置来执行,该装置可以由软件和/或硬件来实现,并一般集成在计算机设备中。
如图1所示,本申请实施例的技术方案,包括如下步骤:
S110、在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中,其中,所述待处理文件包括多行数据。
其中,待处理文件是需要进行数据解析的文件,待处理文件与线程一一对应,对每个待处理文件创建一个线程,可以提高对待处理文件的数据解析效率。
例如,在待处理文件对应的线程下对待处理文件进行解析之前,还包括:获取当前文件压缩包,对当前文件压缩包进行解压缩,将解压缩后生成的文件作为待处理文件;在将多个当前业务对象进行存储之后,还包括:将当前文件压缩包存储到日志数据库中。
在本申请实施例中,获取的源文件是压缩包形式,对获取的当前文件压缩包解压缩后,将解压缩后获得的文件作为待处理文件进行解析。在对待处理文件处理完毕之后,将当前文件压缩包存储到日志数据库中进行备份,以供后续其他业务处理需要。
当前业务对象是对待处理文件中的每行数据按照日志数据格式进行解析,从而生成的对象,待处理文件可以对应多个当前业务对象。
对象集合用于存储每个线程对每个待处理文件进行解析后生成的多个当前业务对象,在本申请实施例中,待处理文件与线程一一对应,每个线程分别对 其对应的待处理文件按行解析数据,对每行数据按照日志数据格式进行解析生成当前业务对象,每个线程分别将生成的当前业务对象保存到对象集合中。在本申请实施例中,将所有当前业务对象都保存到对象集合中,可以通过合并对象集合中可合并的业务对象,减少和数据库的交互次数,从而提高数据处理的能力和速度。
S120、采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段。
索引对象可以预先生成,获取对象集合之后,对对象集合中需要进行合并的当前业务对象,确定其合并时所需的关键字段,并将关键字段作为字段值保存到索引对象中。通过索引对象,可以确定关键字段存在关联性的当前业务对象。其中,关键字段可以为一个或多个,本实施例对此不进行限制。
在本申请实施例中,采用多个线程分别对多个待处理文件进行解析,获取当前业务对象,线程与待处理文件一一对应。在多个线程分别将当前业务对象保存到对象集合中之后,同样采用多个线程对对象集合中的当前业务对象进行合并,本实施例对进行当前业务对象合并的线程的数量不进行限制,可以根据计算机的数据处理能力进行配置。
例如,可以采用java多线程方式contdownlatch,并行处理对象集合中的当前业务对象。Contdownlatch是一个同步工具类,它允许一个或多个线程一直等待直到其他线程执行完毕,但本实施例对采用的同步工具不进行限制。
例如,可以使用ES(Elasticsearch,弹性搜索)搜索引擎,对对象集合中关键字段存在关联的当前业务对象进行查询,但本实施例对采用的搜索引擎不进行限制。
S130、采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
在本申请实施例中,在将多个当前业务对象存储到业务数据库之前,首先确定业务数据库中是否存储有与当前业务对象匹配的历史业务对象,当存在历史业务对象时,将历史业务对象对应的字段值,和与历史业务对象匹配的当前业务对象的字段值进行匹配,将历史业务对象相比于当前业务对象多出的字段值,保存到当前业务对象中。这样设置可以保证当前业务对象中对应的业务信息不仅为最新更新的信息,并且还能保证业务信息的全面性。
同样,本实施例对进行历史业务对象和当前业务对象合并的线程的数量不进行限制,可以根据计算机的数据处理能力进行配置。
在本申请实施例中,对待处理文件进行解析生成当前业务对象、对对象集合中的多个当前业务对象进行合并以及对历史业务对象和当前业务对象合并的过程,都采用多线程的方式进行,可以提高数据处理的效率,防止海量数据场景下对数据进行解析合并和存储时,出现数据堆积阻塞的现象。
本实施例的技术方案,通过对每个待处理文件对应一个线程,在待处理文件对应的线程下解析待处理文件,对待处理文件的每行数据生成一个当前业务对象,将业务对象保存到对象集合中,再通过多线程,根据索引对象对对象集 合中的当前业务对象进行合并,并通过多线程将当前业务对象与业务数据库中的历史业务对象进行合并,将当前业务对象和索引对象保存到业务数据库中。改善了相关技术中在海量数据场景下数据处理效率低,容易造成数据堆积阻塞,无法满足业务数据分析需求的情况,实现了快速对批量数据进行解析、合并和存储。
图2是本申请另一实施例提供的一种数据解析方法的流程图,本申请实施例在上述实施例的基础上,对根据待处理文件的每行数据生成当前业务对象的过程、根据索引对象进行当前业务对象合并的过程,以及当前业务对象与历史业务对象进行合并的过程进行了进一步的具体化。
相应的,如图2所示,本申请实施例的技术方案,包括如下步骤:
S210、判断是否满足压缩包定时获取执行条件,基于满足压缩包定时获取执行条件的判断结果,执行S220,基于不满足压缩包定时获取执行条件的判断结果,返回执行S210。
满足压缩包定时获取执行条件,可以是每隔预设时间间隔获取一批压缩包,本实施例对具体的时间间隔不进行限制。
S220、判断是否满足历史文件压缩包处理完成条件,基于满足历史文件压缩包处理完成条件的判断结果,执行S230,基于不满足历史文件压缩包处理完成条件的判断结果,返回执行S210。
在本申请实施例中,如果确定对上一批获取的历史文件压缩包,进行解压缩、文件复制之后,对生成的待处理文件按照S110-S130的处理方式处理完毕,并且历史文件压缩包解压缩后的文件存储到日志数据库中,则确定满足历史文件压缩包处理完成条件。
S230、获取当前文件压缩包,对当前文件压缩包进行解压缩,将解压缩后生成的文件作为待处理文件。
在本申请实施例中,获得的数据为文件压缩包的形式,对获得的当前文件压缩包进行解压缩,将解压缩后的文件作为待处理文件进行处理。
S240、通过目标线程,对待处理文件中的每行数据分别进行解析,获取与每行数据对应的字段值,根据字段值生成与每行数据对应的当前业务对象。
对每个待处理文件创建一个线程,由目标线程实现待处理文件的解析、当前业务对象的生成和保存。
目标线程对待处理文件中的每行数据,按照日志数据格式进行解析,获取字段值,根据每行数据对应的字段值分别生成当前业务对象。
S250、通过目标线程,将待处理文件对应的多个当前业务对象保存到对象集合中。
目标线程对待处理文件的每行数据,分别生成当前业务对象,将每个当前业务对象保存到对象集合中。对象集合中存储的是不同的线程对其对应的待处理文件进行解析后,分别生成的当前业务对象。
S260、采用多线程方式,根据索引对象中的关键字段,在所述对象集合中获取与所述关键字段匹配的当前业务对象,并获取当前业务对象的主键信息。
关键字段是需要进行合并的当前业务对象的合并依据,根据关键字段,可以在对象集合中获取关键字段存在关联的每个当前业务对象。
主键信息是当前业务对象的唯一标识,主键信息可以用于表示当前业务对象的排序顺序,也即表示当前业务对象对应的行数据在待处理文件中的排序顺序。
由于主键信息可以唯一标识当前业务对象,因此,将关键字段存在关联的每个当前业务对象的主键信息,以及关键字段对应存储到索引对象中。根据索引对象,即可关联查询到每个关联的当前业务对象的完整数据,提高了数据搜索的效率。
S270、将当前业务对象的主键信息与关键字段进行对应存储。
S280、采用多线程方式,根据当前业务对象的主键信息,在业务数据库中获取与当前业务对象匹配的历史业务对象,并获取历史业务对象相比于当前业务对象的差异字段值。
差异字段值是指包含在历史业务对象中,但未包含在当前业务对象中的字段值。
根据当前业务对象的主键信息,可以确定业务数据库中是否存在与当前业务对象匹配的历史业务对象。如果存在历史业务对象,则将历史业务对象的字段值与当前业务对象进行比对。将存在于历史业务对象中,但当前业务对象对应的字段值为空的差异字段值存储到当前业务对象中,对于历史业务对象和当前业务对象中字段值都不为空的字段,以当前业务对象的字段值为准。这样设置可以使当前业务对象中包含的字段值更新、更加全面。
S290、将差异字段值存储到与历史业务对象匹配的当前业务对象中。
将差异字段值存储到当前业务对象中,与历史业务对象匹配的字段值为空的字段处,可以保证当前业务对象中数据的全面性和准确性。
S2100、将经过与历史业务对象合并处理后的多个当前业务对象以及索引对象,存储到业务数据库中。
在本申请实施例中,经过与历史业务对象合并处理后,当前业务对象的数据更加完整。根据索引对象,可以快速获得关键字段存在关联的当前业务对象,提高了待处理文件的数据价值,为后续进行业务数据分析提供了便利。
S2110、将当前文件压缩包存储到日志数据库中。
在本申请实施例中,将当前文件压缩包存储到日志数据库中,便于进行数据回溯,同时也可以供后续其他业务需求使用。
本实施例的技术方案,通过每隔一定时间间隔,确定历史压缩包处理完成后,获取当前压缩包进行解压缩,将解压缩后复制得到的文件作为待处理文件,对每个待处理文件对应一个线程,待处理文件对应的目标线程对待处理文件的每行数据生成一个当前业务对象,将业务对象保存到对象集合中,再通过多线程,根据索引对象中的关键字段,获取对象集合中与关键字段匹配的当前业务对象的主键信息,将主键信息与关键字段对应存储,通过多线程根据当前业务对象的主键信息,确定业务数据库中匹配的历史业务对象,与当前业务对象进 行合并,将当前业务对象和索引对象保存到业务数据库中。改善了相关技术中在海量数据场景下数据处理效率低,容易造成数据堆积阻塞,无法满足业务数据分析需求的情况,实现了快速对批量数据进行解析、合并和存储。
图3是本申请一实施例中的一种数据解析装置的结构示意图,该装置包括:当前业务对象生成模块310、当前业务对象合并模块320以及历史业务对象合并模块330。其中:
当前业务对象生成模块310,设置为在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
当前业务对象合并模块320,设置为采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;
历史业务对象合并模块330,设置为采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
本实施例的技术方案,通过对每个待处理文件对应一个线程,在待处理文件对应的线程下解析待处理文件,对待处理文件的每行数据生成一个当前业务对象,将业务对象保存到对象集合中,再通过多线程,根据索引对象对对象集合中的多个当前业务对象进行合并,并通过多线程将当前业务对象与业务数据库中的历史业务对象进行合并,将当前业务对象和索引对象保存到业务数据库中。改善了相关技术中在海量数据场景下数据处理效率低,容易造成数据堆积阻塞,无法满足业务数据分析需求的情况,实现了快速对批量数据进行解析、合并和存储。
在上述实施例的基础上,所述装置,还包括:
当前文件压缩包处理模块,设置为获取当前文件压缩包,对当前文件压缩包进行解压缩,将解压缩后生成的文件作为待处理文件;
当前文件压缩包存储模块,设置为将当前文件压缩包存储到日志数据库中。
在上述实施例的基础上,当前文件压缩包处理模块,包括:
条件判断单元,设置为响应于确定满足压缩包定时获取执行条件,并且满足历史文件压缩包处理完成条件,获取当前文件压缩包。
在上述实施例的基础上,当前业务对象生成模块310,包括:
当前业务对象生成单元,设置为通过目标线程,对待处理文件中的每行数据分别进行解析,获取与每行数据对应的字段值,根据字段值生成与每行数据对应的当前业务对象;
当前业务对象保存单元,设置为通过目标线程,将待处理文件对应的多个当前业务对象保存到对象集合中。
在上述实施例的基础上,当前业务对象合并模块320,包括:
主键信息获取单元,设置为采用多线程方式,根据索引对象中的关键字段,在所述对象集合中获取与所述关键字段匹配的当前业务对象,并获取当前业务 对象的主键信息;
主键信息存储单元,设置为将当前业务对象的主键信息与关键字段进行对应存储。
在上述实施例的基础上,历史业务对象合并模块330,包括:
差异字段值获取单元,设置为采用多线程方式,根据当前业务对象的主键信息,在业务数据库中获取与当前业务对象匹配的历史业务对象,并获取历史业务对象相比于当前业务对象的差异字段值;
差异字段值存储单元,设置为将差异字段值存储到与历史业务对象匹配的当前业务对象中。
在上述实施例的基础上,历史业务对象合并模块330,包括:
对象存储单元,设置为将经过与历史业务对象合并处理后的多个当前业务对象以及索引对象,存储到业务数据库中。
本申请实施例所提供的数据解析装置可执行本申请任意实施例所提供的数据解析方法,具备执行方法相应的功能模块和有益效果。
图4为本申请实施例提供的一种计算机设备的结构示意图,如图4所示,该计算机设备包括处理器70、存储器71、输入装置72和输出装置73;计算机设备中处理器70的数量可以是一个或多个,图4中以一个处理器70为例;计算机设备中的处理器70、存储器71、输入装置72和输出装置73可以通过总线或其他方式连接,图4中以通过总线连接为例。
存储器71作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的数据解析方法对应的模块(例如,数据解析装置中的当前业务对象生成模块310、当前业务对象合并模块320以及历史业务对象合并模块33)。处理器70通过运行存储在存储器71中的软件程序、指令以及模块,从而执行计算机设备的多种功能应用以及数据处理,即实现上述的数据解析方法。该方法包括:
在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;
采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
存储器71可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作***、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器71可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器71可进一步包括相对于处理器70远程设置的存储器,这些远程存储器可以通过网络连接至计算机设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置72可设置为接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置73可包括显示屏等显示设备。
本申请实施例还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时设置为执行一种数据解析方法,该方法包括:
在待处理文件对应的线程下对待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;采用多线程方式,根据索引对象,对对象集合中的多个当前业务对象进行合并;其中,索引对象包括需要进行合并的当前业务对象的关键字段;
采用多线程方式,将多个当前业务对象与业务数据库中的历史业务对象进行合并,并将多个当前业务对象进行存储。
当然,本申请实施例所提供的一种包含计算机可执行指令的存储介质,其计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的数据解析方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以清楚地了解到,本申请可借助软件及必需的通用硬件来实现,当然也可以通过硬件实现。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请多个实施例所述的方法。计算机可读存储介质可以为非暂态计算机可读存储介质。
值得注意的是,上述数据解析装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,多个功能单元的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。

Claims (10)

  1. 一种数据解析方法,包括:
    在待处理文件对应的线程下所述对待处理文件进行解析,对所述待处理文件中的每行数据生成当前业务对象,将所述待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
    采用多线程方式,根据索引对象,对所述对象集合中的所述多个当前业务对象进行合并;其中,所述索引对象包括需要进行合并的当前业务对象的关键字段;
    采用多线程方式,将所述多个当前业务对象与业务数据库中的历史业务对象进行合并,并将所述多个当前业务对象进行存储。
  2. 根据权利要求1所述的方法,在待处理文件对应的线程下对所述待处理文件进行解析之前,还包括:
    获取当前文件压缩包,对所述当前文件压缩包进行解压缩,将解压缩后生成的文件作为待处理文件;
    在将所述多个当前业务对象进行存储之后,还包括:
    将所述当前文件压缩包存储到日志数据库中。
  3. 根据权利要求2所述的方法,其中,所述获取当前文件压缩包,包括:
    响应于确定满足压缩包定时获取执行条件,并且满足历史文件压缩包处理完成条件,获取当前文件压缩包。
  4. 根据权利要求1所述的方法,其中,所述在待处理文件对应的线程下对所述待处理文件进行解析,对所述待处理文件中的每行数据生成当前业务对象,将所述待处理文件对应的多个当前业务对象保存到对象集合中,包括:
    通过目标线程,对所述待处理文件中的每行数据进行解析,获取与所述每行数据对应的字段值,根据字段值生成与所述每行数据对应的当前业务对象;
    通过所述目标线程,将所述待处理文件对应的多个当前业务对象保存到对象集合中。
  5. 根据权利要求1所述的方法,其中,所述采用多线程方式,根据索引对象,对所述对象集合中的所述多个当前业务对象进行合并,包括:
    采用多线程方式,根据索引对象中的关键字段,在所述对象集合中获取与所述关键字段匹配的当前业务对象,并获取所述当前业务对象的主键信息;
    将所述当前业务对象的主键信息与所述关键字段进行对应存储。
  6. 根据权利要求5所述的方法,其中,所述采用多线程方式将所述多个当前业务对象与业务数据库中的历史业务对象进行合并,包括:
    采用多线程方式,根据所述当前业务对象的主键信息,在业务数据库中获取与所述当前业务对象匹配的历史业务对象,并获取所述历史业务对象相比于所述当前业务对象的差异字段值;
    将所述差异字段值存储到与所述历史业务对象匹配的当前业务对象中。
  7. 根据权利要求1所述的方法,其中,所述将所述多个当前业务对象进行存储,包括:
    将经过与历史业务对象合并处理后的所述多个当前业务对象以及所述索引 对象,存储到业务数据库中。
  8. 一种数据解析装置,包括:
    当前业务对象生成模块,设置为在待处理文件对应的线程下对所述待处理文件进行解析,对待处理文件中的每行数据生成当前业务对象,将所述待处理文件对应的多个当前业务对象保存到对象集合中;其中,所述待处理文件包括多行数据;
    当前业务对象合并模块,设置为采用多线程方式,根据索引对象,对所述对象集合中的所述多个当前业务对象进行合并;其中,所述索引对象包括需要进行合并的当前业务对象的关键字段;
    历史业务对象合并模块,设置为采用多线程方式将所述多个当前业务对象与业务数据库中的历史业务对象进行合并,并将所述多个当前业务对象进行存储。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如权利要求1-7中任一所述的数据解析方法。
  10. 一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时设置为执行如权利要求1-7中任一所述的数据解析方法。
PCT/CN2022/095555 2021-06-03 2022-05-27 数据解析方法、装置、计算机设备和存储介质 WO2022253131A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110618394.7 2021-06-03
CN202110618394.7A CN113220646A (zh) 2021-06-03 2021-06-03 一种数据解析方法、装置、计算机设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022253131A1 true WO2022253131A1 (zh) 2022-12-08

Family

ID=77082494

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/095555 WO2022253131A1 (zh) 2021-06-03 2022-05-27 数据解析方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN113220646A (zh)
WO (1) WO2022253131A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220646A (zh) * 2021-06-03 2021-08-06 北京锐安科技有限公司 一种数据解析方法、装置、计算机设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721749B1 (en) * 2000-07-06 2004-04-13 Microsoft Corporation Populating a data warehouse using a pipeline approach
CN105955970A (zh) * 2015-11-12 2016-09-21 ***股份有限公司 一种基于日志解析的数据库复制方法及装置
US20160299916A1 (en) * 2015-04-13 2016-10-13 Tactile, Inc. Merging data edits with intervening edits for data concurrency
US20210149905A1 (en) * 2019-11-14 2021-05-20 YScope Inc. Compression, searching, and decompression of log messages
CN113220646A (zh) * 2021-06-03 2021-08-06 北京锐安科技有限公司 一种数据解析方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721749B1 (en) * 2000-07-06 2004-04-13 Microsoft Corporation Populating a data warehouse using a pipeline approach
US20160299916A1 (en) * 2015-04-13 2016-10-13 Tactile, Inc. Merging data edits with intervening edits for data concurrency
CN105955970A (zh) * 2015-11-12 2016-09-21 ***股份有限公司 一种基于日志解析的数据库复制方法及装置
US20210149905A1 (en) * 2019-11-14 2021-05-20 YScope Inc. Compression, searching, and decompression of log messages
CN113220646A (zh) * 2021-06-03 2021-08-06 北京锐安科技有限公司 一种数据解析方法、装置、计算机设备和存储介质

Also Published As

Publication number Publication date
CN113220646A (zh) 2021-08-06

Similar Documents

Publication Publication Date Title
US10831562B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
CN108009236B (zh) 一种大数据查询方法、***、计算机及存储介质
CN110287163B (zh) 安全日志采集解析方法、装置、设备及介质
WO2020238597A1 (zh) 基于Hadoop的数据更新方法、装置、***及介质
CN110928851B (zh) 处理日志信息的方法、装置、设备及存储介质
CN108694221B (zh) 数据实时分析方法、模块、设备和装置
CN109299101B (zh) 数据检索方法、装置、服务器和存储介质
CN107016039B (zh) 数据库写入的方法和数据库***
WO2021259217A1 (zh) 数据的关联查询方法、装置、设备及存储介质
US20140059000A1 (en) Computer system and parallel distributed processing method
CN110765195A (zh) 一种数据解析方法、装置、存储介质及电子设备
WO2022253131A1 (zh) 数据解析方法、装置、计算机设备和存储介质
CN111427784A (zh) 一种数据获取方法、装置、设备及存储介质
CN110362593B (zh) 一种数据查询方法、装置、设备及存储介质
CN112199443B (zh) 数据同步方法、装置、计算机设备和存储介质
CN105302827A (zh) 一种事件的搜索方法和设备
CN114791927A (zh) 一种数据分析方法和装置
CN112612832A (zh) 节点分析方法、装置、设备及存储介质
CN112800091A (zh) 一种流批一体式计算控制***及方法
CN111125216A (zh) 数据导入Phoenix的方法及装置
WO2021143010A1 (zh) 一种分布式计算任务的响应方法及设备
CN111104527B (zh) 一种富媒体文件解析方法
JP7133037B2 (ja) メッセージ処理方法、装置およびシステム
CN109388658B (zh) 一种数据确定方法和装置
CN112818183A (zh) 一种数据合成方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22815177

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22815177

Country of ref document: EP

Kind code of ref document: A1