CN113360490A - Data processing method, apparatus, device, medium, and program product - Google Patents

Data processing method, apparatus, device, medium, and program product Download PDF

Info

Publication number
CN113360490A
CN113360490A CN202110693569.0A CN202110693569A CN113360490A CN 113360490 A CN113360490 A CN 113360490A CN 202110693569 A CN202110693569 A CN 202110693569A CN 113360490 A CN113360490 A CN 113360490A
Authority
CN
China
Prior art keywords
data
processed
processing
configuration file
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110693569.0A
Other languages
Chinese (zh)
Other versions
CN113360490B (en
Inventor
张瑞
许超
石晓坤
孟迪
吴家林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110693569.0A priority Critical patent/CN113360490B/en
Publication of CN113360490A publication Critical patent/CN113360490A/en
Application granted granted Critical
Publication of CN113360490B publication Critical patent/CN113360490B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Stored Programmes (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a data processing method, a data processing device, data processing equipment, a data processing medium and a program product, and relates to the field of artificial intelligence such as natural language processing and cloud computing. One embodiment of the method comprises: responding to a received data processing request, and acquiring a configuration file corresponding to the data to be processed, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file; analyzing the configuration file to obtain an analysis result; and processing the data to be processed according to the data processing measures of the analysis result.

Description

Data processing method, apparatus, device, medium, and program product
Technical Field
The present disclosure relates to the field of computers, and in particular, to the field of artificial intelligence, such as natural language processing and cloud computing, and more particularly, to a data processing method, apparatus, device, medium, and program product.
Background
With the rapid development of the internet, most organizations have also adopted computers to perform business processing. However, because the computer platforms used by the organizations are different, the versions, statistical methods, storage media, resource allocation, data relationships, etc. of the data are different, for example, the medical organization, the power grid organization, the network organization, etc. Taking medical institutions as an example, large data is popularized to the medical industry, the medical-related industry conforms to the development of the large data, and a plurality of medical institutions begin to file the data to be processed in a structured mode. Due to different data storage modes of medical institutions, butt joint logics of the medical institutions and the supervision institutions are complex, a single set of logics corresponds to a single supervision example, the development cost is high, repetitive work is large, the flexibility is poor, and in addition, the number of the existing medical institutions is large, the processing difficulty in a short time is high, the data diversity is high, and the maintenance technology difficulty after processing is large.
At present, the following solutions are generally adopted for data processing to be processed: (1) the processing and development scheme is formulated according to the actual situation of the medical institution, the medical documents to be processed are developed sequentially, each processing logic or medical document is a relatively independent system, and the data processing can be realized for the same data version of the same medical institution. (2) The processing scheme is customized by utilizing the storage process of the database, the medical documents to be processed are processed by compiling the corresponding storage process, and the quick data processing can be realized by copying, pasting and storing the same data version of different medical institutions. (3) The method comprises the steps of adopting Query of a database, obtaining metadata required by processing by using Structured Query Language (SQL), and then processing by developing a processing program, wherein a set of programs can be used for processing data aiming at the data structure of the same database.
Disclosure of Invention
The embodiment of the disclosure provides a data processing method, a data processing device, data processing equipment, a data processing medium and a program product.
In a first aspect, an embodiment of the present disclosure provides a data processing method, including: responding to a received data processing request, and acquiring a configuration file corresponding to the data to be processed, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file; analyzing the configuration file to obtain an analysis result; and processing the data to be processed according to the data processing measures of the analysis result.
In a second aspect, an embodiment of the present disclosure provides a data processing apparatus, including: the data acquisition module is configured to respond to a received data processing request and acquire a configuration file corresponding to the data to be processed, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file; the result analysis module is configured to analyze the configuration file to obtain an analysis result; and the data processing module is configured to process the data to be processed according to the data processing measures of the analysis result.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the first aspect.
In a fourth aspect, the disclosed embodiments propose a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in the first aspect.
In a fifth aspect, the disclosed embodiments propose a computer program product comprising a computer program that, when executed by a processor, implements the method as described in the first aspect.
According to the data processing method, the data processing device, the data processing equipment, the data processing medium and the program product, when a data processing request is received, a configuration file corresponding to data to be processed in the data processing request is obtained, and the data processing request is used for indicating that the data to be processed is processed according to the configuration file; then, analyzing the configuration file to obtain an analysis result; and finally, processing the data to be processed according to the data processing measures of the analysis result. The data to be processed can be processed according to the data processing measures obtained by analyzing the configuration file; the data to be processed can be processed based on the data processing measures obtained by analyzing the configuration file.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
Other features, objects, and advantages of the disclosure will become apparent from a reading of the following detailed description of non-limiting embodiments which proceeds with reference to the accompanying drawings. The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;
FIG. 2 is a flow diagram for one embodiment of a data processing method according to the present disclosure;
FIG. 3 is a schematic diagram of a combination of data mapping, text structuring, and data cleansing;
FIG. 4 is a schematic diagram of a combination of data mapping, text structuring, and data cleansing;
FIG. 5 is a schematic view of a drill-down process;
FIG. 6 is a schematic diagram of a data map;
FIG. 7 is a flow diagram for one embodiment of a data processing method according to the present disclosure;
FIG. 8 is a schematic block diagram of one embodiment of a data processing apparatus according to the present disclosure;
FIG. 9 is a block diagram of an electronic device used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the data processing method or data processing apparatus of the present disclosure may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with a server 105, e.g. data to be processed, via a network 104. The terminal devices 101, 102, 103 may have installed thereon various client applications, intelligent interactive applications, such as data processing applications, data screening software, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are used, the terminal devices may be electronic products that perform human-Computer interaction with a user through one or more modes of a keyboard, a touch pad, a display screen, a touch screen, a remote controller, voice interaction, or handwriting equipment, such as a PC (Personal Computer), a mobile phone, a smart phone, a PDA (Personal Digital Assistant), a wearable device, a PPC (Pocket PC, palmtop), a tablet Computer, a smart car, a smart television, a smart speaker, a tablet Computer, a laptop Computer, a desktop Computer, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the above-described electronic apparatuses. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may provide various services. For example, the server 105 may obtain a configuration file corresponding to the data to be processed when receiving a data processing request sent by the terminal device 101, 102, 103, where the data processing request is used to instruct to process the data to be processed according to the configuration file; analyzing the configuration file to obtain an analysis result; and processing the data to be processed according to the data processing measures of the analysis result.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the data processing method provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the data processing apparatus is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a data processing method according to the present disclosure is shown. The data processing method may include the steps of:
step 201, in response to receiving a data processing request, obtaining a configuration file corresponding to data to be processed.
In this embodiment, an executing body of the data processing method (for example, the server 105 shown in fig. 1) may obtain a configuration file corresponding to the data to be processed when receiving a data processing request sent by a terminal device (for example, the terminal devices 101, 102, 103 shown in fig. 1). The data to be processed may be data processed according to a configuration file, for example, medical data, logistics data, log data, power grid data, and the like.
Taking medical data as an example, the medical data may be data related to medical information generated by a medical staff during a hospital visit, such as patient identity information, medical time, disease diagnosis information, drug information, and the like.
Here, the data processing request is used to instruct to process the data to be processed according to the configuration file.
In one example, the data to be processed may be processed according to the attribute information of the data to be processed according to the configuration file. Optionally, the attribute information may include at least one of: the type of data to which the data to be processed belongs (e.g., unstructured data type (e.g., text data), structured data type (e.g., structured data)), creator information of the data to be processed, creation time of the data to be processed, and the like.
Data to be processed acquired from a plurality of medical institutions. The initial data to be processed comprises the information of the treatment generated by the treatment staff in the hospital and other relevant data.
In the technical scheme of the disclosure, the related to-be-processed data and medical data are acquired, stored, applied and the like, which all accord with the regulations of related laws and regulations and do not violate the good custom of the public order.
Step 202, parsing the configuration file to obtain a parsing result.
In this embodiment, the execution subject may parse the configuration file to obtain a parsing result.
And step 203, processing the data to be processed according to the data processing measures of the analysis result.
In this embodiment, the execution subject may process the data to be processed according to the data processing measure of the analysis result. The processing of the data to be processed may include: the method comprises the steps of carrying out standardized processing on data to be processed, carrying out storage processing on the data to be processed, obtaining the data to be processed, carrying out data mapping processing on the data to be processed, carrying out text structuralization processing on the data to be processed and the like.
The data processing method provided by the embodiment of the disclosure includes the steps that firstly, when a data processing request is received, a configuration file corresponding to data to be processed in the data processing request is obtained, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file; then, analyzing the configuration file to obtain an analysis result; and finally, processing the data to be processed according to the data processing measures of the analysis result. The data to be processed can be processed according to the data processing measures obtained by analyzing the configuration file; the data to be processed can be processed based on the data processing measures obtained by analyzing the configuration file.
In some optional implementation manners of this embodiment, processing the data to be processed according to the data processing measure of the analysis result includes: and responding to the analysis result in the target format, and processing the data to be processed according to the data processing measure of the analysis result in the target format.
In this implementation manner, the execution body may parse the configuration file and return a parsing result, where the parsing result is a parsing result in a target format, for example, a parsing result in a target format such as JSON, XML, YAML, or the like; if the parsing is successful, the parsing result is returned to the user, and the user determines whether the configuration file before parsing has syntax errors so as to debug and check errors.
In this implementation manner, when the analysis result is the analysis result in the target format, the data to be processed may be processed according to the data processing measure of the analysis result in the target format.
In some optional implementation manners of this embodiment, processing the data to be processed according to the data processing measure of the analysis result includes: and processing the data to be processed according to a preset function system corresponding to the data processing measures of the analysis result.
In one example, in the process of processing the data to be processed, the data to be processed is processed in an E (Extract) -T (Transform) -L (Load) manner, and the E-T-L manner extracts corresponding data to be processed from each node to be processed in a configuration file manner, converts the data to be processed, and finally loads the data to be processed into a destination end data structure (i.e., a target field). The configuration file of each step processes a plurality of data formats and data sources according to a set of function system and the path description scheme in the configuration file. After the data to be processed is obtained, each data processing node has a corresponding data processing function to process the data to be processed.
In this implementation, the processing of the data to be processed may be implemented by a function system.
In some optional implementations of this embodiment, the data processing measure includes at least one of: data mapping, text structuring and data cleaning; the data mapping is used for mapping structured data in the data to be processed to a target field, the text structuring is used for extracting keywords and values of text data in the data to be processed to form structured data, and the data cleaning is used for cleaning non-standardized data in the data to be processed to obtain standardized data.
In the present implementation, the data Mapping (Mapping) is used to map the structured data (e.g., data of a database, an interface, etc.) in the data to be processed onto the target field; the text structuralization (Structure) is used for extracting keywords and values of the text data in the data to be processed to form structured data; the data washing (Value washing) is used for washing non-standardized data in the data to be processed to obtain standardized data, for example, non-standardized or redundant data or structures are adjusted to obtain standardized data. The target field may be a field on a target, such as a field on a target medical document of a target (e.g., an executing entity that performs the data processing method).
It should be noted that the data mapping, the text structuring, and the data cleansing are independent of each other and have different functions. In this implementation, the corresponding data processing measures can be determined according to the configuration file specifying the context of data mapping, text structuring, and data cleansing.
Taking medical data as an example, in fig. 3, data processing measures meeting the medical institution may be configured according to each medical institution (e.g., medical institution 1, medical institution 2, and medical institution 3), for example, if the medical institution 1 does not have a long text processing requirement, text structuring may not be performed, and the medical data is obtained for the medical institution 1; after acquiring the medical data, performing data mapping on the structured data in the medical data; and (4) performing data cleaning on the non-standardized data in the mapped medical data to obtain a data processing result 1. For the medical institution 2, medical data is acquired first; then, performing data mapping on the structured data in the medical data to obtain mapped medical data; then, text structuring is carried out on text data in the medical data to obtain structured data; the non-standard data (i.e., the mapped medical data and/or structured data) is then cleaned to obtain standardized data to obtain the data processing result 2. If the medical data of the medical institution 3 does not include non-standardized data, the data cleansing may not be performed, and the medical data may be subjected to data mapping after the medical data is acquired, so as to obtain the data processing result 3.
It should be noted that, the execution sequence of data mapping, text structuring and data cleansing, and the number of deployed instances are not limited, and can be flexibly combined and pieced together to establish a unique data processing measure. The data mapping, the text structuring and the data cleaning are realized by using the operation mode of the configuration file instead of a hard coding mode, the flexible use of the configuration file can enable a program to process the data to be processed according to the corresponding configuration file, and the data to be processed of all medical institutions can be processed only by maintaining a set of data processing measures of the configuration file.
In this implementation, the unified profile is used, which can be reused when processing objects (i.e., medical data) having the same content, without regenerating a new profile from the processing objects.
In this implementation, whether to process the data to be processed through at least one of data mapping, text structuring, and data cleansing is controlled by the controller. The controller is configured to store therein an execution sequence in which the process flow should pass through at least one of data Mapping (Mapping), text structuring (Structure) and data washing (Value wash), and the data Mapping (Mapping), the text structuring (Structure) and the data washing (Value wash). The data processing measures are flexibly configured, so that an accurate data processing effect to be processed can be obtained, and different data processing measures can be configured for the data processing to be processed of different medical structures.
Taking medical data as an example, in fig. 4, the medical data is acquired from a medical institution; then, performing data mapping on the structured data in the medical data; then, performing text structuring processing on the medical data subjected to data mapping to obtain a data processing result 1; and, data after data mapping are subjected to data cleaning to obtain data processing data 2; and performing text structuring on text data in the medical data to obtain structured data, performing natural language understanding on the structured data, and obtaining a data processing result 3 based on the data after natural language understanding.
In the implementation mode, the personalized processing of the data to be processed can be realized based on the free combination of data mapping, text structuring and data cleaning.
In some optional implementations of this embodiment, the data mapping is determined based on the following steps: extracting data to be processed at a preset data extraction position included in the configuration file according to a data extraction function in the function system; acquiring a screened data set according to a screening function in a function system and data to be processed; and mapping the data to be processed in the screened data set to a target field.
Specifically, according to a data extraction function in the function system, extracting the data to be processed at a preset data extraction position included in the configuration file may include:
for the first data extraction, extracting data to be processed at a preset data extraction position included in the configuration file according to a data extraction function in a function system; and then, storing the data to be processed into a cache according to the ID of the data to be processed (namely, the preset ID specified in the configuration file) and a preset path included in the configuration file, and returning to the processing flow of the calling function.
For non-first-time data extraction, the data to be processed may be extracted in the corresponding cache by using an ID preset in the configuration file (for example, an ID of the data to be processed) according to a data extraction function in the function system.
In this implementation, the data mapping includes the following steps:
first, data selection and extraction (before _ mapping): using a data extraction function (the data extraction function can be used to operate JSON, XML, and other structural data) at the data extraction position, and extracting the data to be processed at a preset path (i.e., the data extraction position) in the configuration file, where the functions used may include: get _ content (PATH, GLOBAL _ NAME) is used to obtain JSON data specifying the PATH; the XML _ load _ PATH (PATH, XPATH, GLOBAL _ NAME) is used for acquiring data corresponding to XPATH in the XML of the specified PATH; get _ data (GLOBAL _ NAME) is used to obtain the cached data content (i.e., not the first data extraction).
Second, data screening (filter): after data extraction, if part of the extracted data needs to be screened, a screening function can be used for screening in the part; for example, the data set that is drilled down is screened out by the value of the current pass (i.e., by this value, the corresponding data source is obtained, the screened data set is obtained from this data source according to the screening function corresponding to the drill down in the function system, according to the preset relationship between the fields in the configuration file), or the screened data set is screened out by a constant condition (e.g., screened out of the cache). The functions used may include: filter _ kill (GLOBAL _ NAME, DRILL _ pair) filters out other data by using the data meeting the current drilling condition value in the specified data set; the filter _ by _ dottype (GLOBAL _ NAME, QUERY) filters out data items that do not meet the QUERY condition using all data sets in the specified data set when the data item meets the QUERY condition. QUERY supports continin (present), exclude (absent), etc., as: patient name: continain (a) to screen the patient set with "a" in the patient name.
Third, data conversion (do _ mapping): the data to be processed is mapped to a target field of a target end, each data to be processed in the data to be processed is processed through a traversal method, a drill-down data relation (the data relation can be preset in a configuration file) can be generated through the exact value of the current traversal and other data in the cache, and the data mapping work can be completed through the drill-down relation. The functions used may include: the loop _ item (GLOBAL _ NAME) circularly traverses the data set and simply maps with the target field; and circularly traversing the data set by loop _ with _ DRILL (GLOBAL _ NAME, DRILL _ KEYS), storing the specified DRILL _ KEYS in a cache during the traversal, obtaining the effect of data joint drilling by matching with a filter _ DRILL function, screening out the data set meeting the conditions, and repeating the whole configuration process to obtain an accurate data result set.
Fourth, data loading (after _ mapping): after data is extracted, screened and converted, mapping is required to be carried out on the data and a target field, and since various formats of the processed data exist, the data loading step is to simply adjust (namely, adjust the data structure) or clean (namely, clean the data value) the data to obtain the data after cleaning and adjustment or cleaning. The functions used may include: join _ list, join _ dit _ value, etc. are used to adjust or clean the non-normalized data in the manner specified in the configuration file.
In the implementation manner, data mapping of the data to be processed can be realized through the functions in the function system and the configuration files.
In some optional implementations of this embodiment, the configuration file includes an ID or a path of each piece of data to be processed; and processing the data to be processed, including: and searching a data source based on the ID or the path of the data, and obtaining a value corresponding to the ID or the path from the data source.
In this implementation manner, in the process of processing the data to be processed, part of the data in the data to be processed is used as a data source, the ID or the path of each piece of data is matched, the ID or the path is stored in the configuration file, when the data is processed, the data source is obtained according to the ID or the path, and then the value of the target processing field is obtained from the data source according to the preset relationship between the fields in the configuration file. Alternatively, this is achieved by a screening (fliter) process. And the value of the target field is a numerical value corresponding to the ID or the path of the data.
In one example, in FIG. 5, a loop _ with _ kill function is used to traverse the data set and store the current value of the specified field (i.e., the field name corresponding to the ID or path described above is stored to the storage area) in a data conversion step that includes the value of the target process field. Then, redefining the position of the data source by using a configuration meta block at a field needing a drill-down relationship, using a before _ mapping process to specify a new (get _ content function) or a cached (get _ data function) data source, and using a filter _ drill function to connect a drill-down field (namely, field 1 and field 3) of the data source (namely, a source table) and a corresponding field of the data source needing processing (namely, a drill-down table) in a flipper process to obtain a drill-down result set; and the filter _ kill function screens out data with the same field value every time the data are traversed, so that the acquisition of the drilling data value is realized.
In one example, in FIG. 6, the next level of data may be data in dataset 1, with a "value of 1.1" sifted out of "dataset 1" according to "dataset 1, value 102", sifted out of "VALUE1.1.2" from dataset 1; then, data mapping is performed on "field 1, field 2, and field 3" in "value 1.2", that is, a "field 3.1" obtained by "matching 1.2 ═ value 1.2" from "data set 2" after screening: matching with 1.2 "" field 3.2: data mapping is carried out on the data with the value of 2.1'; finally, a field: "constant", "field 2: "VALUE1.1.1" ", and" field 3: "field 3.1 ═ VALUE1.2.1, and field 3.2 ═ VALUE2.1.1" ".
It should be noted that, through the function cooperation processing in the function system, the to-be-processed data with the data format of JSON and XML can be processed, and the to-be-processed data can be expanded to include more to-be-processed objects such as a database (db) and a network interface (soap) in cooperation with an interface for acquiring the to-be-processed data.
In this implementation manner, the configuration file may include an ID or a path of each piece of data to be processed, and a drill-down operation is performed to obtain a value corresponding to the ID or the path.
In some optional implementation manners of this embodiment, acquiring a filtered data set according to data to be processed according to a filtering function in a function system includes: acquiring a corresponding data source according to the ID or the path of the data to be processed; and acquiring a screened data set from a drill-down data source according to a screening function corresponding to drill-down in the function system and according to a preset relation between fields in the configuration file.
In this implementation, when the data to be processed includes data of a next layer (i.e., data to be processed corresponding to drill-down), the execution main body may obtain a corresponding data source (i.e., a drill-down table) according to an ID or a path of the data to be processed; and then, acquiring a screened data set from a drill-down data source according to a screening function corresponding to drill-down in the function system and according to a preset relation between fields in the configuration file. The preset relationship between the fields may be a relationship between fields, for example, "field 1 ═ field 3" in fig. 5.
In this implementation manner, for the next layer of data in the data to be processed, the filtered data set with the data relationship as the preset relationship between the fields may be screened from the data source obtained from the ID or the path of the data to be processed according to the screening function corresponding to the drill-down in the function system.
In some alternative implementations of the present embodiment, the target format comprises JSON, XML, or YAML.
In this implementation, the configuration file is parsed into JSON, XML, or YAML, and then recognized by the machine, thereby processing the data to be processed.
In some optional implementations of this embodiment, the format of the configuration file is HOCON.
In the implementation mode, because the repeated quantity of the fields in the data processing to be processed is large, repeated configuration sections can be extracted based on the configuration file, so that the length of the configuration is reduced, and the data processing to be processed is easier to maintain and read.
In the implementation mode, the readability of the configuration file of the HOCON format is strong, and the HOCON is a JSON-like configuration format which has a JSON simple description format and is not limited to JSON strict syntax check. The debugging is very easy in the process of writing in the early period or in the later period. The difference between the configuration file of the HOCON format and other configuration files is shown in the following table:
Figure BDA0003127553630000121
Figure BDA0003127553630000131
in one example, the components of the HOCON also provide a tool for checking the syntax of the configuration file, the target configuration can be parsed by using the pyhocon tool, and the result of parsing conversion is returned, the HOCON supports parsing the configuration file into a result of parsing JSON, XML, YAML format (i.e., target format), and the like, and if the parsing is successful, the tool returns the parsing result to the user, and the user determines whether the written configuration file has syntax errors, so as to facilitate debugging and debugging.
In this implementation, the configuration file can be quickly switched to annotation during debugging without the restriction of other strict syntax. In addition, the HOCON also supports internal variable reference and external configuration reference, greatly reduces the cost for modifying the same configuration, and is more intuitive, flexible and easy to use.
With further reference to fig. 7, fig. 7 illustrates a flow 700 of one embodiment of a data processing method according to the present disclosure. The data processing method may include the steps of:
step 701, in response to receiving a data processing request, obtaining a configuration file corresponding to data to be processed, where the data processing request is used to instruct to process the data to be processed according to the configuration file.
And step 702, analyzing the configuration file to obtain an analysis result.
And 703, in response to the analysis result being the analysis result in the target format, processing the data to be processed according to a preset function system corresponding to the data processing measure of the analysis result in the target format.
In this embodiment, when the parsing result is the parsing result in the target format, an executing entity of the data processing method (for example, the server 105 shown in fig. 1) processes the data to be processed according to a preset function system corresponding to the data processing measure of the parsing result in the target format. The parsing result of the target format may be a parsing result in an XML, JSON, or YAML format.
In this embodiment, the specific operations of steps 701 and 702 have been described in detail in steps 201 and 202, respectively, in the embodiment shown in fig. 2, and are not described again here.
As can be seen from fig. 7, compared with the embodiment corresponding to fig. 2, the data processing method in this embodiment highlights the step of processing the data to be processed according to the preset function system. Therefore, in the solution described in this embodiment, when the analysis result is the analysis result in the target format, the data to be processed is processed according to the preset function system corresponding to the data processing measure of the analysis result in the target format. The data to be processed can be processed based on a preset function system in the analysis result.
With further reference to fig. 8, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a data processing apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.
As shown in fig. 8, the data processing apparatus 800 of the present embodiment may include: a data acquisition module 801, a result analysis module 802 and a data processing module 803. The data obtaining module 801 is configured to, in response to receiving a data processing request, obtain a configuration file corresponding to data to be processed, where the data processing request is used to instruct to process the data to be processed according to the configuration file; a result analysis module 802 configured to analyze the configuration file to obtain an analysis result; and the data processing module 803 is configured to process the data to be processed according to the data processing measure of the analysis result.
In the present embodiment, in the data processing apparatus 800: the detailed processing of the data obtaining module 801, the result analyzing module 802, and the data processing module 803 and the technical effects thereof can refer to the related descriptions of step 201 and step 203 in the corresponding embodiment of fig. 2, which are not repeated herein.
In some optional implementations of this embodiment, the data processing module 801 is further configured to: and responding to the analysis result in the target format, and processing the data to be processed according to the data processing measure of the analysis result in the target format.
In some optional implementations of this embodiment, the data processing module 801 is further configured to: and processing the data to be processed according to a preset function system corresponding to the data processing measures of the analysis result.
In some optional implementations of this embodiment, the data processing measure includes at least one of: data mapping, text structuring and data cleaning; the data mapping is used for mapping structured data in the data to be processed to a target field, the text structuring is used for extracting keywords and values of text data in the data to be processed to form structured data, and the data cleaning is used for cleaning non-standardized data in the data to be processed to obtain standardized data.
In some optional implementations of this embodiment, the data processing apparatus further includes: the data storage module is configured to extract the data to be processed at a preset data extraction position included in the configuration file according to a data extraction function in the function system; the data screening module is configured to obtain a screened data set according to the data to be processed according to a screening function in the function system; and the data traversing module is configured to map the data to be processed in the screened data set to the target field.
In some optional implementations of this embodiment, the data filtering module is further configured to: acquiring a corresponding data source according to the ID or the path of the data to be processed; and acquiring a screened data set from a data source according to a screening function corresponding to the drill-down in the function system and according to a preset relation between fields in the configuration file.
In some alternative implementations of the present embodiment, the target format is JSON, XML, or YAML.
In some optional implementations of this embodiment, the format of the configuration file is HOCON.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a data processing method. For example, in some embodiments, the data processing method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the data processing method described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the data processing method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
Artificial intelligence is the subject of studying computers to simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural voice processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions mentioned in this disclosure can be achieved, and are not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (19)

1. A method of data processing, comprising:
responding to a received data processing request, and acquiring a configuration file corresponding to data to be processed, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file;
analyzing the configuration file to obtain an analysis result;
and processing the data to be processed according to the data processing measure of the analysis result.
2. The method of claim 1, wherein the processing the data to be processed according to the data processing measure of the parsing result comprises:
and responding to the analysis result in the target format, and processing the data to be processed according to the data processing measure of the analysis result in the target format.
3. The method according to claim 1 or 2, wherein the processing the data to be processed according to the data processing measure of the analysis result comprises:
and processing the data to be processed according to a preset function system corresponding to the data processing measure of the analysis result.
4. The method of claim 3, wherein the data processing measures include at least one of:
data mapping, text structuring and data cleaning; the data mapping is used for mapping structured data in the data to be processed to a target field, the text structuring is used for extracting keywords and values of text data in the data to be processed to form structured data, and the data cleaning is used for cleaning non-standardized data in the data to be processed to obtain standardized data.
5. The method of claim 4, wherein the data mapping is determined based on:
extracting data to be processed at a preset data extraction position included in the configuration file according to a data extraction function in the function system;
acquiring a screened data set according to the data to be processed according to a screening function in the function system;
and mapping the data to be processed in the screened data set to a target field.
6. The method of claim 5, wherein the obtaining a filtered data set from the data to be processed according to a filtering function in the function system comprises:
acquiring a corresponding data source according to the ID or the path of the data to be processed;
and acquiring a screened data set from a drill-down data source according to a screening function corresponding to drill-down in the function system and according to a preset relation between fields in the configuration file.
7. The method of claim 2, wherein the target format is JSON, XML, or YAML.
8. The method according to claim 1 or 2, wherein the configuration file is in the format of HOCON.
9. A data processing apparatus comprising:
the data acquisition module is configured to respond to a received data processing request and acquire a configuration file corresponding to data to be processed, wherein the data processing request is used for indicating that the data to be processed is processed according to the configuration file;
the result analysis module is configured to analyze the configuration file to obtain an analysis result;
and the data processing module is configured to process the data to be processed according to the data processing measures of the analysis result.
10. The apparatus of claim 9, wherein the data processing module is further configured to:
and responding to the analysis result in the target format, and processing the data to be processed according to the data processing measure of the analysis result in the target format.
11. The apparatus of claim 9 or 10, wherein the data processing module is further configured to:
and processing the data to be processed according to a preset function system corresponding to the data processing measure of the analysis result.
12. The apparatus of claim 11, wherein the data processing measures comprise at least one of:
data mapping, text structuring and data cleaning; the data mapping is used for mapping structured data in the data to be processed to a target field, the text structuring is used for extracting keywords and values of text data in the data to be processed to form structured data, and the data cleaning is used for cleaning non-standardized data in the data to be processed to obtain standardized data.
13. The apparatus of claim 12, the apparatus further comprising:
the data storage module is configured to extract data to be processed at a preset data extraction position included in the configuration file according to a data extraction function in the function system;
the data screening module is configured to obtain a screened data set according to the data to be processed according to a screening function in the function system;
and the data traversing module is configured to map the data to be processed in the screened data set to the target field.
14. The apparatus of claim 13, wherein the data screening module is further configured to:
acquiring a corresponding data source according to the ID or the path of the data to be processed;
and acquiring a screened data set from a data source according to a screening function corresponding to the drill-down in the function system and according to a preset relation between fields in the configuration file.
15. The apparatus of claim 10, wherein the target format is JSON, XML, or YAML.
16. The apparatus according to claim 9 or 10, wherein the configuration file is in the format of HOCON.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.
18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.
19. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-8.
CN202110693569.0A 2021-06-22 2021-06-22 Data processing method, device, apparatus, medium and program product Active CN113360490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110693569.0A CN113360490B (en) 2021-06-22 2021-06-22 Data processing method, device, apparatus, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110693569.0A CN113360490B (en) 2021-06-22 2021-06-22 Data processing method, device, apparatus, medium and program product

Publications (2)

Publication Number Publication Date
CN113360490A true CN113360490A (en) 2021-09-07
CN113360490B CN113360490B (en) 2023-07-28

Family

ID=77535647

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110693569.0A Active CN113360490B (en) 2021-06-22 2021-06-22 Data processing method, device, apparatus, medium and program product

Country Status (1)

Country Link
CN (1) CN113360490B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579618A (en) * 2022-04-15 2022-06-03 中信百信银行股份有限公司 Configurable OCR recognition accuracy rate evaluation method and system, electronic device and readable storage medium
CN114613513A (en) * 2022-03-08 2022-06-10 医渡云(北京)技术有限公司 Data processing method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095449A (en) * 2015-07-27 2015-11-25 福州盈展信息技术有限公司 Method for converting HTML webpage to mobile terminal page
CN108388640A (en) * 2018-02-26 2018-08-10 北京环境特性研究所 A kind of data transfer device, device and data processing system
CN108509447A (en) * 2017-02-24 2018-09-07 北京国双科技有限公司 Data processing method and device
CN108664331A (en) * 2018-05-22 2018-10-16 腾讯大地通途(北京)科技有限公司 Distributed data processing method and device, electronic equipment, storage medium
US10789461B1 (en) * 2019-10-24 2020-09-29 Innovaccer Inc. Automated systems and methods for textual extraction of relevant data elements from an electronic clinical document
US20210064644A1 (en) * 2019-08-30 2021-03-04 Google Llc Yaml configuration modeling
CN112733199A (en) * 2020-12-28 2021-04-30 北京极豪科技有限公司 Data processing method and device, electronic equipment and readable storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095449A (en) * 2015-07-27 2015-11-25 福州盈展信息技术有限公司 Method for converting HTML webpage to mobile terminal page
CN108509447A (en) * 2017-02-24 2018-09-07 北京国双科技有限公司 Data processing method and device
CN108388640A (en) * 2018-02-26 2018-08-10 北京环境特性研究所 A kind of data transfer device, device and data processing system
CN108664331A (en) * 2018-05-22 2018-10-16 腾讯大地通途(北京)科技有限公司 Distributed data processing method and device, electronic equipment, storage medium
US20210064644A1 (en) * 2019-08-30 2021-03-04 Google Llc Yaml configuration modeling
US10789461B1 (en) * 2019-10-24 2020-09-29 Innovaccer Inc. Automated systems and methods for textual extraction of relevant data elements from an electronic clinical document
CN112733199A (en) * 2020-12-28 2021-04-30 北京极豪科技有限公司 Data processing method and device, electronic equipment and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁丽琴;郑少明;郑汉军;罗佳;: "利用大数据技术进行海量数据治理", 网络安全技术与应用, no. 11, pages 54 *
沈琦;陈博;: "基于大数据处理的ETL框架的研究与设计", 电子设计工程, no. 02, pages 31 - 33 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114613513A (en) * 2022-03-08 2022-06-10 医渡云(北京)技术有限公司 Data processing method and device, electronic equipment and storage medium
CN114579618A (en) * 2022-04-15 2022-06-03 中信百信银行股份有限公司 Configurable OCR recognition accuracy rate evaluation method and system, electronic device and readable storage medium

Also Published As

Publication number Publication date
CN113360490B (en) 2023-07-28

Similar Documents

Publication Publication Date Title
US10210240B2 (en) Systems and methods for code parsing and lineage detection
CA2684822C (en) Data transformation based on a technical design document
EP3671526B1 (en) Dependency graph based natural language processing
US20220092252A1 (en) Method for generating summary, electronic device and storage medium thereof
CN112966004B (en) Data query method, device, electronic equipment and computer readable medium
US20200356726A1 (en) Dependency graph based natural language processing
CN113656590B (en) Industry map construction method and device, electronic equipment and storage medium
CN113408299A (en) Training method, device, equipment and storage medium of semantic representation model
CN113360490B (en) Data processing method, device, apparatus, medium and program product
CN112711581A (en) Medical data verification method and device, electronic equipment and storage medium
US20220237376A1 (en) Method, apparatus, electronic device and storage medium for text classification
US11442930B2 (en) Method, apparatus, device and storage medium for data aggregation
CN114579104A (en) Data analysis scene generation method, device, equipment and storage medium
CN111221698A (en) Task data acquisition method and device
CN114861059A (en) Resource recommendation method and device, electronic equipment and storage medium
CN114064923A (en) Data processing method and device, electronic equipment and storage medium
US10223086B2 (en) Systems and methods for code parsing and lineage detection
CN113609100A (en) Data storage method, data query method, data storage device, data query device and electronic equipment
CN115186738B (en) Model training method, device and storage medium
CN111026916A (en) Text description conversion method and device, electronic equipment and storage medium
CN112860812B (en) Method and device for non-invasively determining data field level association relation in big data
CN114840507A (en) Data governance method and device, electronic equipment and storage medium
CN114168119B (en) Code file editing method, device, electronic equipment and storage medium
US20220129418A1 (en) Method for determining blood relationship of data, electronic device and storage medium
CN113138767B (en) Code language conversion method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant