CN112948637A - Rule-based data standardization system - Google Patents

Rule-based data standardization system Download PDF

Info

Publication number
CN112948637A
CN112948637A CN202110339296.XA CN202110339296A CN112948637A CN 112948637 A CN112948637 A CN 112948637A CN 202110339296 A CN202110339296 A CN 202110339296A CN 112948637 A CN112948637 A CN 112948637A
Authority
CN
China
Prior art keywords
data
rule
conversion
component
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110339296.XA
Other languages
Chinese (zh)
Inventor
严春利
杨波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sailing Information Technology Co ltd
Original Assignee
Shanghai Sailing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sailing Information Technology Co ltd filed Critical Shanghai Sailing Information Technology Co ltd
Priority to CN202110339296.XA priority Critical patent/CN112948637A/en
Publication of CN112948637A publication Critical patent/CN112948637A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a rule-based data standardization system, which comprises a rule management component, a conversion management component and an output management component, wherein the rule management component comprises a rule storage component, a conversion management component and a conversion management component; the rule management component defines a conversion rule of data and comprises a rule analysis component and a rule matching component; the conversion management component converts the source data into standardized data and comprises a data analysis component and a data conversion component; and the output management component carries out persistent output on the standardized data. The invention realizes the standardized processing of the data by configuring the corresponding rules and the data in different formats according to the rules, and achieves the aims of efficiently utilizing service resources, simplifying deployment implementation and reducing development and operation and maintenance costs.

Description

Rule-based data standardization system
Technical Field
The invention relates to the field of data standardization, in particular to a rule-based data standardization system.
Background
One very important system in the field of intelligent security is a data networking system. The construction of the data networking system is an important component of the overall engineering construction of city-level and district-county-level smart cities, the construction of the core technology platform of the city-level and district-county-level smart cities can be completed through the platform construction, and a solid technical foundation is laid for the subsequent application construction of various industries. The data networking system is used as a basic platform under the city-level and county-level smart city overall architecture, business data scattered at each department are processed, analyzed and mined to form a uniform, complete and ordered data asset system, and cross-industry, cross-department and cross-region comprehensive application and data sharing are realized through sharing exchange. In the data networking process, data are different, the analysis difficulty is increased, the application platform needs to be developed independently every time along with the increase of the supported data types, and the analyzed data cannot be reused. Not only the development is complex, but also the cost is not easy to be saved.
There are several common methods currently on the market for data standardization:
1. the requirements are customized.
The solution is customized to the user given the data to achieve the data availability goal.
2. Different data formats are supported by exposing code.
For example, languages such as C, nodjs, Java, and the like are put into the conversion platform, so that the enterprise can standardize data based on different formats.
The main disadvantage of the demand customization method is that the standardized demand of unknown data access cannot be met, and the method of supporting different data formats by exposing codes also has the problems of high operation and maintenance cost and difficult expansion.
Therefore, those skilled in the art are dedicated to develop a rule-based data standardization system and method, which can generate different rules according to the user standardization requirement, and new data access does not need to change the scheme, and only needs to add the rules.
Disclosure of Invention
In view of the above defects in the prior art, the technical problem to be solved by the present invention is to generate different rules according to the standardized requirements of users, and the new data access does not need to change the scheme, and only needs to add the rules.
In order to achieve the purpose, the invention provides a rule-based data standardization system, which comprises a rule management component, a conversion management component and an output management component, wherein the rule management component defines a conversion rule of data and comprises a rule analysis component and a rule matching component, the conversion management component converts source data into standardized data and comprises a data analysis component and a data conversion component, and the output management component carries out persistent output on the standardized data.
Further, the conversion rules are stored in an XML file.
Further, the rule parsing component loads the XML file into a memory according to a definition structure; the rule matching component provides the corresponding conversion rule for the conversion management component.
Further, the data analysis component analyzes the data in different formats according to the original structure; the conversion component finds the corresponding conversion rule in the rule management component through the data identifier, and converts the source data into the standardized data according to the conversion rule.
Further, the rule parsing component performs the steps of:
a step 01, reading the XML file from the configuration path;
a02, analyzing the XML file by an XML tool;
step a03, creating an element node (data field description information);
step a04, if it is a continuous start label (representing new element node), then a new model node (element management node) is needed to be created;
step a05, if the node is a Condition node (element node assignment adds a screening Condition node), taking out attribute assignment to the Condition storage structure Condition of the model node for the special node;
step a06, converting value into corresponding type according to type attribute value, and defaulting the type attribute value as string type;
a07, taking out attribute assignment to the model node conditional storage structure Condition;
step a08, modifying a start mark and an end mark by special processing, so that the next node and the Condition node are under the same father node;
step a09, if the element storage in the model node is empty, manually creating a storage space;
step a10, placing the newly-built element node into the model node map;
step a11, newly building a father model node of the element node as a current model;
step a12, if it is a continuous end label, it needs to go back up to the father model node once;
step a13, finishing the structuring of the label;
step a14, storing the structured rule into the ID corresponding hash table.
Further, the rule matching component performs the steps of:
step b1, calculating a rule ID according to the data definition model;
b2, searching data corresponding to the conversion rule according to the rule ID;
step b3, returning the matching result with the structured pointer.
Further, the data parsing component performs the steps of:
step c1, receiving the source data structure to a memory;
step c2, acquiring the identification information of the source data header;
step c3, obtaining the rule ID by using the identification information generation rule;
and c4, acquiring the conversion rule corresponding to the data according to the rule ID.
Further, the conversion component performs the steps of:
step d1, acquiring key of the conversion rule;
d2, searching whether the data has a corresponding field according to the key value;
d3, finding the corresponding field, and judging the next operation according to the data format;
step d4, if the type is array, circulating all elements in the array;
step d5, if the object is, further splitting the object;
step d6, if the element is a single element, taking out the data value according to the conversion destination field corresponding to the element key and putting the data value into a cache;
step d7, circularly and recursively executing d5, d6, d7 and d8 until all fields in the source data, and finding corresponding conversion results according to the conversion rules;
and d8, outputting the conversion result.
Further, the output management component assembles and persists the standardized structure data into a json format.
Further, the output management component performs the steps of:
step e1, acquiring an internal standard data structure according to the data type;
step e2, acquiring the data type according to the field name;
step e3, converting the external field type into a standardized type according to the data type;
step e4, executing e2 and e3 circularly until all data are converted into the standardized type;
step e5, packaging the standardized data into json unified output;
and e6, persisting according to the output requirement of the user.
The invention can generate different rules according to the user standardization requirement, and the new data access does not need to change the scheme and only needs to increase the rules. The invention can also support different data standardization by configuring the xml file, and is simple and easy to understand.
The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.
Drawings
FIG. 1 is a logic diagram of data interaction in accordance with a preferred embodiment of the present invention;
FIG. 2 is a data processing flow diagram of a preferred embodiment of the present invention;
FIG. 3 is a diagram of the rule file logic structure of a preferred embodiment of the present invention.
Detailed Description
The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components may be exaggerated where appropriate in the figures to improve clarity.
As shown in fig. 1, the data interaction logic diagram of a preferred embodiment of the present invention includes several modules of a data standardization component, namely an input data module, a data model module, a rule management module, a transformation data module, a packed data module and an output data module. Firstly, source data are input from an input data module, then a conversion rule is determined through a data model module, the input source data are converted in the conversion data module by utilizing the conversion rule, next, the data are packaged into uniform json standardized data in a packaging data module, and finally, the data are persistently output in an output data module, and a rule management module is mainly used for managing the rules according to different requirements of users.
The following is an example of data to be converted in json format:
Figure BDA0002998885420000041
Figure BDA0002998885420000051
Figure BDA0002998885420000061
the data is input source data, and after conversion, standardized data in json format is output, and the data is as follows:
Figure BDA0002998885420000062
when converting, it needs to refer to the specified conversion rule, the conversion rule file is in XML format, and the contents are as follows:
Figure BDA0002998885420000063
Figure BDA0002998885420000071
the logical structure of the rule file is shown in fig. 3.
The specific conversion steps are shown in fig. 2, and include the following steps:
step 1, initializing a conversion rule;
step 2, if the initialization rule is successful, continuing the following steps, otherwise, ending;
step 3, judging whether data exist, if so, continuing the following steps, otherwise, ending;
step 4, acquiring data;
step 5, analyzing data;
step 6, data conversion is carried out;
step 7, data packaging is carried out;
step 8, outputting data;
and 9, judging whether the operation is finished or not, and if not, turning to the step 3.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims (10)

1. A rule-based data normalization system includes a rule management component, a transformation management component, and an output management component; the rule management component defines a conversion rule of data and comprises a rule analysis component and a rule matching component; the conversion management component converts the source data into standardized data and comprises a data analysis component and a data conversion component; and the output management component carries out persistent output on the standardized data.
2. The rule-based data normalization system of claim 1, wherein the transformation rules are stored in an XML file.
3. The rule-based data normalization system of claim 2, wherein the rule parsing component loads the XML file into memory; the rule matching component provides the corresponding conversion rule for the conversion management component.
4. The rule-based data normalization system of claim 1, wherein the data parsing component parses data of different formats from an original structure; the conversion component finds the corresponding conversion rule in the rule management component through the data identifier, and converts the source data into the standardized data according to the conversion rule.
5. The rule-based data normalization system of claim 3, wherein the rule parsing component performs the steps of:
a step 01, reading the XML file from the configuration path;
a02, analyzing the XML file by an XML tool;
step a03, newly building an element node;
step a04, if the label is a continuous start label, a model node needs to be newly built;
step a05, if the node is a Condition node, not creating the element node for a special node, and taking out attribute assignment to the Condition storage structure Condition of the model node;
step a06, converting value into corresponding type according to type attribute value, and defaulting the type attribute value as string type;
a07, taking out attribute assignment to the model node conditional storage structure Condition;
step a08, modifying a start mark and an end mark by special processing, so that the next node and the Condition node are under the same father node;
step a09, if the element storage in the model node is empty, manually creating a storage space;
step a10, placing the newly-built element node into the model node map;
step a11, newly building a father model node of the element node as a current model;
step a12, if it is a continuous end label, it needs to go back up to the father model node once;
step a13, finishing the structuring of the label;
step a14, storing the structured rule into the ID corresponding hash table.
6. The rule-based data normalization system of claim 3, wherein the rule matching component performs the steps of:
step b1, calculating a rule ID according to the data definition model;
b2, searching the conversion rule corresponding to the data according to the rule ID;
step b3, returning the matching result with the structured pointer.
7. The rule-based data normalization system of claim 6, wherein the data parsing component performs the steps of:
step c1, receiving the source data structure to a memory;
step c2, acquiring the identification information of the source data header;
step c3, obtaining the rule ID by using the identification information generation rule;
and c4, acquiring the conversion rule corresponding to the data according to the rule ID.
8. The rule-based data normalization system of claim 4, wherein the transformation component performs the steps of:
step d1, acquiring key of the conversion rule;
d2, searching whether the data has a corresponding field according to the key value;
d3, finding the corresponding field, and judging the next operation according to the data format;
step d4, if the type is array type, circulating all elements in the array;
step d5, if the object is, further splitting the object;
step d6, if the element is a single element, taking out the data value according to the conversion destination field corresponding to the element key and putting the data value into a cache;
step d7, circularly and recursively executing d5, d6, d7 and d8 until all fields in the source data, and finding corresponding conversion results according to the conversion rules;
and d8, outputting the conversion result.
9. The rule-based data normalization system of claim 1, wherein the output management component assembles and persists the normalized structure data into a json format.
10. The rule-based data normalization system of claim 4, wherein the output management component performs the steps of:
step e1, acquiring an internal standard data structure according to the data type;
step e2, acquiring the data type according to the field name;
step e3, converting the external field type into a standardized type according to the data type;
step e4, executing e2 and e3 circularly until all data are converted into the standardized type;
step e5, packaging the standardized data into json unified output;
and e6, persisting according to the output requirement of the user.
CN202110339296.XA 2021-03-30 2021-03-30 Rule-based data standardization system Pending CN112948637A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110339296.XA CN112948637A (en) 2021-03-30 2021-03-30 Rule-based data standardization system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110339296.XA CN112948637A (en) 2021-03-30 2021-03-30 Rule-based data standardization system

Publications (1)

Publication Number Publication Date
CN112948637A true CN112948637A (en) 2021-06-11

Family

ID=76230502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110339296.XA Pending CN112948637A (en) 2021-03-30 2021-03-30 Rule-based data standardization system

Country Status (1)

Country Link
CN (1) CN112948637A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016705A (en) * 2022-12-29 2023-04-25 浙江瑞瀛物联科技有限公司 ZigBee communication data conversion method and device, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070015782A (en) * 2005-08-01 2007-02-06 인터컴 소프트웨어(주) System of Transforming Heterogeneous Log to Standard Form
CN104391730A (en) * 2014-08-03 2015-03-04 浙江网新恒天软件有限公司 Software source code language translation system and method
CN110347879A (en) * 2019-07-12 2019-10-18 上海熙菱信息技术有限公司 A kind of rule-based data normalization method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20070015782A (en) * 2005-08-01 2007-02-06 인터컴 소프트웨어(주) System of Transforming Heterogeneous Log to Standard Form
CN104391730A (en) * 2014-08-03 2015-03-04 浙江网新恒天软件有限公司 Software source code language translation system and method
CN110347879A (en) * 2019-07-12 2019-10-18 上海熙菱信息技术有限公司 A kind of rule-based data normalization method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116016705A (en) * 2022-12-29 2023-04-25 浙江瑞瀛物联科技有限公司 ZigBee communication data conversion method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN106202207B (en) HBase-ORM-based indexing and retrieval system
CN112949276A (en) Report generation method and device, electronic equipment and storage medium
CN107168695B (en) Excel data analysis method and system
CN110020358B (en) Method and device for generating dynamic page
CN110955714B (en) Method and device for converting unstructured text into structured text
CN111103635A (en) Meteorological data processing method, system, electronic equipment and storage medium
US20190171425A1 (en) Method and system to provide a generalized framework for dynamic creation of module analytic applications
CN109189395B (en) Data analysis method and device
CN114595199B (en) File analysis method and device, computer equipment and storage medium
CN112948637A (en) Rule-based data standardization system
CN114490641A (en) Industrial Internet data sharing method, equipment and medium
CN113704269B (en) Data processing method, system, storage medium and electronic equipment
CN114692532A (en) Chip system integration method and device and computer readable storage medium
CN117271478A (en) Data migration method and device, storage medium and electronic equipment
CN113918158A (en) Method, device and computer readable medium for automatic serialization of dictionary into service value
CN110764769B (en) Method and device for processing user request
CN107025233B (en) Data feature processing method and device
CN114064044A (en) Commodity information access control method and device, equipment, medium and product thereof
CN114490651A (en) Data storage method and device
CN114282895A (en) Data processing method and device, electronic equipment and storage medium
CN114064685A (en) Data standardized access method and device, equipment, medium and product thereof
CN111580799A (en) Domain specific language script assembling method and system
CN111930441A (en) Consul-based configuration file management system and method
CN115168365B (en) Data storage method and device, electronic equipment and storage medium
CN116011406A (en) Data extraction method and device, processor and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination