CN103198138A - Large-scale hot continuous rolling data scheme customizing system based on cloud computing - Google Patents

Large-scale hot continuous rolling data scheme customizing system based on cloud computing Download PDF

Info

Publication number
CN103198138A
CN103198138A CN2013101304423A CN201310130442A CN103198138A CN 103198138 A CN103198138 A CN 103198138A CN 2013101304423 A CN2013101304423 A CN 2013101304423A CN 201310130442 A CN201310130442 A CN 201310130442A CN 103198138 A CN103198138 A CN 103198138A
Authority
CN
China
Prior art keywords
data
theme
item
module
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101304423A
Other languages
Chinese (zh)
Inventor
邹丽晖
张德政
华镇
阿孜古丽
孙义
谢永红
刘宏岚
杜鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Science and Technology Beijing USTB
Original Assignee
University of Science and Technology Beijing USTB
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology Beijing USTB filed Critical University of Science and Technology Beijing USTB
Priority to CN2013101304423A priority Critical patent/CN103198138A/en
Publication of CN103198138A publication Critical patent/CN103198138A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a large-scale hot continuous rolling data theme customizing system based on cloud computing. The large-scale hot continuous rolling data theme customizing system comprises an ETL (electron transport layer) module, a data persistence layer and a theme customizing layer. The ETL module is used for realizing original hot continuous rolling system data analysis, a cloud data warehouse data sheet building and a data extraction; the data persistence layer organizes and stores structuralized data extracted from the ETL module by utilizing the cloud data warehouse; and the theme customizing module, aiming at solving the problem of data theme analysis in the cloud data warehouse, provides users with reasonable customizing schemes through a theme pool and an experience pool, and provides users with a general data analytical function through a public data mining method pool and a MapReduce module container, so that working efficiency of data management staff can be improved. The theme customizing system has flexible expandability, can be integrated in any original hot continuous rolling system data system, can solve a complicated data theme clarifying problem when a user's requirement is unclear, and can provide a reliable data set guarantee to the hot continuous rolling system data analysis.

Description

A kind of extensive hot continuous rolling data theme custom-built system based on cloud computing
Technical field
The present invention relates to large-scale data processing technology field in the iron and steel metallurgical industry, relate in particular to hot continuous rolling preprocessing in data mining field.
Background technology
In the daily production run of hot strip rolling production line, produced the real time data of magnanimity, containing abundant scientific research in these data and be worth.For a long time, because not enough to the attention degree of mass data, mismanagement causes long-time scattered the depositing of data, effectively do not utilized, and on the angle of data mining, be a kind of very big waste.This has influenced the development of hot continuous rolling process also to a certain extent greatly.
Along with development of computer, all hot-strip factories have basically all realized the management of electronic information at present.But these only are some storages, the statistics to available data and show, as the direct displaying of temperature, thickness, plate shape and some parameters.In technological requirement more and more higher today, be difficult to reach progress on the strip quality by this direct displaying.Therefore, the hot rolling data are carried out further exploration, excavate contact and the rule of its inherence, ever more important just seems.
The pretreated pattern of original data mining is to decide earlier theme, stipulate that by theme some tables of data that need make up the cloud data warehouse of corresponding theme, and the cloud data warehouse extracts the needed data of theme according to theme selected part data table related from database of correspondence.Yet hot-rolled steel original system complex manufacturing technology, data type is heavy, and there is not a good design structure of present technology, add of the remote past, it is that the mode of the information that extracts of the structure of definition lasting data layer earlier can't satisfy the design under the unknown demand that traditional database is built table organization's decimation pattern, and in the face of the mass data collection, the storage of database, expansion and analysis ability are also very limited.In addition, because hot-rolled steel system real time data data type complexity, add that the professional person also can't be exhaustive to system and domain knowledge institute, be difficult to propose definite demand at the system reform, this makes traditional " collaboration application program development pattern ", namely by information technologist and business department's collaborative work content, on the basis of sorting out, the motif area that identifies different pieces of information becomes very difficult thing.
Summary of the invention
Technical matters to be solved by this invention is the cloud data warehouse that can be used for analysis mining for one of original hot-rolled steel system constructing, and provide a prolongable theme customization function, be used for the complex data collection under the unknown demand condition is carried out theme customization flexibly, so that further data are excavated and analyzed.
The present invention's first purpose is to propose a kind of hot continuous rolling data theme custom-built system based on cloud computing, it is characterized in that described system comprises the ETL(information extraction) module, lasting data layer module and theme customized module;
The ETL(information extraction) module, be used for resolving the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data is formatd;
Lasting data layer module, lasting data layer module are used for making up data dictionary and tables of data according to the described data dictionary file that receives from the ETL module and gauge outfit file for the cloud data warehouse, and the cloud data warehouse is gone in the collection text data merger after regularly will formaing;
The theme customized module carries out the theme customization based on the cloud data warehouse.
Preferably, the ETL module comprises:
The data structure resolution unit is used for resolving the hot continuous rolling system data structure and generates data dictionary file and gauge outfit file;
Structuring template base generation unit is used for the gauge outfit file that data structure elucidation unit generates is formatd the masterplate file of generating structure template base;
The text data formatting unit is used for that the masterplate file in the structuring template base regularly is loaded into data and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
Preferably, the theme customized module comprises:
Theme library inquiry unit is used for according to keyword query theme storehouse, determines whether the theme storehouse exists the required theme item of user;
Experience storehouse recommendation unit is used for providing the attribute selection of the data dictionary of tables of data when there is not required theme item in the theme storehouse, and with user-selected attribute as required theme item attribute, and in the experience storehouse, obtain the proposed topic item based on user-selected attribute;
Theme storehouse registering unit is used for when there is required theme item in described proposed topic item, required theme item is registered become owner of exam pool; When not having required theme item in the described proposed topic item, accept user-defined new theme item, and exam pool is become owner of in described new theme item registration;
Communication unit when service data, is used for sending to the cloud data warehouse request of data of theme item.
Another purpose of the present invention is to propose a kind of hot continuous rolling data theme method for customizing based on cloud computing, it is characterized in that this method for customizing may further comprise the steps:
Step 1, ETL module are resolved the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, and data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data are formatd;
Step 2, lasting data layer module are cloud data warehouse establishment data dictionary and tables of data according to the data dictionary file and the gauge outfit file that receive, and the collection text data after regularly merger ETL module formats;
Step 3, the theme customized module carries out the theme customization based on the cloud data warehouse.
Preferably, step 1 specifically may further comprise the steps:
Step 1.1, ETL module parses hot continuous rolling system data structure generates data dictionary file and gauge outfit file;
Step 1.2, ETL module his-and-hers watches header file formats the masterplate file of generating structure template base;
Step 1.3, the ETL module regularly is loaded into data with the masterplate file in the structuring template base and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
Preferably, step 3 specifically may further comprise the steps:
Step 3.1, theme customized module determine according to keyword query theme storehouse whether the theme storehouse exists the required theme item of user;
Step 3.2, when not having required theme item in the theme storehouse, the theme customized module provides the attribute of the data dictionary of tables of data to select, and receives the user attribute in the data dictionary is selected, and obtain the proposed topic item based on user-selected attribute in the experience storehouse;
Step 3.3 when having required theme item in the proposed topic item of experience storehouse, is become owner of exam pool with required theme item registration; When not having required theme item in the proposed topic item of experience storehouse, user-defined new theme item is accepted in the experience storehouse, and exam pool is become owner of in described new theme item registration;
Step 3.4, during service data, the theme storehouse sends the request of data of theme item to the cloud data warehouse.
Preferably, the mode of the new theme item of User Defined is in the step 3.3: obtain maximum coupling theme item from the proposed topic item of experience storehouse, revise the attribute of maximum coupling theme item, form new theme item.
The invention has the advantages that, it has used and has been different from the pretreated normal processes of former data mining, utilize the data structure of original system, from the data pick-up process, dynamically generate the cloud data warehouse, the large-scale parallel characteristics of recycling cloud data warehouse dynamically generate the data motif area and oppositely realize the data preprocessing process, have demonstrated fully expansibility and the dirigibility of system.And system has the advantages that to allow the definition of user's freedom and flexibility ground and expand the theme item, greatly convenient under unknown demand according to the self-defined theme item of actual conditions, and data mining and the analysis that can expand other professional themes based on this system, this mode is to a perfect set transforming process by an incomplete collection, and based on cloud computing storage can free extendability, also for this expansion theme collection provides great advantage, allow the rule that the user can the how potential data of better utilization system discovery inside close to unlimited memory space.
The present invention can effectively resolve certainly to the hot-rolled steel system of complexity, and whole set of data is made up the cloud data warehouse as data resource, has significantly reduced professional person and the incoordination of program development personage in the processing demands process; Hot-rolled steel system in particular for unknown demand provides the customization function of theme freely, makes system more flexible multi-purpose, and theme customization easily provides bigger Data Control space also for the hot-rolled steel field, and the easier rule of finding from data instructs and produces.
Description of drawings
The structural representation of Fig. 1 hot continuous rolling data of the present invention theme custom-built system.
The flow chart of data processing of ETL module in Fig. 2 hot continuous rolling data of the present invention theme custom-built system.
The partial data structure tree of Fig. 3 hot continuous rolling data of the present invention theme custom-built system.
Dictionary file and the gauge outfit file of Fig. 4 hot continuous rolling data of the present invention theme custom-built system lasting data layer module construction.
The cloud data warehouse model of Fig. 5 hot continuous rolling data of the present invention theme custom-built system lasting data layer module.
The theme the expanded customized module of Fig. 6 hot continuous rolling data of the present invention theme custom-built system and the process flow diagram of other module interactive operations.
Fig. 7 hot continuous rolling data of the present invention theme custom-built system theme item customization instance graph.
Fig. 8 hot continuous rolling data of the present invention theme custom-built system is the theme item customization instance graph of coupling not exclusively.
Embodiment
The invention provides a kind of system that is structured on the cloud computing basis, is intended to handle the complicated hot continuous rolling data set theme customization under the unknown demand.
As shown in Figure 1, hot continuous rolling data set theme custom-built system comprises: the ETL(information extraction) module, lasting data layer module and can expand the theme customized module, finish from the data preprocessing process of data acquisition, data parsing, data loading, theme customization.System passes through the original hot continuous rolling of ETL module parses system, and according to the data structure that parses dynamic construction tables of data in lasting data layer module; Real-time image data on the hot-rolled steel production line and historical data can be drawn into ETL module temporary folder; Regularly the aggregation of data of ETL module collection is gone into lasting data layer module every day in the table in the cloud data warehouse; Can expand the theme customized module and be by the data set customization theme of experience storehouse to having built, and be that common data method for digging storehouse provides support by its MapReduce template container, rule of thumb excavate theme for the data set custom analysis that has had by the user.
Wherein, the ETL module has automatic resolution system data structure, makes up cloud data warehouse list structure, and text data three partial functions of timing architecture collection constitute.
The ETL module is according to system's self structure, by resolving hot-rolled steel original system header file, and the structure tree of generation system data, wherein the structure tree node comprises that field name and field explain.For example when the hot-rolled steel system header file (suffix is the file of .h) of C language development is resolved, the structure content that among the header file .h with struct is keyword is taken out, with the structure name as top mode, the structure content is as time node layer, and iteration can be built into the structure tree of system data in this way.
After obtaining structure tree, whole data-structure tree of ETL module recurrence traversal splits node item in the tree, generates the gauge outfit file that is used for making up cloud data warehouse data dictionary file and is used for the storage data.Data dictionary file and gauge outfit file that the ETL module is finished parsing send to lasting data layer module, and lasting data layer module generates data dictionary and tables of data according to these 2 files for the cloud data warehouse.
The ETL module formats the masterplate file that becomes the structuring template base with the gauge outfit file, conveniently image data is deposited in the cloud data warehouse of lasting data layer; The ETL module regularly is loaded into data with the masterplate file in the structuring template base and resolves template base, and based on template file the text data in production line and the historical data is formatd, in order to the text data of format is drawn in the cloud data warehouse of lasting data layer module.
Lasting data layer module is mainly by forming based on the cloud data warehouse of cloud storage.The cloud data warehouse is used for the data of the former rolling system of structured storage, and its list structure and data dictionary constitute middle generation by the ETL module at resolution system, and it mainly is that the data of regularly merger ETL module parses are in the cloud data warehouse and storage.The cloud data warehouse design framework of native system is on distributed file system (HDFS) Hadoop of Hadoop cloud computing model Distributed File System, it utilizes the multinode distributed nature storage data resource of HDFS, thereby has solved the parallelization of data processing and the dynamic expansion problem of memory capacity simultaneously.In the cloud data warehouse, the tables of data that builds is by the statement operation (being similar to general sql statement) of cloud data warehouse, to format good collection text data from the ETL module directly loads the cloud Data Warehouse table, mapping mode one by one by the position between the data dictionary of tables of data and data item, when service data, service data is come in the position of the position mapping (enum) data by data dictionary, and this is called as " pattern when reading " of data manipulation.The mode of operation that is similar to traditional database is provided for cloud data warehouse user by this pattern, brings bigger facility to their exploitation.Data dictionary in the cloud data warehouse is the main foundation of customization theme.It is the main contact between theme item attribute and the tables of data, and theme item attribute has access to data in the tables of data by data dictionary, thus the statement operation that data are correlated with.
The theme customized module be can expand and theme storehouse, experience storehouse, MapReduce template container and common data method for digging storehouse comprised, the theme customized module mainly is on the cloud data warehouse that builds, experience and knowledge according to user oneself, the guiding in the explanation of reference data dictionary and experience storehouse specifies some users to need data item to make up theme, because the cloud data warehouse is the storage to the total system data, the theme structure is exactly a dynamic partition process to the system architecture table.The theme storehouse is the good filing theme collection with a plurality of themes (as: quality theme, parameter theme etc.) of customization, each theme comprises some theme items, comprise related datas such as table name, row name and subject, theme is a table section (this zone can be overlapping) in fact, this is the zone of dividing for the cloud computing parallel parsing, and the theme customization also is a process that the zone is divided, and it represents the size of data that this theme is controlled.Motif area also is in charge of the parallelization of data set, and MapReduce template container provides parallelization support design for data mining public method storehouse.It can excavate and analyze a plurality of themes a parallelization process.The theme collection that filing is good is the experience storehouse, be to deposit in the database with the form of tables of data, the experience storehouse is the data digging system by more existing association areas, inherit the good fairly perfect theme storehouse of filing that they have customized, come to lead for the unknown demand of native system, adopt the algorithm of maximum matching degree to come to offer help to the user from the experience storehouse as far as possible.The experience storehouse need dispose relevant field synonym vocabulary and mate semantic error between the different system, and the error between the system is reduced to minimum.It is as follows specifically can to expand theme customization flow process: whether the user understands according to inquiry theme storehouse has own desired theme item to be present in the theme storehouse; If do not have, user applies is checked the data dictionary of the tables of data in the cloud data warehouse, when the user selectes the data item of a tables of data, system can provide the related subject item of this attribute that exists in the experience storehouse automatically and recommend, do not allow the user blindly select, the user selects to use or do not use the proposed topic item by actual demand then; When the demand of customization does not exist in the experience storehouse, user-defined theme item can be registered into the experience storehouse, a process as self study, and the experience storehouse is the attribute list in statistics theme storehouse regularly, the attribute that the frequency of occurrences is higher is as responsive attribute, and they recommend the user as the determinant attribute that the user begins to customize theme; Exam pool is become owner of in new theme item registration, and marks off this needed zone of theme item and scope in dictionary; After motif area was delimited, MapReduce template container can provide the parallel algorithm support of MapReduce for common data mining analysis method; It is a parallelization integrator, can disposablely handle when adopting with a kind of mining algorithm to a plurality of theme items of same data set, improves speed and efficient that data analysis is excavated greatly.Common data method for digging storehouse is the public method with some data minings of MapReduce realization, as: correlation rule, neural network, genetic algorithm and traditional decision-tree etc., dynamic load is to MapReduce template container during use, significantly reduced data mining personnel's workload, simultaneously, utilize MapReduce template container, the API that the data management personnel can easily use container to provide writes the program of some subject analyses, agree with system more easily, the parallel mining efficiency of performance.
Based on above-mentioned hot continuous rolling data set theme custom-built system, the storehouse is gathered, classifies, built to the data of hot continuous rolling industrial circle complex data collection, and then reach effective theme customization, wherein the concrete grammar flow process is as follows:
Step 1, ETL module are resolved the hot-rolled steel system data structure, generate data dictionary file and gauge outfit file, and data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data are formatd.
The flow process of ETL module is shown in Fig. 2:
Step 1.1, ETL module parses hot-rolled steel system data structure generates data dictionary file and gauge outfit file.
This step is initialization step, ETL module analysis hot-rolled steel original system header file, generate the structure tree (the structure tree node comprises field name and field explanation) of hot-rolled steel original system data, whole data-structure tree of ETL module recurrence traversal, split node item in the tree, generate the gauge outfit file that is used for making up cloud data warehouse data dictionary file and is used for the storage data; Afterwards, data dictionary file and gauge outfit file that the ETL module is finished parsing send to lasting data layer module, so that lasting data layer module generates data dictionary and tables of data according to above-mentioned 2 files for the cloud data warehouse.
The said structure tree is the multilayer tree structure that is made of the multilayer node, complexity in view of structure tree, minor structure tree with one of them branch " rolling line data " is example, as shown in Figure 3, the implication that top mode is represented is the name of rolling line data structure body, equally also is the name that the cloud data warehouse makes up the rolling line tables of data; Second layer node is represented the attribute that the rolling line data comprise, and they are as the field in the rolling line tables of data equally.The 3rd node layer is similar to the relation of second layer node and ground floor node, and they are the explanations to second layer node, equally also is the field that makes up second layer node table.
After structure tree generated, the ETL module split node item in the tree with whole data-structure tree of mode recurrence traversal of range traversal, generates data dictionary file and gauge outfit file, is used for making up cloud data warehouse gauge outfit." rolling line data " minor structure tree with Fig. 3 is example, with node " rolling line data " as big label "<rolling line data〉", traverse its child node then, be respectively " coil of strip attribute ", " roughing presets result data " ..., etc., with them as the subtab of "<rolling line data〉" among the xml "<1〉coil of strip attribute</1〉", "<2〉roughing preset result data</2〉", ..., subsequent node is finished the generation of data dictionary file and gauge outfit file by that analogy.
The sample of data dictionary file and gauge outfit file generates with following XML file.
Data dictionary file sample:
<?xml?version= "1.0"?>
<?xml-stylesheet?type= "text/xsl"?href= "configuration.xsl"?>
<rolling line parameter 〉
<1〉steel reel number</1 〉
<2〉steel grade</2 〉
<3〉slab number</3 〉
<4〉material code</4 〉
</rolling line parameter 〉
<roughing parameter 〉
</roughing parameter 〉.
 
Gauge outfit file sample:
<?xml?version= "1.0"?>
<?xml-stylesheet?type= "text/xsl"?href= "configuration.xsl"?>
<rolling line parameter 〉
<1>p.mill.pri.MatId</1>
<2>p.mill.pri.SteelGrade</2>
<3>p.mill.pri.SlabNo</3>
</rolling line parameter 〉
<roughing parameter 〉
</roughing parameter 〉
…。
 
After the initialization of the analysis of having finished original rolling system and production line hot continuous rolling data set theme custom-built system, the text data of gathering on the production line need be write in the cloud data warehouse, for the ease of writing of image data, need gather the format of text data.
Step 1.2, ETL module his-and-hers watches header file formats the template file of generating structure template base.
Deposit in for the ease of image data in the cloud data warehouse of lasting data layer.The ETL module is according to the set type template file of the structuring template base that generates corresponding format of the gauge outfit item mode of the dictionary item of above-mentioned data dictionary and gauge outfit file, the template base form is with above-mentioned XML form, with the binary file of the non-structured file of extracting of numeral position or collection.
The code format of using when the masterplate file of structuring template base is the extraction document data, it is used for gathering text data (comprising the text data in production line and the historical data) and is formatted into the file that the cloud data warehouse extracts order format, so that the cloud data warehouse deposits data in corresponding rational position.
Step 1.3, the ETL module regularly is loaded into data according to the masterplate file in the structuring template base and resolves template base, will gather text data (comprising the text data in production line and the historical data) format, and send to lasting data layer module.
After forming the structuring template base, the data in the ETL module are resolved template base and can regularly be formatd image data, and concrete steps are: at first, data are resolved template base in real time or are regularly obtained the text data of image data and set of data samples; Secondly, at official hour, data are resolved in the template base and are loaded into the structuring template base; At last, data parsing template base formats according to the text data of the structuring template base that loads to the image data set of data samples.
Step 2, lasting data layer module make up the cloud data warehouse and create data dictionary and tables of data according to the data dictionary file and the gauge outfit file that receive, the collection text data after the cloud data warehouse timing merger ETL module format in the lasting data layer module.
Lasting data layer module obtained data dictionary file and gauge outfit file from the ETL module, and be that the cloud data warehouse creates data dictionary and tables of data, be that example is specially with " rolling line data " in the step 1.1: the data dictionary table of rolling line data will "<1〉steel reel number</1〉" as first field, "<2〉steel grade</2〉" as second field, "<3〉slab numbers</3〉" as the 3rd field by that analogy; In the tables of data relevant with this numeral dictionary then corresponding "<1〉p.mill.pri.MatId</1〉" as first field, "<2〉p.mill.pri.SteelGrade</2〉" as second field, "<3〉p.mill.pri.SlabNo</3〉" as the 3rd field, make the table of their each self-generatings in the cloud data warehouse, form the field location one-to-one relationship, as shown in Figure 4.
Cloud data warehouse model such as Fig. 5, it is built on the distributed file system HDFS (Hadoop Distributed File System) of hadoop cloud computing model, has been realized the method for operating of cloud data warehouse by MapReduce.
In the cloud data warehouse, the tables of data that builds is by the statement operation (being similar to general sql statement) of cloud data warehouse, to format good collection text data from the ETL module directly loads the cloud Data Warehouse table, in tables of data, pass through the mapping mode one by one of position between data dictionary and the data item, when service data, service data is come in the position of the position mapping (enum) data by dictionary, is called " pattern when reading " of data manipulation.
Be the example of a tables of data below, table 1 be store in the cloud data warehouse obtain data from the ETL module, they are non-structured data of a kind of streaming, customization all can't be operated by such data for data analysis and theme.The cloud data warehouse is the data model of setting up by the data mapping, it is to read mapping relations according to a kind of position of the modelling of database, with data file, gauge outfit file and data dictionary file are mapped to an integral body by the position, come the service data parallel access by the mode of operation sql statement (off-gauge sql statement) that is similar to traditional database then, table 2 is the data after resolving, it is that the partial data that inquires from the cloud data warehouse by cloud Data Warehouse query manipulation (select * from product) is capable, so as after data analysis and theme customized treatment.
The data of storing in the cloud data warehouse:
Table 1
File 1:H101007780 Q345B 01A95921D228 P02 0 190 1,500 11000 ....
File 2:H101000020 Q235B 02B00201E011 P01 1 190 1,500 8111
File 3:H101000030 Q235B 01A93722D011 P01 1 190 1,500 8383
…。
The data that inquire:
Table 2
The steel reel number Steel grade Slab number The material code Cold and hot dress sign Slab thickness Width of plate slab Slab length
H101007780 Q345B 01A95921D228 P02 0 190 1500 11000
H101000020 Q235B 02B00201E011 P01 1 190 1500 8111
H101000030 Q235B 01A93722D011 P01 1 190 1500 8383
Step 3, the theme customized module is according to carry out the theme customization for the cloud data warehouse that builds.
The theme storehouse is the filing theme collection of the good a plurality of themes of customization, each theme comprises a plurality of theme items, comprise related datas such as table name, row name and subject, theme is a table section (this zone can be overlapping) in fact, this is the zone of dividing for the cloud computing parallel parsing, and the theme customization also is a process that the zone is divided, and it represents the size of data that this theme is controlled.Wherein, table name is that the physical mappings table in the theme Xiang Zaiyun data warehouse that exists concentrated in theme, the required attribute field of using and data when comprising this theme item analysis in the table; The row name is required for the data attribute name of analyzing that relates under this theme item, namely be stored in the field name in the table, also can be called theme item attribute, in Fig. 7, the field name " plate embryo chemical constitution and inlet thickness " " tapping temperature " " roll-force of each frame of milling train " " mill speed and roll gap " in " influencing the theme item table of the factor of thickness of slab " etc. all is the row names.The theme key name is to be that this analyzes the logical name of theme Xiang Suoding, it be used for and external user mutual, inner then corresponding to the table of physical mappings, all be the theme key name as " influencing the factor of thickness of slab " of Fig. 7 and " thickness of slab is related with chemical constitution " of Fig. 8.
Fig. 6 can expand the theme customized module specifically and the process flow diagram of other module interactive operations.
Step 3.1, theme customized module determine according to keyword query theme storehouse whether the theme storehouse exists the required theme item of user.
The theme customized module receives the key word relevant with inquiry theme item of user's input, and in the theme storehouse, search for the related subject item according to key word, generate the tabulation of related subject item, theme item putting in order in tabulation can be the degree of association of this theme item and key word or the priority of theme item access frequency.The user can select required theme item from the tabulation of related subject item.
Step 3.2, when not having required theme item in the theme storehouse, the theme customized module provides the attribute of the data dictionary of tables of data to select, and receives the user attribute in the data dictionary is selected, and obtain the proposed topic item based on selected properties in the experience storehouse.
When the required theme Xiang Wei of user appears at the tabulation of related subject item, the theme customized module is according to the data dictionary of user applies demonstration corresponding to tables of data, when the attribute (the attribute item of data dictionary or data dictionary attribute) of a certain or a plurality of fields of user's selected data dictionary, system with it as required theme item attribute, automatically provide the associated recommendation of the theme item of this attribute that exists in the experience storehouse, do not allow the user blindly select, the user selects to use or do not use the proposed topic item by actual demand then.
Step 3.3 when having required theme item in the experience storehouse, is become owner of exam pool with required theme item registration; When not having required theme item in the experience storehouse, accept user-defined new theme item, exam pool is become owner of in described new theme item registration.
When the demand of customization does not exist in the experience storehouse, user-defined theme item can be registered into the experience storehouse, a process as self study, and the experience storehouse is the attribute list in statistics theme storehouse regularly, the attribute that the frequency of occurrences is higher is as responsive attribute, and they recommend the user as the determinant attribute that the user begins to customize theme; Exam pool is become owner of in new theme item registration, and marks off this needed zone of theme item and scope in dictionary; After the regional assignment of theme item, MapReduce template container can provide the parallel algorithm support of MapReduce for common data mining analysis method; It is a parallelization integrator, can disposablely handle when adopting with a kind of mining algorithm to a plurality of theme items of same data set, improves speed and efficient that data analysis is excavated greatly.Common data method for digging storehouse is the public method with some data minings of MapReduce realization, as: correlation rule, neural network, genetic algorithm and traditional decision-tree etc., dynamic load is to MapReduce template container during use, significantly reduced data mining personnel's workload, simultaneously, utilize MapReduce template container, the API that the data management personnel can easily use container to provide writes the program of some subject analyses, agree with system more easily, the parallel mining efficiency of performance.
Step 3.4, during service data, request of data is sent to the cloud data warehouse in the theme storehouse.Extract the data item of request from the cloud data warehouse, be used for carrying out data mining and analysis.
When needs carried out data mining to a certain theme item and analyze, the theme storehouse sent request of data to the cloud data warehouse, so that the related attribute data of storing in the acquisition cloud data warehouse of this theme item comprises in the request of data choosing theme item attribute.The cloud data warehouse mates the attribute of each field of theme item attribute and data dictionary, according to the attribute of data dictionary of coupling and the mapping relations of data item, obtains required theme item and is stored in data item in the cloud data warehouse tables of data.Afterwards, the cloud data warehouse sends to the data item of obtaining in (non-native system) analysis module and carries out data analysis.
Be the example that the customization of theme item is effectively recommended as Fig. 7, user's request is to look for some data analysis theme items relevant with thickness of slab, at first he can select theme item attribute relevant with thickness of slab as slab chemical constitution and inlet thickness, the theme storehouse can provide all relevant theme items of attribute therewith that the theme storehouse has according to field name, the relevant theme item of bar attribute therewith also can be applied for as the experience storehouse in the theme storehouse simultaneously, its experience record is scanned in the experience storehouse, can find many relevant records of attribute therewith, as wherein one: slab chemical constitution and inlet thickness, tapping temperature, the roll-force of each frame of milling train, mill speed and roll gap, the setting value of each parameter and self study value etc., the theme key name is for influencing the factor of thickness of slab; The user can become owner of exam pool with this theme item registration as just in time needing this theme item, and the theme storehouse is to the request of cloud data warehouse during service data, and the cloud data warehouse sends to requested data item in the analysis module, namely can be used for data mining and analysis.
Be a not exclusively example of coupling as Fig. 8, the user does not have the theme item that discovery needs oneself in the experience storehouse, can expand the theme customized module at first allows the user select maximum occurrence, as: the user wonders there is which kind of rule between slab thickness and each element chemistry composition, and this theme item is imperfect or do not have in the former experience storehouse, the user can select maximum coupling theme item slab inlet thickness at first very easily so, Fe, Gu, Mg, Ag etc., adding user-defined attribute item Pd(symbol of element then) this is non-existent attribute in other thematic systems, be built into new theme item registration so and become owner of exam pool, the theme storehouse is loaded into the experience storehouse as a kind of learning ways in experience storehouse when upgrading, for follow-up transplanting provides great convenience and intelligent.
This embodiment has been finished the pretreated system integration of data mining substantially, makes up and the theme customization from data pick-up, warehouse, allows data mining exploitation user can easily utilize existing data resource to excavate how valuable rule targetedly.
Above-described only is preferred embodiment of the present invention, and the present invention not only is confined to above-described embodiment, all any changes of doing within the spirit and principles in the present invention, is equal to replacement, improvement etc. and all should be included within protection scope of the present invention.

Claims (7)

1. the hot continuous rolling data theme custom-built system based on cloud computing is characterized in that described system comprises the ETL(information extraction) module, lasting data layer module and theme customized module;
The ETL(information extraction) module, be used for resolving the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data is formatd;
Lasting data layer module, lasting data layer module are used for making up data dictionary and tables of data according to the described data dictionary file that receives from the ETL module and gauge outfit file for the cloud data warehouse, and the cloud data warehouse is gone in the collection text data merger after regularly will formaing;
The theme customized module carries out the theme customization based on the cloud data warehouse.
2. hot continuous rolling data theme custom-built system as claimed in claim 1 is characterized in that the ETL module comprises:
The data structure resolution unit is used for resolving the hot continuous rolling system data structure and generates data dictionary file and gauge outfit file;
Structuring template base generation unit is used for the gauge outfit file that data structure elucidation unit generates is formatd the masterplate file of generating structure template base;
The text data formatting unit is used for that the masterplate file in the structuring template base regularly is loaded into data and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
3. hot continuous rolling data theme custom-built system as claimed in claim 1 is characterized in that the theme customized module comprises:
Theme library inquiry unit is used for according to keyword query theme storehouse, determines whether the theme storehouse exists the required theme item of user;
Experience storehouse recommendation unit is used for providing the attribute selection of the data dictionary of tables of data when there is not required theme item in the theme storehouse, and with user-selected attribute as required theme item attribute, and in the experience storehouse, obtain the proposed topic item based on user-selected attribute;
Theme storehouse registering unit is used for when there is required theme item in described proposed topic item, required theme item is registered become owner of exam pool; When not having required theme item in the described proposed topic item, accept user-defined new theme item, and exam pool is become owner of in described new theme item registration;
Communication unit when service data, is used for sending to the cloud data warehouse request of data of theme item.
4. hot continuous rolling data theme method for customizing based on cloud computing is characterized in that this method for customizing may further comprise the steps:
Step 1, ETL module are resolved the hot continuous rolling system data structure, generate data dictionary file and gauge outfit file, and data dictionary file and gauge outfit file are sent to lasting data layer module, and regularly hot continuous rolling system acquisition text data are formatd;
Step 2, lasting data layer module are cloud data warehouse establishment data dictionary and tables of data according to the data dictionary file and the gauge outfit file that receive, and the collection text data after regularly merger ETL module formats;
Step 3, the theme customized module carries out the theme customization based on the cloud data warehouse.
5. hot continuous rolling data theme method for customizing as claimed in claim 4 is characterized in that step 1 specifically may further comprise the steps:
Step 1.1, ETL module parses hot continuous rolling system data structure generates data dictionary file and gauge outfit file;
Step 1.2, ETL module his-and-hers watches header file formats the masterplate file of generating structure template base;
Step 1.3, the ETL module regularly is loaded into data with the masterplate file in the structuring template base and resolves template base, hot continuous rolling system acquisition text data is formatd, and send to lasting data layer module.
6. hot continuous rolling data theme method for customizing as claimed in claim 4 is characterized in that step 3 specifically may further comprise the steps:
Step 3.1, theme customized module determine according to keyword query theme storehouse whether the theme storehouse exists the required theme item of user;
Step 3.2, when not having required theme item in the theme storehouse, the theme customized module provides the attribute of the data dictionary of tables of data to select, and receives the user attribute in the data dictionary is selected, and obtain the proposed topic item based on user-selected attribute in the experience storehouse;
Step 3.3 when having required theme item in the proposed topic item of experience storehouse, is become owner of exam pool with required theme item registration; When not having required theme item in the proposed topic item of experience storehouse, user-defined new theme item is accepted in the experience storehouse, and exam pool is become owner of in described new theme item registration;
Step 3.4, during service data, the theme storehouse sends the request of data of theme item to the cloud data warehouse.
7. hot continuous rolling data theme method for customizing as claimed in claim 6, it is characterized in that, the mode of the new theme item of User Defined is in the step 3.3: obtain maximum coupling theme item from the proposed topic item of experience storehouse, revise the attribute of maximum coupling theme item, form new theme item.
CN2013101304423A 2013-04-16 2013-04-16 Large-scale hot continuous rolling data scheme customizing system based on cloud computing Pending CN103198138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101304423A CN103198138A (en) 2013-04-16 2013-04-16 Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101304423A CN103198138A (en) 2013-04-16 2013-04-16 Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Publications (1)

Publication Number Publication Date
CN103198138A true CN103198138A (en) 2013-07-10

Family

ID=48720695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101304423A Pending CN103198138A (en) 2013-04-16 2013-04-16 Large-scale hot continuous rolling data scheme customizing system based on cloud computing

Country Status (1)

Country Link
CN (1) CN103198138A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107042234A (en) * 2017-03-15 2017-08-15 中冶华天工程技术有限公司 The intelligent production line and production method gathered based on bar whole process big data
CN108171640A (en) * 2017-12-21 2018-06-15 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Marine communication system data storage system and statistical method
CN110134685A (en) * 2019-05-06 2019-08-16 武汉中岩测控技术有限公司 A kind of monitoring method and system based on big data field automatic Mosaic algorithm
CN110348954A (en) * 2019-06-25 2019-10-18 河南科技大学 A kind of complicated technology module partition method of mass customization
CN110355214A (en) * 2019-06-24 2019-10-22 科芃智能科技(苏州)有限公司 A kind of quality stream inlet thickness storage calculation method based on most rickle
CN112507098A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Question processing method, question processing device, electronic equipment, storage medium and program product
CN112818048A (en) * 2021-01-28 2021-05-18 北京软通智慧城市科技有限公司 Hierarchical construction method and device of data warehouse, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012912A (en) * 2010-11-19 2011-04-13 清华大学 Management method for unstructured data based on cloud computing environment
CN102254024A (en) * 2011-07-27 2011-11-23 国网信息通信有限公司 Mass data processing system and method
CN102394923A (en) * 2011-10-27 2012-03-28 周诗琦 Cloud system platform based on n*n display structure
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system
CN102567391A (en) * 2010-12-20 2012-07-11 ***通信集团广东有限公司 Method and device for building classification forecasting mixed model
US20120297017A1 (en) * 2011-05-20 2012-11-22 Microsoft Corporation Privacy-conscious personalization
US20120303559A1 (en) * 2011-05-27 2012-11-29 Ctc Tech Corp. Creation, use and training of computer-based discovery avatars

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102012912A (en) * 2010-11-19 2011-04-13 清华大学 Management method for unstructured data based on cloud computing environment
CN102567391A (en) * 2010-12-20 2012-07-11 ***通信集团广东有限公司 Method and device for building classification forecasting mixed model
US20120297017A1 (en) * 2011-05-20 2012-11-22 Microsoft Corporation Privacy-conscious personalization
US20120303559A1 (en) * 2011-05-27 2012-11-29 Ctc Tech Corp. Creation, use and training of computer-based discovery avatars
CN102254024A (en) * 2011-07-27 2011-11-23 国网信息通信有限公司 Mass data processing system and method
CN102394923A (en) * 2011-10-27 2012-03-28 周诗琦 Cloud system platform based on n*n display structure
CN102521246A (en) * 2011-11-11 2012-06-27 国网信息通信有限公司 Cloud data warehouse system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107042234A (en) * 2017-03-15 2017-08-15 中冶华天工程技术有限公司 The intelligent production line and production method gathered based on bar whole process big data
CN108171640A (en) * 2017-12-21 2018-06-15 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Marine communication system data storage system and statistical method
CN108171640B (en) * 2017-12-21 2021-01-12 武汉船舶通信研究所(中国船舶重工集团公司第七二二研究所) Data storage system and statistical method for ship communication system
CN110134685A (en) * 2019-05-06 2019-08-16 武汉中岩测控技术有限公司 A kind of monitoring method and system based on big data field automatic Mosaic algorithm
CN110355214A (en) * 2019-06-24 2019-10-22 科芃智能科技(苏州)有限公司 A kind of quality stream inlet thickness storage calculation method based on most rickle
CN110348954A (en) * 2019-06-25 2019-10-18 河南科技大学 A kind of complicated technology module partition method of mass customization
CN110348954B (en) * 2019-06-25 2022-02-25 河南科技大学 Complex process module dividing method for large-scale customization
CN112507098A (en) * 2020-12-18 2021-03-16 北京百度网讯科技有限公司 Question processing method, question processing device, electronic equipment, storage medium and program product
CN112818048A (en) * 2021-01-28 2021-05-18 北京软通智慧城市科技有限公司 Hierarchical construction method and device of data warehouse, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103198138A (en) Large-scale hot continuous rolling data scheme customizing system based on cloud computing
JP6857689B2 (en) Data retrieval devices, programs, and recording media
CA2960718C (en) Graphical user interface that simplifies user creation of custom calculations for data visualizations
Nigro et al. Data mining with ontologies: Implementations, findings, and frameworks: Implementations, findings, and frameworks
US10997187B2 (en) Knowledge-driven federated big data query and analytics platform
US10963518B2 (en) Knowledge-driven federated big data query and analytics platform
CN101258496A (en) Autocompleting with queries to a database
CN101490675A (en) Methods and apparatus for reusing data access and presentation elements
EP3699774A1 (en) Knowledge-driven federated big data query and analytics platform
CN104572740A (en) Data storage method and device
Decourselle et al. A survey of FRBRization techniques
US8832601B2 (en) ETL tool utilizing dimension trees
Valdestilhas et al. Where is my URI?
CN103810243A (en) Innovative hotspot pre-warning recognition system and method
US7433882B2 (en) Data management system and computer program
Mavrogiorgou et al. A comparative study in data mining: clustering and classification capabilities
CN108121760A (en) A kind of mining analysis towards OGC geographic information services data is with recommending method
Novak et al. Prototype of a Web ETL tool
KR102488466B1 (en) Apparatus and method to design key-value database based in table diagram
Kucherov et al. The method of forming contents for a nosql storage of configurable information system
KR20170059604A (en) System for planning document databse based in collection diagram and apparatus used in it
US20180260461A1 (en) Multi-platform data mining software
Martens Progress, Public
Dhanasekar et al. An efficient approach for effectual mining of relational patterns from multi-relational database.
Wang et al. Design method of data acquisition in intelligent sensor based on web data mining clustering technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130710