CN103853821A - Method for constructing multiuser collaboration oriented data mining platform - Google Patents

Method for constructing multiuser collaboration oriented data mining platform Download PDF

Info

Publication number
CN103853821A
CN103853821A CN201410059806.8A CN201410059806A CN103853821A CN 103853821 A CN103853821 A CN 103853821A CN 201410059806 A CN201410059806 A CN 201410059806A CN 103853821 A CN103853821 A CN 103853821A
Authority
CN
China
Prior art keywords
data
user
mining
data mining
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410059806.8A
Other languages
Chinese (zh)
Other versions
CN103853821B (en
Inventor
叶枫
郭小成
李源畅
范仕良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201410059806.8A priority Critical patent/CN103853821B/en
Publication of CN103853821A publication Critical patent/CN103853821A/en
Application granted granted Critical
Publication of CN103853821B publication Critical patent/CN103853821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method for constructing a multiuser collaboration oriented data mining platform. According to the method, flexible workflow and a multiuser collaboration mechanism are integrated, a working space oriented to the collaborated data mining of three kinds of user roles, namely data acquisition staff, data analysis staff and result auditing staff, is provided, and the whole work flow is realized by components comprising a data acquisition component, a data preprocessing component, a data modeling component, a result visualization display component and a model evaluation component. The flexible workflow formed by the components and arrows can be created and manipulated by different use roles in different user views in a drag-and-drop manner. Aiming at the complexity, namely continuous repeating, continuous revising and continuous iterating, of data mining, the method has the advantages that data mining work can be greatly simplified, and data can be prevented from leaking so as to guarantee the safety of the data.

Description

A kind of construction method of data mining platform of facing multiple users cooperation
?
Technical field
The construction method that the present invention relates to the data mining platform of a kind of integrated resiliency workflow, facing multiple users cooperation, belongs to data mining technology field.
Background technology
Data mining (data mining) is a kind of from the historical business datum of magnanimity, sees through mathematical analysis pattern and extracts the process of containing in potential information wherein.Data mining is the process of a continuous repetition, constantly amendment, continuous iteration, mainly comprises: the flow processs such as data acquisition, data pre-service, data analysis, result visualization demonstration and model evaluation.At present, data mining is widely used at commercial fields such as bank, telecommunications, insurance, traffic, retails.
, there is following problem in existing data mining platform: lack revocable, can reform, preservable elasticity user workspace, user must be settled at one go in the time carrying out data mining, bring inconvenience; Lack can revise, can iteration, procedure member that can intermediate result output, make user can not understand well and handle its data analysis process; Towards the excavation mechanism at alone family, make user set data collector, data analyst, three roles of result audit crew, cannot in whole analysis process, cooperate, also be easy to cause leaking of data and analysis result, cause problem of data safety.
Summary of the invention
Goal of the invention: for problems of the prior art, the invention provides a kind of construction method of the data mining platform that relates to elastic working stream, multi-user Cooperation.
The data mining platform building by the inventive method provide a kind of based on Web revocable, can reform, preservable elasticity user workspace.In user workspace, data acquisition personnel can upload, upgrade, delete data set; The data analysis flow process of oneself can be set up and handle to data analyst; Result audit crew can carry out examination and the reply of Result.
Technical scheme: a kind of construction method of data mining platform of facing multiple users cooperation, a kind of data-oriented collector, data analyst and three kinds of user roles of result audit crew work space that carries out data mining that cooperates is provided, whole workflow realizes with member, comprising: data acquisition member, data pre-service member, data modeling member, result visualization display member and model evaluation member.Different user roles uses different Users, can use the mode pulling to set up and operate the data analysis flow process of oneself, described data acquisition personnel carry out uploading of data by described data acquisition member, upgrade and deletion action, described data analyst is utilized data pre-service member successively by flow sequence, data modeling member, result visualization member and model evaluation member carry out data acquisition, data pre-service, modeling, the data analysis operations such as model evaluation, described result audit crew is examined and is given an written reply Result by described result visualization member in described user workspace.
Described user workspace is the pattern manipulation interface of a towed, comprise: prioritizing member district and flow process create two, district part, described prioritizing member district is the region of a series of type of extension data digging flow members of display, and it is that user sets up and the region of manipulation data analysis process that described flow process creates district.
Described data analysis flow process is that a kind of elastic working being made up of member and arrow flows.In any data analysis flow process, execution parameter, change flow that user can adjust on component node are at any time carried out direction and are derived the operations such as intermediate operations result.
Data mining platform construction method comprises the following steps;
Step 1: design and Implement data acquisition member.In following two kinds of situations, carry out data acquisition: image data and web upload mode image data in database.
Image data in database, is connected and is realized by Java database, and the data access of data mining platform is converted into corresponding data query in database in real time.
Web uploads mode image data, and by monitoring the data upload request of web client, the socket that sets up client and data storage server is connected, and re-uses Java I/O stream data set is written in the file system of data storage server.
In the time of two kinds of data acquisition member specific implementations, all need the metadata information of the corresponding data of data set to be saved in the database of system, and unified access interface is externally provided.
Step 2: design and Implement data pre-service member.By R language, data set is carried out to statistical study, the basic descriptor in the mode of figure to user's demonstrating data collection; The mathematical method of removal and data correction is filled up, recorded to encapsulation interpolation, and the user interface of the data pre-service links such as deal with data missing values, processing repeating data, processing noise data and processing abnormal data is provided.
Step 3: design and Implement data modeling member.By data mining models such as R language encapsulation classification, cluster, association and time serieses; Provide graphical interfaces interface, to user, corresponding model analysis parameter is set.
Step 4: design and Implement result visualization display member.By R language, data mining results and model evaluation result are represented to user in the mode such as figure, list; Be pushed to result audit crew during by fructufy by Ajax polling technique.
Step 5: design and Implement model evaluation member.By using R language that the multiple model evaluation methods such as accurate rate, error rate and confusion matrix are provided; The user interface that model analysis parameter and model element data message is saved in to system database is provided.
Step 6: design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery, comprise that member prioritizing member district and flow process create two, district part; Store User operation log by stack data structures, the user interface of cancelling, reforming and save workspace is provided.
Step 7: define and realize data digging flow.The data mining member designing taking step 1 to step 5 is node, the workflow that definition is made up of several nodes and arrow; Provide knot modification execution parameter, change flow to carry out direction and derive the user interfaces such as intermediate operations result.
Step 8: integrated and deployment Mining Platform.The data mining member that step 1 to step 5 is designed provides the configuration interface of JSON form, and the mode that provides to edit configuration file customizes the user interface of the function of Mining Platform.
The present invention adopts technique scheme, has following beneficial effect: for the complicacy of the continuous repetition of data mining, constantly amendment, continuous iteration, provide a kind of elastic data excacation space of facing multiple users cooperation.Not only reduced data excacation greatly, can also prevent leaking of data, ensures the security of data.
Brief description of the drawings
Fig. 1 is the structural principle block diagram of the facing multiple users data mining platform of the embodiment of the present invention.
Embodiment
Below in conjunction with specific embodiment, further illustrate the present invention, should understand these embodiment is only not used in and limits the scope of the invention for the present invention is described, after having read the present invention, those skilled in the art all fall within the application's claims limited range to the amendment of the various equivalent form of values of the present invention.
In the embodiment of the present invention, data mining platform construction method comprises the following steps;
Step 1: design and Implement data acquisition member.For the large quantification (volume) of data set, the complex characteristics such as variation (variety) and rapid (velocity), are divided into following two kinds of situation specific implementations: image data and web upload mode image data in database.
Image data in database, connects (JDBC) by Java database and realizes, and the data access of data mining platform is converted into corresponding data query SQL in database in real time.
Web uploads mode image data, and by monitoring the data upload request of web client, the socket that sets up client and data storage server is connected, and re-uses Java I/O stream data set is written in the file system of data storage server.
In the time of two kinds of data acquisition member specific implementations, all need the metadata information of the corresponding data of data set to be saved in the database of system, and unified access interface is externally provided.
Step 2: design and Implement data pre-service member.By R language, data set is carried out to statistical study, the basic descriptor in the mode of figure to user's demonstrating data collection; The mathematical method of removal and data correction is filled up, recorded to encapsulation interpolation, and the user interface of the data pre-service links such as deal with data missing values, processing repeating data, processing noise data and processing abnormal data is provided.
Step 3: design and Implement data modeling member.By data mining models such as R language encapsulation classification, cluster, association and time serieses; Provide graphical interfaces interface, to user, corresponding model analysis parameter is set.
Step 4: design and Implement result visualization display member.By R language, data mining results and model evaluation result are represented to user in the mode such as figure, list; Be pushed to result audit crew during by fructufy by Ajax polling technique.
Step 5: design and Implement model evaluation member.By R language, the model establishing is before assessed; The user interface that model analysis parameter and model element data message is saved in to system database is provided.
Step 6: design and Implement user workspace.Realize the pattern manipulation interface of a towed by JQuery, comprise that member prioritizing member district and flow process create two, district part; Store User operation log by stack data structures, the user interface of cancelling, reforming and save workspace is provided.
Step 7: define and realize data digging flow.The data mining member designing taking step 1 to step 5 is node, the workflow that definition is made up of several nodes and arrow; Provide knot modification execution parameter, change flow to carry out direction and derive the user interfaces such as intermediate operations result.
Step 8: integrated and deployment Mining Platform.The data mining member that step 1 to step 5 is designed provides the configuration interface of JSON form, and the mode that provides to edit configuration file customizes the user interface of the function of Mining Platform.
As shown in Figure 1, data mining platform data-oriented collector, data analyst and the three kinds of user roles of result audit crew that the present invention relates to carry out collaboration data excavation, and a kind of user workspace of componentization is provided, comprises data acquisition member, data pre-service member, data modeling member, result visualization display member and model evaluation member.
Different user roles uses different Users, can use the mode pulling to set up and operate the data analysis flow process of oneself, data acquisition personnel carry out uploading of data by data acquisition member, upgrade and deletion action, described data analyst is utilized data pre-service member successively by flow sequence, data modeling member, result visualization member and model evaluation member carry out data acquisition, data pre-service, modeling, the data analysis operations such as model evaluation, result audit crew is examined and is given an written reply Result by result visualization member in user workspace.
Data analysis flow process is that a kind of elastic working being made up of member and arrow flows.In any data analysis flow process, execution parameter, change flow that user can adjust on component node are at any time carried out direction and are derived the operations such as intermediate operations result.

Claims (1)

1. the construction method of the data mining platform of a facing multiple users cooperation, it is characterized in that: a kind of data-oriented collector, data analyst and three kinds of user roles of result audit crew work space that carries out data mining that cooperates is provided, specifically comprises the following steps:
Step 1: design and Implement data acquisition member;
In following two kinds of situations, carry out data acquisition: image data and web upload mode image data in database;
Image data in database, is connected and is realized by Java database, and the data access of data mining platform is converted into corresponding data query in database in real time;
Web uploads mode image data, and by monitoring the data upload request of web client, the socket that sets up client and data storage server is connected, and re-uses Java I/O stream data set is written in the file system of data storage server;
In the time of two kinds of data acquisition member specific implementations, all need the metadata information of the corresponding data of data set to be saved in the database of system, and unified access interface is externally provided;
Step 2: design and Implement data pre-service member; By R language, data set is carried out to statistical study, the basic descriptor in the mode of figure to user's demonstrating data collection; The mathematical method of removal and data correction is filled up, recorded to encapsulation interpolation, and the user interface of the data pre-service links such as deal with data missing values, processing repeating data, processing noise data and processing abnormal data is provided;
Step 3: design and Implement data modeling member; By R language encapsulation classification, cluster, association and seasonal effect in time series data mining model; Provide graphical interfaces interface, to user, corresponding model analysis parameter is set;
Step 4: design and Implement result visualization display member; By R language, data mining results and model evaluation result are represented to user in the mode such as figure, list; Be pushed to result audit crew during by fructufy by Ajax polling technique;
Step 5: design and Implement model evaluation member; By using R language that the multiple model evaluation method of accurate rate, error rate and confusion matrix is provided; The user interface that model analysis parameter and model element data message is saved in to system database is provided;
Step 6: design and Implement user workspace; Realize the pattern manipulation interface of a towed by JQuery, comprise that member prioritizing member district and flow process create two, district part; Store User operation log by stack data structures, the user interface of cancelling, reforming and save workspace is provided;
Step 7: define and realize data digging flow; The data mining member designing taking step 1 to step 5 is node, the workflow that definition is made up of several nodes and arrow; Provide knot modification execution parameter, change flow to carry out direction and derive the user interfaces such as intermediate operations result;
Step 8: integrated and deployment Mining Platform; The data mining member that step 1 to step 5 is designed provides the configuration interface of JSON form, and the mode that provides to edit configuration file customizes the user interface of the function of Mining Platform.
CN201410059806.8A 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform Active CN103853821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410059806.8A CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410059806.8A CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Publications (2)

Publication Number Publication Date
CN103853821A true CN103853821A (en) 2014-06-11
CN103853821B CN103853821B (en) 2017-02-22

Family

ID=50861476

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410059806.8A Active CN103853821B (en) 2014-02-21 2014-02-21 Method for constructing multiuser collaboration oriented data mining platform

Country Status (1)

Country Link
CN (1) CN103853821B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572929A (en) * 2014-12-26 2015-04-29 深圳市科漫达智能管理科技有限公司 Data mining method and device
CN104731953A (en) * 2015-03-31 2015-06-24 河海大学 R-based building method of data preprocessing system
CN105159688A (en) * 2015-10-14 2015-12-16 浙江大学 Programmable information visualization interaction type design method
CN105468736A (en) * 2015-11-23 2016-04-06 国云科技股份有限公司 Plug-in and component based data preprocessing system and realization method therefor
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model
CN106446238A (en) * 2016-10-10 2017-02-22 合肥红珊瑚软件服务有限公司 Web data mining system based on XML
CN106599325A (en) * 2017-01-18 2017-04-26 河海大学 Method for constructing data mining visualization platform based on R and HighCharts
CN107944146A (en) * 2017-11-28 2018-04-20 河海大学 Polynary hydrology Time Series Matching model building method based on principal component analysis
CN108228359A (en) * 2016-12-15 2018-06-29 北京京东尚科信息技术有限公司 Web programs integrate the method and system of processing data with R programs
CN108304557A (en) * 2018-02-07 2018-07-20 霍尔果斯智融未来信息科技有限公司 A kind of multiple person cooperational data digging method
CN108563706A (en) * 2018-03-27 2018-09-21 昆山和君纵达数据科技有限公司 A kind of collection big data intelligent service system and its operation method
CN108694448A (en) * 2018-05-08 2018-10-23 成都卡莱博尔信息技术股份有限公司 PHM platforms
WO2019033401A1 (en) * 2017-08-18 2019-02-21 深圳怡化电脑股份有限公司 Software development method and device
CN109491289A (en) * 2018-11-15 2019-03-19 国家计算机网络与信息安全管理中心 A kind of dynamic early-warning method and device for data center's dynamic environment monitoring
CN109558395A (en) * 2018-10-17 2019-04-02 中国光大银行股份有限公司 Data processing system and data digging method
CN110909039A (en) * 2019-10-25 2020-03-24 北京华如科技股份有限公司 Big data mining tool and method based on drag type process
CN112069244A (en) * 2020-08-28 2020-12-11 福建博思软件股份有限公司 Visualization-based web page data mining method and storage device
CN112148747A (en) * 2020-09-08 2020-12-29 银清科技有限公司 Transaction system log analysis method and device based on R language
CN112632146A (en) * 2020-12-03 2021-04-09 成都大数据产业技术研究院有限公司 Multi-person collaborative visual data mining system
CN114597890A (en) * 2022-01-27 2022-06-07 国网冀北电力有限公司经济技术研究院 Construction method of holographic data system of power transmission line
CN116737803A (en) * 2023-08-10 2023-09-12 天津神舟通用数据技术有限公司 Visual data mining arrangement method based on directed acyclic graph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100476819C (en) * 2006-12-27 2009-04-08 章毅 Data mining system based on Web and control method thereof
CN101324901A (en) * 2008-08-06 2008-12-17 中国电信股份有限公司 Method, platform and system for excavating data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
何清等: "基于云计算的大数据挖掘平台", 《中兴通讯技术》 *
陈慧萍等: "WEKA数据挖掘平台及其二次开发", 《计算机工程与应用》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104572929A (en) * 2014-12-26 2015-04-29 深圳市科漫达智能管理科技有限公司 Data mining method and device
CN104731953A (en) * 2015-03-31 2015-06-24 河海大学 R-based building method of data preprocessing system
CN105159688A (en) * 2015-10-14 2015-12-16 浙江大学 Programmable information visualization interaction type design method
CN105468736A (en) * 2015-11-23 2016-04-06 国云科技股份有限公司 Plug-in and component based data preprocessing system and realization method therefor
CN105550365A (en) * 2016-01-15 2016-05-04 中国科学院自动化研究所 Visualization analysis system based on text topic model
CN106446238A (en) * 2016-10-10 2017-02-22 合肥红珊瑚软件服务有限公司 Web data mining system based on XML
CN108228359B (en) * 2016-12-15 2020-11-03 北京京东尚科信息技术有限公司 Method and system for integrating web program and R program to process data
CN108228359A (en) * 2016-12-15 2018-06-29 北京京东尚科信息技术有限公司 Web programs integrate the method and system of processing data with R programs
CN106599325A (en) * 2017-01-18 2017-04-26 河海大学 Method for constructing data mining visualization platform based on R and HighCharts
WO2019033401A1 (en) * 2017-08-18 2019-02-21 深圳怡化电脑股份有限公司 Software development method and device
CN107944146A (en) * 2017-11-28 2018-04-20 河海大学 Polynary hydrology Time Series Matching model building method based on principal component analysis
CN108304557A (en) * 2018-02-07 2018-07-20 霍尔果斯智融未来信息科技有限公司 A kind of multiple person cooperational data digging method
CN108563706A (en) * 2018-03-27 2018-09-21 昆山和君纵达数据科技有限公司 A kind of collection big data intelligent service system and its operation method
CN108694448A (en) * 2018-05-08 2018-10-23 成都卡莱博尔信息技术股份有限公司 PHM platforms
CN109558395A (en) * 2018-10-17 2019-04-02 中国光大银行股份有限公司 Data processing system and data digging method
CN109491289A (en) * 2018-11-15 2019-03-19 国家计算机网络与信息安全管理中心 A kind of dynamic early-warning method and device for data center's dynamic environment monitoring
CN110909039A (en) * 2019-10-25 2020-03-24 北京华如科技股份有限公司 Big data mining tool and method based on drag type process
CN112069244A (en) * 2020-08-28 2020-12-11 福建博思软件股份有限公司 Visualization-based web page data mining method and storage device
CN112069244B (en) * 2020-08-28 2022-07-29 福建博思软件股份有限公司 Method and storage device based on visualization web page data mining
CN112148747A (en) * 2020-09-08 2020-12-29 银清科技有限公司 Transaction system log analysis method and device based on R language
CN112632146A (en) * 2020-12-03 2021-04-09 成都大数据产业技术研究院有限公司 Multi-person collaborative visual data mining system
CN112632146B (en) * 2020-12-03 2023-04-07 成都大数据产业技术研究院有限公司 Multi-person collaborative visual data mining system
CN114597890A (en) * 2022-01-27 2022-06-07 国网冀北电力有限公司经济技术研究院 Construction method of holographic data system of power transmission line
CN116737803A (en) * 2023-08-10 2023-09-12 天津神舟通用数据技术有限公司 Visual data mining arrangement method based on directed acyclic graph
CN116737803B (en) * 2023-08-10 2023-11-17 天津神舟通用数据技术有限公司 Visual data mining arrangement method based on directed acyclic graph

Also Published As

Publication number Publication date
CN103853821B (en) 2017-02-22

Similar Documents

Publication Publication Date Title
CN103853821A (en) Method for constructing multiuser collaboration oriented data mining platform
Karnitis et al. Migration of relational database to document-oriented database: structure denormalization and data transformation
Yang et al. A system architecture for manufacturing process analysis based on big data and process mining techniques
CN110413690A (en) Method of data synchronization, server, electronic equipment, the storage medium of database
US20140157417A1 (en) Methods and systems for architecture-centric threat modeling, analysis and visualization
CN109961204A (en) Quality of service analysis method and system under a kind of micro services framework
Bhardwaj et al. Implementation of ID3 algorithm
CN103679384A (en) Method for workflow cooperative office work
CN104298496B (en) data analysis type software development framework system
WO2012074516A1 (en) Systems and methods for reducing reservoir simulator model run time
CN106021260A (en) Method and system to search for at least one relationship pattern in a plurality of runtime artifacts
CN104951306B (en) Data processing method and system based on real-time Computational frame
SG10201702888XA (en) Platform for the integration of operational bim, operational intelligence, and user journeys for the simplified and unified management of smart cities
Hammad et al. Provenance as a service: A data-centric approach for real-time monitoring
CN109376153A (en) System and method for writing data into graph database based on NiFi
US10963963B2 (en) Rule based hierarchical configuration
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
KR20150058709A (en) Integrated system for research productivity and operation managment based on big date technology, and method thereof
Talib et al. A multi-agent framework for data extraction, transformation and loading in data warehouse
CN109033157A (en) A kind of complex data search method and system based on customized search condition tree
Brzychczy et al. New possibilities for process analysis in an underground mine
CN111552847B (en) Method and device for changing number of objects
Laksmiwati et al. Modeling unpredictable data and moving object in disaster management information system based on spatio-temporal data model
US20110295882A1 (en) System and method for providing a composite view object and sql bypass in a business intelligence server
Pourmasoumi et al. On Business Process Variants Generation.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant