CN106649599A - Knowledge service oriented scientific research data processing and predictive analysis platform - Google Patents
Knowledge service oriented scientific research data processing and predictive analysis platform Download PDFInfo
- Publication number
- CN106649599A CN106649599A CN201611054928.3A CN201611054928A CN106649599A CN 106649599 A CN106649599 A CN 106649599A CN 201611054928 A CN201611054928 A CN 201611054928A CN 106649599 A CN106649599 A CN 106649599A
- Authority
- CN
- China
- Prior art keywords
- data
- platform
- analysis
- module
- data processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 23
- 238000004458 analytical method Methods 0.000 title claims abstract description 21
- 238000011160 research Methods 0.000 title abstract description 12
- 238000007405 data analysis Methods 0.000 claims abstract description 32
- 230000002159 abnormal effect Effects 0.000 claims abstract description 13
- 238000000034 method Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 13
- 238000006243 chemical reaction Methods 0.000 claims description 10
- 230000000007 visual effect Effects 0.000 claims description 8
- 238000013500 data storage Methods 0.000 claims description 5
- 238000012216 screening Methods 0.000 claims description 4
- 238000012800 visualization Methods 0.000 claims description 4
- 238000009877 rendering Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 241001269238 Data Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000007670 refining Methods 0.000 description 2
- 238000009412 basement excavation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/20—Education
- G06Q50/205—Education administration or guidance
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Educational Administration (AREA)
- Educational Technology (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Human Resources & Organizations (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Economics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a knowledge service oriented scientific research data processing and predictive analysis platform. The platform comprises a data processing module, a data analysis module and a platform application module. The data processing module is used for extracting data specification, identifying and processing abnormal data, and standardizing data word list. The data analysis module is used for acquiring standardized data, establishing a data analysis model, extracting data from a relational model, and displaying the data in a visualized view. The platform application module is used for providing the result derived from the data processing module and data analysis module to a WEB application program to allow a user to search, browse and do the statistics of scientific research data. The knowledge service oriented scientific research data processing and predictive analysis platform has the advantages of being capable of processing data of different sources, sending output in a uniform format, facilitating the user to retrieve data based on different needs, having a reusable standard data word list, and makes the rendering in the most direct and visualized manner.
Description
Technical field
The present invention relates to the system or method of data processing operation, are related to be specially adapted for the system of education department or side
Method, the scientific data of espespecially a kind of Facing Knowledge Service is processed and predictive analysis platform.
Background technology
With the globalization that the development and scientific research of information technology are propagated, mechanism's scientific achievement becomes international each colleges and universities and scientific research
The units such as institutes weigh the evaluation criterion of its basic research strength, and numerous colleges and universities are by the evaluation of professional titles, performance appraisal, department's bonus etc.
Link directly with science research output and influence power, scientific research enthusiasm is improved with this.The scientific data that in addition mechanism quantifies be school and
When relevant departments carry out again the very important decisions such as department's restructuring, development adjustment and planning, there is provided objective fact basis.
Recent year Ge great colleges and universities are all using the papers included of three abstract database (SCI-E, CPCI-S, EI) as
The important indicator of section's level and academic standing, other Digest Database is such as:SCOPUS is also continuous as up-and-coming youngster
Expand its influence power.Because each system data including standard, coverage, data form are had nothing in common with each other, and data itself also can
There is wrong record, by mistake many situations lack of standardization such as record.Often being difficult to meet mechanism carries out cross-platform, cross-system, random, many to data
What is selected calls, and is unfavorable for effective management and use of the mechanism to scientific data.Its shortcoming mainly has:
1st, normalizing database differs, and recall ratio, pertinency factor are low, Overall Acquisition mechanism achievement high cost.Such as mechanism's literary style not
Specification, disunity, rename, mechanism of the same name, according to missing inspection, flase drop caused by misspelling or field offering question etc..
2nd, because the stealthy data of the non-standard of data between different platform and part are refined without effectively cleaning, use
Family is difficult to directly obtain the immediate data required for it by different platform.Such as it is difficult to count the dispatch and contribution of different departments
Rate, the mechanism of this school first, the achievement statistics of communication means.
3rd, data effectively do not carry out data storage, conversion, and data recycle rate is very low.
The content of the invention
For the shortcoming of prior art, it is an object of the invention to provide a kind of scientific data of Facing Knowledge Service is processed
With predictive analysis platform.One-stop management, inquiry and the predictive analysis platform of scientific data are provided, scientific data is carried out
Convenient management, improves organization data utilization rate and value, servomechanism scientific research development.
The technical solution adopted for the present invention to solve the technical problems is:A kind of scientific data of Facing Knowledge Service is provided
Process and predictive analysis platform, it is characterised in that include:
Data processing module:With the Digest Database specific format receiving data of international mainstream, by separate sources database
Corresponding field, description rule, memory requirement unified, number separate sources database being converted into required for this platform
According to, specification extraction is carried out to data, the identification and screening of abnormal data are carried out, abnormal data is extracted so as to artificial knowledge
Not, extraction is re-recognized to wrong data;Data after process are carried out into duplicate removal process, and carries out md5 encryption, data after specification
Can be realized to data analysis, inquiry operation by ES indexes;
Data analysis module:Data after for obtaining specification, set up Data Analysis Model, the extraction number from relational model
According to and shown with visualization view;
Platform application module:For the result of data processing module and data analysis module to be supplied to into WEB application journey
Sequence, so that user inquires about, browses, counts scientific data.
Further:
The data processing module includes:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, storage
Requirement is unified, and so as not to same source database the data required for application platform can be converted into;
S1.2 data standard extraction modules:For to separate sources data are according to change data form and require to specific word
Segment data is extracted;
S1.3 disorder data recognition modules:Data abnormal conditions are judged, data can be improved by machine learning
The data of cleaning rule then complete data cleansing using machine, and the matching of machine None- identified then submits to data processing interface to be used for
Abnormal data is extracted so as to manual identified, for wrong data is processed extraction is re-recognized.
Data analysis module includes:
S2.1 Data Analysis Model modules:For setting up Data Analysis Model, according to the data after specification, data point are extracted
Internal relation between analysis result, data storage;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.
The invention has the beneficial effects as follows:
1 integrates multi-source data, and final with unified and standard form, according to user's different demands, provides clear in WEB terminal
Look at, retrieve, inquiring about, statistical work, solve colleges and universities for mechanism scientific data statistics, the puzzlement of inquiry.
2 data cleansing strategies and rule can be sentenced with constantly improve and multiplexing by preliminary data cleansing strategy and rule
The disconnected mechanism's cleaning that can complete most mechanism's scientific datas, greatly simplifies user data handling process;
3 pairs of mechanism's scientific datas carry out advanced analysis mining, and with visualization view formal intuition mechanism's scientific research situation is disclosed.
Description of the drawings
Below in conjunction with the accompanying drawings the invention will be further described.
Fig. 1 is the system construction drawing of the present invention.
Fig. 2 is the data processing module system diagram of the present invention.
Fig. 3 is the data analysis module system diagram of the present invention.
Specific embodiment
Referring to accompanying drawing, a kind of scientific data of Facing Knowledge Service of the invention is processed and predictive analysis platform embodiment,
Include:
Data processing module:Referring to Fig. 2,
Data processing module includes:Data acquisition, data conversion, data standard extraction, disorder data recognition, data merge
Duplicate removal, data storage, index;In order to ensure including the accuracy of data, in an embodiment of the present invention, data source is mainly with state
The Digest Database specific format of border main flow;Data conversion is by the corresponding field of separate sources database, description rule, storage will
Ask and unified, so as not to same source database the data required for platform of the present invention can be converted into;Because certain applications are put down
Platform desired data relation is implied in some fields, and the present invention also carries out specification extraction to data, such as from author's address field
Extract:The fields such as country, province, city, postcode, school, some two grades of departments;Because data itself are lack of standardization or extract
Mismatching in information cannot extract or recognize so as to result in data, identification of the embodiment of the present invention with abnormal data and sieve
Choosing, can by abnormal data according to some Rule Extractions out so as to manual identified, can know again for wrong data is processed
Indescribably take;Data after process carry out duplicate removal process according to ad hoc rules, and carry out md5 encryption;Data pass through ES ropes after specification
Draw realization to data analysis, inquiry operation;
Data analysis module:Data analysis is the soul that data value is excavated, and data visualization is by graphical hand
Section, by expressing for the implicit information visual pattern of scientific data, data analysis module flow process is referring to Fig. 3;
Data analysis module includes:By " data processing module " obtain specification after data, set up Data Analysis Model,
Extracted data and shown with visualization view from relational model;Wherein, data source is mainly " data processing module "
Data after specification;The emphasis that Data Analysis Model is analysis is set up, covers analysis rule setting, analysis threshold value setting, analysis
Dimension determines;By analyze data object and relational model scheduled store, number is shown according to the Visual Chart for embodying data relationship
According to relation;
Platform application module:The functional module is mainly supplied to the result of data processing module and data analysis module
WEB application program, so that user inquires about, browses, counts scientific data;It is included in web terminal and browses scientific data, screening scientific research number
According to, retrieval scientific data.
In an embodiment of the present invention:
The data processing module includes:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, storage
Requirement is unified, and so as not to same source database the data required for application platform, such as SCI-E author mechanism can be converted into
General format is+two grades of departments of school+city postcode, and EI author mechanism general format is two grades of department+school+city postals
Compile, data conversion module needs accurately to change school with two grades of department's relations;
S1.2 data standard extraction modules:To separate sources data are according to change data form and require to specific fields number
According to being extracted, due to different pieces of information recording form difference, the hiding field having and data relationship need to be carried from data relationship
Refining is obtained, and is extracted such as from author's address field:The fields such as country, province, city, postcode, school, some two grades of departments;
S1.3 disorder data recognition modules:Because data itself are lack of standardization or rule differs, the proficiency data of direct access
Availability than relatively low, after only data being carried out with the excavation of profound level and being processed, for the data inquired about, count, analyze
It is just authentic and valid;Abnormal data is divided into two situations:Machine is by the manageable abnormal conditions of study (such as the misspelling of data
By mistake, the different naming methods of unified mechanism, different institutions literary style are consistent, unified mechanism's different time title change, non-sincere
Data interference etc.).Prediction dictionary, it is possible to use the word frequency statisticses technology of mask data tentatively recognizes most of data;Using language
Method analyze and fuzzy matching technology to data similarity judge can with identifying system relatively common misspelling the problems such as;Pass through
The mode analyzing and associating data for judging are combined using multi-field, can be used for recognizing the problem that different institutions literary style is consistent;Utilize
Non- sincere word can isolate interference data item;By the way that data cleansing strategy and rule are constantly improved and machine learning, can
Further to improve data cleansing efficiency.For the problem of the complete None- identified of machine:Such as illegal value, shortage of data, then increase
Plus the identification and screening of abnormal data, so as to manual identified.
Data analysis module includes:
S2.1 Data Analysis Models:The emphasis that Data Analysis Model is analysis is set up, how according to the data after specification, is carried
The internal relation fetched data between analysis result, data storage, the dimension for refining analysis is to set up the key of Data Analysis Model;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.
Claims (3)
1. a kind of scientific data of Facing Knowledge Service is processed and predictive analysis platform, it is characterised in that included:
Data processing module:With the Digest Database specific format receiving data of international mainstream, by the right of separate sources database
Answer field, description rule, memory requirement to be unified, separate sources database is converted into into the data required for this platform, it is right
Data carry out specification extraction, carry out the identification and screening of abnormal data, abnormal data are extracted so as to manual identified, to mistake
By mistake data re-recognize extraction;Data after process are carried out into duplicate removal process, and carries out md5 encryption, data can pass through after specification
ES indexes are realized to data analysis, inquiry operation;
Data analysis module:Data after for obtaining specification, set up Data Analysis Model, extracted data is simultaneously from relational model
Shown with visualization view;
Platform application module:For the result of data processing module and data analysis module to be supplied to into WEB application program, with
Just user inquires about, browses, counts scientific data.
2. the scientific data of Facing Knowledge Service according to claim 1 is processed and predictive analysis platform, and its feature exists
Include in the data processing module:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, memory requirement
Unified, so as not to same source database the data required for application platform can be converted into;
S1.2 data standard extraction modules:For to separate sources data are according to change data form and require to specific fields number
According to being extracted;
S1.3 disorder data recognition modules:For abnormal data to be extracted so as to manual identified, for process wrong data
Re-recognize extraction.
3. the scientific data of Facing Knowledge Service according to claim 1 is processed and predictive analysis platform, and its feature exists
Include in data analysis module:
S2.1 Data Analysis Model modules:For setting up Data Analysis Model, according to the data after specification, data analysis knot is extracted
Really, the internal relation between data storage;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611054928.3A CN106649599A (en) | 2016-11-25 | 2016-11-25 | Knowledge service oriented scientific research data processing and predictive analysis platform |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611054928.3A CN106649599A (en) | 2016-11-25 | 2016-11-25 | Knowledge service oriented scientific research data processing and predictive analysis platform |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106649599A true CN106649599A (en) | 2017-05-10 |
Family
ID=58812035
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611054928.3A Pending CN106649599A (en) | 2016-11-25 | 2016-11-25 | Knowledge service oriented scientific research data processing and predictive analysis platform |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649599A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463603A (en) * | 2017-06-16 | 2017-12-12 | 中国科学院计算机网络信息中心 | It is a kind of that control method and system are customized based on the scientific research project life cycle data management for quantifying DMP |
CN110990388A (en) * | 2019-11-29 | 2020-04-10 | 东软睿驰汽车技术(沈阳)有限公司 | Data processing method and device |
CN111190972A (en) * | 2019-12-31 | 2020-05-22 | 武汉俊楚信息科技有限公司 | Experiment data management system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699849A (en) * | 2015-04-07 | 2015-06-10 | 同方知网数字出版技术股份有限公司 | Digital library resource unified search system |
CN105740471A (en) * | 2016-03-14 | 2016-07-06 | 燕山大学 | Intelligent method for dynamically querying paper collection states |
-
2016
- 2016-11-25 CN CN201611054928.3A patent/CN106649599A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104699849A (en) * | 2015-04-07 | 2015-06-10 | 同方知网数字出版技术股份有限公司 | Digital library resource unified search system |
CN105740471A (en) * | 2016-03-14 | 2016-07-06 | 燕山大学 | Intelligent method for dynamically querying paper collection states |
Non-Patent Citations (1)
Title |
---|
寇远涛: "面向学科领域的科研信息环境建设研究", 《中国优秀博士学位论文全文数据库信息科技辑》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107463603A (en) * | 2017-06-16 | 2017-12-12 | 中国科学院计算机网络信息中心 | It is a kind of that control method and system are customized based on the scientific research project life cycle data management for quantifying DMP |
CN107463603B (en) * | 2017-06-16 | 2021-01-12 | 中国科学院计算机网络信息中心 | Scientific research project life cycle data management customized control method and system based on quantitative DMP |
CN110990388A (en) * | 2019-11-29 | 2020-04-10 | 东软睿驰汽车技术(沈阳)有限公司 | Data processing method and device |
CN111190972A (en) * | 2019-12-31 | 2020-05-22 | 武汉俊楚信息科技有限公司 | Experiment data management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111708773B (en) | Multi-source scientific and creative resource data fusion method | |
CN109189901B (en) | Method for automatically discovering new classification and corresponding corpus in intelligent customer service system | |
CN101566997B (en) | Determining words related to given set of words | |
CN110334212A (en) | A kind of territoriality audit knowledge mapping construction method based on machine learning | |
CN106067094A (en) | A kind of dynamic assessment method and system | |
CN109190098A (en) | A kind of document automatic creation method and system based on natural language processing | |
CN111967761A (en) | Monitoring and early warning method and device based on knowledge graph and electronic equipment | |
CN106779581A (en) | A kind of HRMS | |
CN111708774B (en) | Industry analytic system based on big data | |
CN106682236A (en) | Machine learning based patent data processing method and processing system adopting same | |
CN102542061A (en) | Intelligent product classification method | |
CN106649599A (en) | Knowledge service oriented scientific research data processing and predictive analysis platform | |
CN108228788A (en) | Guide of action automatically extracts and associated method and electronic equipment | |
CN111737477A (en) | Intellectual property big data-based intelligence investigation method, system and storage medium | |
CN115438199A (en) | Knowledge platform system based on smart city scene data middling platform technology | |
CN115794803A (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN112396437A (en) | Trade contract verification method and device based on knowledge graph | |
US8341170B2 (en) | Apparatus and method for visualizing technology change | |
CN117519656A (en) | Software development system based on intelligent manufacturing | |
CN112363996A (en) | Method, system, and medium for building a physical model of a power grid knowledge graph | |
Khekare et al. | Design of Automatic Key Finder for Search Engine Optimization in Internet of Everything | |
CN113792157B (en) | Domain mechanism-oriented knowledge base construction method | |
CN115760495A (en) | Method and device for realizing automatic labeling of legal cases | |
CN112256912A (en) | Intelligent marking analysis and playing method for trial video | |
CN109542973A (en) | A kind of patent information localization method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170510 |