CN106649599A - Knowledge service oriented scientific research data processing and predictive analysis platform - Google Patents

Knowledge service oriented scientific research data processing and predictive analysis platform Download PDF

Info

Publication number
CN106649599A
CN106649599A CN201611054928.3A CN201611054928A CN106649599A CN 106649599 A CN106649599 A CN 106649599A CN 201611054928 A CN201611054928 A CN 201611054928A CN 106649599 A CN106649599 A CN 106649599A
Authority
CN
China
Prior art keywords
data
platform
analysis
module
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611054928.3A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Latitude Mdt Infotech Ltd
Original Assignee
Hunan Latitude Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Latitude Mdt Infotech Ltd filed Critical Hunan Latitude Mdt Infotech Ltd
Priority to CN201611054928.3A priority Critical patent/CN106649599A/en
Publication of CN106649599A publication Critical patent/CN106649599A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a knowledge service oriented scientific research data processing and predictive analysis platform. The platform comprises a data processing module, a data analysis module and a platform application module. The data processing module is used for extracting data specification, identifying and processing abnormal data, and standardizing data word list. The data analysis module is used for acquiring standardized data, establishing a data analysis model, extracting data from a relational model, and displaying the data in a visualized view. The platform application module is used for providing the result derived from the data processing module and data analysis module to a WEB application program to allow a user to search, browse and do the statistics of scientific research data. The knowledge service oriented scientific research data processing and predictive analysis platform has the advantages of being capable of processing data of different sources, sending output in a uniform format, facilitating the user to retrieve data based on different needs, having a reusable standard data word list, and makes the rendering in the most direct and visualized manner.

Description

The scientific data of Facing Knowledge Service is processed and predictive analysis platform
Technical field
The present invention relates to the system or method of data processing operation, are related to be specially adapted for the system of education department or side Method, the scientific data of espespecially a kind of Facing Knowledge Service is processed and predictive analysis platform.
Background technology
With the globalization that the development and scientific research of information technology are propagated, mechanism's scientific achievement becomes international each colleges and universities and scientific research The units such as institutes weigh the evaluation criterion of its basic research strength, and numerous colleges and universities are by the evaluation of professional titles, performance appraisal, department's bonus etc. Link directly with science research output and influence power, scientific research enthusiasm is improved with this.The scientific data that in addition mechanism quantifies be school and When relevant departments carry out again the very important decisions such as department's restructuring, development adjustment and planning, there is provided objective fact basis.
Recent year Ge great colleges and universities are all using the papers included of three abstract database (SCI-E, CPCI-S, EI) as The important indicator of section's level and academic standing, other Digest Database is such as:SCOPUS is also continuous as up-and-coming youngster Expand its influence power.Because each system data including standard, coverage, data form are had nothing in common with each other, and data itself also can There is wrong record, by mistake many situations lack of standardization such as record.Often being difficult to meet mechanism carries out cross-platform, cross-system, random, many to data What is selected calls, and is unfavorable for effective management and use of the mechanism to scientific data.Its shortcoming mainly has:
1st, normalizing database differs, and recall ratio, pertinency factor are low, Overall Acquisition mechanism achievement high cost.Such as mechanism's literary style not Specification, disunity, rename, mechanism of the same name, according to missing inspection, flase drop caused by misspelling or field offering question etc..
2nd, because the stealthy data of the non-standard of data between different platform and part are refined without effectively cleaning, use Family is difficult to directly obtain the immediate data required for it by different platform.Such as it is difficult to count the dispatch and contribution of different departments Rate, the mechanism of this school first, the achievement statistics of communication means.
3rd, data effectively do not carry out data storage, conversion, and data recycle rate is very low.
The content of the invention
For the shortcoming of prior art, it is an object of the invention to provide a kind of scientific data of Facing Knowledge Service is processed With predictive analysis platform.One-stop management, inquiry and the predictive analysis platform of scientific data are provided, scientific data is carried out Convenient management, improves organization data utilization rate and value, servomechanism scientific research development.
The technical solution adopted for the present invention to solve the technical problems is:A kind of scientific data of Facing Knowledge Service is provided Process and predictive analysis platform, it is characterised in that include:
Data processing module:With the Digest Database specific format receiving data of international mainstream, by separate sources database Corresponding field, description rule, memory requirement unified, number separate sources database being converted into required for this platform According to, specification extraction is carried out to data, the identification and screening of abnormal data are carried out, abnormal data is extracted so as to artificial knowledge Not, extraction is re-recognized to wrong data;Data after process are carried out into duplicate removal process, and carries out md5 encryption, data after specification Can be realized to data analysis, inquiry operation by ES indexes;
Data analysis module:Data after for obtaining specification, set up Data Analysis Model, the extraction number from relational model According to and shown with visualization view;
Platform application module:For the result of data processing module and data analysis module to be supplied to into WEB application journey Sequence, so that user inquires about, browses, counts scientific data.
Further:
The data processing module includes:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, storage Requirement is unified, and so as not to same source database the data required for application platform can be converted into;
S1.2 data standard extraction modules:For to separate sources data are according to change data form and require to specific word Segment data is extracted;
S1.3 disorder data recognition modules:Data abnormal conditions are judged, data can be improved by machine learning The data of cleaning rule then complete data cleansing using machine, and the matching of machine None- identified then submits to data processing interface to be used for Abnormal data is extracted so as to manual identified, for wrong data is processed extraction is re-recognized.
Data analysis module includes:
S2.1 Data Analysis Model modules:For setting up Data Analysis Model, according to the data after specification, data point are extracted Internal relation between analysis result, data storage;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.
The invention has the beneficial effects as follows:
1 integrates multi-source data, and final with unified and standard form, according to user's different demands, provides clear in WEB terminal Look at, retrieve, inquiring about, statistical work, solve colleges and universities for mechanism scientific data statistics, the puzzlement of inquiry.
2 data cleansing strategies and rule can be sentenced with constantly improve and multiplexing by preliminary data cleansing strategy and rule The disconnected mechanism's cleaning that can complete most mechanism's scientific datas, greatly simplifies user data handling process;
3 pairs of mechanism's scientific datas carry out advanced analysis mining, and with visualization view formal intuition mechanism's scientific research situation is disclosed.
Description of the drawings
Below in conjunction with the accompanying drawings the invention will be further described.
Fig. 1 is the system construction drawing of the present invention.
Fig. 2 is the data processing module system diagram of the present invention.
Fig. 3 is the data analysis module system diagram of the present invention.
Specific embodiment
Referring to accompanying drawing, a kind of scientific data of Facing Knowledge Service of the invention is processed and predictive analysis platform embodiment, Include:
Data processing module:Referring to Fig. 2,
Data processing module includes:Data acquisition, data conversion, data standard extraction, disorder data recognition, data merge Duplicate removal, data storage, index;In order to ensure including the accuracy of data, in an embodiment of the present invention, data source is mainly with state The Digest Database specific format of border main flow;Data conversion is by the corresponding field of separate sources database, description rule, storage will Ask and unified, so as not to same source database the data required for platform of the present invention can be converted into;Because certain applications are put down Platform desired data relation is implied in some fields, and the present invention also carries out specification extraction to data, such as from author's address field Extract:The fields such as country, province, city, postcode, school, some two grades of departments;Because data itself are lack of standardization or extract Mismatching in information cannot extract or recognize so as to result in data, identification of the embodiment of the present invention with abnormal data and sieve Choosing, can by abnormal data according to some Rule Extractions out so as to manual identified, can know again for wrong data is processed Indescribably take;Data after process carry out duplicate removal process according to ad hoc rules, and carry out md5 encryption;Data pass through ES ropes after specification Draw realization to data analysis, inquiry operation;
Data analysis module:Data analysis is the soul that data value is excavated, and data visualization is by graphical hand Section, by expressing for the implicit information visual pattern of scientific data, data analysis module flow process is referring to Fig. 3;
Data analysis module includes:By " data processing module " obtain specification after data, set up Data Analysis Model, Extracted data and shown with visualization view from relational model;Wherein, data source is mainly " data processing module " Data after specification;The emphasis that Data Analysis Model is analysis is set up, covers analysis rule setting, analysis threshold value setting, analysis Dimension determines;By analyze data object and relational model scheduled store, number is shown according to the Visual Chart for embodying data relationship According to relation;
Platform application module:The functional module is mainly supplied to the result of data processing module and data analysis module WEB application program, so that user inquires about, browses, counts scientific data;It is included in web terminal and browses scientific data, screening scientific research number According to, retrieval scientific data.
In an embodiment of the present invention:
The data processing module includes:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, storage Requirement is unified, and so as not to same source database the data required for application platform, such as SCI-E author mechanism can be converted into General format is+two grades of departments of school+city postcode, and EI author mechanism general format is two grades of department+school+city postals Compile, data conversion module needs accurately to change school with two grades of department's relations;
S1.2 data standard extraction modules:To separate sources data are according to change data form and require to specific fields number According to being extracted, due to different pieces of information recording form difference, the hiding field having and data relationship need to be carried from data relationship Refining is obtained, and is extracted such as from author's address field:The fields such as country, province, city, postcode, school, some two grades of departments;
S1.3 disorder data recognition modules:Because data itself are lack of standardization or rule differs, the proficiency data of direct access Availability than relatively low, after only data being carried out with the excavation of profound level and being processed, for the data inquired about, count, analyze It is just authentic and valid;Abnormal data is divided into two situations:Machine is by the manageable abnormal conditions of study (such as the misspelling of data By mistake, the different naming methods of unified mechanism, different institutions literary style are consistent, unified mechanism's different time title change, non-sincere Data interference etc.).Prediction dictionary, it is possible to use the word frequency statisticses technology of mask data tentatively recognizes most of data;Using language Method analyze and fuzzy matching technology to data similarity judge can with identifying system relatively common misspelling the problems such as;Pass through The mode analyzing and associating data for judging are combined using multi-field, can be used for recognizing the problem that different institutions literary style is consistent;Utilize Non- sincere word can isolate interference data item;By the way that data cleansing strategy and rule are constantly improved and machine learning, can Further to improve data cleansing efficiency.For the problem of the complete None- identified of machine:Such as illegal value, shortage of data, then increase Plus the identification and screening of abnormal data, so as to manual identified.
Data analysis module includes:
S2.1 Data Analysis Models:The emphasis that Data Analysis Model is analysis is set up, how according to the data after specification, is carried The internal relation fetched data between analysis result, data storage, the dimension for refining analysis is to set up the key of Data Analysis Model;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.

Claims (3)

1. a kind of scientific data of Facing Knowledge Service is processed and predictive analysis platform, it is characterised in that included:
Data processing module:With the Digest Database specific format receiving data of international mainstream, by the right of separate sources database Answer field, description rule, memory requirement to be unified, separate sources database is converted into into the data required for this platform, it is right Data carry out specification extraction, carry out the identification and screening of abnormal data, abnormal data are extracted so as to manual identified, to mistake By mistake data re-recognize extraction;Data after process are carried out into duplicate removal process, and carries out md5 encryption, data can pass through after specification ES indexes are realized to data analysis, inquiry operation;
Data analysis module:Data after for obtaining specification, set up Data Analysis Model, extracted data is simultaneously from relational model Shown with visualization view;
Platform application module:For the result of data processing module and data analysis module to be supplied to into WEB application program, with Just user inquires about, browses, counts scientific data.
2. the scientific data of Facing Knowledge Service according to claim 1 is processed and predictive analysis platform, and its feature exists Include in the data processing module:
S1.1 data conversion modules:Data conversion is by the corresponding field of separate sources database, description rule, memory requirement Unified, so as not to same source database the data required for application platform can be converted into;
S1.2 data standard extraction modules:For to separate sources data are according to change data form and require to specific fields number According to being extracted;
S1.3 disorder data recognition modules:For abnormal data to be extracted so as to manual identified, for process wrong data Re-recognize extraction.
3. the scientific data of Facing Knowledge Service according to claim 1 is processed and predictive analysis platform, and its feature exists Include in data analysis module:
S2.1 Data Analysis Model modules:For setting up Data Analysis Model, according to the data after specification, data analysis knot is extracted Really, the internal relation between data storage;
S2.2 visual analyzings interface:Browsed with visual interface for users for data results.
CN201611054928.3A 2016-11-25 2016-11-25 Knowledge service oriented scientific research data processing and predictive analysis platform Pending CN106649599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611054928.3A CN106649599A (en) 2016-11-25 2016-11-25 Knowledge service oriented scientific research data processing and predictive analysis platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611054928.3A CN106649599A (en) 2016-11-25 2016-11-25 Knowledge service oriented scientific research data processing and predictive analysis platform

Publications (1)

Publication Number Publication Date
CN106649599A true CN106649599A (en) 2017-05-10

Family

ID=58812035

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611054928.3A Pending CN106649599A (en) 2016-11-25 2016-11-25 Knowledge service oriented scientific research data processing and predictive analysis platform

Country Status (1)

Country Link
CN (1) CN106649599A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463603A (en) * 2017-06-16 2017-12-12 中国科学院计算机网络信息中心 It is a kind of that control method and system are customized based on the scientific research project life cycle data management for quantifying DMP
CN110990388A (en) * 2019-11-29 2020-04-10 东软睿驰汽车技术(沈阳)有限公司 Data processing method and device
CN111190972A (en) * 2019-12-31 2020-05-22 武汉俊楚信息科技有限公司 Experiment data management system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699849A (en) * 2015-04-07 2015-06-10 同方知网数字出版技术股份有限公司 Digital library resource unified search system
CN105740471A (en) * 2016-03-14 2016-07-06 燕山大学 Intelligent method for dynamically querying paper collection states

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104699849A (en) * 2015-04-07 2015-06-10 同方知网数字出版技术股份有限公司 Digital library resource unified search system
CN105740471A (en) * 2016-03-14 2016-07-06 燕山大学 Intelligent method for dynamically querying paper collection states

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
寇远涛: "面向学科领域的科研信息环境建设研究", 《中国优秀博士学位论文全文数据库信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107463603A (en) * 2017-06-16 2017-12-12 中国科学院计算机网络信息中心 It is a kind of that control method and system are customized based on the scientific research project life cycle data management for quantifying DMP
CN107463603B (en) * 2017-06-16 2021-01-12 中国科学院计算机网络信息中心 Scientific research project life cycle data management customized control method and system based on quantitative DMP
CN110990388A (en) * 2019-11-29 2020-04-10 东软睿驰汽车技术(沈阳)有限公司 Data processing method and device
CN111190972A (en) * 2019-12-31 2020-05-22 武汉俊楚信息科技有限公司 Experiment data management system

Similar Documents

Publication Publication Date Title
CN111708773B (en) Multi-source scientific and creative resource data fusion method
CN109189901B (en) Method for automatically discovering new classification and corresponding corpus in intelligent customer service system
CN101566997B (en) Determining words related to given set of words
CN110334212A (en) A kind of territoriality audit knowledge mapping construction method based on machine learning
CN106067094A (en) A kind of dynamic assessment method and system
CN109190098A (en) A kind of document automatic creation method and system based on natural language processing
CN111967761A (en) Monitoring and early warning method and device based on knowledge graph and electronic equipment
CN106779581A (en) A kind of HRMS
CN111708774B (en) Industry analytic system based on big data
CN106682236A (en) Machine learning based patent data processing method and processing system adopting same
CN102542061A (en) Intelligent product classification method
CN106649599A (en) Knowledge service oriented scientific research data processing and predictive analysis platform
CN108228788A (en) Guide of action automatically extracts and associated method and electronic equipment
CN111737477A (en) Intellectual property big data-based intelligence investigation method, system and storage medium
CN115438199A (en) Knowledge platform system based on smart city scene data middling platform technology
CN115794803A (en) Engineering audit problem monitoring method and system based on big data AI technology
CN112396437A (en) Trade contract verification method and device based on knowledge graph
US8341170B2 (en) Apparatus and method for visualizing technology change
CN117519656A (en) Software development system based on intelligent manufacturing
CN112363996A (en) Method, system, and medium for building a physical model of a power grid knowledge graph
Khekare et al. Design of Automatic Key Finder for Search Engine Optimization in Internet of Everything
CN113792157B (en) Domain mechanism-oriented knowledge base construction method
CN115760495A (en) Method and device for realizing automatic labeling of legal cases
CN112256912A (en) Intelligent marking analysis and playing method for trial video
CN109542973A (en) A kind of patent information localization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170510