CN105512336A - Method and device for mass data processing based on Hadoop - Google Patents

Method and device for mass data processing based on Hadoop Download PDF

Info

Publication number
CN105512336A
CN105512336A CN201511009913.0A CN201511009913A CN105512336A CN 105512336 A CN105512336 A CN 105512336A CN 201511009913 A CN201511009913 A CN 201511009913A CN 105512336 A CN105512336 A CN 105512336A
Authority
CN
China
Prior art keywords
data
indicator
module
specific statistics
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511009913.0A
Other languages
Chinese (zh)
Inventor
王明龙
王力
彭塨烨
谢潇宇
王伟
包辰明
赵金鑫
张舜华
陈暑生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN201511009913.0A priority Critical patent/CN105512336A/en
Publication of CN105512336A publication Critical patent/CN105512336A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and device for mass data processing based on Hadoop. The method comprises the steps of collecting data, integrating the collected data, storing the integrated data into an Hbase database, performing index statistics respectively according to a data update cycle in the Hbase database, and storing the index statistics results into a relational database. By implementing the method and device, the database management pressure in mass data processing can be relieved, and the display of statistics results of mass data is facilitated.

Description

A kind of mass data processing method based on Hadoop and device
Technical field
The present invention relates to data processing field, specifically, relate to a kind of mass data processing method and device.
Background technology
In e-commerce website Correlation method for data processing, usually by unified for data asynchronous, discrete to each business master library, access log, flowing water daily record etc. process, quasi real time and in the recent period monitoring termly to system indexs such as service traffics, visit capacity, user, products is realized.Nowadays, along with the fast development of ecommerce, the data that website produces are explosive growth, and how quickly and efficiently Storage and Processing mass data becomes the important technological problems that people face.
The database of current main employing relationship type processes mass data, but all there is the problem required database transaction consistency in traditional relevant database, and in data mining or data analysis process, do not need to be strict with db transaction characteristic and read consistency.Therefore, calculate based on the issued transaction in the database of relationship type for being used for carrying out data and being a white elephant for data mining.Therefore, the mass data processing scheme designing the calculating of a set of applicable data and excavation becomes the technical matters needing solution badly.
Summary of the invention
For solving the problems of the technologies described above, the invention provides a kind of mass data processing method based on Hadoop and device.
According to the first aspect of embodiment of the present invention, provide a kind of mass data processing method based on Hadoop, the method can comprise: image data; Gathered data are integrated, by the data after integration stored in Hbase database, carries out indicator-specific statistics respectively, the result of indicator-specific statistics stored in relevant database according to the update cycle of data in described Hbase database.
In certain embodiments of the present invention, described image data comprises: embed javascript script and the asynchronous log collection daily record data of rsyslog in front end page, and/or, by the business datum of rsync synchronous acquisition application server.
It is in certain embodiments of the present invention, described that gathered data to be carried out integration be based on FlumeNG framework.
In certain embodiments of the present invention, the data of described collection carry out buffer memory with the queue of file type in FlumeNG framework.
In certain embodiments of the present invention, described method also comprises: the result of described indicator-specific statistics is saved as regular snapshot document, and is outwards provided by described regular snapshot document by BDE.
In certain embodiments of the present invention, described method also comprises: the querying condition receiving user's input, and access according to described querying condition the result that described relevant database obtains described indicator-specific statistics, then the result of described indicator-specific statistics is shown to described user.
According to the second aspect of embodiment of the present invention, provide a kind of mass data processing device based on Hadoop, this device can comprise: acquisition module, for image data; Integrate module, integrates for the data gathered by described acquisition module; Memory module, for the data after described integrate module is integrated stored in Hbase database, processing module, for carrying out indicator-specific statistics respectively according to the update cycle of data in described Hbase database, wherein, described memory module, also for the result of the indicator-specific statistics by described processing module stored in relevant database.
In certain embodiments of the present invention, described acquisition module image data comprises: embed javascript script and the asynchronous log collection daily record data of rsyslog in front end page, and/or, by the business datum of rsync synchronous acquisition application server.
In certain embodiments of the present invention, described integrate module is based on FlumeNG framework.
In certain embodiments of the present invention, described integrate module carries out buffer memory with the queue of file type in FlumeNG framework.
In certain embodiments of the present invention, described processing module, also for the result of described indicator-specific statistics is saved as regular snapshot document, and is outwards provided described regular snapshot document by BDE.
In certain embodiments of the present invention, described device also comprises: represent module, for receiving the querying condition of user's input, and accessing according to described querying condition the result that described relevant database obtains described indicator-specific statistics, then the result of described indicator-specific statistics being shown to described user.
The mass data processing method based on Hadoop that embodiment of the present invention provides and device, be stored into dissimilar database respectively by the mass data after collection, integration with by the statistics that mass data processing obtains, while improving the data base administration efficiency of mass data, also facilitate inquiry and the displaying of mass data statistics; And the form of the data in different update cycle by snapshot is externally unified for number, unify external confession number frequency, facilitate data analysis and the excavation of mass data.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the mass data processing method based on Hadoop according to one embodiment of the present invention;
Fig. 2 is the configuration diagram of the mass data processing based on Hadoop according to one embodiment of the present invention;
Fig. 3 is the structural representation of the mass data processing device based on Hadoop according to one embodiment of the present invention;
Fig. 4 is the structural representation of the mass data processing device based on Hadoop according to one embodiment of the present invention.
Embodiment
Be described in detail to various aspects of the present invention below in conjunction with the drawings and specific embodiments.Wherein, well-known module, unit and connection each other, link, communication or operation do not illustrate or do not elaborate.Further, described feature, framework or function can combine by any way in one or more embodiments.It will be appreciated by those skilled in the art that following various embodiments are only for illustrating, but not for limiting the scope of the invention.Can also easy understand, the module in each embodiment described herein and shown in the drawings or unit or processing mode can be undertaken combining and designing by various different configuration.
See the schematic flow sheet that Fig. 1, Fig. 1 are the mass data processing methods based on Hadoop according to one embodiment of the present invention, the method can comprise:
S101, image data;
Gathered data are integrated by S102,
S103, by the data after integration stored in Hbase database,
S104, carries out indicator-specific statistics respectively according to the update cycle of data in described Hbase database,
S105, the result of indicator-specific statistics stored in relevant database.
Hadoop of the present invention refers to the distributed system architecture developed by Apache fund club, and the power that can make full use of cluster carries out high-speed computation and storage.Hadoop achieves a distributed file system HDFS (HadoopDistributedFileSystem), can be deployed on cheap hardware, provides high-throughput to visit the data of application program.MapReduce is the main execution framework of Hadoop, for the programming model of distributed parallel process, with the data set that a kind of reliably fault-tolerant mode parallel processing is ultra-large.Hbase is a NoSQL database towards row being structured on HDFS.
Data processing method of the present invention can comprise: step S101, image data, such as, carries out data acquisition by the acquisition module being arranged at data source.For electronic emporium, such as, can collect the data that the business datum of shopping mall website, access log, flowing water daily record etc. are asynchronous, discrete, be follow-up data processing providing source data.
In some embodiments, the front end page that step S101 image data can be included in website embeds javascript script and the asynchronous log collection daily record data of rsyslog quasi real time (such as, access log and flowing water daily record etc.), and by the business datum of the application server of rsync backup tool synchronous acquisition store main website.In other embodiment, can also according to applying the class needing only to gather in daily record data or business datum.
Next, perform step S102, the data gathered by step S101 are integrated, and described integration can comprise resolving gathered business datum, carrying out correlation inquiry and filtration etc. by resolving the critical field of raw data obtained.Data Integration of the present invention is based on ApacheFlumeNG framework.Flume framework mainly comprises three module: Source (reception) module in charge data access and monitors, corresponding above-mentioned different acquisition mode adopts different access snoop agents end Agent, such as: LogAgent (journaling agent end) is responsible for monitoring the daily record data integrated rsyslog asynchronous transmission and come, and DBAgent (database broker end) is responsible for the business datum integrating relevant database in application server; Channel (queue) module in charge data buffering, after namely receiving Source data, Channel is mail in unification, supplies Sink resume module below.In embodiments of the present invention, the data gathered carry out buffer memory with the queue of file type, are FileChannel mode by Channel by unified definition; Sink module in charge data write.
Then, perform step S103, the data after being integrated by step S102 are stored in Hbase database.Data storage of the present invention can adopt HBaseSink interface, by the sink module in FlumeNG framework, data are saved in the Hbase database of accumulation layer, the redundant data table etc. that the data being stored into Hbase database can comprise access log, flowing water daily record, business datum and be formed after business datum being associated with business datum table, these data can provide basic data for report generation and data mining.The present invention utilizes the data being used for report generation and data analysis and excavation not need to be strict with the feature of db transaction characteristic and read consistency, by gather mass data storage to Hbase database the oracle database of non-relational, significantly can improve the data base administration efficiency of mass data processing, facilitate data analysis and the data mining of mass data.
Then, perform step S104, the update cycle according to the data in Hbase database carries out indicator-specific statistics respectively.From description above, the data that step S101 gathers can be asynchronous data, these asynchronous data can have the different update cycles (such as, 10 minutes, 1 hour etc.), indicator-specific statistics can be carried out respectively according to the update cycle of adopted data, that is, for the data in different update cycle, adopt the corresponding update cycle to carry out indicator-specific statistics.Also can comprise the duplicate removal to data to the statistics of data in step S104, wherein, the mode of timed task can be adopted the duplicate removal of data to carry out with statistics.For the data in different update cycle, after obtaining the result of indicator-specific statistics by timed task statistics, buffer memory can be carried out to the result of indicator-specific statistics, such as, the result of indicator-specific statistics can be saved as regular snapshot document (such as, day snapshot document), by BDE (BorlandDatabaseEngine), preserved regular snapshot document is outwards provided, carry out data analysis or data mining etc.Such as, by BDE.net, snapshot document outwards can be transmitted.The external of the data source in different update cycle is unified for the number update cycle by snapshot document by the present invention, greatly facilitates the statistical treatment of various dissimilar mass data.
Then, step S105 is performed, by the result of indicator-specific statistics in step S104 stored in relevant database, such as, oracle database.The result of indicator-specific statistics stored in relevant database, can be improved the efficiency of correlation inquiry in indicator-specific statistics result queries and displaying, reduce the query time of the statistics of mass data, facilitate the visual presentation of mass data statistics by the present invention.
Mass data processing method based on Hadoop of the present invention also can comprise: receive the querying condition of user's input (such as, need the keyword etc. of inquiry), and the result of indicator-specific statistics is obtained according to this querying condition access relation type database, then the result of indicator-specific statistics is shown to this user.Such as, can by the Structured Query Language (SQL) (StructuredQueryLanguage of standard, SQL) statistics or in the relational database of HQL (HibernateQueryLanguage) interface polls, these statisticss can by document form to present customers.In other embodiment, the data in Hive interface polls high-volume database Hbase can also be set.Hive is a Tool for Data Warehouse of Hadoop, structurized data file can be mapped as a database table, provide complete SQL query function, and SQL statement can be converted to MapReduce task run.
In a kind of specific embodiment, the framework of the mass data processing based on Hadoop of the present invention can be as shown in Figure 2.In fig. 2, image data, these data can comprise script (such as, javascript script) data, WEB daily record data and business datum.The data gathered are integrated by Flume Data Collection.Integrating the data obtained can stored in Hbase/HDFS/oracle.These data stored can by SQLAPI (ApplicationprogrammingInterface, application programming interface) and HQL and search engine data communication, search engine can provide Search Results by HIVE interface to data query interface, checks for user.The data stored can also be added up data respectively by timed task, generating report forms record, form embedded report controls.The data stored also externally supply number by the form of snapshot document, form day whole snapshot document.
Describe the flow process of the mass data processing method based on Hadoop above in conjunction with specific embodiments, describe the mass data processing device based on Hadoop of application said method below in conjunction with specific embodiment.
See the structural representation that Fig. 3, Fig. 3 are the mass data processing devices based on Hadoop according to one embodiment of the present invention, this device 200 can comprise:
Acquisition module 201, for image data;
Integrate module 202, integrates for the data gathered by described acquisition module 201;
Memory module 203, for the data after described integrate module 202 is integrated stored in Hbase database,
Processing module 204, for carrying out indicator-specific statistics respectively according to the update cycle of data in described Hbase database, wherein,
Described memory module 203, also for the result of the indicator-specific statistics by described processing module 204 stored in relevant database.
Mass data processing device 200 based on Hadoop of the present invention can comprise acquisition module 201, integrate module 202, memory module 203 and processing module 204, and modules can be arranged at different servers respectively.
Acquisition module 201 can be used for image data, such as, can be arranged at data source (such as, providing the database of the application server of miscellaneous service process).For electronic emporium, such as, can collect the data that the business datum of shopping mall website, access log, flowing water daily record etc. are asynchronous, discrete, be follow-up data processing providing source data.
In some embodiments, the front end page that acquisition module 201 image data can be included in website embeds javascript script and the asynchronous log collection daily record data of rsyslog quasi real time (such as, access log and flowing water daily record etc.), and by the business datum of the application server of rsync backup tool synchronous acquisition store main website.In other embodiment, acquisition module 201 can also according to applying the class needing only to gather in daily record data or business datum.
The data that acquisition module 201 gathers are integrated by integrate module 202, and described integration can comprise resolving gathered business datum, carrying out correlation inquiry and filtration etc. by resolving the critical field of raw data obtained.Data Integration of the present invention is based on ApacheFlumeNG framework.Flume framework mainly comprises three module: Source (reception) module in charge data access and monitors, corresponding above-mentioned different acquisition mode adopts different accesses to monitor Agent, as: LogAgent is responsible for monitoring the data integrated rsyslog asynchronous transmission and come, and DBAgent is responsible for the business datum integrating relevant database in application server; Channel (queue) module in charge data buffering, after namely receiving Source data, Channel is mail in unification, supplies Sink resume module below.In embodiments of the present invention, the data gathered carry out buffer memory with the queue of file type, are FileChannel mode by Channel by unified definition; Sink module in charge data write.
Memory module 203 integrate module 202 is integrated after data stored in Hbase database.Data storage of the present invention can adopt HBaseSink interface, by the sink module in FlumeNG framework, data are saved in the Hbase database of accumulation layer, the redundant data table etc. that the data being stored into Hbase database can comprise access log, flowing water daily record, business datum and be formed after business datum being associated with business datum table, these data can provide basic data for report generation and data mining.The present invention utilizes the data being used for report generation and data analysis and excavation not need to be strict with the feature of db transaction characteristic and read consistency, by gather mass data storage to Hbase database non-relational database, significantly can improve the data base administration efficiency of mass data processing, facilitate data analysis and the data mining of mass data.
Processing module 204 carries out indicator-specific statistics respectively according to the update cycle of the data in Hbase database.From description above, the data that acquisition module 201 gathers can be asynchronous data, these asynchronous data can have the different update cycles (such as, 10 minutes, 1 hour etc.), indicator-specific statistics can be carried out respectively according to the update cycle of adopted data, that is, for the data in different update cycle, adopt the corresponding update cycle to carry out indicator-specific statistics.The statistics of processing module 204 pairs of data also can comprise the duplicate removal to data, wherein, the mode of timed task can be adopted the duplicate removal of data to carry out with statistics.For the data in different update cycle, after obtaining the result of indicator-specific statistics by timed task statistics, buffer memory can be carried out, such as, the result of indicator-specific statistics can be saved as regular snapshot document (such as, day snapshot document), by BDE, preserved regular snapshot document is outwards provided.Such as, by BDE.net, snapshot document outwards can be transmitted.The external of the data source in different update cycle is unified for the number update cycle by snapshot document by the present invention, greatly facilitates the statistical treatment of various dissimilar mass data.
Memory module 203 can also be used for by the result of indicator-specific statistics in processing module 204 stored in relevant database, such as, and oracle database.The result of indicator-specific statistics stored in relevant database, can be improved the efficiency of correlation inquiry in indicator-specific statistics result queries and displaying, reduce the query time of the statistics of mass data, facilitate the visual presentation of mass data statistics by the present invention.
Mass data processing device based on Hadoop of the present invention also can comprise display module 205, as shown in Figure 4.Display module 205 can receive the querying condition (such as, needing the keyword of inquiry) of user's input, and obtains the result of indicator-specific statistics according to this querying condition access relation type database, then the result of indicator-specific statistics is shown to this user.Such as, can by the statistics in the Structured Query Language (SQL) SQL of standard or the relational database of HQL interface polls, these statisticss can by document form to present customers.In other embodiment, the data in Hive interface polls high-volume database Hbase can also be set.Hive is a Tool for Data Warehouse of Hadoop, structurized data file can be mapped as a database table, provide complete SQL query function, and SQL statement can be converted to MapReduce task run.
Through the above description of the embodiments, those skilled in the art can be well understood to the present invention and can realize by the mode of software combined with hardware platform.Based on such understanding, what technical scheme of the present invention contributed to background technology can embody with the form of software product in whole or in part, this computer software product can be stored in storage medium, as ROM/RAM, magnetic disc, CD etc., comprising some instructions in order to make a computer equipment (can be personal computer, server, smart mobile phone or the network equipment etc.) perform the method described in some part of each embodiment of the present invention or embodiment.
The term used in instructions of the present invention and wording, just to illustrating, are not meaned and are formed restriction.It will be appreciated by those skilled in the art that under the prerequisite of the ultimate principle not departing from disclosed embodiment, can various change be carried out to each details in above-mentioned embodiment.Therefore, scope of the present invention is only determined by claim, and in the claims, except as otherwise noted, all terms should be understood by the most wide in range rational meaning.

Claims (12)

1. based on a mass data processing method of Hadoop, it is characterized in that, described method comprises:
Image data,
Gathered data are integrated,
By the data after integration stored in Hbase database,
Indicator-specific statistics is carried out respectively according to the update cycle of data in described Hbase database,
The result of indicator-specific statistics stored in relevant database.
2. method according to claim 1, is characterized in that, described image data comprises: embed javascript script and the asynchronous log collection daily record data of rsyslog in front end page, and/or, by the business datum of rsync synchronous acquisition application server.
3. method according to claim 1, is characterized in that, described gathered data to be carried out integration be based on FlumeNG framework.
4. method according to claim 3, is characterized in that, the data of described collection carry out buffer memory with the queue of file type in FlumeNG framework.
5. method according to claim 1, is characterized in that, described method also comprises:
The result of described indicator-specific statistics is saved as regular snapshot document, and by BDE, described regular snapshot document is outwards provided.
6. method according to claim 1, is characterized in that, described method also comprises:
Receive the querying condition of user's input, and access according to described querying condition the result that described relevant database obtains described indicator-specific statistics, then the result of described indicator-specific statistics is shown to described user.
7. based on a mass data processing device of Hadoop, it is characterized in that, described device comprises:
Acquisition module, for image data;
Integrate module, integrates for the data gathered by described acquisition module;
Memory module, for the data after described integrate module is integrated stored in Hbase database,
Processing module, for carrying out indicator-specific statistics respectively according to the update cycle of data in described Hbase database, wherein,
Described memory module, also for the result of the indicator-specific statistics by described processing module stored in relevant database.
8. device according to claim 7, it is characterized in that, described acquisition module image data comprises: embed javascript script and the asynchronous log collection daily record data of rsyslog in front end page, and/or, by the business datum of rsync synchronous acquisition application server.
9. device according to claim 7, is characterized in that, described integrate module is based on FlumeNG framework.
10. device according to claim 9, is characterized in that, described integrate module carries out buffer memory with the queue of file type in FlumeNG framework.
11. devices according to claim 7, is characterized in that, described processing module, also for the result of described indicator-specific statistics is saved as regular snapshot document, and are outwards provided by described regular snapshot document by BDE.
12. devices according to claim 7, is characterized in that, described device also comprises:
Represent module, for receiving the querying condition of user's input, and accessing according to described querying condition the result that described relevant database obtains described indicator-specific statistics, then the result of described indicator-specific statistics being shown to described user.
CN201511009913.0A 2015-12-29 2015-12-29 Method and device for mass data processing based on Hadoop Pending CN105512336A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511009913.0A CN105512336A (en) 2015-12-29 2015-12-29 Method and device for mass data processing based on Hadoop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511009913.0A CN105512336A (en) 2015-12-29 2015-12-29 Method and device for mass data processing based on Hadoop

Publications (1)

Publication Number Publication Date
CN105512336A true CN105512336A (en) 2016-04-20

Family

ID=55720316

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511009913.0A Pending CN105512336A (en) 2015-12-29 2015-12-29 Method and device for mass data processing based on Hadoop

Country Status (1)

Country Link
CN (1) CN105512336A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021574A (en) * 2016-05-27 2016-10-12 安徽四创电子股份有限公司 Data storage replication method and system
CN106250410A (en) * 2016-07-21 2016-12-21 广州安望信息科技有限公司 A kind of data processing method based on flume system and device thereof
CN106570153A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data extraction method and system for mass URLs
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN106909117A (en) * 2017-03-28 2017-06-30 重庆市通信建设有限公司 data real-time monitoring system and method
CN106933971A (en) * 2017-02-13 2017-07-07 北京优炫软件股份有限公司 A kind of data analysis statistical system based on science service
CN107506476A (en) * 2017-09-08 2017-12-22 上海炫萌网络科技有限公司 User behavior data collects and surveys system and analysis method
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN108038181A (en) * 2017-12-08 2018-05-15 山东浪潮商用***有限公司 A kind of data handling system and data processing method
CN108399231A (en) * 2018-02-13 2018-08-14 中体彩科技发展有限公司 A kind of collecting method and Flume data collection clients
CN108492862A (en) * 2018-02-01 2018-09-04 西安大数据与人工智能研究院 Medical image cloud imaging based on Distributed C T terminating machines and interpretation method and system
CN108536810A (en) * 2018-03-30 2018-09-14 四川斐讯信息技术有限公司 Data visualization methods of exhibiting and system
WO2018196650A1 (en) * 2017-04-26 2018-11-01 平安科技(深圳)有限公司 User feature data acquisition method and device, server, and medium
CN108920659A (en) * 2018-07-03 2018-11-30 广州唯品会信息科技有限公司 Data processing system and its data processing method, computer readable storage medium
CN108959608A (en) * 2018-07-13 2018-12-07 中国建设银行股份有限公司 Historical transactional information querying method and device
CN109271437A (en) * 2018-09-27 2019-01-25 智庭(北京)智能科技有限公司 A kind of Query method in real time of magnanimity rent information
CN109299089A (en) * 2018-08-27 2019-02-01 广东电网有限责任公司信息中心 The calculating and storage method and calculating of a kind of label data of drawing a portrait and storage system
CN106095391B (en) * 2016-05-31 2019-03-26 携程计算机技术(上海)有限公司 Calculation method and system based on big data platform and algorithm model
CN111966710A (en) * 2020-08-04 2020-11-20 中国建设银行股份有限公司 Bank system fund flow direction scene analysis method and device
CN115495499A (en) * 2022-09-22 2022-12-20 生态环境部南京环境科学研究所 Integration statistical method based on mass data of same medium in multiple batches in polluted site

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208991A (en) * 2010-03-29 2011-10-05 腾讯科技(深圳)有限公司 Blog processing method, device and system
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
US20140156638A1 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods
CN103902838A (en) * 2014-04-17 2014-07-02 北京泰乐德信息技术有限公司 TMIS traffic flow determination method and system based on cloud computing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102208991A (en) * 2010-03-29 2011-10-05 腾讯科技(深圳)有限公司 Blog processing method, device and system
US20140156638A1 (en) * 2012-11-30 2014-06-05 Orbis Technologies, Inc. Ontology harmonization and mediation systems and methods
CN103678665A (en) * 2013-12-24 2014-03-26 焦点科技股份有限公司 Heterogeneous large data integration method and system based on data warehouses
CN103902838A (en) * 2014-04-17 2014-07-02 北京泰乐德信息技术有限公司 TMIS traffic flow determination method and system based on cloud computing

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
曾刚: "《实战Hadoop大数据处理》", 31 August 2015, 清华大学出版社 *
李可风等: "大数据环境下移动数字图书馆信息推送策略研究", 《图书馆学研究》 *
罗顿等: "《云计算架构 解决方案设计手册》", 31 August 2012, 机械工业出版社 *
辛晃等: "基于Hadoop+MPP架构的电信运营商网络数据共享平台研究", 《电信科学》 *
马延辉等: "《Storm企业级应用实战、运维和调优》", 30 June 2015, 机械工业出版社 *
鲍亮等: "《实战大数据》", 31 March 2014, 清华大学出版社 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106021574A (en) * 2016-05-27 2016-10-12 安徽四创电子股份有限公司 Data storage replication method and system
CN106095391B (en) * 2016-05-31 2019-03-26 携程计算机技术(上海)有限公司 Calculation method and system based on big data platform and algorithm model
CN106250410A (en) * 2016-07-21 2016-12-21 广州安望信息科技有限公司 A kind of data processing method based on flume system and device thereof
CN106250410B (en) * 2016-07-21 2020-01-07 深圳软通动力信息技术有限公司 Data processing method and device based on flash system
CN107818120A (en) * 2016-09-14 2018-03-20 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN107818120B (en) * 2016-09-14 2020-05-29 博雅网络游戏开发(深圳)有限公司 Data processing method and device based on big data
CN107979477A (en) * 2016-10-21 2018-05-01 苏宁云商集团股份有限公司 A kind of method and system of business monitoring
CN106570153A (en) * 2016-10-28 2017-04-19 上海斐讯数据通信技术有限公司 Data extraction method and system for mass URLs
CN106709029A (en) * 2016-12-28 2017-05-24 上海斐讯数据通信技术有限公司 File hierarchical processing method and processing system based on Hadoop and MySQL
CN106933971A (en) * 2017-02-13 2017-07-07 北京优炫软件股份有限公司 A kind of data analysis statistical system based on science service
CN106909117A (en) * 2017-03-28 2017-06-30 重庆市通信建设有限公司 data real-time monitoring system and method
WO2018196650A1 (en) * 2017-04-26 2018-11-01 平安科技(深圳)有限公司 User feature data acquisition method and device, server, and medium
CN107506476A (en) * 2017-09-08 2017-12-22 上海炫萌网络科技有限公司 User behavior data collects and surveys system and analysis method
CN108038181A (en) * 2017-12-08 2018-05-15 山东浪潮商用***有限公司 A kind of data handling system and data processing method
CN108492862B (en) * 2018-02-01 2019-10-22 西安大数据与人工智能研究院 The imaging of medical image cloud and interpretation method and system based on Distributed C T terminating machine
CN108492862A (en) * 2018-02-01 2018-09-04 西安大数据与人工智能研究院 Medical image cloud imaging based on Distributed C T terminating machines and interpretation method and system
CN108399231A (en) * 2018-02-13 2018-08-14 中体彩科技发展有限公司 A kind of collecting method and Flume data collection clients
CN108536810A (en) * 2018-03-30 2018-09-14 四川斐讯信息技术有限公司 Data visualization methods of exhibiting and system
CN108920659A (en) * 2018-07-03 2018-11-30 广州唯品会信息科技有限公司 Data processing system and its data processing method, computer readable storage medium
CN108920659B (en) * 2018-07-03 2022-06-07 广州唯品会信息科技有限公司 Data processing system, data processing method thereof, and computer-readable storage medium
CN108959608A (en) * 2018-07-13 2018-12-07 中国建设银行股份有限公司 Historical transactional information querying method and device
CN109299089A (en) * 2018-08-27 2019-02-01 广东电网有限责任公司信息中心 The calculating and storage method and calculating of a kind of label data of drawing a portrait and storage system
CN109299089B (en) * 2018-08-27 2020-05-26 广东电网有限责任公司信息中心 Calculation and storage method and calculation and storage system for portrait label data
CN109271437A (en) * 2018-09-27 2019-01-25 智庭(北京)智能科技有限公司 A kind of Query method in real time of magnanimity rent information
CN111966710A (en) * 2020-08-04 2020-11-20 中国建设银行股份有限公司 Bank system fund flow direction scene analysis method and device
CN115495499A (en) * 2022-09-22 2022-12-20 生态环境部南京环境科学研究所 Integration statistical method based on mass data of same medium in multiple batches in polluted site
CN115495499B (en) * 2022-09-22 2023-05-30 生态环境部南京环境科学研究所 Integrated statistical method based on contaminated site same-medium multi-batch mass data

Similar Documents

Publication Publication Date Title
CN105512336A (en) Method and device for mass data processing based on Hadoop
US10754877B2 (en) System and method for providing big data analytics on dynamically-changing data models
CN107451225B (en) Scalable analytics platform for semi-structured data
Gupta et al. Cloud computing and big data analytics: what is new from databases perspective?
US10515386B2 (en) System and method for performing cross-platform big data analytics
Bordin et al. Dspbench: A suite of benchmark applications for distributed data stream processing systems
US10565208B2 (en) Analyzing multiple data streams as a single data object
CN104182506A (en) Log management method
JP2017146994A (en) Efficient query processing using histograms in columnar database
US20170330239A1 (en) Methods and systems for near real-time lookalike audience expansion in ads targeting
US11042899B2 (en) System and method for tracking users across a plurality of media platforms
JP2019503525A (en) Event batch processing, output sequencing, and log-based state storage in continuous query processing
CN103268336A (en) Fast data and big data combined data processing method and system
US20150100596A1 (en) System and method for performing set operations with defined sketch accuracy distribution
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN104426713A (en) Method and device for monitoring network site access effect data
CN103001796A (en) Method and device for processing weblog data by server
US10409813B2 (en) Imputing data for temporal data store joins
CN108415964A (en) Tables of data querying method, device, terminal device and storage medium
CN105405070A (en) Distributed memory power grid system construction method
WO2019120093A1 (en) Cardinality estimation in databases
US20210311845A1 (en) Method and apparatus of monitoring interface performance of distributed application, device and storage medium
CN104572856A (en) Converged storage method of service source data
CN110020273B (en) Method, device and system for generating thermodynamic diagram
CN111258978A (en) Data storage method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160420