CN102214236B - Method and system for processing mass data - Google Patents

Method and system for processing mass data Download PDF

Info

Publication number
CN102214236B
CN102214236B CN 201110182296 CN201110182296A CN102214236B CN 102214236 B CN102214236 B CN 102214236B CN 201110182296 CN201110182296 CN 201110182296 CN 201110182296 A CN201110182296 A CN 201110182296A CN 102214236 B CN102214236 B CN 102214236B
Authority
CN
China
Prior art keywords
data
platform
module
calling
scheduler module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN 201110182296
Other languages
Chinese (zh)
Other versions
CN102214236A (en
Inventor
祝博立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Feinno Communication Technology Co Ltd
Original Assignee
Beijing Feinno Communication Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Feinno Communication Technology Co Ltd filed Critical Beijing Feinno Communication Technology Co Ltd
Priority to CN 201110182296 priority Critical patent/CN102214236B/en
Publication of CN102214236A publication Critical patent/CN102214236A/en
Application granted granted Critical
Publication of CN102214236B publication Critical patent/CN102214236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for processing mass data. The method comprises the following steps that: a scheduling module judges whether to call a data warehouse operation statement (HQL) according to acquired current service information and a predetermined scheduling strategy, acquires a calling sequence according to the acquired current service information and the predetermined scheduling strategy if the HQL is called, and calls the HQL to a data warehouse platform according to the calling sequence; and the data warehouse platform reads configuration information which corresponds to a data warehouse from a relational database, triggers the HQL to perform operation on data stored in a distributed platform according to the calling sequence, generates result data and stores the result data into the distributed platform. The invention also discloses a system for processing the mass data. By the method and the system provided by the invention, the flexibility of processing of the mass data can be improved.

Description

A kind of mass data processing method and system
Technical field
The present invention relates to data processing technique, particularly relate to a kind of mass data processing method and system.
Background technology
Along with the fast development of Internet technology, Internet user's quantity sharp increase, therefore, more and more for the demand that the data such as the collection of Internet user's data, cleaning, statistics, analysis are processed.Simultaneously, the magnitude of Internet user's data also is being explosive growth, thereby the pressure that causes above-mentioned data to be processed further increases.
At present, when Internet user's mass data is processed, the method that adopts distributed platform (Hadoop) technology to combine with Data Warehouse Platform (Hive) technology.In distributed platform storage mass data, the calculation command by console instructions calling data warehouse action statement (HQL) to the mass data of distributed platform storage add up, the processing such as analysis, the very flexible of the method when command calls.
Summary of the invention
The invention provides a kind of mass data processing method, adopt the method can strengthen the dirigibility of mass data processing.
The present invention also provides a kind of mass data processing system, adopts this system can strengthen the dirigibility of mass data processing.
For achieving the above object, technical scheme of the present invention is achieved in that
The invention discloses a kind of mass data processing method, comprising:
Scheduler module judges whether calling data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order;
Scheduler module is called sequentially to Data Warehouse Platform calling data warehouse action statement according to described;
Data Warehouse Platform reads configuration information corresponding to described data warehouse action statement from relational database;
Data Warehouse Platform triggers described data warehouse action statement the data of distributed platform storage is carried out computing according to the described order of calling, and generates result data and also stores described distributed platform into.
Described generation destination file also stores into after the described distributed platform, also comprises:
Scheduler module is controlled described distributed platform described result data is imported described relational database;
Scheduler module control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
The data exhibiting platform reads from described cache module and represents described result data commonly used.
Described data exhibiting platform reads from described cache module and represents after the described destination file commonly used, also comprises:
The data exhibiting platform reads from described relational database and represents described result data.
Described scheduler module judges whether also to comprise before the calling data warehouse action statement according to the current business information of obtaining and default scheduling strategy:
The data access platform transmits at least one times data to distributed platform;
When being transmitted, the data access platform sends data transmission to the message interface module and finishes message at every turn;
Described scheduler module is obtained at least one times described data transmission from described message interface module and is finished message, as described current business information.
Described data access platform is finished message to message interface module transmission data transmission and is comprised:
Described data access platform adopts the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
The invention discloses a kind of mass data processing system, comprising:
Scheduler module, be used for judging whether calling data warehouse action statement according to the current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, according to the described order of calling to Data Warehouse Platform calling data warehouse action statement;
Described Data Warehouse Platform, be used for reading configuration information corresponding to described data warehouse action statement from relational database, trigger described data warehouse action statement the data of distributed platform storage are carried out computing according to the described order of calling, generate result data and also store described distributed platform into;
Described relational database is used for storing configuration information corresponding to described data warehouse action statement;
Distributed platform is used for storing described data and described result data.
Described scheduler module also is used for controlling described distributed platform described result data is imported described relational database, and the control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
Described system also comprises:
Described cache module: be used for the described result data commonly used of buffer memory;
The data exhibiting platform is used for reading and representing described result data commonly used from described cache module.
Described data exhibiting platform also is used for reading and representing described result data from described relational database.
Described system also comprises:
The data access platform is used for transmitting at least one times data to distributed platform, when being transmitted at every turn, sending data transmission to the message interface module and finishes message;
Described message interface module is used for receiving described data transmission and finishes message;
Described scheduler module also is used for obtaining at least one times described data transmission from described message interface module and finishes message, as described current business information.
Described data access platform specifically is used for adopting the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
By the foregoing invention content as seen, in the mass data processing system, add scheduler module, this module is determined calling data warehouse action statement and is called order according to current business information and default scheduling strategy, under the control of scheduler module, finish data handling procedure, thereby avoided in the existing mass data processing system by control desk transmitting order to lower levels one by one, because control by scheduler module, can be according to the logic of the business of required realization, the corresponding scheduling strategy of flexible configuration and call order, thus the dirigibility of mass data processing strengthened.
Description of drawings
Fig. 1 is the process flow diagram of the mass data processing method of the embodiment of the invention one;
Fig. 2 is the process flow diagram of the mass data processing method of the embodiment of the invention two;
Fig. 3 is the structural representation of the mass data processing system of the embodiment of the invention three.
Embodiment
In order to make the purpose, technical solutions and advantages of the present invention clearer, describe the present invention below in conjunction with the drawings and specific embodiments.
Basic thought of the present invention is, in the mass data processing system, add scheduler module, this module is determined calling data warehouse action statement and is called order according to current business information and default scheduling strategy, finish data handling procedure under the control of scheduler module.
Fig. 1 is the process flow diagram of the mass data processing method of the embodiment of the invention one.As shown in Figure 1, the method comprises following process at least.
Step 101: scheduler module judges whether calling data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order.
Step 102: scheduler module is according to calling order to Data Warehouse Platform calling data warehouse action statement.
Step 103: Data Warehouse Platform is configuration information corresponding to reading out data warehouse action statement from relational database (mysql).
Step 104: Data Warehouse Platform carries out computing according to calling order trigger data warehouse action statement to the data of distributed platform storage, generates result data and stores distributed platform into.
Fig. 2 is the process flow diagram of the mass data processing method of the embodiment of the invention two.As shown in Figure 2, the method comprises following process.
Step 201: the data access platform transmits at least one times data to distributed platform.
In this step, a kind of better embodiment is that the data transmission that the data access platform regularly will receive is to the distributed platform the inside.Distributed platform supports data receiver, arrangement, calculating, the distribution result of calculation of peripheral system to arrive the functions such as reporting system.Particularly, distributed platform is the data storage platform under the foundation (Apache) of abroad increasing income, by member compositions such as distributed file system (HDFS), distributed document processing.Wherein, the processing of distributed file system (HDFS) and distributed document is two most important members the most basic.Distributed file system (HDFS) is the version of increasing income of Google's Distribute file system (GFS), it is the distributed file system of an Error Tolerance, it can provide the data access of high-throughput, the large file that is fit to storage magnanimity, the large file that surpasses 64M of PB level for example, become N little file to be distributed to above the different machines large file declustering, and the quantity of backup can be set, thereby when some machine goes wrong, still can work.It is the sharp weapon that large-scale data calculates that distributed document is processed, and for example the TB DBMS comprises that distributed data extracts (Map) and distributed data is processed (Reduce) module.The distributed data abstraction module is responsible for data are broken up; The distributed data processing module is responsible for data are assembled.The user only need to realize that distributed data extracts and distributed data is processed two interfaces, can finish the calculating of TB DBMS.Distributed document is processed and can be applied to the data analyses such as log analysis and data mining, also can be applicable to science data and calculates, such as the calculating of circular constant PI etc.
Step 202: when being transmitted, the data access platform sends data transmission to the message interface module and finishes message at every turn.
In this step, when the data access platform was finished to the distributed platform the transmission of data at every turn, the data access platform sent data transmission to the message interface module and finishes message, and the information synchronization of data transmission being finished by this message is to the application system of data platform.A kind of better embodiment is that the data access platform adopts a kind of transmission of messages scheme (protoBuffer) communication modes of Google to send data transmission to the message interface module and finishes message.
Step 203: scheduler module is obtained at least one times data transmission from the message interface module and is finished message, as current business information.
In this step, for example, the data access platform has transmitted 3 secondary data to distributed platform, correspondingly, scheduler module is obtained 3 data transfer from the message interface module and is finished message, and scheduler module is finished message as current business information with the data transmission of obtaining for 3 times.
Step 204: scheduler module judges whether calling data warehouse action statement according to the current business information of obtaining and default scheduling strategy.When being judged as when being execution in step 205; When whether being judged as, return step 201.
In this step, scheduling strategy sets in advance in scheduler module.Scheduling strategy is used to indicate the trigger condition of calling data warehouse action statement, if current business information satisfies the scheduling strategy defined terms, then scheduler module is judged as calling data warehouse action statement, otherwise, if current business information does not satisfy the scheduling strategy defined terms, then scheduler module is judged as and never calls the data warehouse action statement.For example, the data that the data access platform receives comprise the data of many aspects, data import to the distributed platform the inside several times, correspondingly, scheduler module is obtained repeatedly data transmission from the message interface module and is finished message, dispatching system is finished message according to data transmission repeatedly and is judged whether calling data warehouse action statement, according to scheduling strategy, when receiving only that wherein partial data is transmitted message, never call the data warehouse action statement, only have when the data of above-mentioned many aspects all complete import to distributed platform after, receive whole data transmission and finish message, scheduler module just is judged as beginning calling data warehouse action statement, calculates to carry out data.
Step 205: scheduler module is obtained according to the current business information of obtaining and default scheduling strategy and is called order.
In this step, because calculating, data comprise a lot of steps, mutually not subsistence logic contact between some steps, and must carry out in a certain order between some steps, therefore, carry out calculating according to certain sequence call data warehouse action statement of calling.This calls order and sets in advance in scheduler module.Can preset a plurality of orders of calling in scheduler module, scheduler module can select to call accordingly order according to the current business information of obtaining and default scheduling strategy.
Step 206: scheduler module is according to calling order to Data Warehouse Platform calling data warehouse action statement.
Step 207: Data Warehouse Platform is configuration information corresponding to reading out data warehouse action statement from relational database.
In this step, Data Warehouse Platform is a Structured Query Language (SQL) (SQL) analytics engine, and it is used for that SQL statement is translated into distributed data extraction/distributed data processes, and then carries out in distributed platform, to reach the purpose of fast Development.The table of storing in the Data Warehouse Platform is the catalogue of distributed platform, particularly, the Data Warehouse Platform default table is deposited the data warehouse catalogue that the path is positioned at the work at present catalogue, separate as file with table name, if there is partition table in work at present, then the subregion value is sub-folder, can directly directly use this part data in other distributed data extraction/distributed data is processed.Data Warehouse Platform can carry out related with relational database.The file that the data warehouse action statement need to be operated or catalogue are mapped to table name information and are stored in the relational database, and the field information that the field in the file also is mapped to the table that will operate is stored in the relational database, and the table name information that above-mentioned mapping obtains and field information are as the configuration information of this data warehouse action statement.When data warehouse receives the order that calling data warehouse action statement calculates, can resolve the order that receives, and from relational database, read the relevant configuration information of data warehouse action statement that calls, be translated into distributed data extraction/distributed data handling procedure according to this configuration information and carry out statistical computation.
Step 208: Data Warehouse Platform carries out computing according to calling order trigger data warehouse action statement to the data of distributed platform storage, generates result data and stores distributed platform into.
Step 209: scheduler module control distributed platform imports relational database with result data.
In this step, particularly, calling module adopts and imports the result data that algorithm generates from the reading out data warehouse calculating of distributed platform the inside, this result data can be with the storage of the form of destination file, then calling module according to business demand with in a plurality of tables of data of the above results data importing in the relational database.
Step 210: scheduler module control cache module extracts result data commonly used according to the default strategy that represents from relational database.
In this step, representing strategy sets in advance in scheduler module, this represents the frequently-used data that strategy is used to indicate exhibition platform, scheduler module represents strategy according to this, and the result data that belongs to the frequently-used data of exhibition platform in the result data of storing in the relational database is drawn in the cache module.Particularly, cache module can adopt memory cache (memcache) technology, it is a high performance distributed memory object caching system, data by huge hash (Hash) table of safeguarding a unification in internal memory is stored various forms comprise the result of image, video, file and database retrieval etc.Cache module is a kind of distributed, namely can allow a plurality of users on the different main frames to access simultaneously, thereby not only having solved shared drive can only be the drawback of unit, but also has reduced the pressure of database retrieval, and has improved the speed of obtaining data of accessing.
Step 211: the data exhibiting platform reads from cache module and represents result data commonly used.
In this step, the data exhibiting platform obtains by reading result data from cache module, and represent result data commonly used after acquisition for self data commonly used.The data that are of little use for the data exhibiting platform are because can't read from cache module, so continue to carry out following step 212.
Step 212: the data exhibiting platform reads from relational database and represents result data.
In this step, the data that the data exhibiting platform is of little use for example, need the data of dynamic mapping and inquiry etc., and the data exhibiting platform obtains by reading result data from relational database, and represents result data commonly used after acquisition.
Fig. 3 is the structural representation of the mass data processing system of the embodiment of the invention three.As shown in Figure 3, this mass data processing system comprises at least: scheduler module 31, Data Warehouse Platform 32, relational database 33 and distributed platform 34.On this basis, can also comprise: data access platform 35, message interface module 36, cache module 37 and data exhibiting platform 38.Above-mentioned message interface module 36 can all be arranged in application system with scheduler module 31.Wherein processing mode and the flow process of each ingredient execution can be referring to the records of the embodiment of the invention one and the embodiment of the invention two.
Wherein, scheduler module 31 judges whether calling data warehouse action statement according to the current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, according to calling order to Data Warehouse Platform 32 calling data warehouse action statement.
Data Warehouse Platform 32 is configuration information corresponding to reading out data warehouse action statement from relational database 33, according to calling order trigger data warehouse action statement the data of distributed platform 34 storages are carried out computing, generate result data and store distributed platform 34 into.
Configuration information corresponding to relational database 33 storage data warehouse action statement.
The distributed platform 34 above-mentioned data of storage and the above results data.
On the basis of technique scheme, in the situation that comprise data access platform 35 and message interface module 36 in the said system, data access platform 35 transmits at least one times data to distributed platform 34, when being transmitted at every turn, sending data transmission to message interface module 36 and finishes message.Message interface module 36 receive datas are transmitted message.Scheduler module 31 is obtained at least one times data transmission from message interface module 36 and is finished message, as current business information.Particularly, data access platform 35 specifically can adopt a kind of transmission of messages scheme of Google, and for example the protoBuffer communication modes sends data transmission to message interface module 36 and finishes message.Wherein, data access platform 35 is used for the data access of peripheral system, supports the real-time interface access.The data form according to the rules that data access platform 35 receives generates text, for example file of txt form.And data access platform 35 regularly is transferred to above-mentioned text the HDFS file system the inside of distributed platform 34.
On the basis of technique scheme, in the situation that comprise cache module 37 in the said system, scheduler module 31 is also controlled distributed platform 34 result data is imported relational database 33, and control cache module 37 extracts result data commonly used according to the default strategy that represents from relational database 33.The result data that cache module 37 buffer memorys are commonly used.
Data exhibiting platform 38 represents the interface with the result data of the final arrangement of notebook data disposal system.The Data Source of data exhibiting platform 38 comprises following two kinds: the first, from cache module 37, obtain; The second, from relational database, obtain.Particularly, data exhibiting platform 38 reads from cache module 37 and represents result data commonly used.And data exhibiting platform 38 also reads from relational database 33 and represents result data.
According to above embodiment as seen, in the mass data processing system, add scheduler module, this module is determined calling data warehouse action statement and is called order according to current business information and default scheduling strategy, under the control of scheduler module, finish data handling procedure, thereby avoided in the existing mass data processing system by control desk transmitting order to lower levels one by one, because control by scheduler module, can be according to the logic of the business of required realization, the corresponding scheduling strategy of flexible configuration and call order, thus the dirigibility of mass data processing strengthened.And, by cache module storage result data commonly used, the data exhibiting module is reading result data and representing from cache module preferentially, only have when not storing required result data in the cache module, the data exhibiting platform just can read from database, thereby has reduced the pressure that a large amount of access cause to the data exhibiting platform by increasing cache module.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of making, is equal to replacement, improvement etc., all should be included within the scope of protection of the invention.

Claims (8)

1. a mass data processing method is characterized in that, adds scheduler module in the mass data processing system, and the method comprises:
The data access platform transmits at least one times data to distributed platform; When being transmitted, the data access platform sends data transmission to the message interface module and finishes message at every turn; Scheduler module is obtained at least one times described data transmission from described message interface module and is finished message, as current business information;
Scheduler module judges whether calling data warehouse action statement according to the current business information obtained and default scheduling strategy, when being judged as when being, obtaining according to the current business information of obtaining and default scheduling strategy and to call order; Scheduling strategy sets in advance in scheduler module; The order of calling of obtaining sets in advance in scheduler module, is preset with a plurality of orders of calling in the scheduler module;
Scheduler module is called sequentially to Data Warehouse Platform calling data warehouse action statement according to described;
Data Warehouse Platform reads configuration information corresponding to described data warehouse action statement from relational database;
Data Warehouse Platform triggers described data warehouse action statement the data of distributed platform storage is carried out computing according to the described order of calling, and generates result data and also stores described distributed platform into.
2. mass data processing method according to claim 1 is characterized in that, described generation destination file also stores into after the described distributed platform, also comprises:
Scheduler module is controlled described distributed platform described result data is imported described relational database;
Scheduler module control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
The data exhibiting platform reads from described cache module and represents described result data commonly used.
3. mass data processing method according to claim 2 is characterized in that, described data exhibiting platform reads from described cache module and represents after the described destination file commonly used, also comprises:
The data exhibiting platform reads from described relational database and represents described result data.
4. each described mass data processing method in 3 according to claim 1 is characterized in that, described data access platform sends data transmission to the message interface module and finishes message and comprise:
Described data access platform adopts the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
5. a mass data processing system is characterized in that, has added scheduler module in this mass data processing system, and this mass data processing system comprises:
The data access platform is used for transmitting at least one times data to distributed platform, when being transmitted at every turn, sending data transmission to the message interface module and finishes message;
The message interface module is used for receiving described data transmission and finishes message;
Scheduler module, be used for obtaining at least one times described data transmission from described message interface module and finish message, as current business information, be used for judging whether calling data warehouse action statement according to the current business information of obtaining and default scheduling strategy, when being judged as when being, obtain according to the current business information of obtaining and default scheduling strategy and to call order, according to the described order of calling to Data Warehouse Platform calling data warehouse action statement; Wherein, scheduling strategy sets in advance in scheduler module; The order of calling of obtaining sets in advance in scheduler module, is preset with a plurality of orders of calling in the scheduler module;
Described Data Warehouse Platform, be used for reading configuration information corresponding to described data warehouse action statement from relational database, trigger described data warehouse action statement the data of distributed platform storage are carried out computing according to the described order of calling, generate result data and also store described distributed platform into;
Described relational database is used for storing configuration information corresponding to described data warehouse action statement;
Distributed platform is used for storing described data and described result data.
6. mass data processing according to claim 5 system is characterized in that,
Described scheduler module also is used for controlling described distributed platform described result data is imported described relational database, and the control cache module extracts result data commonly used according to the default strategy that represents from described relational database;
Described system also comprises:
Described cache module: be used for the described result data commonly used of buffer memory;
The data exhibiting platform is used for reading and representing described result data commonly used from described cache module.
7. mass data processing according to claim 6 system is characterized in that,
Described data exhibiting platform also is used for reading and representing described result data from described relational database.
8. each described mass data processing system in 7 according to claim 5 is characterized in that,
Described data access platform specifically is used for adopting the transmission of messages scheme protoBuffer of Google communication modes to send described data transmission to the message interface module and finishes message.
CN 201110182296 2011-06-30 2011-06-30 Method and system for processing mass data Active CN102214236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110182296 CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110182296 CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Publications (2)

Publication Number Publication Date
CN102214236A CN102214236A (en) 2011-10-12
CN102214236B true CN102214236B (en) 2013-10-23

Family

ID=44745544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110182296 Active CN102214236B (en) 2011-06-30 2011-06-30 Method and system for processing mass data

Country Status (1)

Country Link
CN (1) CN102214236B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750368B (en) * 2012-06-18 2014-03-26 天津神舟通用数据技术有限公司 High-speed importing method of cluster data in data base
CN102880503B (en) * 2012-08-24 2015-04-15 新浪网技术(中国)有限公司 Data analysis system and data analysis method
CN102929961B (en) * 2012-10-10 2016-12-21 北京锐安科技有限公司 Based on the data processing method and the device thereof that build rapid data classification passage
CN102904952B (en) * 2012-10-12 2015-07-01 北京锐安科技有限公司 Self-adapting system and method for efficiently processing input of mass data to database
CN104298671B (en) * 2013-07-16 2018-02-13 深圳中兴网信科技有限公司 data statistical analysis method and device
CN104090901B (en) * 2013-12-31 2017-06-13 腾讯数码(天津)有限公司 A kind of method that data are processed, device and server
CN104102701B (en) * 2014-07-07 2017-10-13 浪潮(北京)电子信息产业有限公司 A kind of historical data based on hive is achieved and querying method
CN106446168B (en) * 2016-09-26 2019-11-01 北京赛思信安技术股份有限公司 A kind of load client realization method of Based on Distributed data warehouse
CN106909641B (en) * 2017-02-16 2020-09-29 青岛高校信息产业股份有限公司 Real-time data memory
CN108153852A (en) * 2017-12-22 2018-06-12 中国平安人寿保险股份有限公司 A kind of data processing method, device, terminal device and storage medium
CN109408598A (en) * 2018-09-14 2019-03-01 深圳市新代信息技术研究院有限公司 A kind of mass data processing system for multimedia research and development training platform
CN111078770B (en) * 2019-11-28 2023-07-21 曙光信息产业股份有限公司 Data processing system, method and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604042A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method for dispatching task, dispatcher and net computer system
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
CN101127578A (en) * 2007-09-14 2008-02-20 广东威创日新电子有限公司 A method and system for processing a magnitude of data
CN101937524A (en) * 2009-06-30 2011-01-05 华中师范大学 Graduation design personalized guide system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101276364B (en) * 2007-03-30 2010-12-22 阿里巴巴集团控股有限公司 Method, system and apparatus for combining distributed computational data
CN101364891B (en) * 2007-08-10 2011-10-26 中兴通讯股份有限公司 System for collecting performance data by single point in distributed telecommunication network management and implementing method
US8145806B2 (en) * 2008-09-19 2012-03-27 Oracle International Corporation Storage-side storage request management
CN102033912A (en) * 2010-11-25 2011-04-27 北京北纬点易信息技术有限公司 Distributed-type database access method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1604042A (en) * 2003-09-30 2005-04-06 国际商业机器公司 Method for dispatching task, dispatcher and net computer system
CN101038559A (en) * 2006-09-11 2007-09-19 中国工商银行股份有限公司 Batch task scheduling engine and dispatching method
CN101127578A (en) * 2007-09-14 2008-02-20 广东威创日新电子有限公司 A method and system for processing a magnitude of data
CN101937524A (en) * 2009-06-30 2011-01-05 华中师范大学 Graduation design personalized guide system

Also Published As

Publication number Publication date
CN102214236A (en) 2011-10-12

Similar Documents

Publication Publication Date Title
CN102214236B (en) Method and system for processing mass data
US11372888B2 (en) Adaptive distribution for hash operations
CN107291948B (en) Access method of distributed newSQL database
US10698913B2 (en) System and methods for distributed database query engines
CN106897322B (en) A kind of access method and device of database and file system
US10545917B2 (en) Multi-range and runtime pruning
US20210117413A1 (en) Global Dictionary for Database Management Systems
US20120130963A1 (en) User defined function database processing
US20130282650A1 (en) OLAP Query Processing Method Oriented to Database and HADOOP Hybrid Platform
CN103646073A (en) Condition query optimizing method based on HBase table
CN109815234A (en) A kind of multiple cuckoo filter under streaming computing model
CN106776783A (en) Unstructured data memory management method, server and system
CN107784103A (en) A kind of standard interface of access HDFS distributed memory systems
CN105159845A (en) Memory reading method
CN104199978A (en) System and method for realizing metadata cache and analysis based on NoSQL and method
CN102253990A (en) Interactive application multimedia data query method and device
US20170068703A1 (en) Local database cache
CN116048817B (en) Data processing control method, device, computer equipment and storage medium
CN100395752C (en) Report data collection system and method
CN116049193A (en) Data storage method and device
CN112486996B (en) Object-oriented memory data storage system
CN105426489A (en) Memory calculation based distributed expandable data search system
CN114546274B (en) Big data processing dimension table calculation system and method based on cache
US11514070B2 (en) Seamless integration between object-based environments and database environments
US10282449B1 (en) Multiple aggregates in a single user-defined function

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 810, 8 / F, 34 Haidian Street, Haidian District, Beijing 100080

Patentee after: BEIJING D-MEDIA COMMUNICATION TECHNOLOGY Co.,Ltd.

Address before: 100089 Beijing city Haidian District wanquanzhuang Road No. 28 Wanliu new building A block 5 layer

Patentee before: BEIJING D-MEDIA COMMUNICATION TECHNOLOGY Co.,Ltd.

CP02 Change in the address of a patent holder