CN105554069A - Big data processing distributed cache system and method thereof - Google Patents

Big data processing distributed cache system and method thereof Download PDF

Info

Publication number
CN105554069A
CN105554069A CN201510891553.5A CN201510891553A CN105554069A CN 105554069 A CN105554069 A CN 105554069A CN 201510891553 A CN201510891553 A CN 201510891553A CN 105554069 A CN105554069 A CN 105554069A
Authority
CN
China
Prior art keywords
buffer unit
large data
value
cloud computing
buffer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510891553.5A
Other languages
Chinese (zh)
Other versions
CN105554069B (en
Inventor
马艳
陈玉峰
朱文兵
杜修明
郑建
袁海燕
任敬国
邹立达
苏东亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shandong Zhongshi Yitong Group Co Ltd
Original Assignee
State Grid Corp of China SGCC
Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd
Shandong Zhongshi Yitong Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Electric Power Research Institute of State Grid Shandong Electric Power Co Ltd, Shandong Zhongshi Yitong Group Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201510891553.5A priority Critical patent/CN105554069B/en
Publication of CN105554069A publication Critical patent/CN105554069A/en
Application granted granted Critical
Publication of CN105554069B publication Critical patent/CN105554069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a big data processing distributed cache system and method thereof. The method includes the steps of: dividing a big data processing server into a plurality of cache units, and storing data in the form of key-value pairs in each cache unit; according to the accessed frequency of the cache units, calculating value of the cache units and sorting the cache units, and extracting all cache units in a preset value threshold value range; and clustering all the extracted cache units in the preset value threshold value range, and allocating a preset clustering number of cache units to any cloud computing cache node for storage. The method provided by the invention enables network data transmission between nodes to be reduced when data are accessed or processed, thereby shortening the processing time, and effectively improving the efficiency of big data processing.

Description

A kind of large data processing distributed cache system and method thereof
Technical field
The invention belongs to large market demand field, particularly relate to a kind of large data processing distributed cache system and method thereof.
Background technology
The development of the Internet science and technology makes data volume sharply increase, and under the greatly developing of data science and technology, people can store, the data that process have reached unprecedented magnitude, and rapidly increase with the speed exceeding Moore's Law.The core value of large data is to carry out storing and analyzing for mass data exactly.In commercial environments, large data processing is packaged into service by data processing service provider, is sold to user.
For the data analysis requirements that some are real-time, required by user has the performance processed and the time returned.Therefore need to be optimized the performance of large data processing, to improve data-handling efficiency.Buffer memory is the important means improving large data processing speed.
Store data in high-speed cache, can significantly improve data I/O efficiency, and then accelerate data-handling efficiency.But buffer memory is a kind of article costly relative to External memory equipments such as disks, and large data are bulk sample mass datas originally, and all data being stored is uneconomic, infeasible in the buffer.The data access of user often to a part of data be frequently, real-time, therefore we can by access frequently, important data placement is among buffer memory.
Relative to traditional data buffer memory, the feature that large data buffer storage has it exclusive:
Data store with key-value pair (Key-Value) structure organization.The granularity of buffer memory, form and replace algorithm and have the need for further discussion the storage organization adapting to large data.
Large data processing needs to depend on cloud computing platform.The data of large data access often have certain relevance, by related data placement to close position, can reduce the cost of transfer of data.Such as a data processing needs A, B two parts data, and A and B is stored in two different nodes, and this needs just to process in one of them transfer of data to another node; If A, B are centrally stored in a node, will Internet Transmission be avoided, thus improve treatment effeciency.After obtaining needs and be data cached, need to design a kind of method by these data placements at applicable node.
Summary of the invention
In order to solve the shortcoming of prior art, the invention provides a kind of large data processing distributed caching method.The method utilizes the method for buffer unit being carried out to cluster, in each cloud computing cache node corresponding stored buffer unit type, for accelerating the processing speed of large data.
For achieving the above object, the present invention is by the following technical solutions:
A kind of large data processing distributed cache system, comprising: the large data storage intercomed mutually and distributed cloud computing server;
Described large data storage is divided into several buffer units, and each described buffer unit is used for carrying out storage data with the form of key-value pair;
Several cloud computing cache nodes, large data extraction module and cloud computing cache node distribution module is provided with in described distributed cloud computing server;
Described large data extraction module, it is for the accessed frequency according to buffer unit, calculates the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Described cloud computing cache node distribution module, the buffer unit of default number of clusters for carrying out cluster to all buffer units in the default value threshold range extracted, and is dispensed in arbitrary cloud computing cache node and stores by it.
Described large data storage comprises RAM memory and FLASH memory.
Carry out upgrading the data in buffer unit according to predetermined period in large data storage.
A caching method for large data processing distributed cache system, comprising:
Large data processing server is divided into several buffer units, in each buffer unit, carries out storage data with the form of key-value pair;
According to the accessed frequency of buffer unit, calculate the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Cluster is carried out to all buffer units in the default value threshold range extracted, and the buffer unit of default number of clusters is dispensed in arbitrary cloud computing cache node stores.
Before calculating the value of buffer unit, carry out upgrading the data in buffer unit according to predetermined period.
The computational methods of the value of buffer unit are:
p i j = α · p i j - 1 + ( 1 - α ) · n i j · β
Wherein, represent the value of i-th buffer unit in a jth cycle; represent the value of i-th buffer unit in-1 cycle of jth; α is the cycle influences factor, is constant; β is the data value factor in i-th buffer unit, is constant; the access times of i-th buffer unit within a jth cycle; I and j is the positive integer being more than or equal to 1, for being more than or equal to the positive integer of 0.
In cloud computing cache node, Memcache mechanism is adopted to carry out the large data of buffer memory.
K-means algorithm is used to carry out cluster to all buffer units in the default value threshold range extracted.
Large data storage comprises RAM memory and FLASH memory.
Beneficial effect of the present invention is:
(1) distributed cloud computing server of the present invention is provided with several cloud computing cache nodes, adopt the buffer unit type of each cloud computing cache node corresponding stored predetermined number, make data accessed or process time, reduce internodal network data transmission, shorten the processing time, effectively improve the efficiency of large data processing;
(2) the cloud computing cache node of distributed cloud computing server of the present invention, can adopt multiple memory mechanism to carry out storing large data, wherein, comprise Memcache mechanism; And the multiple cloud computing cache nodes arranged in large data processing distributed cache system of the present invention, can ensure that large data obtain distributed caching and process.
Accompanying drawing explanation
Fig. 1 is large data processing distributed caching method flow chart of the present invention.
Embodiment
Below in conjunction with accompanying drawing and embodiment, the present invention will be further described:
Large data processing distributed cache system of the present invention comprises: large data storage and distributed cloud computing server, both intercoms mutually.
Describe in detail to large data storage with distributed cloud computing server successively below:
(1) large data storage:
Large data storage divides several buffer units, and each buffer unit is all for carrying out storage data with the form of key-value pair.Wherein, large data storage comprises RAM memory and FLASH memory.
(2) distributed cloud computing server:
Several cloud computing cache nodes, large data extraction module and cloud computing cache node distribution module is provided with in distributed cloud computing server;
Wherein, large data extraction module, for the accessed frequency according to buffer unit, calculates the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Cloud computing cache node distribution module, the buffer unit of default number of clusters for carrying out cluster to all buffer units in the default value threshold range extracted, and is dispensed in arbitrary cloud computing cache node and stores by it.
Carry out upgrading the data in buffer unit according to predetermined period in large data storage.
Fig. 1 is the caching method of large data processing distributed cache system of the present invention, describes the caching method of large data processing distributed cache system of the present invention below in conjunction with Fig. 1 in detail.
Particularly, this caching method comprises:
Step 1: large data processing server is divided into several buffer units, carries out storage data with the form of key-value pair in each buffer unit;
Step 2: according to the accessed frequency of buffer unit, calculates the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Step 3: cluster is carried out to all buffer units in the default value threshold range extracted, and the buffer unit of default number of clusters is dispensed in arbitrary cloud computing cache node stores.
Wherein, before calculating the value of buffer unit, carry out upgrading the data in buffer unit according to predetermined period.
In step 2, the computational methods of the value of buffer unit are:
p i j = α · p i j - 1 + ( 1 - α ) · n i j · β
Wherein, represent the value of i-th buffer unit in a jth cycle; represent the value of i-th buffer unit in-1 cycle of jth; α is the cycle influences factor, is constant; β is the data value factor in i-th buffer unit, is constant; the access times of i-th buffer unit within a jth cycle; I and j is the positive integer being more than or equal to 1, for being more than or equal to the positive integer of 0.
When i-th buffer unit is accessed, return time is more urgent, and β value is higher.In data access, according to the requirement of access to return time, urgency level can be classified: in real time, generally, loosely.The β value that this three kinds of similar correspondences are different.The access that urgency level is high has higher β value.Can count in one-period according to data access record, the access frequency of any one buffer unit and access urgency level.
In cloud computing cache node, Memcache mechanism is adopted to carry out the large data of buffer memory.
Use k-means algorithm to carry out cluster to all buffer units in the default value threshold range extracted, and the buffer unit of default number of clusters is dispensed in any one cloud computing cache node stores.
If a cluster is greater than the capacity of a node, then use k-means algorithm to divide this cluster, and with the least possible node, it is stored.
Before carrying out cluster to all buffer units in the default value threshold range extracted, all buffer units in default value threshold range build a connected graph:
Each buffer unit in default value threshold range is set to a point, if two buffer units are accessed by a data processing simultaneously, increasing a weight at these two points is the limit of 1, and the weight on limit can superpose, and presets all buffer units be worth in threshold range and forms a connected graph.
The connected graph built is judged whether it is empty, if not empty, then carries out cluster to all buffer units in the default value threshold range extracted; Otherwise terminate not carry out cluster, the number of all buffer units in the default value threshold range now extracted is one.By in this buffer unit corresponding stored to cloud computing cache node.
By reference to the accompanying drawings the specific embodiment of the present invention is described although above-mentioned; but not limiting the scope of the invention; one of ordinary skill in the art should be understood that; on the basis of technical scheme of the present invention, those skilled in the art do not need to pay various amendment or distortion that creative work can make still within protection scope of the present invention.

Claims (9)

1. a large data processing distributed cache system, is characterized in that, comprising: the large data storage intercomed mutually and distributed cloud computing server;
Described large data storage is divided into several buffer units, and each described buffer unit is used for carrying out storage data with the form of key-value pair;
Several cloud computing cache nodes, large data extraction module and cloud computing cache node distribution module is provided with in described distributed cloud computing server;
Described large data extraction module, it is for the accessed frequency according to buffer unit, calculates the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Described cloud computing cache node distribution module, the buffer unit of default number of clusters for carrying out cluster to all buffer units in the default value threshold range extracted, and is dispensed in arbitrary cloud computing cache node and stores by it.
2. a kind of large data processing distributed cache system as claimed in claim 1, it is characterized in that, described large data storage comprises RAM memory and FLASH memory.
3. a kind of large data processing distributed cache system as claimed in claim 1, is characterized in that, carry out upgrading the data in buffer unit in large data storage according to predetermined period.
4. a caching method for large data processing distributed cache system as claimed in claim 1, is characterized in that, comprising:
Large data processing server is divided into several buffer units, in each buffer unit, carries out storage data with the form of key-value pair;
According to the accessed frequency of buffer unit, calculate the value of buffer unit and to go forward side by side line ordering, extract all buffer units preset and be worth in threshold range;
Cluster is carried out to all buffer units in the default value threshold range extracted, and the buffer unit of default number of clusters is dispensed in a cloud computing cache node stores.
5. caching method as claimed in claim 4, is characterized in that, before calculating the value of buffer unit, carries out upgrading the data in buffer unit according to predetermined period.
6. caching method as claimed in claim 4, it is characterized in that, the computational methods of the value of buffer unit are:
p i j = α · p i j - 1 + ( 1 - α ) · n i j · β
Wherein, represent the value of i-th buffer unit in a jth cycle; represent the value of i-th buffer unit in-1 cycle of jth; α is the cycle influences factor, is constant; β is the data value factor in i-th buffer unit, is constant; the access times of i-th buffer unit within a jth cycle; I and j is the positive integer being more than or equal to 1, for being more than or equal to the positive integer of 0.
7. caching method as claimed in claim 4, is characterized in that, in cloud computing cache node, adopts Memcache mechanism to carry out the large data of buffer memory.
8. caching method as claimed in claim 4, is characterized in that, uses k-means algorithm to carry out cluster to all buffer units in the default value threshold range extracted.
9. caching method as claimed in claim 4, it is characterized in that, described large data storage comprises RAM memory and FLASH memory.
CN201510891553.5A 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method Active CN105554069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510891553.5A CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510891553.5A CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Publications (2)

Publication Number Publication Date
CN105554069A true CN105554069A (en) 2016-05-04
CN105554069B CN105554069B (en) 2018-09-11

Family

ID=55833001

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510891553.5A Active CN105554069B (en) 2015-12-04 2015-12-04 A kind of big data processing distributed cache system and its method

Country Status (1)

Country Link
CN (1) CN105554069B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528833A (en) * 2016-11-14 2017-03-22 天津南大通用数据技术股份有限公司 Method and device for dynamic redistribution of data of MPP (Massively Parallel Processing) database
CN107645541A (en) * 2017-08-24 2018-01-30 阿里巴巴集团控股有限公司 Date storage method, device and server
CN107704591A (en) * 2017-10-12 2018-02-16 西南财经大学 A kind of data processing method of the intelligent wearable device based on cloud computing non-database framework
CN107995020A (en) * 2017-10-23 2018-05-04 北京兰云科技有限公司 A kind of asset valuation method and apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984203A (en) * 2012-10-31 2013-03-20 深圳市深信服电子科技有限公司 Method and device and system for improving use ratio of high-cache device based on cloud computing
CN103051701A (en) * 2012-12-17 2013-04-17 北京网康科技有限公司 Cache admission method and system
CN103475690A (en) * 2013-06-17 2013-12-25 携程计算机技术(上海)有限公司 Memcached instance configuration method and Memcached instance configuration system
CN104050043A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Share cache perception-based virtual machine scheduling method and device
CN104219327A (en) * 2014-09-27 2014-12-17 上海瀚之友信息技术服务有限公司 Distributed cache system
US20150106884A1 (en) * 2013-10-11 2015-04-16 Broadcom Corporation Memcached multi-tenancy offload

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984203A (en) * 2012-10-31 2013-03-20 深圳市深信服电子科技有限公司 Method and device and system for improving use ratio of high-cache device based on cloud computing
CN103051701A (en) * 2012-12-17 2013-04-17 北京网康科技有限公司 Cache admission method and system
CN103475690A (en) * 2013-06-17 2013-12-25 携程计算机技术(上海)有限公司 Memcached instance configuration method and Memcached instance configuration system
US20150106884A1 (en) * 2013-10-11 2015-04-16 Broadcom Corporation Memcached multi-tenancy offload
CN104050043A (en) * 2014-06-17 2014-09-17 华为技术有限公司 Share cache perception-based virtual machine scheduling method and device
CN104219327A (en) * 2014-09-27 2014-12-17 上海瀚之友信息技术服务有限公司 Distributed cache system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
罗建平 等: ""大数据负载的体系结构特征分析"", 《计算机科学》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528833A (en) * 2016-11-14 2017-03-22 天津南大通用数据技术股份有限公司 Method and device for dynamic redistribution of data of MPP (Massively Parallel Processing) database
CN107645541A (en) * 2017-08-24 2018-01-30 阿里巴巴集团控股有限公司 Date storage method, device and server
CN107645541B (en) * 2017-08-24 2021-03-02 创新先进技术有限公司 Data storage method and device and server
CN107704591A (en) * 2017-10-12 2018-02-16 西南财经大学 A kind of data processing method of the intelligent wearable device based on cloud computing non-database framework
CN107995020A (en) * 2017-10-23 2018-05-04 北京兰云科技有限公司 A kind of asset valuation method and apparatus

Also Published As

Publication number Publication date
CN105554069B (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN103678172B (en) Local data cache management method and device
CN105554069A (en) Big data processing distributed cache system and method thereof
US20110307685A1 (en) Processor for Large Graph Algorithm Computations and Matrix Operations
CN104407879B (en) A kind of power network sequential big data loaded in parallel method
CN105740424A (en) Spark platform based high efficiency text classification method
CN111913649B (en) Data processing method and device for solid state disk
US10817178B2 (en) Compressing and compacting memory on a memory device wherein compressed memory pages are organized by size
CN104199942B (en) A kind of Hadoop platform time series data incremental calculation method and system
CN110413776B (en) High-performance calculation method for LDA (text-based extension) of text topic model based on CPU-GPU (Central processing Unit-graphics processing Unit) collaborative parallel
CN105005585A (en) Log data processing method and device
CN111984400A (en) Memory allocation method and device of neural network
CN107851063A (en) The dynamic coding algorithm of intelligently encoding accumulator system
CN111210004B (en) Convolution calculation method, convolution calculation device and terminal equipment
Shahshahani et al. Memory optimization techniques for fpga based cnn implementations
CN109416688B (en) Method and system for flexible high performance structured data processing
CN105701861A (en) Point cloud sampling method and system
CN115730555A (en) Chip layout method, device, equipment and storage medium
CN106202152B (en) A kind of data processing method and system of cloud platform
CN106201918B (en) A kind of method and system based on big data quantity and extensive caching quick release
CN103543959B (en) The method and device of mass data cache
CN109213745B (en) Distributed file storage method, device, processor and storage medium
CN202093513U (en) Bulk data processing system
CN104050189B (en) The page shares processing method and processing device
CN103473368A (en) Virtual machine real-time migration method and system based on counting rank ordering
CN110413540A (en) A kind of method, system, equipment and the storage medium of FPGA data caching

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant