WO2016023372A1 - 数据存储处理方法及装置 - Google Patents

数据存储处理方法及装置 Download PDF

Info

Publication number
WO2016023372A1
WO2016023372A1 PCT/CN2015/075302 CN2015075302W WO2016023372A1 WO 2016023372 A1 WO2016023372 A1 WO 2016023372A1 CN 2015075302 W CN2015075302 W CN 2015075302W WO 2016023372 A1 WO2016023372 A1 WO 2016023372A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
copies
copy
hbase
stored
Prior art date
Application number
PCT/CN2015/075302
Other languages
English (en)
French (fr)
Inventor
杨庆平
屠趁锋
黄震江
汪峰来
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016023372A1 publication Critical patent/WO2016023372A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of communications, and in particular to a data storage processing method and apparatus.
  • Hadoop an open source big data storage and analytics platform, has become the de facto standard for the industry to handle big data.
  • the Hadoop platform consists of two important subsystems: Distributed File System (HDFS) and MapReduce (Parallel Computing Framework).
  • HDFS Distributed File System
  • MapReduce Parallel Computing Framework
  • Hadoop is a highly fault-tolerant, multi-copy distributed system for deployment on inexpensive machines, and Hadoop supports parallel data writing and reading on multiple hard drives on the machine.
  • HBASE is a distributed, column-oriented open source database based on HDFS that provides high reliability, high performance, column storage, scalable, real-time read and write database systems.
  • HBASE is an important part of the Hadoop platform ecosystem of big data analytics platform and has been widely used in the industry.
  • the mode in which HBASE is stored on HDFS is stored in a column-based mode, and each column corresponds to one or more storage files. The following describes the storage of data for HBASE.
  • the HBASE processing scheme is: when creating the HBASE table, the system uses the same number of copies for all column data to store, and the number of copies does not allow the user to set the table when setting, and can only rely on the HBASE system default setting. 3 copies. That is, all the columns in the table data are stored in 3 copies.
  • the HBASE processing table data storage scheme in the related art has the following disadvantages: high hardware cost: the same storage copy is used for all table data stored in HBASE, and for important data and non-critical data, storage The same copy, which takes up a lot of hardware costs. Data cannot be differentiated: For hot data columns, you want multiple copies to increase read speed, and now you can't differentiate to set up a storage copy for a separate data column.
  • the present invention provides a data storage processing method and apparatus, to at least solve the related art, when storing data for an HBASE processing table, the data cannot be differentially stored, and there is not only waste of storage resources but also data reading. The problem of low efficiency.
  • a data storage processing method comprising: obtaining a copy number of a stored data copy of a column family in a distributed database HBASE table for storing data, wherein each column family in the HBASE table The number of copies of the stored data copy is different; a stored copy of the data is generated based on the obtained number of copies.
  • the method before acquiring the number of copies of the stored data copy of the column family in the HBASE table for storing data, the method further includes: creating the HBASE table by using Ruby hash attribute values when establishing the HBASE table And the number of copies of the stored data copy of the column family in the HBASE table for storing data according to the number of copies corresponding to the Ruby hash attribute value.
  • the method before acquiring the number of copies of the stored data copy of the column family in the HBASE table for storing data, the method further includes: receiving the number of copies of the dynamic input.
  • the number of copies of the stored data copy of the column family in the HBASE table for storing data is obtained by at least one of: receiving a command carrying the number of copies; receiving a web carrying the number of copies Page information.
  • generating the stored copy of the data according to the obtained number of copies comprises: transferring the copy number to a HBASE data write file class when data is written; writing according to the HBASE data transfer The number of copies in the incoming file class generates the corresponding stored copy.
  • the method further comprises: reading the stored copy separately loaded according to the number of copies.
  • a data storage processing apparatus comprising: an acquisition module configured to acquire a copy number of a copy of a stored data of a column family of a distributed database HBASE table for storing data, The number of copies of the data copy of each column family in the HBASE table is different; the generating module is configured to generate a stored copy of the data according to the obtained number of copies.
  • the apparatus further includes: a creating module, configured to: when the HBASE table is created, create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value, according to the Ruby hash attribute value The corresponding copy number attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data.
  • a creating module configured to: when the HBASE table is created, create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value, according to the Ruby hash attribute value The corresponding copy number attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data.
  • the apparatus further comprises: a receiving module configured to receive the number of copies of the dynamic input.
  • the obtaining module comprises at least one of the following: a first receiving unit configured to receive a command carrying the number of copies; and a second receiving unit configured to receive web page information carrying the number of copies.
  • the generating module includes: a transmitting unit configured to: when the data is written, transfer the copy number to the HBASE data writing file class; and the generating unit is configured to write the file class according to the HBASE data transfer The number of copies in the generation generates the corresponding storage copy.
  • the apparatus further comprises: a reading module configured to read the stored copy separately loaded in accordance with the number of copies.
  • the number of copies of the stored data copy of the column family in the distributed database HBASE table for storing data is used, wherein the number of copies of the data copy of each column family in the HBASE table is different;
  • the number of copies generates a stored copy of the data, which not only solves the related art, but also cannot perform differential storage processing on the data when the HBASE processing table is stored, which not only wastes storage resources, but also reads data efficiently.
  • the low problem in turn, achieves a different number of copies for the HBASE column family, which realizes the differential storage of data, and can effectively reduce the storage cost without reducing the data write and read.
  • FIG. 1 is a flow chart of a data storage processing method according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 3 is a block diagram 1 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 4 is a block diagram 2 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a block diagram showing a preferred structure of the acquisition module 22 in the data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is a block diagram showing a preferred structure of a generating module 24 in a data storage processing apparatus according to an embodiment of the present invention
  • FIG. 7 is a block diagram 3 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of a HBASE storage structure according to an embodiment of the present invention.
  • FIG. 9 is a logical view of HBASE data in accordance with an embodiment of the present invention.
  • FIG. 10 is a flow diagram of dynamically creating a HBASE multiple copy in accordance with a preferred embodiment of the present invention.
  • FIG. 1 is a flowchart of a data storage processing method according to an embodiment of the present invention. As shown in FIG. 1, the flow includes the following steps:
  • Step S102 obtaining a copy number of the stored data copy of the column family in the distributed database HBASE table for storing data, wherein the number of copies of the data copy of each column family stored in the HBASE table is different;
  • Step S104 generating a stored copy of the data according to the obtained number of copies.
  • the following processing may also be involved: when the HBASE table is created, the number of copies corresponding to each column family in the HBASE table is created by the Ruby hash attribute value.
  • the attribute obtains the number of copies of the stored data copy of the column family in the HBASE table for storing data according to the copy number attribute corresponding to the Ruby hash attribute value.
  • the above Ruby hash is The creation of the sex value can receive the number of copies of the dynamic input, and dynamically store the data according to the number of copies according to the number of copies received dynamically.
  • a plurality of methods may be used. For example, at least one of the following methods may be used.
  • the command may be used to receive the copy.
  • the number of commands can also be in the form of a web page, that is, receiving web page information carrying the number of copies.
  • a plurality of manners may also be adopted when generating a storage copy of the data according to the obtained number of copies.
  • the number of copies is transferred to the HBASE data writing file class; and the file is written according to the data transmitted to the HBASE.
  • the number of copies in the class produces a corresponding storage copy.
  • the copy of each column family is separately performed. Load read, that is, read the storage copy separately loaded according to the number of copies, each column family does not affect each other.
  • a data storage processing device is provided, which is used to implement the above-mentioned embodiments and preferred embodiments, and has not been described again.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • FIG. 2 is a block diagram showing the structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 2, the apparatus includes an acquisition module 22 and a generation module 24. The apparatus will be described below.
  • the obtaining module 22 is configured to obtain a copy number of the storage data copy of the column family in the distributed database HBASE table for storing data, wherein the number of copies of the data copy of each column family stored in the HBASE table is different; the generating module 24 is connected to The obtaining module 22 is configured to generate a stored copy of the data according to the obtained number of copies.
  • FIG. 3 is a block diagram of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 3, the apparatus includes a creation module 32 in addition to all the modules shown in FIG. 32 for explanation.
  • the creating module 32 is connected to the obtaining module 22, and is configured to create a copy number attribute corresponding to each column family in the HBASE table by using a Ruby hash attribute value when the HBASE table is created, and obtain the copy number attribute corresponding to the Ruby hash attribute value.
  • FIG. 4 is a block diagram of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the apparatus includes a receiving module 42 in addition to all the modules shown in FIG. 42 for explanation.
  • the receiving module 42 is connected to the obtaining module 22 and configured to receive the number of copies of the dynamic input.
  • FIG. 5 is a block diagram of a preferred structure of the acquisition module 22 in the data storage processing device according to the embodiment of the present invention.
  • the acquisition module 22 includes at least one of the following: a first receiving unit 52 and a second receiving unit 54, The acquisition module 22 will be described below.
  • the first receiving unit 52 is configured to receive a command carrying the number of copies; the second receiving unit 54 is configured to receive the web page information carrying the number of copies.
  • FIG. 6 is a block diagram showing a preferred structure of the generating module 24 in the data storage processing apparatus according to the embodiment of the present invention.
  • the generating module 24 includes a transmitting unit 62 and a generating unit 64, and the generating module 24 is described below. .
  • the transfer unit 62 is configured to transfer the copy number to the HBASE data write file class when the data is written; the generating unit 64 is connected to the transfer unit 62, and generates a corresponding number according to the number of copies in the file class written to the HBASE data. Storage copy.
  • FIG. 7 is a block diagram 3 of a preferred structure of a data storage processing apparatus according to an embodiment of the present invention. As shown in FIG. 7, the apparatus includes a reading module 72 in addition to all the modules shown in FIG. Module 72 is taken to illustrate.
  • the reading module 72 is coupled to the generation module 24 described above and is configured to read a stored copy that is loaded separately according to the number of copies.
  • the HBASE database cannot dynamically set the number of storage copies of each column for the data storage.
  • a dynamic processing method for multiple copies of the HBASE database is provided. It mainly includes the following processing: When creating a table for HBASE, you can set the number of copies of each column. The number of copies of each column does not depend on the unified configuration. Creating a table is to support setting a different number of copies for each column, and storing it in HBASE. In the table definition, the table data is dynamically activated when it is inserted and read, and there is no need to restart the HBASE database.
  • multiple copies of the column store can be dynamically processed, independent of the default copy number set by the underlying storage, and the number of copies of each column can be dynamically processed.
  • the number of copies corresponding to each column family is defined by the Ruby hash attribute key when the table is created.
  • Step 2 When the system detects that it is necessary to separately set the number of copies for each column family, dynamically adjust the definition of the column family by HBASE, and set the copy value to the column family of HBASE.
  • Step 3 When data is written, the system dynamically transfers the number of copies corresponding to the column family to the HBASE data write file class.
  • HBASE writes to the HDFS system
  • the dynamic copy number is transferred to the HDFS, and the HDFS generates a storage copy according to the number of copies.
  • Step 4 When the data is read, the table reads that the number of copies is inconsistent, and when the column family copy is loaded, it is processed separately and does not affect each other.
  • the structure includes an HRegionServer (Distributed Storage Server) and an HDFS.
  • the HRegionServer includes one or more HRegions, and the HRegion includes an HLog and one or more Streo, which includes MemSotore and one or more StoreFiles.
  • the HDFS includes one or more DataNodes (storage nodes).
  • each column in the HBASE table corresponds to a storage file of a storage area, as shown in the figure, a copy of each column.
  • the numbers correspond to different storage files (StroeFile).
  • FIG. 10 is a flow chart of dynamically creating an HBASE multiple copy according to a preferred embodiment of the present invention. As shown in FIG. 10, the flow includes the following steps:
  • Step S1002 creating a table number
  • step S1004 it is determined whether the number of copies of the column family in the created HBASE table is defined.
  • Step S1006 determining whether the number of copies of the column family in the HBASE table is defined
  • Step S1008 parsing the package, and obtaining the number of copies of each column family in the HBASE table
  • Step S1010 Create an HBASE table according to the obtained number of copies, where the number of copies of each column family in the HBASE table is different;
  • step S1012 a corresponding copy file is created according to the number of copies of each column family in the HBASE table.
  • HBASE can create a table in the following ways:
  • the table parameter transmission can be created in real time based on the manner of the WEB page;
  • the HBASE table can also be dynamically created according to the number of copies of each column in the dynamically changed HBASE table.
  • the following describes the manner in which the HBASE dynamically creates the table. Of course, there may be other different implementation manners.
  • the HBASE column family description class supports the newly defined copy parameters
  • the copy parameters are supported in the HBASE creation table interface
  • HBASE generates a store file class to add a parameter with a copy
  • HBASE supports the copy parameter value when calling StoreFile to write the file system.
  • Step 1 The user defines the HBASE table structure, and defines the number of copies for each column;
  • Step 2 The system parses the HBASE table definition parameters, and then removes the number of copies
  • Step 3 HBASE creates a storeFile file according to the table definition parameters
  • Step 4 HBASE submits the distributed file system, and creates a corresponding file according to the number of copies of the storeFile file.
  • modules or steps of the present invention described above can be implemented by a general-purpose computing device that can be centralized on a single computing device or distributed across a network of multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from the order herein.
  • the steps shown or described are performed, or they are separately fabricated into individual integrated circuit modules, or a plurality of modules or steps thereof are fabricated as a single integrated circuit module.
  • the invention is not limited to any specific combination of hardware and software.
  • the above embodiments and preferred embodiments not only solve the related art, but also cannot perform differential storage processing on data when the HBASE processing table is stored, which not only wastes storage resources but also reads data.
  • the problem of low efficiency is that the number of copies of the HBASE column family is set differently, and the data is stored differently, and the storage cost can be effectively reduced without reducing the data writing and reading.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种数据存储处理方法及装置,其中,该方法包括:获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,HBASE表中各个列族存储数据副本的副本数不同;依据获取的副本数生成对数据的存储副本,通过本发明,不仅解决了相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题,进而达到了对HBASE的列族设置了不同的副本数,实现了对数据有区别地存储,在不降低数据写入读取的前提下,能够有效降低存储成本的效果。

Description

数据存储处理方法及装置 技术领域
本发明涉及通信领域,具体而言,涉及一种数据存储处理方法及装置。
背景技术
Hadoop,是一种开源的大数据存储和分析平台,已成为业界处理大数据的事实标准。Hadoop平台包含分布式文件***(Hadoop Distributed File System,简称为HDFS)和MapReduce(并行计算框架)两个重要的子***,其中HDFS为海量的数据提供存储,MapReduce为海量的数据提供计算。
Hadoop存储
Hadoop是一个高度容错的多副本的分布式***,适用于部署在廉价的机器上,并且Hadoop支持机器上多块硬盘的并行数据写入和读取。
因此随着大数据的发展,数据量急剧增加,企业为了减少成本采用Hadoop平台部署在廉价的PC服务器(Server)上,Hadoop采用多副本存储文件,保证了在廉价的设备上文件的可靠性。
HBASE
HBASE是一个分布式、面向列的开源数据库,是基于HDFS之上,提供高可靠性、高性能、列存储、可伸缩、实时读写的数据库***。HBASE是大数据分析平台Hadoop平台生态***中重要的组成部分,并在业界获得了广泛的应用。HBASE在HDFS上存储的模式采用以列为主模式进行存储,每一列对应一个或多个存储文件。下面针对HBASE对数据的存储进行说明。
在HBASE数据库中创建表时,HBASE的处理方案是:创建HBASE表时,***对于所有列数据采用同样副本数的进行存储,同时副本数不允许用户创建表时设置,只能依赖HBASE***默认设置的3副本。即表数据中所有列全部采用3副本存储。
从以上技术方案可以看出,相关技术中的HBASE处理表数据存储的方案存在以下缺点:硬件成本高:对于所有存储在HBASE的表数据采用同一种存储副本,对于重要数据和非重要数据,存储副本相同,大大占用了硬件成本。数据不能差异化处理: 对于热点数据列,希望多副本,以提高读取速度,现在无法实现差异化针对单独数据列进行设置存储副本。
因此,在相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题。
发明内容
本发明提供了一种数据存储处理方法及装置,以至少解决相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题。
根据本发明的一个方面,提供了一种数据存储处理方法,包括:获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,所述HBASE表中各个列族存储数据副本的副本数不同;依据获取的所述副本数生成对所述数据的存储副本。
优选地,在获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数之前,还包括:在建立所述HBASE表时,通过Ruby散列属性值创建所述HBASE表中各个列族对应的副本数属性,依据所述Ruby散列属性值对应的副本数属性获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数。
优选地,在获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数之前,还包括:接收到动态输入的所述副本数。
优选地,通过以下方式至少之一,获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数:接收携带所述副本数的命令;接收携带所述副本数的Web页面信息。
优选地,依据获取的所述副本数生成对所述数据的所述存储副本包括:在数据写入时,将所述副本数传递到HBASE数据写入文件类;依据传递到所述HBASE数据写入文件类中的所述副本数生成对应的所述存储副本。
优选地,在依据获取的所述副本数生成对所述数据的所述存储副本之后,还包括:读取依据所述副本数单独加载的所述存储副本。
根据本发明的另一方面,提供了一种数据存储处理装置,包括:获取模块,设置为获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其 中,所述HBASE表中各个列族存储数据副本的副本数不同;生成模块,设置为依据获取的所述副本数生成对所述数据的存储副本。
优选地,该装置还包括:创建模块,设置为在建立所述HBASE表时,通过Ruby散列属性值创建所述HBASE表中各个列族对应的副本数属性,依据所述Ruby散列属性值对应的副本数属性获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数。
优选地,该装置还包括:接收模块,设置为接收到动态输入的所述副本数。
优选地,所述获取模块包括以下至少之一:第一接收单元,设置为接收携带所述副本数的命令;第二接收单元,设置为接收携带所述副本数的Web页面信息。
优选地,所述生成模块包括:传递单元,设置为在数据写入时,将所述副本数传递到HBASE数据写入文件类;生成单元,设置为依据传递到所述HBASE数据写入文件类中的所述副本数生成对应的所述存储副本。
优选地,该装置还包括:读取模块,设置为读取依据所述副本数单独加载的所述存储副本。
通过本发明,采用获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,所述HBASE表中各个列族存储数据副本的副本数不同;依据获取的所述副本数生成对所述数据的存储副本,不仅解决了相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题,进而达到了对HBASE的列族设置了不同的副本数,实现了对数据有区别地存储,在不降低数据写入读取的前提下,能够有效降低存储成本的效果。
附图说明
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:
图1是根据本发明实施例的数据存储处理方法的流程图;
图2是根据本发明实施例的数据存储处理装置的结构框图;
图3是根据本发明实施例的数据存储处理装置的优选结构框图一;
图4是根据本发明实施例的数据存储处理装置的优选结构框图二;
图5是根据本发明实施例的数据存储处理装置中获取模块22的优选结构框图;
图6是根据本发明实施例的数据存储处理装置中生成模块24的优选结构框图;
图7是根据本发明实施例的数据存储处理装置的优选结构框图三;
图8是根据本发明实施方式的HBASE存储结构示意图;
图9是根据本发明实施方式的HBASE数据逻辑视图;
图10是根据本发明优选实施方式的动态创建HBASE多副本的流程图。
具体实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
在本实施例中提供了一种数据存储处理方法,图1是根据本发明实施例的数据存储处理方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,HBASE表中各个列族存储数据副本的副本数不同;
步骤S104,依据获取的副本数生成对数据的存储副本。
通过上述步骤,通过设置HBASE表中各个列族存储数据副本的副本数不同,即为重要和非重要数据的不同处理提供了基础,不仅解决了相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题,进而达到了对HBASE的列族设置了不同的副本数,实现了对数据有区别地存储,在不降低数据写入读取的前提下,能够有效降低存储成本的效果。
在获取用于存储数据的HBASE表中列族的存储数据副本的副本数之前,还可能涉及以下处理:在建立HBASE表时,通过Ruby散列属性值创建HBASE表中各个列族对应的副本数属性,依据该Ruby散列属性值对应的副本数属性获取用于存储数据的HBASE表中列族的存储数据副本的副本数。需要说明的是,通过上述Ruby散列属 性值的创建,可以接收到动态输入的副本数,依据动态接收到的该副本数,动态地依据该副本数进行数据的存储。
获取用于存储数据的HBASE表中列族的存储数据副本的副本数时,可以采用多种方式,例如,可以采用以下方式至少之一来实现,例如,可以通过命令的方式,即接收携带副本数的命令;也可以通过网页的形式,即接收携带副本数的Web页面信息。
优选地,依据获取的副本数生成对数据的存储副本时也可以采用多种方式,例如,在数据写入时,将副本数传递到HBASE数据写入文件类;依据传递到HBASE数据写入文件类中的副本数生成对应的存储副本。
较优地,在依据获取的副本数生成对数据的存储副本之后,在对数据进行读取时,对各列副本数不一致的该HBASE表进行数据读取时,对各列族的副本单独进行加载读取,即读取依据副本数单独加载的存储副本,各列族之间互不影响。
在本实施例中还提供了一种数据存储处理装置,该装置用于实现上述实施例及优选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图2是根据本发明实施例的数据存储处理装置的结构框图,如图2所示,该装置包括获取模块22和生成模块24,下面对该装置进行说明。
获取模块22,设置为获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,HBASE表中各个列族存储数据副本的副本数不同;生成模块24,连接至上述获取模块22,设置为依据获取的副本数生成对数据的存储副本。
图3是根据本发明实施例的数据存储处理装置的优选结构框图一,如图3所示,该装置除包括图2所示的所有模块外,还包括创建模块32,下面对该创建模块32进行说明。
创建模块32,连接至上述获取模块22,设置为在建立HBASE表时,通过Ruby散列属性值创建HBASE表中各个列族对应的副本数属性,依据Ruby散列属性值对应的副本数属性获取用于存储数据的HBASE表中列族的存储数据副本的副本数。
图4是根据本发明实施例的数据存储处理装置的优选结构框图二,如图4所示,该装置除包括图2所示的所有模块外,还包括接收模块42,下面对该接收模块42进行说明。
接收模块42,连接至上述获取模块22,设置为接收到动态输入的副本数。
图5是根据本发明实施例的数据存储处理装置中获取模块22的优选结构框图,如图5所示,该获取模块22包括以下至少之一:第一接收单元52、第二接收单元54,下面对该获取模块22进行说明。
第一接收单元52,设置为接收携带副本数的命令;第二接收单元54,设置为接收携带副本数的Web页面信息。
图6是根据本发明实施例的数据存储处理装置中生成模块24的优选结构框图,如图6所示,该生成模块24包括传递单元62和生成单元64,下面对该生成模块24进行说明。
传递单元62,设置为在数据写入时,将副本数传递到HBASE数据写入文件类;生成单元64,连接至上述传递单元62,依据传递到HBASE数据写入文件类中的副本数生成对应的存储副本。
图7是根据本发明实施例的数据存储处理装置的优选结构框图三,如图7所示,该装置除包括图2所示的所有模块外,还包括读取模块72,下面对该读取模块72进行说明。
读取模块72,连接至上述生成模块24,设置为读取依据副本数单独加载的存储副本。
针对相关技术中,HBASE数据库对数据存储不能动态设置各个列的存储副本数,在本实施例中,提供了一种HBASE数据库多副本动态处理方法。主要包括如下处理:在对HBASE创建表时,可以设置每一列的副本数,每一列的副本数存储不依赖于统一的配置,创建表是支持针对每一列设置不同的副本数,存储在HBASE的表定义中,在表数据***和读取时动态生效,不需要重启HBASE数据库。通过上述处理,能够对列存储的多副本进行动态处理,不依赖于底层存储设置的默认副本数,并且保证每一列的副本数可以动态处理。
该方案可以采用以下处理步骤实现:
步骤一,HBASE表数据创建定义:HBASE创建表支持Ruby散列属性定义,散列形式:{’key1’=>’value’,’key2’=>’value2’,…}。
例如:create‘testtable’,{NAME=>’colfam1’,VERSION=>1,…}。这是创建表testtable,并创建对应的列族;每个列族有对应的散列属性定义。
在创建表时将每个列族对应的副本数通过Ruby散列属性key进行定义。
步骤二,当***检测到需要对每个列族单独设置副本数时,动态调整HBASE对列族的定义,将副本数值设置到HBASE的列族类中。
步骤三,数据写入时,***将列族对应的副本数动态传输到HBASE数据写入文件类,HBASE写入HDFS***时将动态副本数传递给HDFS,由HDFS根据副本数生成存储副本。
步骤四,数据读出时,针对副本数不一致的表读取,对列族副本加载时,单独处理,互不影响。
通过上述处理,不仅实现了HBASE多列族动态副本设置,提供了重要和非重要数据的不同处理,降低了存储成本,对表数据的精细化管理有了大幅提升,该方法可靠有效,并未降低写入和读取的性能。
下面结合附图对本发明优选实施方式进行说明。
图8是根据本发明实施方式的HBASE存储结构示意图,如图8所示,该结构包括HRegionServer(分布式存储服务器)和HDFS,该HRegionServer包括一个或多个HRegion,该HRegion包括HLog和一个或多个存储器(Stroe),该Store包括MemSotore和一个或多个StoreFile(存储文件)。该HDFS包括一个或多个DataNode(存储节点)。
图9是根据本发明实施方式的HBASE数据逻辑视图,如图9所示,该HBASE表(Table)中的各列对应存储区域(Region)的存储文件,如图中所示,各列的副本数(Column)分别对应于不同的存储文件(StroeFile)。
图10是根据本发明优选实施方式的动态创建HBASE多副本的流程图,如图10所示,该流程包括如下步骤:
步骤S1002,创建表数;
步骤S1004,判断创建HBASE表中的列族的副本数是否已定义?
步骤S1006,确定HBASE表中的列族的副本数是否已定义;
步骤S1008,解析封装,获取HBASE表中的各列族的副本数;
步骤S1010,依据获取的副本数创建HBASE表,其中,HBASE表中各列族的副本数不同;
步骤S1012,依据HBASE表中各列族副本数的不同创建相应的副本文件。
下面基于上述步骤,进行详细阐述。
HBASE创建表可以采用以下方式:
1、shell命令行方式实现,基于HBASE提供的shell的实现,增加支持新的{’REPLICATION’=>’2’},针对每个列族支持设置对应的副本;
2、最终实施创建表命令:create‘testtable’,{NAME=>’colfam1’,VERSION=>1,REPLICATION=>2};
3、同时也可以基于WEB页面的方式实时创建表参数传递;
4、将用户输入参数解析后,传给HBASE的创建表接口。
需要说明的是,依据动态变化的HBASE表中各列的副本数,也可以动态地创建HBASE表,下面对HBASE动态创建表的方式进行描述,当然也可以有其他不同实施方式。
1、HBASE的列族描述类支持新定义的副本参数;
2、HBASE创建表接口中支持副本参数;
3、HBASE生成Store文件的类中增加带副本的参数;
4、HBASE在调用StoreFile写文件***时,支持副本参数值。
另外,在多列族表数据创建时,也可以采用多种方式,例如,可以采用以下的创建方式来实现:
步骤一、用户定义HBASE表结构,对每个列进行副本数定义;
步骤二、***解析HBASE表定义参数,后去副本数;
步骤三、HBASE根据表定义参数创建storeFile文件;
步骤四、HBASE提交分布式文件***,按storeFile文件副本数创建相应的文件。
显然,本领域的技术人员应该明白,上述的本发明的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本发明不限制于任何特定的硬件和软件结合。
以上所述仅为本发明的优选实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
工业实用性
如上所述,通过上述实施例及优选实施方式,不仅解决了相关技术中,对于HBASE处理表数据存储时,无法对数据进行差异化存储处理,不仅存在存储资源的浪费,而且对于数据的读取效率也低的问题,进而达到了对HBASE的列族设置了不同的副本数,实现了对数据有区别地存储,在不降低数据写入读取的前提下,能够有效降低存储成本的效果。

Claims (12)

  1. 一种数据存储处理方法,包括:
    获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,所述HBASE表中各个列族存储数据副本的副本数不同;
    依据获取的所述副本数生成对所述数据的存储副本。
  2. 根据权利要求1所述的方法,其中,在获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数之前,还包括:
    在建立所述HBASE表时,通过Ruby散列属性值创建所述HBASE表中各个列族对应的副本数属性,依据所述Ruby散列属性值对应的副本数属性获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数。
  3. 根据权利要求1所述的方法,其中,在获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数之前,还包括:
    接收到动态输入的所述副本数。
  4. 根据权利要求1所述的方法,其中,通过以下方式至少之一,获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数:
    接收携带所述副本数的命令;
    接收携带所述副本数的Web页面信息。
  5. 根据权利要求1所述的方法,其中,依据获取的所述副本数生成对所述数据的所述存储副本包括:
    在数据写入时,将所述副本数传递到HBASE数据写入文件类;
    依据传递到所述HBASE数据写入文件类中的所述副本数生成对应的所述存储副本。
  6. 根据权利要求1至5中任一项所述的方法,其中,在依据获取的所述副本数生成对所述数据的所述存储副本之后,还包括:
    读取依据所述副本数单独加载的所述存储副本。
  7. 一种数据存储处理装置,包括:
    获取模块,设置为获取用于存储数据的分布式数据库HBASE表中列族的存储数据副本的副本数,其中,所述HBASE表中各个列族存储数据副本的副本数不同;
    生成模块,设置为依据获取的所述副本数生成对所述数据的存储副本。
  8. 根据权利要求7所述的装置,其中,还包括:
    创建模块,设置为在建立所述HBASE表时,通过Ruby散列属性值创建所述HBASE表中各个列族对应的副本数属性,依据所述Ruby散列属性值对应的副本数属性获取用于存储数据的所述HBASE表中列族的存储数据副本的所述副本数。
  9. 根据权利要求7所述的装置,其中,还包括:
    接收模块,设置为接收到动态输入的所述副本数。
  10. 根据权利要求7所述的装置,其中,所述获取模块包括以下至少之一:
    第一接收单元,设置为接收携带所述副本数的命令;
    第二接收单元,设置为接收携带所述副本数的Web页面信息。
  11. 根据权利要求7所述的装置,其中,所述生成模块包括:
    传递单元,设置为在数据写入时,将所述副本数传递到HBASE数据写入文件类;
    生成单元,设置为依据传递到所述HBASE数据写入文件类中的所述副本数生成对应的所述存储副本。
  12. 根据权利要求7至11中任一项所述的装置,其中,还包括:
    读取模块,设置为读取依据所述副本数单独加载的所述存储副本。
PCT/CN2015/075302 2014-08-14 2015-03-27 数据存储处理方法及装置 WO2016023372A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410401504.4 2014-08-14
CN201410401504.4A CN105335450B (zh) 2014-08-14 2014-08-14 数据存储处理方法及装置

Publications (1)

Publication Number Publication Date
WO2016023372A1 true WO2016023372A1 (zh) 2016-02-18

Family

ID=55285978

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/075302 WO2016023372A1 (zh) 2014-08-14 2015-03-27 数据存储处理方法及装置

Country Status (2)

Country Link
CN (1) CN105335450B (zh)
WO (1) WO2016023372A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159273A (zh) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 数据流处理方法、装置、服务器及存储介质
CN113704346A (zh) * 2020-05-20 2021-11-26 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122364B (zh) * 2016-02-25 2021-05-18 华为技术有限公司 数据操作方法和数据管理服务器
CN111046074B (zh) * 2019-12-13 2023-09-01 北京百度网讯科技有限公司 流式数据处理方法、装置、设备和介质
CN112306421B (zh) * 2020-11-20 2021-04-30 昆易电子科技(上海)有限公司 一种用于存储分析测量数据格式mdf文件的方法和***

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187931A (zh) * 2007-12-12 2008-05-28 浙江大学 分布式文件***多文件副本的管理方法
CN103838860A (zh) * 2014-03-19 2014-06-04 华存数据信息技术有限公司 一种基于动态副本策略的文件存储***及其存储方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567495B (zh) * 2011-12-22 2013-08-21 国家电网公司 一种海量信息存储***及实现方法
CN103905517A (zh) * 2012-12-28 2014-07-02 ***通信集团公司 一种数据存储方法及设备

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187931A (zh) * 2007-12-12 2008-05-28 浙江大学 分布式文件***多文件副本的管理方法
CN103838860A (zh) * 2014-03-19 2014-06-04 华存数据信息技术有限公司 一种基于动态副本策略的文件存储***及其存储方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111159273A (zh) * 2019-12-31 2020-05-15 中国联合网络通信集团有限公司 数据流处理方法、装置、服务器及存储介质
CN113704346A (zh) * 2020-05-20 2021-11-26 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备
CN113704346B (zh) * 2020-05-20 2024-06-04 杭州海康威视数字技术股份有限公司 一种Hbase表中冷热数据转换方法、装置及电子设备

Also Published As

Publication number Publication date
CN105335450A (zh) 2016-02-17
CN105335450B (zh) 2020-06-05

Similar Documents

Publication Publication Date Title
US11442942B2 (en) Modified representational state transfer (REST) application programming interface (API) including a customized GraphQL framework
JP7360395B2 (ja) 入力および出力スキーママッピング
US10614041B2 (en) Sync as a service for cloud-based applications
Macedo et al. Redis cookbook: Practical techniques for fast data manipulation
WO2016023372A1 (zh) 数据存储处理方法及装置
US20080243847A1 (en) Separating central locking services from distributed data fulfillment services in a storage system
JP2016529599A (ja) コンテンツクリップボードの同期
CN108287894B (zh) 数据处理方法、装置、计算设备及存储介质
US20160088077A1 (en) Seamless binary object and metadata sync
CN107315972A (zh) 一种大数据非结构化文件动态脱敏方法及***
Yang et al. On construction of a distributed data storage system in cloud
WO2017092384A1 (zh) 一种集群数据库分布式存储的方法和装置
CN106855861A (zh) 一种文件合并方法、装置及电子设备
WO2018094962A1 (zh) 一种迁移文件权限的方法、装置以及***
CN104239508A (zh) 数据查询方法和装置
JP2015180991A (ja) 画像形成装置、画像形成装置の制御方法およびプログラム
US11288003B2 (en) Cross-platform replication of logical units
US20180316756A1 (en) Cross-platform replication of logical units
US20140297953A1 (en) Removable Storage Device Identity and Configuration Information
US11429400B2 (en) User interface metadata from an application program interface
US9537941B2 (en) Method and system for verifying quality of server
US12041190B2 (en) System and method to manage large data in blockchain
WO2022121387A1 (zh) 数据存储方法、装置、服务器及介质
US8566280B2 (en) Grid based replication
US20160164941A1 (en) Method for transcoding mutimedia, and cloud mulimedia transcoding system operating the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15831499

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15831499

Country of ref document: EP

Kind code of ref document: A1