CN116383167A

CN116383167A - Method for solving insufficient disk space based on object storage

Info

Publication number: CN116383167A
Application number: CN202211682796.4A
Authority: CN
Inventors: 周振磊; 李华健; 张艳芳; 苏建辉; 李宪英
Original assignee: I Xinnuo Credit Co ltd
Current assignee: I Xinnuo Credit Co ltd
Priority date: 2022-12-27
Filing date: 2022-12-27
Publication date: 2023-07-04

Abstract

The invention provides a method for solving the problem of insufficient disk space based on object storage, which comprises the following steps: the disk is dynamically expanded, the data nodes are newly increased, and finally the odd data nodes are reached; utilizing a physical machine to build an NFS system; the HBase and the NFS are interconnected and intercommunicated; and the two NFS service ends perform data synchronization. The invention has the beneficial effects that: saving cost and avoiding expensive physical machines. An old or cheap physical machine can be adopted to carry the NFS system; the data security is high, and the data security cannot be reduced by adopting the mode; dynamic capacity expansion; the data cold standby is simple, and data service can be provided for the outside as long as the consistency of the data catalogue is ensured.

Description

Method for solving insufficient disk space based on object storage

Technical Field

The invention belongs to the technical field of storage, and particularly relates to a method for solving the problem of insufficient disk space based on object storage.

Background

In the invention patent with the application number of 2018107335697, a Hadoop file system is used for managing the object to be stored by object storage through a data protocol module interface, so that the support of Hadoop and object storage service on the CephFS is realized, and a user can access the data stored by the object storage service through the data protocol module interface. Its object store is mainly focused on the implementation between CephFS and hadoop. The Hadoop cluster can ensure the safety of data, but is inevitably trapped in data redundancy, so that the problem of insufficient disk space is also required to be optimized.

Disclosure of Invention

In view of the foregoing, the present invention aims to overcome the above-mentioned drawbacks of the prior art, and proposes a solution to the problem of insufficient disk space under object-based storage.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

the method for solving the problem of insufficient disk space based on object storage comprises the following steps:

s1, dynamically expanding a disk, adding data nodes, and finally achieving odd data nodes;

s2, utilizing a physical machine to build an NFS system;

s3, interconnection and intercommunication between the HBase and the NFS;

s4, the two NFS service ends perform data synchronization.

Further, in step S1, the dynamic capacity expansion process of the disk is as follows:

s11, newly adding machine environment configuration, and configuring as follows,

s111, an operating system uses a Cloudera Manager;

s112, adopting RAID as a memory of the naminode to protect metadata;

s113, mounting the disk under different catalogues;

s114, deploying the nodes in the same network segment;

s115, modifying the host name and mapping;

s116, setting the starting level to be 3;

s117, starting a firewall and selinux;

s118, configuring the opening quantity of the system files and the maximum process quantity of users;

s119, configuring NTP service to synchronize the newly added node clock with the cluster host clock;

s1110, configuring SSH, and configuring SSH with a host to avoid password login;

s1111, installing jdk and configuring environment variables;

s1112.yum mount dependency;

s1113, the newly added node copies a MySQL drive package;

s1114, creating a cloudera-scm user on the newly added node;

s1115, creating a Parcel package distribution directory on the newly added node;

s1116, disabling the large area compress command and adding to the/etc/rc.local script.

S1117.swappiness parameter is set to 0;

s1118, keeping in mind the problem of Python script, and avoiding abnormality in the process of distributing Parcel packages;

s12, copying a cloudera-manager directory of the slave node to a new node;

s13, configuring and starting Cloudera Manager Agent, wherein the process is as follows,

s131, creating an operation catalog of the agent on all nodes;

s132, copying a Cloudera Manager Agent startup script to a/etc/init.d/directory;

s133, configuring and starting Cloudera Manager Agent;

s14, performing new machine addition and service addition on a management page of a host node ClouderaManager Web;

s15, balancing after the new clusters are stabilized.

Further, in step S3, the interconnection and interworking process between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.

Further, in step S4, the service ends of the two NFSs are mutually cold standby, and perform data synchronization.

Compared with the prior art, the invention has the following advantages:

the method for solving the problem of insufficient disk space based on object storage saves cost and avoids expensive physical machines. An old or cheap physical machine can be adopted to carry the NFS system; the data security is high, and the data security cannot be reduced by adopting the mode; dynamic capacity expansion; the data cold standby is simple, and data service can be provided for the outside as long as the consistency of the data catalogue is ensured.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:

FIG. 1 is a diagram of a service architecture for object storage according to an embodiment of the present invention;

fig. 2 is a logic diagram of an NFS system according to an embodiment of the present invention.

Detailed Description

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

The invention will be described in detail below with reference to the drawings in connection with embodiments.

As shown, the method for solving the problem of insufficient disk space based on object storage comprises the following steps:

s1, dynamically expanding a disk, and newly adding data nodes, wherein the data nodes finally reach odd data nodes, and if a cluster has five data nodes, the number of the newly added data nodes is 2N (2, 4 and the like);

s2, utilizing a physical machine to build an NFS system;

s3, interconnection and intercommunication between the HBase and the NFS;

s4, data synchronization is carried out on the two NFS service ends, and data security is ensured.

In step S1, the dynamic capacity expansion process of the disk is as follows:

s111, an operating system uses a Cloudera Manager;

s112. RAID (Redundant Array of Independent Disk, i.e. disk array) is used as the memory of the naminode to protect metadata, but if RAID is used as the storage device of the datinode, it will not bring benefit to HDFS; the inter-node data replication technology provided by the HDFS can meet the data backup requirement, and a redundant mechanism of RAID is not required. Furthermore, although RAID striping technology (RAID 0) is widely used to improve performance, it is still slower than the JBOD (Just a Bunch Of Disks) configuration used in HDFS;

JBOD cyclically schedules HDFS blocks among all disks; the read-write operation of RAID 0 is limited by the speed of the slowest disk in the disk array, and the disk operations of JBOD are independent, so that the average read-write speed is higher than the read-write speed of the slowest disk; it is emphasized that there is always a considerable difference in performance of individual disks in actual use, even for the same model of disk; finally, if a certain disk configured by JBOD fails, HDFS can ignore the disk and continue to work; a failure of one disk of a RAID may render the entire disk array unusable, thereby disabling the corresponding node.

S113, mounting the disk under different catalogues;

s114, deploying the nodes in the same network segment;

s115, modifying the host name and mapping;

s116, setting the starting level to be 3;

s117, starting a firewall and selinux;

vi/etc/security/limits.conf

*soft nofile 65536

*hard nofile 65536

*soft nproc 16384

*hard nproc 16384

s1111, installing jdk and configuring environment variables;

s1112.yum mount dependency;

s1113, the newly added node copies a MySQL drive package;

s1114, creating a cloudera-scm user on the newly added node;

S1117.swappiness parameter is set to 0;

the value of swappeness is modified, the following refers to the operation example:

1) Temporary setting (failure after restarting)

Query command #: sysctl-a|grep vm

The results show that: sm.swappiness=30

Temporary setting # echo 10 >/proc/sys/vm/swappness

Query command #sysctl-a|grep vm

The results show #vm. Swappeness=10

Note that: must log in with root user;

the alternative method is as follows:

#：sysctl-w vm.swappiness＝10

vm.swappiness＝10

#：cat/proc/sys/vm/swappiness

results: 10

2) Permanent set

Editing in/etc/sysctl. Conf, adding the following parameters (if present) vm. Swappiness = 10;

the machine is then restarted or run #: the sysctl-p command is asserted. Actually set on line is vm.swappiness=0;

s1118, keeping in mind the problem of Python script, and avoiding abnormality in the process of distributing Parcel packages; the best processing method is to directly replace the processed script of the new machine by scp, so that the abnormality of Parcel distribution can not occur during subsequent installation;

s12, copying a cloudera-manager directory of the slave node to a new node; copying a cloudera-manager file on a slave machine to a new machine/opt/directory through scp;

s131, creating an operation catalog of the agent on all nodes;

s133, configuring and starting Cloudera Manager Agent; here, the startup Agent may cause startup failure due to various reasons, so that it is necessary to check the log file positioning error information to repair;

s15, balancing after the new clusters are stabilized, observing after successful machine capacity expansion, component service deployment and the like, and if the clusters are stabilized after observing a section of the new clusters, balancing the clusters.

In step S3, the interconnection and interworking process between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.

In step S4, the service ends of the two NFS are cold standby, and perform data synchronization to ensure data security. In this embodiment, when one machine fails, the other machine can be on top at any time, and data is not lost, and the service ends of the two NFSs are cold standby. In addition, there is a limitation of storage rule, and the data in NFS is simply file stream after snappy compression, so that the real data can be obtained only according to rules such as service, time, rowkey, file type, etc.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. The method for solving the problem of insufficient disk space based on object storage is characterized by comprising the following steps:

s2, utilizing a physical machine to build an NFS system;

s3, interconnection and intercommunication between the HBase and the NFS;

s4, the two NFS service ends perform data synchronization.

2. The method for solving the problem of insufficient disk space based on object storage according to claim 1, wherein in step S1, the dynamic expansion process of the disk is as follows:

s111, an operating system uses a Cloudera Manager;

s112, adopting RAID as a memory of the naminode to protect metadata;

s113, mounting the disk under different catalogues;

s114, deploying the nodes in the same network segment;

s115, modifying the host name and mapping;

s116, setting the starting level to be 3;

s117, starting a firewall and selinux;

s1111, installing jdk and configuring environment variables;

s1112.yum mount dependency;

s1113, the newly added node copies a MySQL drive package;

s1114, creating a cloudera-scm user on the newly added node;

S1117.swappiness parameter is set to 0;

s12, copying a cloudera-manager directory of the slave node to a new node;

s131, creating an operation catalog of the agent on all nodes;

s133, configuring and starting Cloudera Manager Agent;

s15, balancing after the new clusters are stabilized.

3. The method for solving the shortage of disk space based on object storage according to claim 1, wherein in step S3, the interconnection and interworking procedure between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.

4. The method for solving the problem of insufficient disk space based on object storage according to claim 1, wherein: in step S4, the service ends of the two NFS are cold standby, and perform data synchronization.