CN116383167A - Method for solving insufficient disk space based on object storage - Google Patents

Method for solving insufficient disk space based on object storage Download PDF

Info

Publication number
CN116383167A
CN116383167A CN202211682796.4A CN202211682796A CN116383167A CN 116383167 A CN116383167 A CN 116383167A CN 202211682796 A CN202211682796 A CN 202211682796A CN 116383167 A CN116383167 A CN 116383167A
Authority
CN
China
Prior art keywords
nfs
data
configuring
node
hbase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211682796.4A
Other languages
Chinese (zh)
Inventor
周振磊
李华健
张艳芳
苏建辉
李宪英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
I Xinnuo Credit Co ltd
Original Assignee
I Xinnuo Credit Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by I Xinnuo Credit Co ltd filed Critical I Xinnuo Credit Co ltd
Priority to CN202211682796.4A priority Critical patent/CN116383167A/en
Publication of CN116383167A publication Critical patent/CN116383167A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • G06F16/183Provision of network file services by network file servers, e.g. by using NFS, CIFS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/119Details of migration of file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/178Techniques for file synchronisation in file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/184Distributed file systems implemented as replicated file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for solving the problem of insufficient disk space based on object storage, which comprises the following steps: the disk is dynamically expanded, the data nodes are newly increased, and finally the odd data nodes are reached; utilizing a physical machine to build an NFS system; the HBase and the NFS are interconnected and intercommunicated; and the two NFS service ends perform data synchronization. The invention has the beneficial effects that: saving cost and avoiding expensive physical machines. An old or cheap physical machine can be adopted to carry the NFS system; the data security is high, and the data security cannot be reduced by adopting the mode; dynamic capacity expansion; the data cold standby is simple, and data service can be provided for the outside as long as the consistency of the data catalogue is ensured.

Description

Method for solving insufficient disk space based on object storage
Technical Field
The invention belongs to the technical field of storage, and particularly relates to a method for solving the problem of insufficient disk space based on object storage.
Background
In the invention patent with the application number of 2018107335697, a Hadoop file system is used for managing the object to be stored by object storage through a data protocol module interface, so that the support of Hadoop and object storage service on the CephFS is realized, and a user can access the data stored by the object storage service through the data protocol module interface. Its object store is mainly focused on the implementation between CephFS and hadoop. The Hadoop cluster can ensure the safety of data, but is inevitably trapped in data redundancy, so that the problem of insufficient disk space is also required to be optimized.
Disclosure of Invention
In view of the foregoing, the present invention aims to overcome the above-mentioned drawbacks of the prior art, and proposes a solution to the problem of insufficient disk space under object-based storage.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
the method for solving the problem of insufficient disk space based on object storage comprises the following steps:
s1, dynamically expanding a disk, adding data nodes, and finally achieving odd data nodes;
s2, utilizing a physical machine to build an NFS system;
s3, interconnection and intercommunication between the HBase and the NFS;
s4, the two NFS service ends perform data synchronization.
Further, in step S1, the dynamic capacity expansion process of the disk is as follows:
s11, newly adding machine environment configuration, and configuring as follows,
s111, an operating system uses a Cloudera Manager;
s112, adopting RAID as a memory of the naminode to protect metadata;
s113, mounting the disk under different catalogues;
s114, deploying the nodes in the same network segment;
s115, modifying the host name and mapping;
s116, setting the starting level to be 3;
s117, starting a firewall and selinux;
s118, configuring the opening quantity of the system files and the maximum process quantity of users;
s119, configuring NTP service to synchronize the newly added node clock with the cluster host clock;
s1110, configuring SSH, and configuring SSH with a host to avoid password login;
s1111, installing jdk and configuring environment variables;
s1112.yum mount dependency;
s1113, the newly added node copies a MySQL drive package;
s1114, creating a cloudera-scm user on the newly added node;
s1115, creating a Parcel package distribution directory on the newly added node;
s1116, disabling the large area compress command and adding to the/etc/rc.local script.
S1117.swappiness parameter is set to 0;
s1118, keeping in mind the problem of Python script, and avoiding abnormality in the process of distributing Parcel packages;
s12, copying a cloudera-manager directory of the slave node to a new node;
s13, configuring and starting Cloudera Manager Agent, wherein the process is as follows,
s131, creating an operation catalog of the agent on all nodes;
s132, copying a Cloudera Manager Agent startup script to a/etc/init.d/directory;
s133, configuring and starting Cloudera Manager Agent;
s14, performing new machine addition and service addition on a management page of a host node ClouderaManager Web;
s15, balancing after the new clusters are stabilized.
Further, in step S3, the interconnection and interworking process between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.
Further, in step S4, the service ends of the two NFSs are mutually cold standby, and perform data synchronization.
Compared with the prior art, the invention has the following advantages:
the method for solving the problem of insufficient disk space based on object storage saves cost and avoids expensive physical machines. An old or cheap physical machine can be adopted to carry the NFS system; the data security is high, and the data security cannot be reduced by adopting the mode; dynamic capacity expansion; the data cold standby is simple, and data service can be provided for the outside as long as the consistency of the data catalogue is ensured.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a diagram of a service architecture for object storage according to an embodiment of the present invention;
fig. 2 is a logic diagram of an NFS system according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
As shown, the method for solving the problem of insufficient disk space based on object storage comprises the following steps:
s1, dynamically expanding a disk, and newly adding data nodes, wherein the data nodes finally reach odd data nodes, and if a cluster has five data nodes, the number of the newly added data nodes is 2N (2, 4 and the like);
s2, utilizing a physical machine to build an NFS system;
s3, interconnection and intercommunication between the HBase and the NFS;
s4, data synchronization is carried out on the two NFS service ends, and data security is ensured.
In step S1, the dynamic capacity expansion process of the disk is as follows:
s11, newly adding machine environment configuration, and configuring as follows,
s111, an operating system uses a Cloudera Manager;
s112. RAID (Redundant Array of Independent Disk, i.e. disk array) is used as the memory of the naminode to protect metadata, but if RAID is used as the storage device of the datinode, it will not bring benefit to HDFS; the inter-node data replication technology provided by the HDFS can meet the data backup requirement, and a redundant mechanism of RAID is not required. Furthermore, although RAID striping technology (RAID 0) is widely used to improve performance, it is still slower than the JBOD (Just a Bunch Of Disks) configuration used in HDFS;
JBOD cyclically schedules HDFS blocks among all disks; the read-write operation of RAID 0 is limited by the speed of the slowest disk in the disk array, and the disk operations of JBOD are independent, so that the average read-write speed is higher than the read-write speed of the slowest disk; it is emphasized that there is always a considerable difference in performance of individual disks in actual use, even for the same model of disk; finally, if a certain disk configured by JBOD fails, HDFS can ignore the disk and continue to work; a failure of one disk of a RAID may render the entire disk array unusable, thereby disabling the corresponding node.
S113, mounting the disk under different catalogues;
s114, deploying the nodes in the same network segment;
s115, modifying the host name and mapping;
s116, setting the starting level to be 3;
s117, starting a firewall and selinux;
s118, configuring the opening quantity of the system files and the maximum process quantity of users;
vi/etc/security/limits.conf
*soft nofile 65536
*hard nofile 65536
*soft nproc 16384
*hard nproc 16384
s119, configuring NTP service to synchronize the newly added node clock with the cluster host clock;
s1110, configuring SSH, and configuring SSH with a host to avoid password login;
s1111, installing jdk and configuring environment variables;
s1112.yum mount dependency;
s1113, the newly added node copies a MySQL drive package;
s1114, creating a cloudera-scm user on the newly added node;
s1115, creating a Parcel package distribution directory on the newly added node;
s1116, disabling the large area compress command and adding to the/etc/rc.local script.
S1117.swappiness parameter is set to 0;
the value of swappeness is modified, the following refers to the operation example:
1) Temporary setting (failure after restarting)
Query command #: sysctl-a|grep vm
The results show that: sm.swappiness=30
Temporary setting # echo 10 >/proc/sys/vm/swappness
Query command #sysctl-a|grep vm
The results show #vm. Swappeness=10
Note that: must log in with root user;
the alternative method is as follows:
#:sysctl-w vm.swappiness=10
vm.swappiness=10
#:cat/proc/sys/vm/swappiness
results: 10
2) Permanent set
Editing in/etc/sysctl. Conf, adding the following parameters (if present) vm. Swappiness = 10;
the machine is then restarted or run #: the sysctl-p command is asserted. Actually set on line is vm.swappiness=0;
s1118, keeping in mind the problem of Python script, and avoiding abnormality in the process of distributing Parcel packages; the best processing method is to directly replace the processed script of the new machine by scp, so that the abnormality of Parcel distribution can not occur during subsequent installation;
s12, copying a cloudera-manager directory of the slave node to a new node; copying a cloudera-manager file on a slave machine to a new machine/opt/directory through scp;
s13, configuring and starting Cloudera Manager Agent, wherein the process is as follows,
s131, creating an operation catalog of the agent on all nodes;
s132, copying a Cloudera Manager Agent startup script to a/etc/init.d/directory;
s133, configuring and starting Cloudera Manager Agent; here, the startup Agent may cause startup failure due to various reasons, so that it is necessary to check the log file positioning error information to repair;
s14, performing new machine addition and service addition on a management page of a host node ClouderaManager Web;
s15, balancing after the new clusters are stabilized, observing after successful machine capacity expansion, component service deployment and the like, and if the clusters are stabilized after observing a section of the new clusters, balancing the clusters.
In step S3, the interconnection and interworking process between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.
In step S4, the service ends of the two NFS are cold standby, and perform data synchronization to ensure data security. In this embodiment, when one machine fails, the other machine can be on top at any time, and data is not lost, and the service ends of the two NFSs are cold standby. In addition, there is a limitation of storage rule, and the data in NFS is simply file stream after snappy compression, so that the real data can be obtained only according to rules such as service, time, rowkey, file type, etc.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (4)

1. The method for solving the problem of insufficient disk space based on object storage is characterized by comprising the following steps:
s1, dynamically expanding a disk, adding data nodes, and finally achieving odd data nodes;
s2, utilizing a physical machine to build an NFS system;
s3, interconnection and intercommunication between the HBase and the NFS;
s4, the two NFS service ends perform data synchronization.
2. The method for solving the problem of insufficient disk space based on object storage according to claim 1, wherein in step S1, the dynamic expansion process of the disk is as follows:
s11, newly adding machine environment configuration, and configuring as follows,
s111, an operating system uses a Cloudera Manager;
s112, adopting RAID as a memory of the naminode to protect metadata;
s113, mounting the disk under different catalogues;
s114, deploying the nodes in the same network segment;
s115, modifying the host name and mapping;
s116, setting the starting level to be 3;
s117, starting a firewall and selinux;
s118, configuring the opening quantity of the system files and the maximum process quantity of users;
s119, configuring NTP service to synchronize the newly added node clock with the cluster host clock;
s1110, configuring SSH, and configuring SSH with a host to avoid password login;
s1111, installing jdk and configuring environment variables;
s1112.yum mount dependency;
s1113, the newly added node copies a MySQL drive package;
s1114, creating a cloudera-scm user on the newly added node;
s1115, creating a Parcel package distribution directory on the newly added node;
s1116, disabling the large area compress command and adding to the/etc/rc.local script.
S1117.swappiness parameter is set to 0;
s1118, keeping in mind the problem of Python script, and avoiding abnormality in the process of distributing Parcel packages;
s12, copying a cloudera-manager directory of the slave node to a new node;
s13, configuring and starting Cloudera Manager Agent, wherein the process is as follows,
s131, creating an operation catalog of the agent on all nodes;
s132, copying a Cloudera Manager Agent startup script to a/etc/init.d/directory;
s133, configuring and starting Cloudera Manager Agent;
s14, performing new machine addition and service addition on a management page of a host node ClouderaManager Web;
s15, balancing after the new clusters are stabilized.
3. The method for solving the shortage of disk space based on object storage according to claim 1, wherein in step S3, the interconnection and interworking procedure between HBase and NFS is as follows: firstly, NFS needs to be mounted on a data node deployed by HBase, secondly, the NFS is migrated to a directory address on the NFS, the NFS is updated to an HBase table, finally, data is checked and verified, finally, a client acquires an NFS path through an HBase API, and real data is acquired according to the acquired path.
4. The method for solving the problem of insufficient disk space based on object storage according to claim 1, wherein: in step S4, the service ends of the two NFS are cold standby, and perform data synchronization.
CN202211682796.4A 2022-12-27 2022-12-27 Method for solving insufficient disk space based on object storage Pending CN116383167A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211682796.4A CN116383167A (en) 2022-12-27 2022-12-27 Method for solving insufficient disk space based on object storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211682796.4A CN116383167A (en) 2022-12-27 2022-12-27 Method for solving insufficient disk space based on object storage

Publications (1)

Publication Number Publication Date
CN116383167A true CN116383167A (en) 2023-07-04

Family

ID=86968095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211682796.4A Pending CN116383167A (en) 2022-12-27 2022-12-27 Method for solving insufficient disk space based on object storage

Country Status (1)

Country Link
CN (1) CN116383167A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910016A (en) * 2023-09-14 2023-10-20 交通运输部北海航海保障中心天津通信中心 AIS data processing method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116910016A (en) * 2023-09-14 2023-10-20 交通运输部北海航海保障中心天津通信中心 AIS data processing method
CN116910016B (en) * 2023-09-14 2024-06-11 交通运输部北海航海保障中心天津通信中心 AIS data processing method

Similar Documents

Publication Publication Date Title
US20200301589A1 (en) Cluster configuration information replication
US10175910B2 (en) Method and apparatus for restoring an instance of a storage server
US11354336B2 (en) Fault-tolerant key management system
US8473596B2 (en) Method and apparatus for web based storage on-demand
WO2018040591A1 (en) Remote data replication method and system
US7007047B2 (en) Internally consistent file system image in distributed object-based data storage
US7036039B2 (en) Distributing manager failure-induced workload through the use of a manager-naming scheme
US8060776B1 (en) Mirror split brain avoidance
US20090222509A1 (en) System and Method for Sharing Storage Devices over a Network
US20070022138A1 (en) Client failure fencing mechanism for fencing network file system data in a host-cluster environment
JP5516575B2 (en) Data insertion system
US7702757B2 (en) Method, apparatus and program storage device for providing control to a networked storage architecture
US11409708B2 (en) Gransets for managing consistency groups of dispersed storage items
JP2003515813A5 (en)
CN116383167A (en) Method for solving insufficient disk space based on object storage
US20040210605A1 (en) Method and system for high-availability database
US7080197B2 (en) System and method of cache management for storage controllers
US11449398B2 (en) Embedded container-based control plane for clustered environment
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN1567198A (en) Method for mirror backup of cluster platform cross parallel system
US8849763B1 (en) Using multiple clients for data backup
CN105844178A (en) JBOD mass storage data security method
Austin et al. Oracle Clusterware and RAC Administration and Deployment Guide, 10g Release 2 (10.2) B14197-02
Austin et al. Oracle® Clusterware and Oracle RAC Administration and Deployment Guide, 10g Release 2 (10.2) B14197-07
Austin et al. Oracle Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide, 10g Release 2 (10.2) B14197-10

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination