CN115623081A - Data downloading method, data uploading method and distributed storage system - Google Patents

Data downloading method, data uploading method and distributed storage system Download PDF

Info

Publication number
CN115623081A
CN115623081A CN202110809233.6A CN202110809233A CN115623081A CN 115623081 A CN115623081 A CN 115623081A CN 202110809233 A CN202110809233 A CN 202110809233A CN 115623081 A CN115623081 A CN 115623081A
Authority
CN
China
Prior art keywords
cluster
file
gateway
object storage
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110809233.6A
Other languages
Chinese (zh)
Inventor
龙小斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Original Assignee
Guangzhou Shiyuan Electronics Thecnology Co Ltd
Guangzhou Shirui Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Shiyuan Electronics Thecnology Co Ltd, Guangzhou Shirui Electronics Co Ltd filed Critical Guangzhou Shiyuan Electronics Thecnology Co Ltd
Priority to CN202110809233.6A priority Critical patent/CN115623081A/en
Publication of CN115623081A publication Critical patent/CN115623081A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24573Query processing with adaptation to user needs using data annotations, e.g. user-defined metadata
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Library & Information Science (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data downloading method, an uploading method and a distributed storage system. Wherein, the method comprises the following steps: the first object storage gateway receives an access request; the first object storage gateway acquires a target file identifier in the access request; the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and inquires metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; the first object storage gateway determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; the first object storage gateway forwards the access request to a second object storage gateway in the target distributed cluster to obtain the target file from the second object storage gateway.

Description

Data downloading method, data uploading method and distributed storage system
Technical Field
The invention relates to the field of data storage, in particular to a data downloading method, an uploading method and a distributed storage system.
Background
Ceph (distributed file system) is a very popular open source distributed storage system, which has a good design concept, avoids a single point of failure of each component in the architectural design, and provides good scalability and performance. As a PB level object storage system, the method can be used in a production environment with little improvement on an underlying architecture. However, the existing Ceph system has an expansibility problem, and when the number of files stored in the Ceph system is large, the performance of the Ceph system is affected.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a data downloading method, an uploading method and a distributed storage system, which at least solve the technical problem that performance is influenced by too many bucket storage files of each RGW due to the fact that the RGW stores metadata of the files.
According to an aspect of an embodiment of the present invention, a data downloading method is provided, where the method is applied to a distributed storage system, and the distributed storage system includes: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the method comprising: the first object storage gateway receives an access request; the first object storage gateway acquires a target file identifier in the access request; the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and inquires metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; the first object storage gateway determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; the first object storage gateway forwards the access request to a second object storage gateway in the target distributed cluster to obtain the target file from the second object storage gateway.
Optionally, the distributed storage system includes: a load balancing gateway; the first object storage gateway receives an access request, comprising: the first object storage gateway receives the access request forwarded from the load balancing gateway.
Optionally, before the first object storage gateway receives the access request forwarded from the load balancing gateway, the method further includes: the load balancing gateway receives a plurality of access requests; and selecting a first object storage gateway from a preset gateway list according to a preset weight, wherein the preset weight is the weight of each gateway in the preset gateway list, and the larger the weight is, the larger the number of distributed access requests subjected to parallel processing is.
Optionally, the determining, by the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster from among the plurality of distributed storage clusters corresponding to the first object storage gateway includes: the first object storage gateway determines a target distributed storage cluster corresponding to the metadata from a cluster information table in a target relational database; the second object storage gateway is determined from a table of metadata information in the target relational database.
Optionally, the cluster information table includes at least one of the following: cluster identification, readable and writable states of the clusters and cluster address information; the metadata information table comprises at least one of the following: the cluster mark of the target file, the file information of the target file, the writing state of the file and the file size.
Optionally, the readable and writable states of the cluster and the writing state of the file are determined by: a relational database in the relational database cluster responds to a modification request of a target object and modifies the readable and writable states of the cluster and the writing state of a file; and when a trigger event is detected, triggering to execute an operation corresponding to the read-write state of the modified cluster and the write-in state of the file on the file stored by the second object storage gateway.
According to another aspect of the embodiments of the present invention, there is provided a data uploading method, where the method is applied to a distributed storage system, and the distributed storage system includes: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the method comprising: the first object storage gateway receives a file uploading request; the first object storage gateway acquires metadata of a file to be uploaded corresponding to the file uploading request; and the first object storage gateway sends the metadata to the relational database cluster for storage, and sends the data to be uploaded to a second object storage gateway in the distributed storage cluster.
Optionally, the file uploading request is multiple; before sending the data to be uploaded to the second object storage gateway in the distributed storage cluster, the method further includes: the method comprises the steps that a load balancing gateway receives a plurality of file uploading requests; and selecting a first object storage gateway from a preset gateway list according to a preset weight, wherein the preset weight is the weight of each gateway in the preset gateway list, and the larger the weight is, the larger the quantity of distributed file uploading requests subjected to parallel processing is.
Optionally, the method further comprises: when the uploading type of the file to be uploaded is fragmented uploading, the first object storage gateway acquires first type metadata from the file uploading request; after all fragments corresponding to the file to be uploaded are uploaded, sending a request message to a second object storage gateway, and receiving second type metadata corresponding to the request message and fed back by the second object storage gateway; and when the uploading type of the file to be uploaded is uploading in a form mode, acquiring first type metadata from the data to be uploaded.
Optionally, the first type of metadata includes: the method comprises the steps of obtaining cluster identification corresponding to a file to be uploaded, file information of the file to be uploaded and a writing state of the file; the second type of metadata includes: the file size.
According to another aspect of the embodiments of the present invention, there is provided a distributed storage system, including: the system comprises a first object storage gateway, a distributed storage cluster and a relational database cluster; the first object storage gateway is used for receiving an access request; acquiring a target file identifier in the access request; determining a target relational database corresponding to the target file identifier from the relational database cluster, and inquiring metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; determining a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; forwarding the access request to a second object storage gateway in the target distributed cluster so as to acquire a target file from the second object storage gateway; a relational database cluster for storing metadata; and the distributed storage cluster is used for storing the target file.
According to another aspect of the embodiments of the present invention, there is provided a data downloading apparatus, which is applied to a first object storage gateway in a distributed storage system, where the distributed storage system includes: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the apparatus comprising: the first processing module is used for receiving an access request; the acquisition module is used for acquiring the target file identifier in the access request; the first searching module is used for determining a target relational database corresponding to the target file identifier from the relational database cluster and inquiring metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; the second searching module determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; the second processing module forwards the access request to a second object storage gateway in the target distributed cluster so as to obtain the target file from the second object storage gateway.
According to another aspect of the embodiments of the present invention, there is provided a data uploading apparatus, where the apparatus is applied to a first object storage gateway in a distributed storage system, and the distributed storage system includes: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the apparatus comprising: the processing module is used for receiving a file uploading request; the acquisition module is used for acquiring metadata of a file to be uploaded corresponding to the file uploading request; and the sending module is used for sending the metadata to the relational database cluster for storage and sending the data to be uploaded to a second object storage gateway in the distributed storage cluster.
According to another aspect of the embodiments of the present invention, there is provided a nonvolatile storage medium including a stored program, wherein a device in which the nonvolatile storage medium is located is controlled to execute a data download method when the program is executed.
In the embodiment of the invention, a first object storage gateway, a distributed storage cluster and a relational database cluster are adopted, and the method comprises the following steps: the first object storage gateway receives an access request; the first object storage gateway acquires a target file identifier in the access request; the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and inquires metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; the first object storage gateway determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; the first object storage gateway forwards the access request to a second object storage gateway in the target distributed cluster, and achieves the purpose of avoiding storing the metadata of the file in the RGW by storing the metadata in the relational database cluster in a mode of acquiring the target file from the second object storage gateway, thereby achieving the technical effect of improving the performance of the database, and further solving the technical problem that the performance is influenced by the fact that the number of the files stored in the bucket of each RGW is too large due to the fact that the metadata of the file is stored by the RGW.
In addition, the capacity expansion is carried out by taking the Ceph cluster as a unit in the embodiment of the invention, so that the problem of data balance caused by the capacity expansion is avoided.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention and do not constitute a limitation of the invention. In the drawings:
fig. 1 is an architecture diagram of a distributed storage system according to the related art;
FIG. 2 is an architecture diagram of a distributed storage system according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating a data downloading method according to an embodiment of the present application;
fig. 4 is a schematic flow chart of a data uploading method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a data downloading device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a data uploading device according to an embodiment of the present application;
fig. 7 is a flowchart of the operation of a storage gateway in processing a request according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in other sequences than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For better understanding of the embodiments of the present application, the technical terms referred to in the embodiments of the present application are briefly described as follows:
metadata (Metadata), also called intermediary data and relay data, is data (data about data) describing data, and is mainly information describing data attribute (property) for supporting functions such as indicating storage location, history data, resource search, file record, and the like. Metadata is an electronic catalog, and in order to achieve the purpose of creating a catalog, the contents or features of data must be described and collected, so as to achieve the purpose of assisting data retrieval.
The CRUSH is an algorithm based on pseudo-random control data distribution and replication, and is designed for solving the problems of balanced distribution, load, maximized system performance, system expansion and hardware fault tolerance of data. In a Ceph cluster, CRUSH requires a compact and hierarchical device description, including storage clusters and copy placement strategies, to efficiently map data objects onto large storage devices, and this process is done in a distributed manner.
Kubernets, K8s for short, is an open source for managing containerized applications on multiple hosts in a cloud platform. Kubernetes provides a mechanism for application deployment, planning, updating and maintenance in a container deployment mode. Specifically, each container is isolated from another container, each container has a file system of the container, processes among the containers cannot influence each other, and computing resources can be distinguished. And compared with a virtual machine, the container can be rapidly deployed, and in addition, because the container is decoupled with underlying facilities and a machine file system, the container can be migrated between different clouds and different versions of operating systems.
OpenResty, a software platform based on nginx and various extension modules. Since the OpenResty is packaged with lua/luajit, the lua script can be used on the openResty platform, and by means of the asynchronous non-blocking function of nginx, the lua is used for asynchronously and concurrently accessing a relational database cluster at the back end, such as services of MySQL, postgreSQL, memcached, redis and the like; the number of http connections of the browser can be reduced, and the browser can asynchronously and concurrently access background Java/PHP/Python interfaces and the like.
The Sharding-Sphere is an ecosystem consisting of a set of open source distributed database middleware solutions, and consists of 3 parts, namely Sharding-JDBC, sharding-Proxy and Sharding-Sidecar. The Sharding-Sphere can provide standardized data fragmentation, distributed transaction and database governance functions, and can be suitable for various diversified application scenarios such as Java isomorphism, heterogeneous languages, containers, cloud protogenesis and the like. In addition, shading-Sphere is positioned as a relational database middleware for fully and reasonably utilizing the computation and storage capabilities of a relational database in a distributed scenario, rather than implementing a completely new relational database, which is a concurrent, rather than mutually exclusive relationship with NoSQL and NewSQL.
Example 1
Fig. 1 is an architecture diagram of a distributed storage system according to the related art, in which a Rack in a crushmap corresponds to a Rack in a physical environment; node is a Ceph storage server; RGW is the object storage gateway component of Ceph; mon is the monitor component of Ceph; OSD is data storage component of Ceph; loadbalance is an external traffic ingress that proxies external http requests back to the RGW. It should be noted that each physical server may deploy multiple different types of components simultaneously.
However, this architecture has scalability problems, and there are two general ways of expansion: 1. respectively adding nodes to each Rack, and rebalancing the data of the whole cluster according to the crushmap; 2. and respectively adding nodes to each Rack, creating a new Crush Rule, and then creating a new Pool by using the new Crush Rule, thereby realizing capacity expansion.
The above two expansion schemes have the following problems:
for the first scheme, each expansion leads to large-scale data migration, so that the performance of the cluster is influenced, and the first scheme is not suitable for a storage environment above a PB level; although the second scheme avoids data migration and realizes cluster expansion to a certain extent, the second scheme does not expand the volume of the previously created bucket of the user, and the user needs to newly create the bucket to use the space of the expanded volume part.
In addition, under the influence of the reading and writing performance of the metadata of the file, the performance is greatly influenced if the number of the files stored in the bucket of each RGW exceeds 5000 thousands.
In order to solve the above technical problem, an embodiment of the present application provides a new distributed storage system architecture, in which a first object storage gateway is newly added, and is used for receiving an access request; acquiring a target file identifier in the access request; determining a target relational database corresponding to the target file identifier from the relational database cluster, and inquiring metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; determining a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; and forwarding the access request to a second object storage gateway in the target distributed cluster to obtain the target file from the second object storage gateway.
Specifically, as shown in fig. 2, the distributed storage system provided in the embodiment of the present application includes: a first object storage gateway 20, a distributed storage cluster 22, and a relational database cluster 24; wherein, the first object storage gateway 20 is configured to receive an access request; acquiring a target file identifier in the access request; determining a target relational database 240 corresponding to the target file identifier from the relational database cluster 24, and querying metadata of a target file corresponding to the target file identifier from the target relational database 240, wherein the metadata is description information of the target file; determining a target distributed storage cluster 22 corresponding to the metadata and a second object storage gateway 220 in the target distributed storage cluster 22 from the plurality of distributed storage clusters 22 corresponding to the first object storage gateway 20; and forward the access request to the second object storage gateway 220 in the target distributed cluster 22 to obtain the target file from the second object storage gateway 220; a relational database cluster 24 for storing metadata; and the distributed storage cluster 22 is used for storing the target file.
In some embodiments of the present application, the second object storage gateway 220 may be an RGW, the first object storage gateway 20 may be a load gateway composed of openness + lua, the target relational database 240 may be a MySQL database, and the distributed storage cluster 22 is a Ceph cluster.
In some embodiments of the present application, the work flow of the distributed storage system in processing a specific request is shown in fig. 7. Wherein the flowchart shown in fig. 7 describes the 11 stages involved in the operation of the first object storage gateway 20 and the corresponding instructions (Order of lua Nginx modules Directives) for each stage.
Specifically, an init _ by _ Lua phase and an init _ worker _ by _ Lua phase are used to implement initialization setting of the first object storage gateway 20, where the init _ by _ Lua phase is generally used to initialize the global configuration/preloading Lua module; the init _ worker _ by _ lua phase is generally used to pull configuration and data at regular time, or perform health check on an upstream service, and the first-class object storage gateway 20 obtains the Ceph cluster endpoint information and bucket related information from MySQL at regular time at this phase.
The following phases are involved in accessing the distributed storage system through the first object storage gateway 20: the SSL _ certificate _ by _ lua phase is typically used to perform special processing on the SSL, such as version restriction; the set _ by _ lua stage is used for setting the nginx variable, and can realize complex assignment logic; the rewrite _ by _ lua phase is used to implement complex forwarding, redirection, caching, etc. functions (e.g., proxying a particular request to an external network); the access _ by _ lua phase is used for centralized processing of IP admission, interface authority and other conditions.
When a target user operates on data in the distributed storage system through the first object storage gateway 20, the first object storage gateway 20 may involve the following working phases: the first object gateway 20 in the content _ by _ lua stage may be regarded as a content processor, configured to receive a request, process the request, and output a response, where the first object storage gateway 20 implements a core function at this stage (e.g., initiates an http request to the second object storage gateway; obtains file metadata from MySQL, and updates the file metadata); the balance _ by _ lua stage is used for dynamic load balancing; the header _ filter _ by _ lua phase is used for customizing a response header and a cookie of the http request; the body _ filter _ by _ lua phase is used for customizing a response body of the http request; the log _ by _ lua phase is used to record the access volume and the statistical average response time. Currently, the first type of object storage gateway writes file metadata smaller than 512KB into kafka at this stage, so that the merge program can implement small file merge based on this, thereby improving Ceph storage space utilization.
In some embodiments of the present application, the first object storage gateway 20 embeds the lua code (a script language) mainly in two phases (i.e., two processes) of init _ worker _ by _ lua and content _ by _ lua, and extends the functions thereof, which is described below as an extended function of the two phases of init _ worker _ by _ lua and content _ by _ lua.
In the init _ worker _ by _ lua stage, the storage gateway mainly obtains information of a back-end ceph cluster from MySQL and transmits the information to the content _ by _ lua stage in a package variable mode, wherein the package variable is at the worker level and shares all requests processed by a worker.
In the content _ by _ lua phase, the storage gateway mainly analyzes each http request and allocates different access paths for different requests.
In some embodiments of the present application, in order to achieve flexible scaling of the number of instances of the first object storage gateway 20, the distributed storage system is deployed in a deployment manner (also referred to as an object). The object is used for building a POD (also an object which is the minimum unit of scheduling) by a built-in controller, so that a gateway instance is built in the development node.
In some embodiments of the present application, in the distributed storage system, a Ceph cluster is used as a storage unit during capacity expansion, that is, capacity expansion is performed in a unit of the Ceph cluster, and when a bucket is created or a deletion policy of the bucket is formulated, and operations for the bucket, such as setting of access rights and authentication processing, are performed, operation contents are broadcast to all Ceph clusters at a back end.
In some embodiments of the present application, the distributed storage system further includes a load balancing gateway 26, and the access request received by the first object storage gateway 20 is the access request forwarded from the load balancing gateway 26.
In load balancing, load balancing may be performed from the L4 or L7 layer, that is, when load balancing is performed on a server in the background, how to forward traffic may be determined according to information of four layers or seven layers of OSI (Open Systems Interconnection) information. For example, the four-layer load balancing is to issue three-layer IP addresses (VIPs), add four-layer port numbers to determine which traffic needs to be load balanced, perform NAT (Network Address Translation) processing on the traffic that needs to be processed, forward the traffic to a background server, record which server the traffic of the TCP or UDP is processed by, and forward all the traffic of the subsequent connection to the same server for processing. The seven-layer load balancing is that on the basis of the four layers, the characteristics of an application layer, such as the load balancing of the same Web server, are considered, and whether the load balancing needs to be performed or not can be determined according to the URL, the browser type and the language of the seven layers besides the fact that whether the traffic needs to be processed or not is judged according to the VIP 80 port. For example, if the Web servers are divided into two groups, one group is in chinese language and the other group is in english language, the seven-layer load balancing can automatically identify the user language when the user accesses the domain name, and then select the corresponding language server group for load balancing processing.
In some embodiments of the present application, the load balancing gateway 26 may be an L4 load balancer (Loadbalancer), wherein the L4 Loadbalancer may be laterally extended.
In some embodiments of the present application, as shown in FIG. 2, the distributed storage system described in embodiments of the present application is managed through kubernets, as compared to the distributed storage system shown in FIG. 1.
In the operating environment illustrated in fig. 2, where the present application provides a method embodiment of a data download method, it should be noted that the steps illustrated in the flowchart of the figure may be performed in a computer system, such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
Fig. 3 is a data downloading method according to an embodiment of the present invention, and as shown in fig. 3, the method is applied to a distributed storage system, where the distributed storage system includes: the first object storage gateway, the distributed storage cluster and the relational database cluster comprise the following steps:
step S302, the first object storage gateway receives an access request;
specifically, the first object storage gateway may be a load gateway composed of openness + lua.
In some embodiments of the application, before the first object storage gateway receives the access request forwarded by the load balancing gateway, the load balancing gateway also receives multiple access requests, and selects the first object storage gateway from the preset gateway list according to a preset weight, where the preset weight is a weight of each gateway in the preset gateway list, and the larger the weight is, the larger the number of access requests to be distributed for parallel processing is. That is, the load balancing gateway may also perform load balancing processing on the access request in the downloading process according to the weighted polling algorithm.
Step S304, the first object storage gateway obtains the target file identification in the access request;
in some embodiments of the present application, after the openness + lua storage gateway receives the access request, the storage gateway may parse the request, obtain the unique id of the file, and query the metadata of the file from the MySQL cluster.
Step S306, the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and queries metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file;
specifically, when the first object storage gateway receives an access request forwarded from the load balancing gateway, the storage gateway forwards the request to the RGW component of the corresponding Ceph cluster for processing.
In some embodiments of the present application, in the distributed storage system, file metadata is stored through MySQL (for storing relational data), and MySQL is extended through sharing-sphere to support metadata storage of a large number of files.
It should be noted that the sharing-sphere may interface with multiple MySQL or proxy SQL.
In some embodiments of the present application, the MySQL database table at the back end of the storage gateway mainly includes a cluster _ info table for storing Ceph cluster information and a coss _ files table for storing file metadata, where key fields of the cluster _ info table are as follows:
"cluster ' int (10) NOT NULL COMMENT ' cluster id ';
`cluster_name`varchar(64)NOT NULL COMMENT'cluster name';
'rw _ states' tinyint (2) NOT NULL DEFAULT '-1' COMMENT 'cluster readable and writable state, 1 being readable and writable, 0 being read-only, -1 being unreadable and unreadable';
"rgw _ url 'varchar (255) NOT NULL COMMENT' cluster rgw address, < ip > < port >.
The core fields of the coss _ files table have:
"cluster ' int (10) unsigned NOT NULL COMMENT ' file currently stores cluster id ';
`bucket`int(10)unsigned NOT NULL COMMENT'bucket id';
file ' varchar (255) NOT NULL COMMENT ' file unique identifier ';
file _ type ' varchar (128) NOT NULL COMMENT ' file type ';
file _ size ' binary (20) signaled NOT NULL common ' file size ';
' w _ status ' tinyint (2) NOT NULL DEFAULT '0' COMMENT ' file writing state, 0 is mark deletion; 1 is write complete; 2 is in writing; 3, updating the mark; 4, to be merged; and 5, fragment uploading is performed.
In some embodiments of the present application, when a file needs to be deleted in the distributed storage system, the file that needs to be deleted can be marked as deleted only by modifying the w _ status field of the coss _ files table stored in MySQL, and the file can be deleted uniformly by a timing task in the later period.
Step S308, the first object storage gateway determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway;
in some embodiments of the present application, a specific process of the first object storage gateway determining, from the plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster is as follows: the first object storage gateway determines a target distributed storage cluster corresponding to the metadata from a cluster information table in a target relational database; the second object storage gateway is determined from a table of metadata information in the target relational database.
In some embodiments of the present application, the cluster information table includes: cluster identification, readable and writable states of the clusters and cluster address information; the metadata information table comprises a cluster identifier of the target file, file information of the target file and a writing state of the file. The file information of the target file comprises an identification, a name, a file type, a file size and a file writing state of the target file.
In some embodiments of the present application, the readable and writable states of the cluster and the writing state of the file are determined by: a relational database in a relational database cluster responds to a modification request of a target object and modifies the readable and writable states of the cluster and the write-in state of the file; when a trigger event is detected, triggering the operation corresponding to the read-write state of the modified cluster and the write-in state of the file to be executed on the file stored by the second object storage gateway.
Specifically, when maintaining the cluster, the cluster may be marked as a non-readable-and-writable state; when the remaining storage space of the cluster is insufficient, the cluster may be set to a read-only state.
Step S310, the first object storage gateway forwards the access request to a second object storage gateway in the target distributed cluster, so as to obtain the target file from the second object storage gateway.
By adopting the scheme provided by the embodiment, because the first object storage gateway processes the access request in the file downloading process, and before the first object storage gateway processes the access request, the load balancing gateway distributes the access request to the corresponding first object storage gateway according to the preset weight, the workload of the RGW can be reduced, and the working performance of the distributed database is improved.
Example 2
In accordance with an embodiment of the present invention, there is provided a method embodiment of a data upload method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer executable instructions and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that described herein.
Fig. 4 is a data uploading method according to an embodiment of the present invention, and as shown in fig. 4, the method is applied to a distributed storage system, where the distributed storage system includes: a first object storage gateway, a distributed storage cluster, and a relational database cluster, comprising the steps of:
step S402, the first object storage gateway receives a file uploading request;
step S404, the first object storage gateway obtains metadata of a file to be uploaded corresponding to the file uploading request;
step S406, the first object storage gateway sends the metadata to the relational database cluster for storage, and sends the data to be uploaded to the second object storage gateway in the distributed storage cluster.
In some embodiments of the present application, the file upload request is multiple.
In some embodiments of the application, before sending data to be uploaded to a second object storage gateway in a distributed storage cluster, a load balancing gateway receives a plurality of file uploading requests, and then selects a first object storage gateway from a preset gateway list according to a preset weight, where the preset weight is a weight of each gateway in the preset gateway list, and the larger the weight is, the larger the number of distributed file uploading requests subjected to parallel processing is.
In some embodiments of the present application, when the upload type of the file to be uploaded is fragmented upload, the first object storage gateway acquires the first type of metadata from the file upload request; after all fragments corresponding to the file to be uploaded are uploaded, sending a request message to a second object storage gateway, and receiving second type metadata corresponding to the request message and fed back by the second object storage gateway; and when the uploading type of the file to be uploaded is uploading in a form mode, the first object storage gateway acquires the first type of metadata from the data to be uploaded.
Wherein the first type of metadata includes: the method comprises the steps of identifying a cluster corresponding to a file to be uploaded, file information of the file to be uploaded and the writing state of the file; the second type of metadata includes: the file size.
Specifically, when the data volume of the file is large and the fragment uploading is required, only the file content is in the content uploaded in the fragment, and the metadata does not exist. At this point, the file type and identification of the file can be obtained from the http request, and the size of the file needs to be obtained by sending a request to the RGW. When the uploaded file is a form uploaded through a client or an APP, the uploaded data includes metadata and data content.
In order to facilitate understanding of the file uploading step, the file uploading step is explained with reference to a specific example. When uploading a file, an upper layer application accesses an object storage gateway through a domain name bucketname, os, demo, com and analyzes the object storage gateway to a certain L4 Loadbalancer through dns; then the request is forwarded to an openness + lua storage gateway, and the storage gateway forwards the request to an RGW of a certain Ceph cluster at the rear end according to a weighted polling algorithm; and finally, the openness + lua storage gateway writes the file metadata into the MySQL cluster and responds to the client.
With the scheme provided by the embodiment, in the file uploading process, the load balancing gateway receives a plurality of file uploading requests, and then selects the first object storage gateway from the preset gateway list according to the preset weight, so that the workload of the RGW can be reduced, and the working efficiency of the distributed storage system is improved.
Example 3
According to an embodiment of the present invention, there is also provided a data downloading apparatus as shown in fig. 5, which is applied to a first object storage gateway in a distributed storage system, and is configured to execute the data downloading method as shown in fig. 2, where the data downloading apparatus includes: a first processing module 50 for receiving an access request; an obtaining module 52, configured to obtain a target file identifier in the access request; a first searching module 54, configured to determine a target relational database corresponding to the target file identifier from the relational database cluster, and query metadata of a target file corresponding to the target file identifier from the target relational database, where the metadata is description information of the target file; a second lookup module 56, configured to determine, from the plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster; and a second processing module 58, configured to forward the access request to a second object storage gateway in the target distributed cluster, so as to obtain the target file from the second object storage gateway.
Since the data download apparatus shown in fig. 5 is used to execute the data download method shown in fig. 3, the related explanation in embodiment 1 is also applicable to this embodiment.
Example 4
According to an embodiment of the present invention, there is further provided a data downloading apparatus as shown in fig. 6, which is applied to a first object storage gateway in a distributed storage system, and is configured to perform the data uploading method as shown in fig. 4, where the data uploading apparatus includes: a processing module 60, configured to receive a file upload request; the obtaining module 62 is configured to obtain metadata of a file to be uploaded corresponding to the file uploading request; and a sending module 64, configured to send the metadata to the relational database cluster for storage, and send the data to be uploaded to a second object storage gateway in the distributed storage cluster.
Since the data downloading apparatus shown in fig. 6 is used to execute the data uploading method shown in fig. 4, the related explanation in embodiment 1 is also applicable to this embodiment.
Example 5
In some embodiments of the present application, there is also provided a non-volatile storage medium, where the non-volatile storage medium includes a stored program, and when the program runs, a device in which the non-volatile storage medium is controlled to execute the following data downloading method: the first object storage gateway receives an access request; the first object storage gateway acquires a target file identifier in the access request; the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and inquires metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; the first object storage gateway determines a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed cluster from a plurality of distributed storage clusters corresponding to the first object storage gateway; the first object storage gateway forwards the access request to a second object storage gateway in the target distributed cluster to obtain the target file from the second object storage gateway.
Example 6
In some embodiments of the present application, a nonvolatile storage medium is further provided, where the nonvolatile storage medium includes a stored program, and when the program runs, a device in which the nonvolatile storage medium is controlled to execute the following data uploading method: a first object storage gateway receives a file uploading request; the first object storage gateway acquires metadata of a file to be uploaded corresponding to the file uploading request; and the first object storage gateway sends the metadata to the relational database cluster for storage, and sends the data to be uploaded to a second object storage gateway in the distributed storage cluster.
The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.
In the above embodiments of the present invention, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described in detail in a certain embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed technical content can be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or may not be executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention, which is substantially or partly contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. A data downloading method is characterized in that the method is applied to a distributed storage system, and the distributed storage system comprises: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the method comprising:
the first object storage gateway receiving an access request;
the first object storage gateway acquires a target file identifier in the access request;
the first object storage gateway determines a target relational database corresponding to the target file identifier from the relational database cluster, and inquires metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file;
the first object storage gateway determining, from a plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster;
the first object storage gateway forwards the access request to the second object storage gateway in the target distributed cluster to obtain the target file from the second object storage gateway.
2. The method of claim 1, wherein the distributed storage system comprises: a load balancing gateway; the first object storage gateway receiving an access request, comprising:
the first object storage gateway receives the access request forwarded from the load balancing gateway.
3. The method of claim 2, wherein before the first object storage gateway receives the access request forwarded from the load balancing gateway, the method further comprises:
the load balancing gateway receives a plurality of the access requests; and selecting the first object storage gateway from a preset gateway list according to a preset weight, wherein the preset weight is the weight of each gateway in the preset gateway list, and the larger the weight is, the larger the number of distributed access requests for parallel processing is.
4. The method of claim 1, wherein the first object storage gateway determining, from a plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster, comprises:
the first object storage gateway determines a target distributed storage cluster corresponding to the metadata from a cluster information table in the target relational database; determining the second object storage gateway from a table of metadata information in the target relational database.
5. The method of claim 4, wherein the cluster information table comprises at least one of: cluster identification, readable and writable states of the clusters and cluster address information; the metadata information table comprises at least one of the following: the cluster mark of the target file, the file information of the target file, the writing state of the file and the size of the file.
6. The method of claim 5, wherein the readable and writable states of the cluster and the write state of the file are determined by: a relational database in the relational database cluster responds to a modification request of a target object and modifies the readable and writable states of the cluster and the writing state of the file; and when a trigger event is detected, triggering to execute an operation corresponding to the read-write state of the modified cluster and the write state of the file on the file stored by the second object storage gateway.
7. A data uploading method is applied to a distributed storage system, and the distributed storage system comprises the following steps: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the method comprising:
the first object storage gateway receives a file uploading request;
the first object storage gateway acquires metadata of a file to be uploaded corresponding to the file uploading request;
and the first object storage gateway sends the metadata to a relational database cluster for storage, and sends the data to be uploaded to a second object storage gateway in a distributed storage cluster.
8. The method of claim 7, wherein the file upload request is plural; before sending the data to be uploaded to a second object storage gateway in the distributed storage cluster, the method further includes:
a load balancing gateway receives a plurality of file uploading requests; and selecting the first object storage gateway from a preset gateway list according to a preset weight, wherein the preset weight is the weight of each gateway in the preset gateway list, and the larger the weight is, the larger the number of distributed file uploading requests subjected to parallel processing is.
9. The method of claim 7, further comprising:
when the uploading type of the file to be uploaded is fragmented uploading, the first object storage gateway acquires first type metadata from the file uploading request; after uploading of all fragments corresponding to the file to be uploaded is completed, sending a request message to the second object storage gateway, and receiving second type metadata corresponding to the request message and fed back by the second object storage gateway;
and when the uploading type of the file to be uploaded is uploading in a form mode, acquiring the first type of metadata from the data to be uploaded.
10. The method of claim 9, wherein the first type of metadata comprises: the cluster identification corresponding to the file to be uploaded, the file information of the file to be uploaded and the writing state of the file are obtained; the second type of metadata includes: the file size.
11. A distributed storage system, comprising: the system comprises a first object storage gateway, a distributed storage cluster and a relational database cluster; wherein the content of the first and second substances,
the first object storage gateway is used for receiving an access request; acquiring a target file identifier in the access request; determining a target relational database corresponding to the target file identifier from the relational database cluster, and querying metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file; determining, from a plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster; forwarding the access request to the second object storage gateway in the target distributed cluster so as to acquire the target file from the second object storage gateway;
the relational database cluster is used for storing the metadata;
the distributed storage cluster is used for storing the target file.
12. A data downloading apparatus, applied to a first object storage gateway in a distributed storage system, the distributed storage system comprising: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the apparatus comprising:
the first processing module is used for receiving an access request;
the acquisition module is used for acquiring the target file identifier in the access request;
the first searching module is used for determining a target relational database corresponding to the target file identifier from the relational database cluster and inquiring metadata of a target file corresponding to the target file identifier from the target relational database, wherein the metadata is description information of the target file;
a second lookup module, configured to determine, from among the plurality of distributed storage clusters corresponding to the first object storage gateway, a target distributed storage cluster corresponding to the metadata and a second object storage gateway in the target distributed storage cluster;
a second processing module, configured to forward the access request to the second object storage gateway in the target distributed cluster, so as to obtain the target file from the second object storage gateway.
13. A data uploading apparatus, applied to a first object storage gateway in a distributed storage system, the distributed storage system comprising: a first object storage gateway, a distributed storage cluster, and a relational database cluster, the apparatus comprising:
the processing module is used for receiving a file uploading request;
the acquisition module is used for acquiring metadata of a file to be uploaded corresponding to the file uploading request;
and the sending module is used for sending the metadata to a relational database cluster for storage and sending the data to be uploaded to a second object storage gateway in the distributed storage cluster.
14. A non-volatile storage medium, comprising a stored program, wherein when the program runs, the non-volatile storage medium is controlled in a device to execute the data downloading method according to any one of claims 1 to 6.
CN202110809233.6A 2021-07-16 2021-07-16 Data downloading method, data uploading method and distributed storage system Pending CN115623081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110809233.6A CN115623081A (en) 2021-07-16 2021-07-16 Data downloading method, data uploading method and distributed storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110809233.6A CN115623081A (en) 2021-07-16 2021-07-16 Data downloading method, data uploading method and distributed storage system

Publications (1)

Publication Number Publication Date
CN115623081A true CN115623081A (en) 2023-01-17

Family

ID=84855042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110809233.6A Pending CN115623081A (en) 2021-07-16 2021-07-16 Data downloading method, data uploading method and distributed storage system

Country Status (1)

Country Link
CN (1) CN115623081A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737466A (en) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Backup processing method, device, system, electronic equipment and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116737466A (en) * 2023-08-15 2023-09-12 中移(苏州)软件技术有限公司 Backup processing method, device, system, electronic equipment and readable storage medium
CN116737466B (en) * 2023-08-15 2023-11-03 中移(苏州)软件技术有限公司 Backup processing method, device, system, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11061917B1 (en) Method and system for transparent database query caching
US9787780B1 (en) Method and apparatus for web based storage on-demand
CN106506587B (en) Docker mirror image downloading method based on distributed storage
US10659523B1 (en) Isolating compute clusters created for a customer
US8463867B2 (en) Distributed storage network
CN105579965B (en) Via the client guard station resources control of provider&#39;s defining interface
US8484242B1 (en) Method and system for transparent database connection pooling and query queuing
US10592475B1 (en) Consistent data storage in distributed computing systems
CA2890411C (en) System and method for managing dedicated caches
Krishnan et al. Google compute engine
JP2018518744A (en) Automatic scaling of resource instance groups within a compute cluster
US11388234B2 (en) Infrastructure for deploying a security information and event management application on a container platform
CN101924693A (en) Be used for method and system in migrating processes between virtual machines
CN112256399B (en) Docker-based Jupitter Lab multi-user remote development method and system
CN106648838B (en) Resource pool management configuration method and device
US20220342707A1 (en) Infrastructure for deploying a security information and event management application on a container platform
US9172744B2 (en) Scalable storage with programmable networks
CN105071965A (en) Management system of network equipment
US11765244B1 (en) Latency-based service discovery and routing for multi-location service-oriented applications
CN115623081A (en) Data downloading method, data uploading method and distributed storage system
CN112583760A (en) Object storage access method, device, equipment and computer storage medium
CN113885797A (en) Data storage method, device, equipment and storage medium
CN106254411A (en) For providing the system of service, server system and method
JP6607044B2 (en) Server device, distributed file system, distributed file system control method, and program
US11363113B1 (en) Dynamic micro-region formation for service provider network independent edge locations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination