CN113971162A

CN113971162A - Data access method and device

Info

Publication number: CN113971162A
Application number: CN202010710358.9A
Authority: CN
Inventors: 徐佳宏; 陈华兵; 黄金龙; 曾珂
Original assignee: Shenzhen Ipanel TV Inc
Current assignee: Shenzhen Ipanel TV Inc
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2022-01-25

Abstract

The application discloses a data access method and a device, which are applied to a storage server, wherein the storage server comprises at least two disk groups, each disk group comprises at least one disk, and the method comprises the steps that the storage server obtains a data access request sent by electronic equipment, and the data access request is used for requesting to access data resources; generating an information summary for characterizing the data resource based on the data access request; if the memory caches a storage mark corresponding to the information abstract, generating a hash value for representing the data resource based on the data access request; determining a target disk group mapped by the hash value from the at least two disk groups based on a consistent hash algorithm; searching the data resource from the target disk group; and returning the searched data resource to the electronic equipment. The scheme of the application can improve the reliability of data storage and improve the data storage performance.

Description

Data access method and device

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a data access method and apparatus.

Background

The electronic device can obtain various types of data resources such as pictures, texts, videos and the like by accessing the network server. However, the electronic device requests the data resource from the network server directly, and may need to go through multiple layers of routes, resulting in inefficient data access.

In order to improve data access efficiency, a storage server is separately arranged in many data access scenarios, for example, a storage server for caching data may be arranged in a local area network of some companies, or a storage server is deployed based on a content distribution network, and the like. Specifically, after the electronic device sends a data resource request to the storage server, if the storage server stores a data resource corresponding to the data resource request, the locally stored data resource can be directly returned to the electronic device without requesting the data resource from the network server; if the data resource does not exist in the storage server, the data resource is requested from the network server and returned to the electronic equipment, and meanwhile, the storage server stores the data resource locally.

Therefore, if the data processing and storage performance of the storage server is poor, not only the data access efficiency is affected, but also the number of times that the storage server requests the network server for the data resource may be increased, thereby causing the network resource consumption, and therefore, how to improve the data processing and storage performance of the storage server is a technical problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the present application provides a data access method and apparatus to improve data storage performance of a storage server.

In order to achieve the purpose, the application provides the following technical scheme:

in one aspect, the present application provides a data access method applied to a storage server, where the storage server includes at least two disk groups, each disk group including at least one disk, and the method includes:

obtaining a data access request sent by electronic equipment, wherein the data access request is used for requesting to access a data resource;

generating an information summary for characterizing the data resource based on the data access request;

if a storage mark corresponding to the information abstract is cached in the memory, generating a hash value for representing the data resource based on the data access request, wherein the storage mark indicates that the data resource is stored in the storage server;

determining a target disk group to which the hash value is mapped from the at least two disk groups based on a consistent hash algorithm;

searching the data resource from the target disk group;

and returning the searched data resource to the electronic equipment.

Preferably, the data access request carries a data resource name of the data resource requested to be accessed;

generating, based on the data access request, an information summary for characterizing the data resource, including:

generating an information abstract of the data access request;

or generating an information abstract of the data resource name in the data access request.

Preferably, the searching the data resource from the target disk group includes:

constructing a storage path of the data resource in the target disk group based on the information abstract and a set directory construction rule;

and searching the data resource from the target disk group according to the storage path.

Preferably, the constructing a storage path of the data resource in the target disk group based on the information summary and the set directory construction rule includes:

determining the name of the target disk group as a root directory;

constructing at least two layers of subdirectories under the root directory according to the information abstract and a multilevel directory construction rule;

and determining the information abstract as a file name in a storage path, and splicing the root directory, the at least two layers of subdirectories and the file name into the storage path.

Preferably, the disk group comprises a redundant array of independent disks consisting of at least two disks;

the searching the data resource from the target disk group comprises:

and obtaining the data resource from the redundant array of independent disks of the target disk group.

Preferably, the caching of the storage tag corresponding to the information digest in the memory includes:

and the memory caches the file updating time corresponding to the information abstract.

Preferably, the method further comprises the following steps:

if the memory does not cache the storage mark corresponding to the information abstract, requesting the data resource from a network server according to the data access request;

obtaining the data resource returned by the network server;

generating a hash value for characterizing the data resource based on the data access request;

determining, from the at least two disk groups, a disk group to which the hash value maps based on a consistent hash algorithm;

storing the data resource returned by the network server to the disk group mapped by the hash value;

caching a storage mark corresponding to the information abstract in a memory;

and sending the data resource returned by the network server to the electronic equipment.

In another aspect, the present application further provides a data access apparatus applied to a storage server, where the storage server includes at least two disk groups, each disk group includes at least one disk, and the apparatus includes:

the request obtaining unit is used for obtaining a data access request sent by the electronic equipment, and the data access request is used for requesting to access a data resource;

the abstract generating unit is used for generating an information abstract used for representing the data resource based on the data access request;

a first hash calculation unit, configured to generate, based on the data access request, a hash value used for characterizing the data resource, if a storage flag corresponding to the information digest is cached in a memory, where the storage flag indicates that the data resource is stored in the storage server;

a first disk positioning unit, configured to determine, based on a consistent hash algorithm, a target disk group to which the hash value is mapped from the at least two disk groups;

a resource searching unit, configured to search the data resource from the target disk group;

and the first resource returning unit is used for returning the searched data resource to the electronic equipment.

Preferably, the data access request obtained by the request obtaining unit carries a data resource name of the data resource requested to be accessed;

the summary generating unit is specifically configured to generate an information summary of the data access request; or generating an information abstract of the data resource name in the data access request.

Preferably, the method further comprises the following steps:

a resource request unit, configured to request the data resource from a network server according to the data access request if a storage tag corresponding to the information summary is not cached in a memory;

a resource obtaining unit, configured to obtain the data resource returned by the network server;

the second hash calculation unit is used for generating a hash value used for representing the data resource based on the data access request;

the second disk positioning unit is used for determining a disk group to which the hash value is mapped from the at least two disk groups based on a consistent hash algorithm;

a resource storage unit, configured to store the data resource returned by the network server to the disk group to which the hash value is mapped;

a storage tag cache unit, configured to cache the storage tag corresponding to the information digest in a memory;

and the second resource returning unit is used for sending the data resource returned by the network server to the electronic equipment.

As can be seen from the above, after the storage server generates the information summary corresponding to the data resource requested by the data access request based on the data access request, whether the data resource is stored locally can be analyzed by detecting whether the storage flag corresponding to the information summary is cached in the memory, which is beneficial to determining whether the data resource requested by the electronic device is stored locally with high efficiency, and improves the data processing performance of the storage server.

Moreover, the storage server stores the data resources in at least two disk groups based on the consistent hash algorithm, and can determine the target disk group for storing the data resources based on the consistent hash algorithm only by determining the hash value for representing the data resources requested by the data access request, so that the corresponding data resources can be rapidly inquired; meanwhile, even if the conditions that the newly added disk group or part of the disk groups are abnormal and the like exist, the disk groups in the normal use state cannot be influenced, the phenomenon that all data access requests are sent to a network server for processing due to the fact that the newly added disk group or the disk groups are abnormal is avoided, the reliability of data storage is improved, and the data storage performance is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on the provided drawings without creative efforts.

FIG. 1 shows a schematic flow chart diagram of one embodiment of a data access method of the present application;

FIG. 2 shows a schematic flow chart diagram of yet another embodiment of the data access method of the present application;

FIG. 3 shows a schematic diagram of various points on a hash ring in the present application;

FIG. 4 is a schematic diagram illustrating a mapping relationship between a disk group and a picture and a point on a hash ring, respectively;

FIG. 5 is a schematic diagram illustrating another mapping relationship between a disk group and a picture and a point on a hash ring, respectively;

FIG. 6 is a schematic diagram illustrating another mapping relationship between a disk group and a picture and a point on a hash ring, respectively;

FIG. 7 is a schematic diagram illustrating a mapping relationship between a disk group, a virtual disk group, and a picture and a point on a hash ring in the present application;

fig. 8 is a schematic diagram showing a configuration of an embodiment of the data access device of the present application.

Detailed Description

The embodiment of the application is suitable for any scene needing to improve the data access efficiency based on the storage server. For example, in an enterprise, in order to reduce the data access efficiency caused by the internal devices of the enterprise accessing data in servers of the external network, a storage server may be provided in the internal network of the enterprise, in which case the computer devices inside the enterprise may obtain data resources through the storage server. Of course, this is an example, and in practical applications, the same is also applicable to other scenarios requiring the storage server.

For example, referring to fig. 1, a flow chart of an embodiment of a data access method according to the present application is shown, and the method of the present embodiment may be applied to a storage server, where the storage server includes at least two disk groups, and each disk group includes at least one disk. The method of the embodiment may include:

s101, obtaining a data access request sent by the electronic equipment.

Wherein the data access request is for requesting access to a data resource. The data resource requested by the data access request may be a picture, a document, a video, and the like.

The data access request may indicate a data resource to be requested, for example, the data access request may carry a data resource name of the requested data resource, and the data resource name may be a file name, for example.

It is understood that, in order to be able to specify to which web server the data access request requests the data resource, the data access request may also carry an address of the web server, and the like.

The specific format and form of the data access request may be various, and is not limited. For convenience of understanding, for example, taking the data access request as a URL request, the URL request may include: the address of the web server and the file name of the requested resource.

And S102, generating an information summary for representing the data resource based on the data access request.

The information abstraction is a numerical value which can uniquely represent the data resource and is calculated by using an information abstraction algorithm.

In one possible implementation, the application may generate a message digest of the data access request. It is understood that the data access request may include information uniquely identifying the data resource, for example, a name of the requested data resource, an address of a network server where the data resource is located, and the like, and therefore, the information digest of the data access request is directly calculated by using an information digest algorithm, and the obtained value may represent the data resource requested by the data access request.

In the embodiment of the present application, there are many possibilities for calculating the message digest, which is not limited to this. Alternatively, the message-digest algorithm employed in the present application may be MD5(message-digest algorithm 5), and if the data access request is a URL request, the MD5 value of the URL request may be calculated.

In yet another possible scenario, the way of generating the information summary may be: and generating an information abstract corresponding to the data resource name carried in the data access request. When the data resource name carried in the data access request can uniquely represent the data resource, the value of the data resource name can be calculated by using an information abstract algorithm. As above, the message digest algorithm here may be the MD5 algorithm, or may be other message digest algorithms, which is not limited to this.

S103, if the memory stores the storage flag corresponding to the information digest, generating a hash value for representing the data resource based on the data access request.

Wherein, the storage mark indicates that the storage server stores the data resource. For example, the storage tag may be an identification character, for example, the storage tag may be "Y". Correspondingly, if the storage mark corresponding to the information abstract is cached in the memory, it indicates that the data resource corresponding to the information abstract is stored in the local disk group of the storage server.

As an alternative, in order to update or delete the locally stored data resource in time, the storage server needs to store the file update time of each data resource, which may be the time when the data resource is last saved locally in the storage device. In this case, in order to avoid the storage server from separately storing another storage tag, the file update time corresponding to the data resource may be used as the storage tag of the data resource.

Correspondingly, if the data resource is stored in the storage server, the storage server has corresponding file updating time, and on the basis, the storage server can cache the file updating time corresponding to each of the information summaries representing different data resources. Therefore, if the file update time corresponding to the information abstract is cached in the memory, it indicates that the data resource corresponding to the information abstract is locally stored.

It can be understood that after the information abstract is calculated based on the data access request, according to whether the storage mark corresponding to the information abstract is cached in the memory, whether the storage server locally stores the corresponding data resource can be quickly inquired, so that the situation that whether the data resource is stored or not is judged by directly inquiring the disk group is avoided, and the data access efficiency is improved.

In addition, compared with the data access request or the data resource name in the data access request, the data volume of the generated information abstract is smaller, and the occupied memory space is also smaller.

In order to efficiently locate whether a storage server locally stores a certain data resource, the inventor of the present application also thinks that an information summary for representing the data resource and attribute information of the data resource may be cached in a memory, for example, the attribute information may include related information such as a storage path of the data resource. But the storage space required to store the attributes of the data resources is large. And as the number of data resources required to be stored by the storage server increases, the number of resource entries required to be cached by the storage server also increases gradually. If the memory of the storage server is insufficient, the storage server will reduce the amount of stored data resources, which inevitably results in an increase in the number of access requests that need to be sent to the backend network server concurrently, and thus the data access efficiency is affected.

For example, taking a data access request as an URL request as an example, and taking an information summary and attribute information of a data resource as a key and a key value respectively as examples, then the URL request in one key value pair is used as a key, the key value is a storage path of the data resource and related information, then the number of bytes occupied by each key value pair is 256B bytes, then 4000 key value pairs can be stored in 1MB space, and then 400w entries can be stored in 1GB space. In the case of a demand for storing 1 hundred million entries, 25GB of memory is required for 1 hundred million 256 bytes, which is not satisfied by the memory resources of the storage server.

In the application, the storage space occupied by the storage tag is small, for example, the storage tag is file update time, each storage tag only needs to occupy 4 bytes, an information summary generated based on a URL request only occupies 16 bytes, and key-value pairs formed by the information summary and the storage tag occupy 20 bytes in total, so that 1 hundred million key-value pairs are stored, and it is probably enough to use only 2G space, so that the efficiency of determining whether corresponding data resources exist locally is ensured, that is, the access efficiency is improved, and the problem that the amount of stored data resources is small due to the fact that a storage server cannot cache a large number of key-value pairs can be avoided.

The hash value is a value obtained by performing a hash operation based on the data access request. Similar to the generated information abstract, the hash operation can be performed on the data access request to obtain a hash value for representing the data resource requested to be accessed by the data access request; or, performing hash operation on the data resource name carried in the data access request to obtain the hash value.

S104, based on the consistent hash algorithm, determining the target disk group to which the hash value is mapped from the at least two disk groups.

In an embodiment of the present application, the storage server stores data resources into at least two disk groups based on a consistent hashing algorithm. On this basis, each disk group of the storage server actually corresponds to a value range, each value range comprises at least one value, and each value is an integer. The value ranges corresponding to different disk groups in the storage server are different, and all the value ranges formed by the at least two disk groups actually form a hash ring. The hash ring is a circle consisting of 32 power points of 2, and each point on the circle may correspond to a numerical value. The specific process of the storage server storing the data resource into the disk group based on the consistent hash algorithm will be described in detail later, and will not be described herein again.

It can be understood that, since each disk group corresponds to one numerical range, after the hash value corresponding to the data resource to be accessed is calculated, the numerical range to which the hash value is mapped may be determined according to the numerical range corresponding to each disk group in the storage server, so as to obtain the disk group corresponding to the determined numerical range, and for convenience of distinguishing, the determined disk group is referred to as a target disk group.

As an implementation, for a hash value characterizing a data resource, the hash value may be paired with 2³²And performing modulus operation to obtain a remainder after the modulus operation. Wherein the remainder obtained by the modulus operation is from 0 to 2³²An integer of-1, such that the remainder belongs to a point on the hash ring, and a point on the hash ring belongs to a value range corresponding to a disk group in the storage server, such that the disk group whose value range includes the remainder can be determined as the target disk group.

S105, searching the data resource from the target disk group.

There are various ways to find the data resource.

For example, in one possible implementation, the data resource may be queried across the target disk group.

In order to improve the query efficiency, in another possible implementation manner, the storage path of the data resource may be determined first.

Specifically, a storage path of the data resource in the target disk group may be constructed based on the information summary generated in step S102 and a preset directory construction rule. Then, according to the constructed storage path, the data resource can be directly searched from the target disk group.

The directory construction rule is a rule for determining a storage path for storing the data resource in the disk group when the storage server stores the data resource in any disk group. Therefore, after the target disk group where the data resource to be accessed is located is determined, a storage path for characterizing the storage address of the data resource stored in the target disk group can be constructed based on the directory construction rule.

The target construction rule may include a specific manner of converting the information summary into the storage path. For example, the target building directory may set the message digest as a file name, and at least one of the message digests sets a character combination as a subdirectory, so that the subdirectory and the file name constitute a storage path for locating data resources under the target disk group.

In one possible implementation, in order to solve the problem that the operating system does not support due to too many files under the directories, for example, the upper limit of the number of files stored under each directory in the distributed file system is 1048576; meanwhile, when the number of files in a single directory is too many, the file previewing and file searching are troublesome and time consuming, so that the multi-level directory can be adopted in one disk. Accordingly, the directory building rule may be a multi-level directory building rule. On this basis, the name of the target disk group can be determined as the root directory. Then, according to the information abstract and the multilevel directory construction rule, constructing at least two layers of subdirectories under the root directory; and determining the information abstract as a file name in a storage path, and splicing the root directory, at least two layers of subdirectories and the file name into the storage path.

The multi-level directory construction rule may include which part of the information abstract is which layer of sub-directory, and the like, so that the sub-directory of each level can be determined according to the information abstract and the multi-level directory construction rule.

For ease of understanding, one scenario is illustrated:

taking the three-level directory storage structure as an example, the directory construction rule is a three-level directory construction rule. Taking the information summary as the MD5 value of the data access request as an example, assume that the MD5 value is 00004178329.

The three-level directory building rules may include: the MD5 value is used as a file name, the last bit of the message digest is a first-layer subdirectory, the 9 th to 10 th bits of the message digest are a second-layer subdirectory, and the 6 th to 8 th bits of the MD5 value are a third-layer subdirectory.

Then assume that the root directory corresponding to the target disk group located based on the consistent hashing algorithm is/mnt/hdisk 1, and the first-layer subdirectory is 9; the second level sub-directory is 32, the third level sub-directory is 178, and the MD5 represents a file name 00004178329. The memory path resulting from these several splices can be expressed as: and/mnt/hdisk 1/9/32/178/00004178329.

Of course, the above is merely an example, and in practical applications, there may be other directory building rules and other possibilities, which are not limited to this.

In the application, the data resources are stored by adopting a multilevel directory structure, so that the requirement of larger data storage can be met.

For example, the directory structure of the storage file is a two-layer structure, and the number of occupied bits of the directory number of each layer is 4 bits and 8 bits respectively. Then the primary directory is 0-f representing 16 levels and the secondary directory is 00-ff representing 256 levels, then a total of 16 x 16-4096 file directories may be generated.

As an optional manner, when the data resources required to be stored are more, the data resources may be stored by using a three-level directory structure, and correspondingly, the multi-level directory construction rule is a three-level directory construction rule.

In the case of a tertiary directory to store data resources, the primary directory names 0-f represent 16 levels, the secondary directory names 0-ff represent 256 levels, and the tertiary directory names 0-fff represent 4096 levels. The total number of directories is 16 × 256 × 4096 ═ 16777216 ═ 16M file directories, stored in billions of files (i.e., data resources). The number of the data resources stored in each directory is 8 powers/16M, which is 6, of 10, so that the content and related information of the data resources can be confirmed greatly conveniently.

As an alternative, in order to improve the security and reliability of the data resources stored in the storage server, the disk group in the present application may include a Redundant Array of Independent Disks (RAID) composed of at least two Disks. The specific type of RAID in the disk groups may be set according to needs, for example, RAID5 is included in each disk group. Of course, the set of disks may include at least one spare disk in addition to the RAID. Accordingly, after a disk failure or anomaly in a RAID, replacement may be performed with a spare disk

Alternatively, each disk group may include at least 8 disks, where 7 disks form a 6+1 RAID5 array, and an extra disk serves as a spare disk.

RAID5 stores data and corresponding parity information on the various disks making up RAID5, and the parity information and corresponding data are stored separately on different disks. On this basis, when one disk of the RAID5 is damaged, the data in the disk can still be recovered by using the other disks in the RAID5, so that the integrity of the data is not affected, and the data security is ensured. Accordingly, when the damaged disk is replaced, the RAID automatically uses the remaining parity information to reconstruct the data on that disk to maintain the high reliability of RAID 5. Therefore, data redundancy can be realized based on RAID5, the safety of data is ensured, and meanwhile, RAID5 can also improve the read-write performance of data.

On the basis that the disk group comprises RAID, the data resource is actually inquired and obtained from the redundant array of independent disks of the target disk group.

And S106, returning the searched data resource to the electronic equipment.

Therefore, after the storage server generates the information abstract corresponding to the data resource requested by the data access request based on the data access request, whether the data resource is stored locally can be analyzed by detecting whether the storage mark corresponding to the information abstract is cached in the memory, so that whether the data resource requested by the electronic equipment is stored locally can be determined more efficiently, and the data processing performance of the storage server is improved. Meanwhile, as can be seen from the above description, the space occupied by the information abstract and the storage identifier is small, which is beneficial to caching storage marks of a larger number of data resources in the memory, and is further beneficial to avoiding the situation that the storage server cannot store more data resources because the memory cannot cache resource entries of the data resources.

Referring to fig. 2, which shows a schematic flow chart of another embodiment of the data access method of the present application, the method of the present embodiment may be applied to a storage server, where the storage server includes at least two disk groups, and each disk group includes at least one disk. The method of the embodiment may include:

s201, obtaining a data access request sent by the electronic equipment.

S202, based on the data access request, generating an information summary for representing the data resource.

S203, inquiring whether a storage mark corresponding to the information abstract is cached in a memory, and if not, executing the step S204; if so, step S211 is performed.

And S204, if the memory does not cache the storage mark corresponding to the information abstract, requesting the data resource from the network server according to the data access request.

For example, the storage server may forward the data access request to the web server to which the data access request is directed, e.g., forward the data access request to the web server based on the address of the web server in the data access request.

Of course, the storage server may also repackage the access request for requesting the data resource according to the data access request and send the access request to the corresponding network server.

S205, the data resource returned by the network server is obtained.

S206, based on the data access request, generating a hash value for characterizing the data resource.

The specific implementation manner of generating the hash value in step S206 may refer to the related description in step S103, for example, calculating the hash value of the data access request, and specifically refer to the process of generating the hash value corresponding to the data resource, which is not described herein again.

S207, based on the consistent hash algorithm, determining the disk group to which the hash value is mapped from the at least two disk groups.

Here, the storage server determines the disk group to which the hash value obtained in the step S206 is mapped based on the consistent hash algorithm, and actually determines the disk group to which the data resource needs to be stored.

In practical application, for convenience of distinguishing, the target disk group where the determined storage resource is located when the data resource is read in the front can be called a first disk group; and when the data resource is stored, the determined disk group for storing the data resource is called a second disk group. It is understood that in the case of requesting the same data resource from the network server (e.g., the same data resource belonging to the same network server), the determined first disk group and the determined second disk group should be the same disk group.

The process of determining the disk group to which the hash value is mapped is the same as the previous step S104. For example, after obtaining the hash value in step S206, the hash value may be added to 2³²And performing modulus operation to obtain the remainder of the modulus operation. Then, according to the value range on the hash ring corresponding to different disk groups, the disk group whose value range includes the remainder, i.e. the disk group to which the hash value is mapped, is determined.

And S208, storing the data resource returned by the network server to the disk group mapped by the hash value.

For easy understanding, a specific process of the storage server for storing the data resources returned by the network server to the disk groups based on the consistent hash algorithm and the benefits of storing the data resources based on the consistent hash algorithm are described below by taking an example in which the storage server has three disk groups.

To facilitate understanding of the benefits of storing data resources into a disk group based on a consistent hashing algorithm, two common data resource storage methods are introduced first.

Suppose that the three groups of disk groups are labeled disk group a, disk group B, and disk group C, respectively, and there are currently thirty thousand pictures to be stored in these three disk groups.

Then the first way that can be used is: there is no regularity in storing 3 pictures on average over 3 disk packs. Then, when we need to access a certain picture, 3 disk groups need to be traversed, and the picture that we need to access is found from 3 ten thousand pictures, the efficiency of the traversing process is too low, and the time is too long, so that the effect of improving the access efficiency of the data resource by using the storage server cannot be achieved.

There is also a second mode: suppose we use picture names as key, suppose that picture names are non-repeating. Then, after the hash operation is performed on the picture name to obtain the hash value of the picture name, the hash value is used to take the remainder of the total number of the disk groups (in this example, the total number is 3), and then the remainder is necessarily 0, 1, or 2. When any one picture needs to be accessed, the operation is performed on the picture name again, so that which disk group the corresponding picture should be stored in can be obtained, and the picture can be searched on the disk group.

Although traversing all disk groups can be avoided in the second manner, when the number of disk groups changes, since the total number of disk groups changes (e.g., from 3 to 4), for any picture name, the result of the hash value converted by the picture name remaining the total number of disk groups also changes, so that the disk groups (i.e., the storage positions of all pictures) to which all pictures need to be stored need to change. That is to say, after the number of the disk groups changes, all the data resources cached in the storage server are in a failure state for a period of time, and the cache of a large number of data resources in the same period of time fails, which inevitably results in a large number of concurrent accesses to the back-end network server, thereby increasing the pressure of the back-end network server in a short time and easily causing a failure.

In order to avoid failure of a large number of pictures stored in even all storage servers in a short time, the application provides a data resource stored in three disk groups based on a consistent hash algorithm.

And storing the data resources based on the consistent hash algorithm, wherein a mapping relation between a storage server and a hash ring needs to be established. As shown in FIG. 3, the point directly above the ring represents a 0, the first point to the right of the 0 point represents a1, and so on, 2, 3, 4, 5, 6 … … through 2³²-1, i.e. the first point to the left of point 0 represents 2^32-1, this is represented by 2³²The circle of dots is called a hash ring.

The method and the device can obtain the hash value corresponding to the disk group by performing hash operation on the path name of the disk group; and then respectively carrying out hash value pair 2 on the disk groups³²Taking a module, and comparing the hash value of the disk group with 2³²The result obtained by taking the modulus is from 0 to 2³²And the integer necessarily corresponds to a point on the hash ring, thereby mapping the three disk groups onto the hash ring, respectively. As shown in FIG. 4, A represents the point where the hash value of disk group A is modulo, and similarly, B and C represent the points where disk groups B and C map onto the hash ring. Of course, one way to establish the mapping relationship between the disk group and the hash ring is used here, and the same is also applicable to other ways.

After the picture to be stored is obtained from the network server at the back end, the hash operation can be carried out on the picture name of the picture to obtain the hash value corresponding to the picture, and then the hash value of the picture is compared with 2³²And (4) taking a modulus, wherein the modulus obtained result also corresponds to one point on the hash ring. Such as hash value pair 2 for picture 1³²The result of the modulo is shown by circle 1 in fig. 4.

Based on the consistent hash algorithm, the principle of determining which disk group the hash value of the picture should be stored in is as follows: starting from the point on the hash ring corresponding to the picture, the first disk group encountered in the clockwise direction is the target disk group that needs to store the picture, as shown in fig. 4, the picture 1 needs to be stored in the disk group a.

As can be seen from fig. 4, the corresponding value range of the disk group a on the hash ring actually includes the value of the point C plus 1, the point a, and the integer represented by the point C plus 1 (the hash value of the disk group C is 2 for the value pair of 2)³²The modulo result from the modulo addition of 1) to each integer between the integers represented by point a. Similarly, disk set B corresponds to a range of values from a +1 to B, and disk set C is similar. Since the modulo result of the hash value of the picture is between the point C and the point a, that is, in the value range corresponding to the disk group a, the picture needs to be stored in the disk group a.

Similarly, if the hash value pair of picture 2 is 2³²If the modulo result obtained by modulo is located between a and B, the picture 2 needs to be stored in the disk group B, as shown by circle 2 in fig. 4; correspondingly, the hash value pair 2 of picture 3³²The result of the modulo operation is located between the points B and C, and the picture 3 is stored in the disk group C, as indicated by the circle 3 in fig. 4.

Since the name of the picture is fixed, the hash value determined based on the name of the picture is also fixed, and the disk group to which the hash value is mapped is also fixed, when the picture is subsequently read from the storage server, the disk group to which the hash value corresponding to the picture is mapped can be determined based on a consistent hash algorithm, and the picture is read from the corresponding disk group.

As can be seen from fig. 4, assuming that the disk group B fails and needs to be removed, the point B that maps the disk group B of the above figure onto the hash ring is removed from the hash ring, so that fig. 5 can be obtained. As can be seen from comparison between fig. 4 and fig. 5, before the disk group B is removed, the image 2 whose modulo result is the point corresponding to the circle 2 needs to be stored in the disk group B, but after the disk group B is removed, the image 2 needs to be stored in the disk group C, and the storage positions of the image 1 and the image 3 are not changed, so that even if there is a change in the number of disk groups, the data resources stored in all the disk groups are not invalidated, but only a part of the data resources are invalidated, so that the access requests of all the images cannot be forwarded to the network server at the back end, and the pressure of the network server at the back end is reduced.

It is understood that, in fig. 4, the points mapped onto the hash ring by the disk group a, the disk group B, and the disk group C are uniformly distributed, but in practical applications, there may be a plurality of disk groups with more concentrated points mapped onto the hash ring, so that the amount of data resources stored in some disk groups may be too large, resulting in non-uniform distribution of data resources.

For example, referring to fig. 6, the distance between the position points mapped on the hash ring by the disk groups A, B and C is short, and on this basis, the points mapped on the hash ring after modulo hash values of the

respective pictures

1, 2, 3, 4 and 6 are the positions of the

circle

1, 2, 3, 4 and 6. As can be seen from fig. 6, based on the consistent hash algorithm, the

pictures

1, 2, 3, 4 and 6 should be stored to the disk group a. Accordingly, only picture 5 mapped to circle 5 would be stored to disk B, resulting in pictures that are not stored uniformly across the three disk groups. Therefore, in an extreme case, it may happen that data resources stored by some disk groups are too large, so that once the disk group fails, the data resources with cache failure will reach a maximum value, causing too much stress on the network server at the back end.

In order to improve the uniformity of data resource distribution, at least one virtual disk group can be constructed for each disk group in the process of storing the data resources based on the consistent hash algorithm. The more virtual disk groups are virtualized by each disk group, the greater the probability that data resources are uniformly distributed in each disk group is. The mapping position of the virtual disk group corresponding to each disk group on the hash ring is different from the mapping position of the disk group on the hash ring, and the data resource required to be stored to the virtual disk group is actually stored to the disk group corresponding to the virtual disk group.

After virtualizing at least one virtual disk group corresponding to each disk group, the numerical range corresponding to the virtual disk group corresponding to the disk group can be classified into the numerical range of the disk group, and accordingly, storage service is providedThe device may maintain the value range corresponding to each disk group, for example, maintain a lookup table, which may include the value range corresponding to each disk group. Correspondingly, the hash value pair corresponding to the data resource can be 2³²And taking a modulus, and then determining the disk group with the numerical range containing the modulus result, wherein the disk group can be determined as the disk group needing to store the data resource.

As shown in fig. 7, a schematic diagram of the points on the hash ring to which the set of virtual disks is mapped is shown.

In fig. 7, disk group a corresponds to position point a mapped onto the hash ring and position point a1 corresponding to virtual disk group a1 corresponding to disk group a, and similarly, position point B1 is the position point where virtual disk group B1 corresponding to disk group B is mapped onto the hash ring, and position point C1 is the position point where virtual disk group C1 corresponding to disk group C is mapped onto the hash ring.

On this basis, since picture 2 corresponding to circle 2 should be stored to virtual disk group C1, while virtual disk group C1 is the virtual disk group to which disk group C corresponds, picture 2 is actually stored to disk group C. Correspondingly, the picture 6 corresponding to the circle 6 is also stored in the disk group C, the picture 4 corresponding to the circle 4 is stored in the disk group B, and the picture 3 corresponding to the circle 3 is stored in the disk group a, so that the situation that a large number of pictures are stored in the same disk group is avoided, and the distribution uniformity of picture storage is improved.

As an alternative, in a case that each disk group includes a RAID formed by at least two disks, after determining the disk group to which the hash value corresponding to the data resource is mapped, the storage server stores the data resource in the RAID of the disk group according to the RAID mode in the disk group. If the disk group includes a redundant array of independent disks using RAID5, 6 data bits and 1 check bit of the data resource are determined based on RAID5 and stored in the redundant array of independent disks, respectively.

In order to facilitate the subsequent quick query of the data resource from the disk group, the method can also determine the storage path of the data resource in the disk group based on the information abstract corresponding to the data resource and the set directory construction rule; and then, storing the data resource into the disk group according to the storage path, so that the storage path of the data resource in the disk group can be determined more quickly and the data resource can be inquired out based on the information abstract corresponding to the data resource and the target construction rule.

The specific process of determining the storage path of the data resource in the disk group based on the information summary corresponding to the data resource and the set directory construction rule is similar to the process of determining the storage path when querying the data resource, and is not repeated here.

Meanwhile, similar to the foregoing, in order to meet the data storage requirement, the present application may employ a multi-level directory structure to store the data resources, and therefore, the directory construction rule may be a multi-level directory construction rule, such as a three-level directory construction rule, and the like, and refer to the foregoing related description as well.

S209, cache the storage tag corresponding to the message digest in the memory.

As before, there are many possibilities for storing the form of the indicia.

For example, the information summary and the storage time of the data resource to the disk group, i.e. the file update time of the data resource, are cached in the memory.

And S210, returning the data resource to the electronic equipment.

The sequence of step S209 is not limited to that shown in fig. 2, and in practical applications, the data resource may be returned to the electronic device during the process of executing steps S204 to S209.

S211, if the memory stores the storage flag corresponding to the information digest, generating a hash value for representing the data resource based on the data access request.

S212, based on the consistent hash algorithm, determining a target disk group to which the hash value is mapped from the at least two disk groups.

And S213, returning the data resource inquired from the target disk group to the electronic equipment.

For reference, the descriptions of S211 to S213 may refer to the related descriptions of the previous embodiments, and are not repeated herein.

As can be seen from the embodiment of fig. 2, in the present application, under the condition that a certain data resource is not stored locally in a storage server, the data resource is obtained from a network server, and the data resource is returned to an electronic device, and is stored in a disk group based on a consistent hash algorithm, so that different data resources are uniformly distributed in each disk group, the number of data resources whose positions are invalid due to a failure of the disk group is reduced, and the reliability of data resource storage is improved.

It can be understood that, in the embodiment of the present application, both the disk groups and the memory space of the storage server are limited, and in order to more effectively utilize the storage space, the present application may also delete the storage resource periodically or aperiodically, for example, some data resources with lower access frequency may be deleted when the remaining space of the storage space is smaller than the space threshold.

Specifically, under the condition of the information summary of the data resource cached in the memory and the file update time, the data resource whose duration between the file update time and the current time exceeds the duration threshold may be deleted from the disk group, and the corresponding relationship between the information summary corresponding to the data resource cached in the memory and the file update time may be deleted.

The application also provides a data access device corresponding to the data access method. As shown in fig. 8, which shows a schematic structural diagram of an embodiment of a data access apparatus according to the present application, the apparatus of the present embodiment may be applied to a storage server, where the storage server includes at least two disk groups, and each disk group includes at least one disk, and the apparatus includes:

a request obtaining unit 801, configured to obtain a data access request sent by an electronic device, where the data access request is used to request to access a data resource;

a summary generating unit 802, configured to generate an information summary for characterizing the data resource based on the data access request;

a first hash calculation unit 803, configured to generate, based on the data access request, a hash value used for characterizing the data resource, if a storage flag corresponding to the information digest is cached in a memory, where the storage flag indicates that the data resource is stored in the storage server;

a first disk locating unit 804, configured to determine, based on a consistent hash algorithm, a target disk group to which the hash value is mapped from the at least two disk groups;

a resource searching unit 805, configured to search the data resource from the target disk group;

a first resource returning unit 806, configured to return the found data resource to the electronic device.

In a possible case, the data access request obtained by the request obtaining unit carries a data resource name of the data resource requested to be accessed;

In yet another possible scenario, the resource lookup unit includes:

the path construction subunit is configured to construct a storage path of the data resource in the target disk group based on the information summary and a set directory construction rule;

and the resource searching subunit is used for searching the data resource from the target disk group according to the storage path.

Optionally, the path building subunit includes:

a root directory construction subunit, configured to determine the name of the target disk group as a root directory;

the subdirectory construction subunit is used for constructing at least two layers of subdirectories under the root directory according to the information abstract and the multilevel directory construction rule;

and the path splicing subunit is used for determining the information abstract as a file name in a storage path and splicing the root directory, the at least two layers of subdirectories and the file name into the storage path.

In one possible implementation, the disk group includes a redundant array of independent disks made up of at least two disks;

the resource searching unit is specifically configured to obtain the data resource from the redundant array of independent disks of the target disk group.

In one possible implementation, the apparatus may further include:

It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other, and different embodiments may be combined with each other. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims

1. A data access method applied to a storage server, the storage server including at least two disk groups, each disk group including at least one disk, the method comprising:

searching the data resource from the target disk group;

and returning the searched data resource to the electronic equipment.

2. The method of claim 1, wherein the data access request carries a data resource name of the data resource requested to be accessed;

generating an information abstract of the data access request;

3. The method of claim 1, wherein the searching the data resource from the target disk group comprises:

4. The method according to claim 3, wherein the constructing a storage path of the data resource in the target disk group based on the information summary and the set directory construction rule comprises:

determining the name of the target disk group as a root directory;

5. The method of claim 1, wherein the disk group comprises a redundant array of independent disks of at least two disks;

the searching the data resource from the target disk group comprises:

6. The method of claim 1, wherein caching the storage tag corresponding to the message digest in the memory comprises:

7. The method of any of claims 1 to 6, further comprising:

obtaining the data resource returned by the network server;

caching a storage mark corresponding to the information abstract in a memory;

8. A data access apparatus, applied to a storage server, the storage server including at least two disk groups, each disk group including at least one disk, the apparatus comprising:

9. The apparatus according to claim 8, wherein the data access request obtained by the request obtaining unit carries a data resource name of the data resource requested to be accessed;

10. The apparatus of claim 8 or 9, further comprising: