CN112486915A - Data storage method and device - Google Patents

Data storage method and device Download PDF

Info

Publication number
CN112486915A
CN112486915A CN202011511668.4A CN202011511668A CN112486915A CN 112486915 A CN112486915 A CN 112486915A CN 202011511668 A CN202011511668 A CN 202011511668A CN 112486915 A CN112486915 A CN 112486915A
Authority
CN
China
Prior art keywords
data
stored
storage
index
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011511668.4A
Other languages
Chinese (zh)
Other versions
CN112486915B (en
Inventor
郑志升
李天烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202011511668.4A priority Critical patent/CN112486915B/en
Publication of CN112486915A publication Critical patent/CN112486915A/en
Application granted granted Critical
Publication of CN112486915B publication Critical patent/CN112486915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/137Hash-based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data storage method, which comprises the following steps: acquiring data to be stored; judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type; creating a storage index file of the data to be stored in a data bucket according to the determined index mode; and storing the data to be stored into a corresponding storage file according to the storage index file. The method and the device can reduce the time spent on updating the data.

Description

Data storage method and device
Technical Field
The embodiment of the application relates to the technical field of data processing, in particular to a data storage method and device.
Background
With the rapid development of network technology, the amount of various types of data is suddenly increased, and in the face of the suddenly increased data, different ground distributions are required for different types of data, for example, the data needs to be scattered and then stored in different data storage containers. In the prior art, when storing data into the HUDI database, an index for storing data is generally established according to data time, and then the data is stored according to the established index. However, the inventor finds that, in the prior art, when creating an index, the message digest calculation is performed on data, then the hash is performed according to the obtained message digest, and then the index of the data is stored into a different data bucket (bucket) according to the hash value. For example, when storing comment data in a comment scene, if 1000 pieces of comment data need to be stored in the current latest 1 minute, during storage, indexes of the 1000 pieces of comment data are created first, and then the indexes are evenly distributed and stored in 1000 buckets. Thus, when data is updated, all the buckets need to be read, and HFile (file) in each bucket needs to be read, an index of the data is found, and then the data is found according to the index and updated. With the index created in the above manner, when updating multiple pieces of data, multiple packets need to be read to read HFile to find each piece of data, which results in a long time for updating the data.
Disclosure of Invention
An embodiment of the present application provides a data storage method, an apparatus, a computer device, and a computer-readable storage medium, which can solve the problem that it takes a long time to update data in the prior art.
One aspect of an embodiment of the present application provides a data storage method, where the method includes:
acquiring data to be stored;
judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type;
creating a storage index file of the data to be stored in a data bucket according to the determined index mode;
and storing the data to be stored into a corresponding storage file according to the storage index file.
Optionally, the determining the type of the data to be stored and determining the index mode of the data to be stored according to the determined type includes:
judging the type of the data to be stored to determine a preset scene corresponding to the data to be stored;
and determining the index mode of the data to be stored according to the preset scene information.
Optionally, the determining, according to the preset scene information, an index manner of the data to be stored includes:
when the type of the data to be stored is judged to be data of a first preset scene, determining the indexing mode of the data to be stored as indexing based on the timestamp of the data to be stored;
the creating of the storage index file of the data to be stored in the data bucket according to the determined index mode comprises:
determining a data bucket for storing the storage index file according to the creation time stamp of the data to be stored;
and creating the storage index file in the determined data bucket, wherein the storage index file comprises a storage path of the data to be stored and file information of the data to be stored.
Optionally, the determining, according to the creation timestamp of the data to be stored, a data bucket storing the storage index file includes:
calculating a message digest of the creation timestamp;
and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
Optionally, the determining, according to the preset scene information, an index manner of the data to be stored includes:
when the storage type of the data to be stored is judged to be data of a second preset scene, determining the indexing mode of the data to be stored as indexing based on user identification and a time interval;
the creating of the storage index file of the data to be stored in the data bucket according to the determined index mode comprises:
determining a data bucket for storing the storage index file according to the user identification corresponding to the data to be stored;
and creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored, wherein the storage index file comprises a storage path of the data to be stored and file information stored by the data to be stored.
Optionally, the creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored includes:
and creating different storage index files in the determined data bucket according to different time intervals corresponding to the data to be stored.
Optionally, the determining, according to the user identifier corresponding to the data to be stored, a data bucket storing the index file includes:
calculating the message abstract of the user identification;
and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
Optionally, the data storage method further includes:
when a data updating instruction is received, determining a storage index file of data to be updated according to the data updating instruction;
determining a storage path and storage file information of the data to be updated according to the determined storage index file;
and reading the data to be updated according to the storage path and the storage file information of the data to be updated, and updating the data to be updated.
Yet another aspect of an embodiment of the present application provides a data storage apparatus including:
the acquisition module is used for acquiring data to be stored;
the judging module is used for judging the type of the data to be stored and determining the index mode of the data to be stored according to the judged type;
the creating module is used for creating a storage index file of the data to be stored in a data bucket according to the determined index mode;
and the storage module is used for storing the data to be stored into the corresponding storage file according to the storage index file.
Yet another aspect of embodiments of the present application provides a computer device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the data storage method as described in any one of the above when executing the computer program.
Yet another aspect of embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of the data storage method according to any one of the above.
According to the data storage method, the data storage system, the computer equipment and the computer readable storage medium, data to be stored are obtained; judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type; creating a storage index file of the data to be stored in a data bucket according to the determined index mode; and storing the data to be stored into a corresponding storage file according to the storage index file. According to the method and the device, the index can be created in different modes according to different types of data when the index is created, so that the created index is matched with the data type, the storage position of the data can be conveniently found according to the created index, the data can be conveniently updated subsequently, and the time spent on updating the data is reduced.
Drawings
Fig. 1 schematically shows a schematic diagram of a data transmission system implementing a data storage method of an embodiment of the present application;
FIG. 2 schematically illustrates a flow diagram of a data storage method according to an embodiment of the present application;
FIG. 3 is a flow chart that schematically illustrates an embodiment of a detailed process of creating a storage index file of the data to be stored in a data bucket according to a determined indexing manner;
FIG. 4 is a flowchart illustrating a detailed process of determining a data bucket for storing the storage index file according to the creation timestamp of the data to be stored;
FIG. 5 is a flowchart schematically illustrating a detailed step of creating a storage index file of the data to be stored in a data bucket according to a determined indexing manner according to another embodiment;
FIG. 6 is a flowchart illustrating a detailed process of determining a data bucket for storing the index file according to the user identifier corresponding to the data to be stored;
FIG. 7 schematically illustrates a flow chart of a data storage method of another embodiment;
FIG. 8 schematically illustrates a block diagram of a data storage device according to an embodiment of the present application; and
fig. 9 schematically shows a hardware architecture diagram of a computer device suitable for implementing the data storage method according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the description relating to "first", "second", etc. in the present invention is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present invention.
Fig. 1 schematically shows a schematic diagram of a data transmission system implementing an embodiment of the present application, which may be composed of the following parts: data source layer, network routing layer 2, data buffer layer 3, data distribution layer 4, data storage layer 3, etc.
The data source layer can comprise an internal data source and can also be a data interface connected with an external data source. The data source layer may have data in multiple formats, for example, the reported data of APP and Web are data in HTTP (HyperText Transfer Protocol), and the internal communication data of the server is data in RPC (Remote Procedure Call) format. As shown in fig. 1, the data of the data source layer may be Log data reported by the mobile terminal and received by one or more edge nodes, or may be data provided by various systems or devices, such as a database (e.g., Mysql), a Log Agent (Log Agent), and the like.
Via the gateway and messaging system, the data source layer may transmit data to Collector 2. Wherein:
and the gateway is used for forwarding the data provided by the data source layer to the message system. The gateway may be adapted to a variety of different service scenarios and data protocols, such as APP and Web data configured for compatible parsing of the HTTP (HyperText Transfer Protocol) Protocol, and intercom data of the GRPC Protocol.
And the message system can be composed of one or more Kafka clusters and is used for publishing the data in the data source layer to the corresponding subject. Data with different importance, priority and data throughput can be distributed to different kafka clusters, so that the value of different types of data is guaranteed, and the influence of system faults on the whole data is avoided.
And the Collector 2 is a streaming distribution node based on Flink. The Collector 2 may consume data through a corresponding theme of the message system and convert and distribute the data for storage, that is, guarantee that the data is obtained from the message system and written into a corresponding storage terminal in the data storage layer 3, for example, HDFS, Kafka, Hbase, ES (elastic search), and the like.
The data storage layer 3, which is used to store data, may be composed of different forms of databases. The data storage layer 3 may be a HUDI (apache HUDI) database, wherein the HUDI database may be used to manage large analytics data sets stored by DFS (HDFS or cloud storage), which support update operations in the current data table.
Example one
Fig. 2 schematically shows a flowchart of a data storage method according to a first embodiment of the present application. The data storage method may be applied to a HUDI database, and it should be understood that the flow chart in the embodiment of the method is not used to limit the order of executing the steps. The following description is made by taking a data storage device as an execution subject. As shown in fig. 2, the data storage method may include steps S20 to S23, in which:
step S20, data to be stored is acquired.
Specifically, the data to be stored may be a piece of data or a batch of data, and in this embodiment, the data to be stored is preferably a batch of data, for example, comment data issued by each user within 1 minute. In this embodiment, the data to be stored may be various types of data, for example, comment data in a comment scene, and for example, data related to the UP owner (for example, fan data of the UP owner) in a relationship chain scene.
And step S21, judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type.
Specifically, when storing the data to be stored, in order to facilitate subsequent updating or searching of the data to be stored, in this embodiment, before storing the data, an Index (Index) of the data to be stored needs to be created first, so that when storing the data, the data may be stored according to the created Index, and when subsequently updating or querying the stored data, the data may be quickly searched according to the Index.
In this embodiment, in order to improve the performance of subsequent updating or querying of data, when an index is established, indexes of data may be established in different indexing manners according to different data types. Specifically, the type of the data to be stored may be determined by using a preset determination rule, and then, after the type of the data to be stored is determined, the index manner corresponding to the current type of data to be stored is determined according to a pre-created mapping table of data types and index manners.
In an exemplary embodiment, the step S21 may include the following steps: judging the type of the data to be stored to determine a preset scene corresponding to the data to be stored; and determining the index mode of the data to be stored according to the preset scene information.
Specifically, the index modes corresponding to the data to be stored corresponding to different scenes are different, and in order to determine the index mode of the data to be stored, in this embodiment, a preset determination rule may be first adopted to determine the type of the data to be stored, so as to determine which preset scene the data to be stored belongs to, and after the preset scene is determined, the index mode of the current data to be stored may be determined according to a mapping table between the preset scene and the index mode.
In an exemplary embodiment, the determining, according to the preset scene information, an index manner of the data to be stored may include the following steps: and when the type of the data to be stored is data of a first preset scene, determining the indexing mode of the data to be stored as indexing based on the timestamp of the data to be stored.
Specifically, the first preset scene is a scene in which the time of data generation has a relatively obvious rule, for example, a comment data scene. Generally, for each manuscript or video, more comment data can be generated within one day or one week of manuscript or video publishing, and the more late the manuscript or video publishing, the less comment data is generated, that is, the comment data scene is a scene with obvious rule in the data generation time.
As an example, when it is determined that the type of the data to be stored is comment data, the data to be stored may be indexed based on a time stamp of the comment data. For example, comment data posted in the last 1 minute is stored as one dimension in one data bucket, and comment data posted in the last 2 minutes to the last 1 minute is stored as another dimension in another data bucket.
In an exemplary embodiment, the determining, according to the preset scene information, an index manner of the data to be stored may further include: and when the storage type of the data to be stored is judged to be the data of a second preset scene, determining the indexing mode of the data to be stored as indexing based on the user identification and the time interval.
Specifically, the second preset scenario is a relationship chain between data generation and a user, and a scenario in which a time interval has a strong association, such as an UP main relationship chain scenario. Generally, when a user becomes a UP master, other users pay attention to the fact that the user becomes his fan, so that the fan is associated with the UP master, and when the other users become the fan of the user, generally, the user suddenly has a large number of users who become his fans within a certain period of time, and after a certain period of time, the increase of the number of fans becomes slower or even not. That is, the UP master relationship chain scene is a scene in which a relationship chain between data generation and a user and a time interval have a strong association.
As an example, when it is determined that the type of the data to be stored is fan data of UP main, the data to be stored may be indexed based on the user identifier (i.e., UP main identifier) corresponding to the fan data. For example, the fan data of the UP master a is stored into one data bucket as one dimension, such as data bucket 1, and the fan data of the UP master B is stored into another data bucket as one dimension, such as data bucket 2. Meanwhile, the association between the fan data and the time interval is relatively large, so that the time interval can be further used for indexing the data to be stored. For example, the fan data generated by the UP master a at number 1 is stored as another dimension into the storage index file1 of the data bucket 1, such as HFile1, and the fan data generated by the UP master a at number 2 is stored as another dimension into the storage index file2 of the data bucket 1, such as HFile 2; the fan data generated by the UP master B at number 1 is stored as another dimension into the storage index file1 of the data bucket 2, such as HFile1, and the fan data generated by the UP master B at number 2 is stored as another dimension into the storage index file2 of the data bucket 2, such as HFile 2.
And step S22, creating a storage index file of the data to be stored in the data bucket according to the determined index mode.
Specifically, the data bucket (bucket) is used for storing an index file, one data bucket may include multiple hfiles, and each Hfile is a storage index file and is used for storing index entry information of each data. Wherein each index entry information is recorded in the form of a Recordkey, each Recordkey including storage path information of the currently indexed data and stored file information.
As an example, the format of each index entry information is as follows:
recordkey X- > path, file, where path represents the storage path of the data, which is the pointer offset address (offset) in the data storage container, and file represents the stored file information.
In an exemplary embodiment, as shown in fig. 3, when the type of the data to be stored is data of a first preset scene, the step S22 may further include steps S30-S31, wherein: step S30, determining a data bucket for storing the storage index file according to the creation time stamp of the data to be stored; step S31, creating the storage index file in the determined data bucket, where the storage index file includes a storage path of the data to be stored and file information of the data to be stored.
Specifically, when creating the storage index file, a creation timestamp of data to be stored may be extracted, then a specific data bucket in which the storage index file is created may be determined according to the extracted timestamp, and after determining a data bucket, the storage index file may be created directly in the determined data bucket. The creation time stamp may also be referred to as a generation time stamp of the data or a release time stamp of the data, and is used to indicate when the data is generated.
In an exemplary embodiment, as shown in fig. 4, step S30 may further include steps S40-S41, wherein: step S40, calculating the message digest of the creation timestamp; and step S41, performing hash operation on the calculated message digest to obtain a data bucket for storing the index file.
Specifically, the message digest, also called digital digest, is a short message of a fixed length obtained by converting a message of an arbitrary length, and is similar to a Hash function, which is a function of an argument that is a message. The digital digests are a series of ciphertexts with fixed length (128 bits) formed by 'digests' of plaintext to be encrypted by adopting a one-way Hash function, which is also called digital fingerprints, and the ciphertexts have fixed length, and different digests of the plaintext are the ciphertexts, the result is always different, and the digests of the same plaintext must be consistent.
In this embodiment, the message digest calculation may be performed on the creation timestamp through a message digest algorithm to obtain a message digest corresponding to the creation timestamp. The message digest algorithm may be MD2, MD4, MD5, SHA-1, SHA-256, ripemm 128, ripemm 160, and the like, and in this embodiment, the message digest calculation is preferably performed on the creation timestamp by using MD5 algorithm.
As an example, a message digest obtained by performing message digest calculation on the creation timestamp through the MD5 algorithm is a 128-bit ciphertext, and hash calculation needs to be performed on the 128-bit ciphertext in order to determine that the data bucket storing the index storage file needs to perform hash calculation, in this embodiment, when performing hash calculation, the hash algorithm used may be: the data bucket is determined by dividing the 128-bit ciphertext by the number of buckets to perform a remainder operation, for example, if the number corresponding to the 128-bit ciphertext is 505, and the total number of the preset data buckets is 100, the data bucket storing the index file can be obtained as data bucket 5 after the hash operation.
In an embodiment, the data bucket may also be determined directly and simply according to the creation timestamp, for example, the storage index file of the data to be stored with the creation timestamp of the latest one minute is stored in the data bucket 1, the storage index file of the data to be stored with the creation timestamp of the latest two minutes to one minute is stored in the data bucket 2, and the storage index file of the data to be stored with the creation timestamp of the latest three minutes to two minutes is stored in the data bucket 3.
In an exemplary embodiment, as shown in fig. 5, when the type of the data to be stored is data of the second preset scene, the step S22 may further include steps S50-S51, wherein: step S50, determining a data bucket for storing the index file according to the user identification corresponding to the data to be stored; step S51, creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored, where the storage index file includes a storage path of the data to be stored and file information of the data to be stored.
Specifically, when creating the storage index file, a user identifier of data to be stored may be extracted first, then a specific data bucket in which the storage index file is created is determined according to the user identifier, and after determining a data bucket, the storage index file may be created directly in the determined data bucket according to a time interval corresponding to the data to be stored. The user identifier is used to distinguish which user the data to be stored is specifically associated with, for example, if the data to be stored is fan data of a UP master, the user identifier is an account of the UP master, or a nickname of the UP master and the like can uniquely distinguish information of different UP masters.
For example, if the time for generating the data to be stored is the latest 1 day, the time interval corresponding to the data to be stored may be determined to be time interval 1, and if the time for generating the data to be stored is generated from the latest one day to the latest 2 days, the time interval corresponding to the data to be stored may be determined to be time interval 2. It can be understood that, in order to determine the time interval corresponding to the data to be stored, a determination rule of the time interval needs to be set in advance, so that after the data to be stored is subsequently acquired, the time interval corresponding to the data to be stored can be determined according to the determination rule.
In an exemplary embodiment, when the storage index file is created in the determined data bucket according to the time interval corresponding to the data to be stored, different storage index files may be created in the determined data bucket according to different time intervals corresponding to the data to be stored.
As an example, when the determined data bucket is data bucket 1, and the time interval corresponding to the data to be stored is time interval 1, then storage index file1 may be created in the data bucket; when the determined data bucket is the data bucket 1 and the time interval corresponding to the data to be stored is the time interval 2, the storage index file2 may be created in the data bucket, that is, the storage index files created in the data bucket at different time intervals are different.
In an exemplary embodiment, as shown in fig. 6, step S50 may further include steps S60-S61, wherein: step S60, calculating the message abstract of the user identification; and step S61, performing hash operation on the calculated message digest to obtain a data bucket for storing the index file.
Specifically, the message digest, also called digital digest, is a short message of a fixed length obtained by converting a message of an arbitrary length, and is similar to a Hash function, which is a function of an argument that is a message. The digital digests are a series of ciphertexts with fixed length (128 bits) formed by 'digests' of plaintext to be encrypted by adopting a one-way Hash function, which is also called digital fingerprints, and the ciphertexts have fixed length, and different digests of the plaintext are the ciphertexts, the result is always different, and the digests of the same plaintext must be consistent.
In this embodiment, the message digest calculation may be performed on the user identifier through a message digest algorithm to obtain a message digest corresponding to the user identifier. The message digest algorithm may be MD2, MD4, MD5, SHA-1, SHA-256, ripemm 128, ripemm 160, and the like, and in this embodiment, the message digest calculation is preferably performed on the creation timestamp by using MD5 algorithm.
As an example, a message digest obtained by performing message digest calculation on a user identifier through an MD5 algorithm is a 128-bit ciphertext, and hash calculation needs to be performed on the 128-bit ciphertext to determine that a data bucket storing an index storage file needs to perform hash calculation, in this embodiment, when performing hash calculation, the hash algorithm used may be: the data bucket is determined by dividing the 128-bit ciphertext by the number of buckets to perform a remainder operation, for example, if the number corresponding to the 128-bit ciphertext is 615, and the total number of the preset data buckets is 100, the data bucket storing the index file can be obtained as the data bucket 15 after the hash operation.
And step S23, storing the data to be stored into a corresponding storage file according to the storage index file.
Specifically, after the index storage file is created, the index information in the index storage file may be used to store the index information into the corresponding storage file. The storage file is a file which is actually located in the data storage container and used for storing data to be stored.
As an example, it is assumed that the index storage file includes index information "Recordkey 1- > path X, file a", which indicates that data to be stored needs to be stored in file a (the storage file) under path X.
According to the data storage method, the data storage system, the computer equipment and the computer readable storage medium, data to be stored are obtained; judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type; creating a storage index file of the data to be stored in a data bucket according to the determined index mode; and storing the data to be stored into a corresponding storage file according to the storage index file. According to the method and the device, the index can be created in different modes according to different types of data when the index is created, so that the created index is matched with the data type, the storage position of the data can be conveniently found according to the created index, the data can be conveniently updated subsequently, and the time spent on updating the data is reduced.
In an exemplary embodiment, as shown in fig. 7, the data storage method may further include: steps S70-S72, wherein: step S70, when a data updating instruction is received, determining a storage index file of data to be updated according to the data updating instruction; step S71, determining the storage path and the storage file information of the data to be updated according to the determined storage index file; step S72, reading the data to be updated according to the storage path and the storage file information of the data to be updated, and updating the data to be updated.
As an example, assuming that data a needs to be updated, after an update instruction of data a is received, a storage index file corresponding to the data a is queried in a data bucket according to the data a, and assuming that the storage index file queried for the data a is a storage index file1 in a data bucket 2, a storage path and storage file information of the data a are then obtained from the storage index file1, and finally, the data a can be found according to the information, read into a memory, updated, and rewritten into a file after the update is completed.
Fig. 8 is a block diagram of a data storage device 800 that may be partitioned into one or more program modules, stored in a storage medium, and executed by one or more processors to implement an embodiment of the present application, according to an embodiment of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can perform specific functions, and the following description will specifically describe the functions of the program modules in the embodiments. As shown in fig. 8, the data storage device 800 may include: an acquisition module 801, a judgment module 802, a creation module 803, and a storage module 804.
An obtaining module 801, configured to obtain data to be stored.
The determining module 802 is configured to determine the type of the data to be stored, and determine an index manner of the data to be stored according to the determined type.
A creating module 803, configured to create a storage index file of the to-be-stored data in a data bucket according to the determined index manner
The storage module 804 is configured to store the data to be stored into a corresponding storage file according to the storage index file.
In an exemplary embodiment, the determining module 802 is further configured to determine a type of the data to be stored, so as to determine a preset scene corresponding to the data to be stored; and determining the index mode of the data to be stored according to the preset scene information.
In an exemplary embodiment, the determining module 802 is further configured to determine, when it is determined that the type of the data to be stored is data of a first preset scene, that an indexing manner of the data to be stored is to perform indexing based on a timestamp of the data to be stored.
The creating module 803 is further configured to determine, according to the creating timestamp of the data to be stored, a data bucket for storing the storage index file; and creating the storage index file in the determined data bucket, wherein the storage index file comprises a storage path of the data to be stored and file information of the data to be stored.
In an exemplary embodiment, the creating module 803 is further configured to calculate a message digest of the creation timestamp; and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
In an exemplary embodiment, the determining module 802 is further configured to determine, when it is determined that the storage type of the data to be stored is data of a second preset scene, that an indexing manner of the data to be stored is based on a user identifier and a time interval for indexing.
The creating module 803 is further configured to determine, according to the user identifier corresponding to the data to be stored, a data bucket for storing the storage index file; and creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored, wherein the storage index file comprises a storage path of the data to be stored and file information stored by the data to be stored.
In an exemplary embodiment, the creating module 803 is further configured to create different storage index files in the determined data buckets according to different time intervals corresponding to the data to be stored.
In an exemplary embodiment, the creating module 803 is further configured to calculate a message digest of the user identifier; and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
In an exemplary embodiment, the data storage device 800 may include: the device comprises a receiving module, a determining module and an updating module.
And the receiving module is used for determining a storage index file of the data to be updated according to the data updating instruction when the data updating instruction is received.
The determining module is used for determining the storage path and the storage file information of the data to be updated according to the determined storage index file.
And the updating module is used for reading the data to be updated according to the storage path and the storage file information of the data to be updated and updating the data to be updated.
According to the embodiment of the application, the data to be stored are obtained; judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type; creating a storage index file of the data to be stored in a data bucket according to the determined index mode; and storing the data to be stored into a corresponding storage file according to the storage index file. According to the method and the device, the index can be created in different modes according to different types of data when the index is created, so that the created index is matched with the data type, the storage position of the data can be conveniently found according to the created index, the data can be conveniently updated subsequently, and the time spent on updating the data is reduced.
Fig. 9 schematically shows a hardware architecture diagram of a computer device suitable for implementing the data storage method according to an embodiment of the present application. In the present embodiment, the computer device 20 is a device capable of automatically performing numerical calculation and/or information processing in accordance with a command set or stored in advance. For example, it may be a data forwarding device such as a gateway. As shown in fig. 9, the computer device 20 includes at least, but is not limited to: the memory 21, processor 22, and network interface 23 may be communicatively coupled to each other by a system bus. Wherein:
the memory 21 includes at least one type of computer-readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 21 may be an internal storage module of the computer device 20, such as a hard disk or a memory of the computer device 20. In other embodiments, the memory 21 may also be an external storage device of the computer device 20, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the computer device 20. Of course, the memory 21 may also include both internal and external memory modules of the computer device 20. In this embodiment, the memory 21 is generally used for storing an operating system installed in the computer device 20 and various types of application software, such as program codes of a data storage method. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
Processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is generally configured to control the overall operation of the computer device 20, such as performing control and processing related to data interaction or communication with the computer device 20. In this embodiment, the processor 22 is configured to execute the program code stored in the memory 21 or process data.
The network interface 23 may comprise a wireless network interface or a wired network interface, and the network interface 23 is typically used to establish a communication connection between the computer device 20 and other computer devices. For example, the network interface 23 is used to connect the computer device 20 with an external terminal through a network, establish a data storage channel and a communication connection between the computer device 20 and the external terminal, and the like. The network may be a wireless or wired network such as an Intranet (Intranet), the Internet (Internet), a Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), a 4G network, a 5G network, Bluetooth (Bluetooth), or Wi-Fi.
It is noted that fig. 9 only shows a computer device with components 21-23, but it is to be understood that not all of the shown components are required to be implemented, and that more or less components may be implemented instead.
In this embodiment, the data storage method stored in the memory 21 can be further divided into one or more program modules and executed by one or more processors (in this embodiment, the processor 22) to complete the present invention.
The present embodiment also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the data storage method in the embodiments.
In this embodiment, the computer-readable storage medium includes a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the computer readable storage medium may be an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. In other embodiments, the computer readable storage medium may be an external storage device of the computer device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device. Of course, the computer-readable storage medium may also include both internal and external storage devices of the computer device. In the present embodiment, the computer-readable storage medium is generally used for storing an operating system and various types of application software installed in the computer device, for example, the program codes of the data storage method in the embodiment, and the like. Further, the computer-readable storage medium may also be used to temporarily store various types of data that have been output or are to be output.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (11)

1. A method of data storage, the method comprising:
acquiring data to be stored;
judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type;
creating a storage index file of the data to be stored in a data bucket according to the determined index mode;
and storing the data to be stored into a corresponding storage file according to the storage index file.
2. The data storage method according to claim 1, wherein judging the type of the data to be stored, and determining the index mode of the data to be stored according to the judged type comprises:
judging the type of the data to be stored to determine a preset scene corresponding to the data to be stored;
and determining the index mode of the data to be stored according to the preset scene information.
3. The data storage method according to claim 2, wherein the determining, according to the preset scene information, the index manner of the data to be stored comprises:
when the type of the data to be stored is data of a first preset scene, determining that the indexing mode of the data to be stored is indexing based on the timestamp of the data to be stored;
the creating of the storage index file of the data to be stored in the data bucket according to the determined index mode comprises:
determining a data bucket for storing the storage index file according to the creation time stamp of the data to be stored;
and creating the storage index file in the determined data bucket, wherein the storage index file comprises a storage path of the data to be stored and file information of the data to be stored.
4. The data storage method according to claim 3, wherein the determining a data bucket for storing the storage index file according to the creation timestamp of the data to be stored comprises:
calculating a message digest of the creation timestamp;
and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
5. The data storage method according to claim 2, wherein the determining, according to the preset scene information, the index manner of the data to be stored comprises:
when the storage type of the data to be stored is data of a second preset scene, determining that the indexing mode of the data to be stored is based on user identification and time interval to perform indexing;
the creating of the storage index file of the data to be stored in the data bucket according to the determined index mode comprises:
determining a data bucket for storing the storage index file according to the user identification corresponding to the data to be stored;
and creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored, wherein the storage index file comprises a storage path of the data to be stored and file information stored by the data to be stored.
6. The data storage method according to claim 5, wherein the creating the storage index file in the determined data bucket according to the time interval corresponding to the data to be stored comprises:
and creating different storage index files in the determined data bucket according to different time intervals corresponding to the data to be stored.
7. The data storage method of claim 5, wherein the determining, according to the user identifier corresponding to the data to be stored, the data bucket storing the storage index file comprises:
calculating the message abstract of the user identification;
and carrying out hash operation on the calculated message digest to obtain a data bucket for storing the index file.
8. The data storage method of any of claims 1 to 7, further comprising:
when a data updating instruction is received, determining a storage index file of data to be updated according to the data updating instruction;
determining a storage path and storage file information of the data to be updated according to the determined storage index file;
and reading the data to be updated according to the storage path and the storage file information of the data to be updated, and updating the data to be updated.
9. A data storage device, comprising:
the acquisition module is used for acquiring data to be stored;
the judging module is used for judging the type of the data to be stored and determining the index mode of the data to be stored according to the judged type;
the creating module is used for creating a storage index file of the data to be stored in a data bucket according to the determined index mode;
and the storage module is used for storing the data to be stored into the corresponding storage file according to the storage index file.
10. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the computer program, is adapted to carry out the steps of the data storage method according to any of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of the data storage method according to any one of claims 1 to 8.
CN202011511668.4A 2020-12-18 2020-12-18 Data storage method and device Active CN112486915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011511668.4A CN112486915B (en) 2020-12-18 2020-12-18 Data storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011511668.4A CN112486915B (en) 2020-12-18 2020-12-18 Data storage method and device

Publications (2)

Publication Number Publication Date
CN112486915A true CN112486915A (en) 2021-03-12
CN112486915B CN112486915B (en) 2023-01-20

Family

ID=74915154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011511668.4A Active CN112486915B (en) 2020-12-18 2020-12-18 Data storage method and device

Country Status (1)

Country Link
CN (1) CN112486915B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010526A (en) * 2021-04-19 2021-06-22 星辰天合(北京)数据科技有限公司 Storage method and device based on object storage service
CN113094340A (en) * 2021-04-28 2021-07-09 杭州海康威视数字技术股份有限公司 Data query method, device and equipment based on Hudi and storage medium
CN115291812A (en) * 2022-09-30 2022-11-04 北京紫光青藤微***有限公司 Data storage method and device of communication chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719141A (en) * 2009-12-24 2010-06-02 成都市华为赛门铁克科技有限公司 File processing method and system based on directory object
CN103218445A (en) * 2013-04-22 2013-07-24 亿赞普(北京)科技有限公司 Mobile terminal information pushing method and device
CN104657362A (en) * 2013-11-18 2015-05-27 深圳市腾讯计算机***有限公司 Method and device for storing and querying data
CN108776678A (en) * 2018-05-29 2018-11-09 阿里巴巴集团控股有限公司 Index creation method and device based on mobile terminal NoSQL databases
CN110427368A (en) * 2019-07-12 2019-11-08 深圳绿米联创科技有限公司 Data processing method, device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101719141A (en) * 2009-12-24 2010-06-02 成都市华为赛门铁克科技有限公司 File processing method and system based on directory object
CN103218445A (en) * 2013-04-22 2013-07-24 亿赞普(北京)科技有限公司 Mobile terminal information pushing method and device
CN104657362A (en) * 2013-11-18 2015-05-27 深圳市腾讯计算机***有限公司 Method and device for storing and querying data
CN108776678A (en) * 2018-05-29 2018-11-09 阿里巴巴集团控股有限公司 Index creation method and device based on mobile terminal NoSQL databases
CN110427368A (en) * 2019-07-12 2019-11-08 深圳绿米联创科技有限公司 Data processing method, device, electronic equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010526A (en) * 2021-04-19 2021-06-22 星辰天合(北京)数据科技有限公司 Storage method and device based on object storage service
CN113094340A (en) * 2021-04-28 2021-07-09 杭州海康威视数字技术股份有限公司 Data query method, device and equipment based on Hudi and storage medium
CN115291812A (en) * 2022-09-30 2022-11-04 北京紫光青藤微***有限公司 Data storage method and device of communication chip

Also Published As

Publication number Publication date
CN112486915B (en) 2023-01-20

Similar Documents

Publication Publication Date Title
CN112486915B (en) Data storage method and device
WO2019000630A1 (en) Multi-task scheduling method and system, application server and computer-readable storage medium
CN110543448A (en) data synchronization method, device, equipment and computer readable storage medium
CN112751772B (en) Data transmission method and system
CN111831748A (en) Data synchronization method, device and storage medium
CN112559475B (en) Data real-time capturing and transmitting method and system
CN112019605B (en) Data distribution method and system for data stream
CN112214519B (en) Data query method, device, equipment and readable medium
WO2021164462A1 (en) Data encryption method, data decryption method, computer device, and medium
CN113704790A (en) Abnormal log information summarizing method and computer equipment
CN112181614B (en) Task timeout monitoring method, device, equipment, system and storage medium
CN110602165A (en) Government affair data synchronization method, device, system, computer equipment and storage medium
WO2015117309A1 (en) Method and apparatus for generating warning
CN112087530A (en) Method, device, equipment and medium for uploading data to block chain system
CN112148350A (en) Remote version management method for works, electronic device and computer storage medium
CN113407560B (en) Update message processing method, data synchronization method and configuration information configuration method
CN115129728A (en) File checking method and device
US20160004850A1 (en) Secure download from internet marketplace
CN111327680B (en) Authentication data synchronization method, device, system, computer equipment and storage medium
CN112559118A (en) Application data migration method and device, electronic equipment and storage medium
CN115801765A (en) File transmission method, device, system, electronic equipment and storage medium
CN112749142B (en) Handle management method and system
CN107832021A (en) A kind of electronic evidence fixing means, terminal device and storage medium
CN114328129A (en) Message sending method, device, equipment and storage medium
CN112199529A (en) Picture processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant