CN106708825B - A kind of data file processing method and system - Google Patents

A kind of data file processing method and system Download PDF

Info

Publication number
CN106708825B
CN106708825B CN201510454768.0A CN201510454768A CN106708825B CN 106708825 B CN106708825 B CN 106708825B CN 201510454768 A CN201510454768 A CN 201510454768A CN 106708825 B CN106708825 B CN 106708825B
Authority
CN
China
Prior art keywords
data
data file
shared drive
file
directory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510454768.0A
Other languages
Chinese (zh)
Other versions
CN106708825A (en
Inventor
王刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510454768.0A priority Critical patent/CN106708825B/en
Publication of CN106708825A publication Critical patent/CN106708825A/en
Application granted granted Critical
Publication of CN106708825B publication Critical patent/CN106708825B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data file processing method and systems, wherein this method comprises: obtaining the data file on disk, wherein data file is stored with the data structure of shared drive;Data file is loaded into shared drive by the way of memory mapping, and shared drive is initialized;Data more new information is recorded based on the loading procedure, data more new packets include the filename and first time information of data file, and the first time information is time when loading procedure loads;Processing is read out to data file according to data more new information.The embodiment of the present invention is loaded into shared drive by storing the data file of clear data structure on disk with the data structure of shared drive, improves loading efficiency;It supports load at one, many places to use, i.e., by shared drive, realizes that multiple processes are used with a shared drive data, greatly reduce additional EMS memory occupation.

Description

A kind of data file processing method and system
Technical field
The invention belongs to field of communication technology more particularly to a kind of data file processing methods and system.
Background technique
With the rapid development of Internet technology, retrieval string parsing and error correction in searching service, personalized recommendation industry Recommendation service in business etc. has required a large amount of data in largely servicing to support decision.Performance is then line in a program Upper service needs to load a large amount of data, and needs per treatment carry out a large amount of table lookup operation.Meanwhile it needing according to certain frequency Rate is updated data, to adapt to change.The frequency of update can be (second grade), quasi real time (minute grade) or timing in real time (day grade) updates.
In the prior art, usual each service processes are voluntarily responsible for the load and update of data, and usually process is read Data file, and oneself building memory data structure, in order to support not withdraw update, general way is to start an independence Thread carries out data update, maintains old data constant at no point in the update process, after the completion of new data load, deletes old data.
In the research and practice process to the prior art, it was found by the inventors of the present invention that internal storage data in the prior art It can only be used in individual process, if multiple processes need to be needed to load respectively, be updated respectively using identical data, from And it will lead to additional EMS memory occupation;And the data are clear data structure, and the load time is longer, so as to cause loading efficiency It is not high.
Summary of the invention
The purpose of the present invention is to provide a kind of data file processing method and systems, it is intended to improve data documents disposal standard True rate and recall rate.
In order to solve the above technical problems, the embodiment of the present invention the following technical schemes are provided:
A kind of data file processing method, including:
The data file on disk is obtained, wherein the data file is stored with the data structure of shared drive;
The data file is loaded into the shared drive by the way of memory mapping, and to the shared drive It is initialized;
Based on the loading procedure record data more new information, the data more new packets include the filename of data file with And first time information, the first time information are the time when loading procedure loads;
Processing is read out to data file according to the data more new information.
In order to solve the above technical problems, the embodiment of the present invention also the following technical schemes are provided:
A kind of data documents disposal system, including:
Data management module, for obtaining the data file on disk, wherein the data file is with the number of shared drive It is stored according to structure;The data file is loaded into the shared drive by the way of memory mapping, and to described Shared drive is initialized;Data more new information is recorded based on the loading procedure, the data more new packets include data text The filename and first time information of part, the first time information are the time when loading procedure loads;
Data read module, for being read out processing to data file according to the data more new information.
Compared with the existing technology, data file is first stored on disk by the present embodiment with the data structure of shared drive, And the data file will be loaded into the shared drive by the way of memory mapping, and record more new information, with reality Loading processing now is carried out to data file, and then data file is updated and is loaded according to the data more new information, with Just process is used in conjunction with the data in shared drive;The embodiment of the present invention is by by the data file of clear data structure on disk It is stored, and is loaded into shared drive with the data structure of shared drive, improve loading efficiency;It supports to add at one It carries, many places use, i.e., by shared drive, realizes that multiple processes are used with a shared drive data, greatly reduce additionally EMS memory occupation.
Detailed description of the invention
With reference to the accompanying drawing, by the way that detailed description of specific embodiments of the present invention, technical solution of the present invention will be made And other beneficial effects are apparent.
Fig. 1 a is the schematic diagram of a scenario of data file processing method provided in an embodiment of the present invention;
Fig. 1 b is the flow diagram of data file processing method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of data file processing method provided in an embodiment of the present invention;
Fig. 3 is that the data structure of data file provided in an embodiment of the present invention is illustrated;
Fig. 4 is the structural schematic diagram of data documents disposal system provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of server provided in an embodiment of the present invention.
Specific embodiment
Schema is please referred to, wherein identical component symbol represents identical component, the principle of the present invention is to implement one It is illustrated in computing environment appropriate.The following description be based on illustrated by the specific embodiment of the invention, should not be by It is considered as the limitation present invention other specific embodiments not detailed herein.
In the following description, specific embodiments of the present invention will refer to the step as performed by one or multi-section computer And symbol illustrates, unless otherwise stating clearly.Therefore, these steps and operation will have to mention for several times is executed by computer, this paper institute The computer execution of finger includes by representing with the computer processing unit of the electronic signal of the data in a structuring pattern Operation.This operation is converted at the data or the position being maintained in the memory system of the computer, reconfigurable Or in addition change the running of the computer in mode known to the tester of this field.The maintained data structure of the data For the provider location of the memory, there is the specific feature as defined in the data format.But the principle of the invention is with above-mentioned text Word illustrates that be not represented as a kind of limitation, this field tester will appreciate that plurality of step and behaviour as described below Also it may be implemented in hardware.
The principle of the present invention is grasped using many other wide usages or specific purpose operation, communication environment or configuration Make.The known example suitable for arithmetic system of the invention, environment and configuration may include (but being not limited to) hold phone, Personal computer, server, multicomputer system, system, body frame configuration computer and distributed arithmetic ring based on micro computer Border, which includes any above system or devices.
Term as used herein " module " can regard the software object to execute in the arithmetic system as.It is described herein Different components, module, engine and service can regard as the objective for implementation in the arithmetic system.And device as described herein and Method is preferably implemented in the form of software, can also be implemented on hardware certainly, the scope of the present invention it It is interior.
The embodiment of the present invention provides a kind of data file processing method and system.
Referring to Fig. 1 a, which is the schematic diagram of a scenario of data documents disposal system provided by the embodiment of the present invention, the number It specifically can integrate in the equipment such as server according to document handling system, which can specifically include data Management module, be mainly used for obtain disk on data file, wherein the data file with the data structure of shared drive into Row storage;Wherein, the data structure of shared drive may include array, hash table, even numbers group word lookup tree etc., adopt thereafter The data file is loaded into the shared drive with the mode that memory maps, and the shared drive is carried out initial Change;Data more new information is recorded based on the loading procedure, the data more new packets include the filename and the of data file One temporal information, the first time information are the time when loading procedure loads;Certainly, which may be used also Further for being updated processing etc. to data file;Load/update
In addition, the data documents disposal system can also specifically include data read module, it is mainly used for according to Data more new information is updated and loads to data file, the data being used in conjunction in shared drive so as to process;In addition, should Data documents disposal system can also specifically include peripheral system, alternatively referred to as update detection module, be mainly used for disk On data file be updated detection so that data management module is updated and loading processing, data read module according to Data more new information is updated and loads.
It will be described in detail respectively below.
First embodiment
In the present embodiment, it will be described from the angle of data documents disposal, which specifically may be used To be integrated in the equipment such as server.
A kind of data file processing method, comprising: obtain the data file on disk, wherein the data file is with shared interior The data structure deposited is stored;The data file is loaded into shared drive by the way of memory mapping, and total to this Memory is enjoyed to be initialized;Data more new information is recorded based on the loading procedure, the data more new packets include data file Filename and first time information, the first time information are the time when loading procedure loads;It is updated according to the data Information is read out processing to data file.
Fig. 1 b is please referred to, Fig. 1 b is the flow diagram for the data file processing method that first embodiment of the invention provides. This method comprises:
In step s101, the data file on disk is obtained, wherein the data file is with the data structure of shared drive It is stored.
It is understood that first the data file on disk can be pre-processed when carrying out data file load, Such as: data file is stored in disk with the data structure of shared drive, wherein the data structure of shared drive include but It is not limited to array, hash table, even numbers group word lookup tree (trie) etc., it is not especially limited herein.
In step s 102, the data file is loaded into the shared drive by the way of memory mapping, and to this Shared drive is initialized.
For example, the operation initialized herein can refer specifically to initialize the lock in internal storage data, main needle To there is read-write (such as to modify to partial data) internal storage data of demand simultaneously, needs to design in memory data structure and lock (such as Read-Write Locks, sequence lock) guarantee correctly to access.
Optionally, after being initialized to the shared drive, the data file on disk can also be carried out real-time Update detection, for example, specifically can be such that
(1) it determines to answer the data file that the needs update there are when the data file of update in need when detecting It makes under more new directory;
(2) data file under this more new directory is loaded into shared drive, and the shared drive is initialized;
(3) data file under data directory is moved to backup directory;Data file under this more new directory is mobile To under the data directory.
After data file load updates, original data file can also be mapped and be deleted.
It should be noted that the more new directory, the data directory and the backup directory are to shift to an earlier date in the embodiment of the present invention It is arranged, and these three catalogues are in same file system, to guarantee data file file system index node when moving (inode) it remains unchanged, to keep memory mapping relations;The index node can be used to store the basic letter of archives and catalogue Breath includes time, shelves name, user and group etc..
It is further alternative, cyclic redundancy check code or message digest algorithm (MD5, Message- can be passed through Digest Algorithm 5) etc. modes detection is updated to data file, be not specifically described herein.
In step s 103, data more new information is recorded based on the loading procedure, the data more new packets include data text The filename and first time information of part, the first time information are the time when loading procedure loads.
It is changed by the data structure to data in magnetic disk file, which is loaded into shared drive, and Relevant data more new information is recorded, so that the module of data management also completes the process of data load.
In step S104, processing is read out to data file according to the data more new information.
Such as: it, can be according to pre- using the module of data after the module of data management has recorded data more new information If time interval, the data more new information is read out and is detected;If it is determined that data more new information middle finger is shown with purpose number According to the more new information of file, then data file is loaded into the shared drive by the way of memory mapping, and record this to make Time when mapping load with the module of data is the second temporal information.
It is understood that this is not required to using the module of data since step S102 has performed initialization operation Operation bidirectional is done again, and operating system can guarantee that the same data file is mapped in identical shared drive, to use The module of data also completes the process of data load.
It can be seen from the above, data file processing method provided in this embodiment, first by data file with the number of shared drive It is stored on disk, and the data file will be loaded into the shared drive by the way of memory mapping, and remember according to structure More new information is recorded, loading processing is carried out to data file to realize, and then carry out to data file according to the data more new information It updates and loads, the data being used in conjunction with so as to process in shared drive;The embodiment of the present invention is by by clear data on disk The data file of structure is stored with the data structure of shared drive, and is loaded into shared drive, and load is improved Efficiency;It supports load at one, many places to use, i.e., by shared drive, realizes that multiple processes are used with a shared drive data, Greatly reduce additional EMS memory occupation.
Second embodiment
Citing, is described in further detail by described method according to first embodiment below.
In the data documents disposal system include: data management module, data using module and updates detection module;It is first First, the data file on disk is mapped to shared drive, records the data more new information of the process by data management module.Its Secondary, data file is also mapped to shared drive according to data more new information using module by data;After the completion of data load, update Detection module further can be updated detection to disk file in real time, so that data management module and data read module root It is updated and loads according to data.
Wherein system data for being related in updating load include: data file on disk, shared drive data and Recorded data more new information, and cyclic redundancy check file (CRC, Cyclic for checking file authentication Redundancy Check) and indicate mark (flag) file updated etc..
It will be described in more detail below.
As shown in Fig. 2, a kind of data file processing method, detailed process can be such that
In step s 201, it updates detection module and detection is updated to the data file on disk.
Wherein, the mode for updating detection includes but is not limited to crc verification, md5sum verification.Step is triggered after verifying successfully " data file updated will be needed to copy under more new directory ".
It should be noted that data file is stored on disk with the data structure of shared drive, number is alternatively referred to as stored According to disk file;System need by data it is regular at can the direct format used in shared drive, that is, it is shared in The disk storage form of deposit data.
For example, can be specific as follows:
It is stored on disk in the form of binary file, is mapped to shared drive in such a way that memory maps.Its content It can be any shared drive data structure.Including but not limited to array, hash table, even numbers group trie tree etc..Below with hash For table, data structure can be as shown in figure 3, hash table data structure may include:
(1) header information HEADER stores the metadata of hash table.Such as data type, version number, control multi-process The lock of access, hash table statistical information etc..
(2) hash bucket BUCKET, content are directed to the index of NODE array.
(3) node NODE, content include being directed toward index, the key (key) of next node.It is directed toward the index of CHUNK array, It is elongated hash table for value, further includes the length of value.
(4) data block CHUNK, fixed length hash direct storage value, and elongated hash also stores one and is directed toward next CHUNK's Index.
It is contemplated that only carrying out analytic explanation by taking hash table data structure as an example herein, do not constitute to of the invention It limits.
In step S202, when determining to update detection module for the needs there are when the data file of update in need The data file of update copies under more new directory.
At the same time, flag file can be written for the filename for the data file for needing to update by updating detection module, with Just data management module can periodically check flag file, read filename therein.
In step S203, the data file under this more new directory is loaded into shared drive by data management module, and The shared drive is initialized.
In step S204, the data file under data directory is moved to backup directory by data management module.
In step S205, the data file under this more new directory is moved under the data directory by data management module.
It is understood that step S203 to step S205 is that data management module updates a kind of more excellent of data file The mode of choosing.
It should be noted that the more new directory, the data directory and the backup directory are to shift to an earlier date in the embodiment of the present invention It is arranged, and these three catalogues are in same file system, to guarantee data file file system index node when moving (inode) it remains unchanged, to keep memory mapping relations;The index node can be used to store the basic letter of archives and catalogue Breath includes time, shelves name, user and group etc..
Further, the mode that memory mapping can be used in data management module loads the data file under this more new directory Into shared drive, wherein can be specific: memory mapping just refers to by the mapping of a file to one piece of memory.Win32 is provided Allow application program the function (CreateFileMapping) of File Mapping a to process.Memory Mapping File and void Some are similar for quasi- memory, the region of an address space can be retained by Memory Mapping File, while physical storage being mentioned Give this region, the file that the physical storage of memory mapping is already present on disk from one, and to this document File must be mapped first before being operated.It, will not when handling the file being stored on disk using memory mapping I/O operation must be executed to file again, memory be played the role of considerable when being mapped in the file of processing big data quantity.
In addition, the data file under this more new directory is loaded into shared drive by data management module, it is exactly to " shared Internal storage data " is updated, wherein shared drive data are stored in shared drive, can be used in conjunction with by multiple processes Data.The meaning of data is by process interpretation.For example, fixed/variable hash table, general key-value storage organization, trie tree Deng.It, which is loaded, is mapped to memory for " data file " generally by direct, and does necessary initialization operation and complete.
In step S206, data management module deletes the data file mapping before data file update, and updates the number According to more new information.
Wherein, the data more new packets include the filename and first time information of data file, this believes at the first time Breath is current time when data file is loaded into shared drive by data management module.
It is understood that data more new information is stored in shared drive, with record " shared drive data " Update status.General filename+renewal time the form for using data file.
Such as: when data management module determines that data file has update, data file is mapped to the same of shared drive Time point when Shi Jilu is loaded is that " TT " can read since data management module can periodically check flag file Entitled " AA " to the corresponding file of data file currently updated, then recording data more new information is " AA+TT ".
In step S207, data read module is read out the data more new information according to prefixed time interval.
In step S208, if the time in first time information indicated in the data more new information read is later than the Time in two temporal informations, then data read module determines that data file has update.
In step S209, data file updated under the data directory is mapped to this and shared by data read module Memory is simultaneously waited.
In step S210, data read module deletes data file more when the waiting time being more than preset time threshold Data file mapping before new.
It is understood that step S207 to step S210 is the number that data read module is recorded according to data management module A kind of more preferred mode of data file update and load is carried out according to more new information.
For example, data read module periodic test data more new information, by reading the temporal information in more new information To determine whether needing more new data.If it find that the time in first time information indicated in data more new information is later than Time in second temporal information, then data read module determines that data file has update, that is to say, that if the time letter of record In breath, data management module updates the time in the first time information recorded when data file for " 8:00 ", and reading data It is " 7:00 " that module last time load data file, which is the time in the second temporal information of record, then can determine that data file There is update, data read module needs are updated load to the data file of update.
Data read module updates load data file, can be specific as follows:
Such as: the data file being updated under data directory is mapped to shared drive, since this when can be same When there are two mappings, legacy data file and new data file are simultaneously in shared drive.This is because current legacy data file There are also access, so the memory mapping of legacy data file cannot be released at once.It is waited after having loaded, waits several seconds, solved Except the mapping of old data file, such as 2~30S, to ensure that legacy data file is no longer used.
Herein it should be noted that each functional module (such as data management module and data read module) each Self management oneself Shared drive, after all functional modules all relieve old data file mapping, old data file can just be operated System recycles.In the processing system, data management module will be responsible for load, initialization and update write-in, these operations are all It is exclusive, it needs to be done by a process.Data read module then can be used as another process, for reading data.
It is understood that the embodiment is carried out mainly for the renewal process of data management module and data read module Analysis, the part not being described in detail in this embodiment, e.g., data file is loaded into shared drive by data management module, is gone forward side by side The partial content of row initialization may refer to the detailed description that first embodiment is directed to data file processing method, herein no longer It repeats.
It can be seen from the above, data file processing method provided in this embodiment, data management module first by data file with The data structure of shared drive is stored on disk, and is shared the data file is loaded into this by the way of memory mapping In memory, and record more new information, data file be updated and loading processing with realizing, so data read module according to The data more new information is updated and loads to data file, the data being used in conjunction in shared drive so as to process;This hair Bright embodiment by the way that the data file of clear data structure on disk is stored with the data structure of shared drive, and by its It is loaded into shared drive, improves loading efficiency;It supports load at one, many places to use, i.e., by shared drive, realizes multiple Process is used with a shared drive data, greatly reduces additional EMS memory occupation.
3rd embodiment
For convenient for better implementation data file processing method provided in an embodiment of the present invention, the embodiment of the present invention is also provided A kind of system based on above-mentioned data file processing method.The wherein meaning of noun and phase in the method for above-mentioned data documents disposal Together, specific implementation details can be with reference to the explanation in embodiment of the method.
Referring to Fig. 4, Fig. 4 is the structural schematic diagram of data documents disposal system provided in an embodiment of the present invention, can have Body includes data management module 401 and data read module 402.
Wherein, data management module 401, for the data file on disk to be mapped in shared drive, and to this Shared drive is initialized;Data more new information is recorded based on the loading procedure, to realize the loading processing to data file. It can be specific as follows:
Data management module 401 obtains the data file on disk, and wherein the data file is with the data knot of shared drive Structure is stored;The data file is loaded into the shared drive by the way of memory mapping, and to the shared drive into Row initialization;Based on the loading procedure record data more new information, the data more new packets include the filename of data file with And first time information, the first time information are the time when loading procedure loads.
For example, the operation initialized herein can refer specifically to initialize the lock in internal storage data, main needle To there is read-write (such as to modify to partial data) internal storage data of demand simultaneously, needs to design in memory data structure and lock (such as Read-Write Locks, sequence lock) guarantee correctly to access.
The data management module 401 is used to be read out processing to data file according to the data more new information, can be specific , data management module 401 is updated and loads to data file according to data more new information, so that process is used in conjunction with Enjoy the data in memory.
Such as: after the module of data management has recorded data more new information, data read module 402 can be according to pre- If time interval, the data more new information is read out and is detected;If it is determined that data more new information middle finger is shown with purpose number According to the more new information of file, then data file is loaded into the shared drive by the way of memory mapping, and record the number Time when loading is mapped according to read module 402 for the second temporal information.
It is understood that this uses the mould of data since data management module 401 has performed initialization operation Block does not need to do operation bidirectional again, and operating system can guarantee that the same data file is mapped in identical shared drive, from And the process of data load is also completed using the module of data.
Optionally, the data management module 401 can be also used for before obtaining the data file on disk with shared Data file is stored in disk by the data structure of memory, and the data structure of the shared drive includes array, hash table, even numbers Group word lookup tree.
It is understood that first the data file on disk can be pre-processed when carrying out data file load, Such as: data file is stored in disk with the data structure of shared drive, wherein the data structure of shared drive include but It is not limited to array, hash table, even numbers group word lookup tree (trie) etc., it is not especially limited herein.
It is stored on disk in the form of binary file, is mapped to shared drive in such a way that memory maps.Its content It can be any shared drive data structure.Including but not limited to array, hash table, even numbers group trie tree etc..Below with hash For table, data structure can be as shown in figure 3, hash table data structure may include:
(1) header information HEADER stores the metadata of hash table.Such as data type, version number, control multi-process visit The lock asked, hash table statistical information etc..
(2) hash bucket BUCKET, content are directed to the index of NODE array.
(3) node NODE, content include being directed toward index, the key (key) of next node.It is directed toward the index of CHUNK array, It is elongated hash table for value, further includes the length of value.
(4) data block CHUNK, fixed length hash direct storage value, and elongated hash also stores one and is directed toward next CHUNK's Index.
It is contemplated that only carrying out analytic explanation by taking hash table data structure as an example herein, do not constitute to of the invention It limits.
Optionally, after being initialized to the shared drive, the data file on disk can also be carried out real-time Update detection, for example, specifically can be such that
As shown in figure 4, the system can also include updating detection module 403, for carrying out to the data file on disk Update detection;Wherein, the mode for updating detection includes but is not limited to crc verification, md5sum verification.When determining that there are in need When the data file of update, the data file which updates is copied under more new directory;
The data management module 401 can be also used for for the data file under this more new directory being loaded into shared drive, And the shared drive is initialized;Data file under data directory is moved to backup directory;It will be under this more new directory Data file be moved under the data directory.
After data file load updates, original data file can also be mapped and be deleted.
Such as: the data management module 401, in the case where the data file under this more new directory to be moved to the data directory it Afterwards, it can be also used for deleting the data file mapping before data file update.The data management module 401 is deleting data text After data file mapping before part update, it can be also used for updating the data more new information.
It should be noted that the more new directory, the data directory and the backup directory are to shift to an earlier date in the embodiment of the present invention It is arranged, and these three catalogues are in same file system, to guarantee data file file system index node when moving (inode) it remains unchanged, to keep memory mapping relations;The index node can be used to store the basic letter of archives and catalogue Breath includes time, shelves name, user and group etc..
It is further alternative, it can be by the modes such as cyclic redundancy check code or message digest algorithm MD5 to data text Part is updated detection, is not specifically described herein.
Further, the mode of memory mapping can be used by the data file under this more new directory in data management module 401 Be loaded into shared drive, wherein can be specific: memory mapping just refers to by the mapping of a file to one piece of memory.Win32 Provide the function (CreateFileMapping) for allowing application program File Mapping a to process.Memory Mapping File Some are similar with virtual memory, can retain the region of an address space by Memory Mapping File, while by physical store Device submits to this region, the file that the physical storage of memory mapping is already present on disk from one, and to this File must first map file before being operated.When handling the file being stored on disk using memory mapping, I/O operation need not be executed to file again, memory be played considerable when being mapped in the file of processing big data quantity Effect.
In addition, the data file under this more new directory is loaded into shared drive by data management module 401, it is exactly right " shared drive data " are updated, wherein shared drive data are stored in shared drive, can be made jointly by multiple processes Data.The meaning of data is by process interpretation.For example, fixed/variable hash table, general key-value storage organization, Trie tree etc..It, which is loaded, is mapped to memory for " data file " generally by direct, and does necessary initialization operation and complete.
At this point, the data read module 402, can be also used for according to prefixed time interval, to the data more new information into Row is read;If it is determined that data more new information middle finger is shown with the more new information of purpose data file, then using the side of memory mapping Data file is loaded into the shared drive by formula, and records time when load as the second temporal information.
Further, data read module 402, after more new information is read out to the data, if being also used to Time in the data of reading more new information in indicated first time information is later than the time in the second temporal information, then really Determining data file has update;Data file updated under the data directory is mapped to the shared drive and is waited;When Data file mapping when waiting time is more than preset time threshold, before deleting data file update.
Herein it should be noted that each functional module (such as data management module 401 and data read module 402) is respectively managed The shared drive for managing oneself, after all functional modules all relieve old data file mapping, the just meeting of old data file It is recycled by operating system.In the processing system, data management module will be responsible for load, initialization and update write-in, these behaviour All be it is exclusive, need to be done by a process.Data read module then can be used as another process, for reading number According to.
When it is implemented, the above modules can be used as independent entity to realize, any combination can also be carried out, is made It is realized for same or several entities, for example, reference can be made to second embodiment, before the specific implementation of the above modules can be found in The embodiment of the method in face, details are not described herein.
It can be seen from the above, the processing system of data file provided in this embodiment, first by data file with shared drive Data structure is stored on disk, and the data file will be loaded into the shared drive by the way of memory mapping, and Record more new information, with realize to data file carry out loading processing, and then according to the data more new information to data file into Row updates and load, the data being used in conjunction in shared drive so as to process;The embodiment of the present invention is by by plaintext number on disk It is stored, and is loaded into shared drive with the data structure of shared drive according to the data file of structure, improved and add Carry efficiency;It supports load at one, many places to use, i.e., by shared drive, realizes that multiple processes are used with a shared drive number According to greatly reducing additional EMS memory occupation.
Fourth embodiment
The embodiment of the present invention also provides a kind of server, wherein can integrate the data documents disposal system of the embodiment of the present invention System, as shown in figure 5, it illustrates the structural schematic diagrams of server involved in the embodiment of the present invention, specifically:
The server may include one or processor 501, one or more meters of more than one processing core Memory 502, radio frequency (Radio Frequency, RF) circuit 503, power supply 504, input unit of calculation machine readable storage medium storing program for executing The components such as 505 and display unit 506.It will be understood by those skilled in the art that the not structure of server architecture shown in Fig. 5 The restriction of pairs of server may include perhaps combining certain components or different portions than illustrating more or fewer components Part arrangement.Wherein:
Processor 501 is the control centre of the server, utilizes each of various interfaces and the entire server of connection Part by running or execute the software program and/or module that are stored in memory 502, and calls and is stored in memory Data in 502, the various functions and processing data of execute server, to carry out integral monitoring to server.Optionally, Processor 501 may include one or more processing cores;Preferably, processor 501 can integrate application processor and modulation /demodulation Processor, wherein the main processing operation system of application processor, user interface and application program etc., modem processor master Handle wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 501.
Memory 502 can be used for storing software program and module, and processor 501 is stored in memory 502 by operation Software program and module, thereby executing various function application and data processing.Memory 502 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.;Storage data area, which can be stored, uses created data according to server Deng.In addition, memory 502 may include high-speed random access memory, it can also include nonvolatile memory, for example, at least One disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 502 can also include Memory Controller, to provide access of the processor 501 to memory 502.
During RF circuit 503 can be used for receiving and sending messages, signal is sended and received, and particularly, the downlink of base station is believed After breath receives, one or the processing of more than one processor 501 are transferred to;In addition, the data for being related to uplink are sent to base station.It is logical Often, RF circuit 503 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, user identity Module (SIM) card, transceiver, coupler, low-noise amplifier (LNA, Low Noise Amplifier), duplexer etc..This Outside, RF circuit 503 can also be communicated with network and other equipment by wireless communication.Any communication can be used in the wireless communication Standard or agreement, including but not limited to global system for mobile communications (GSM, Global System of Mobile Communication), general packet radio service (GPRS, General Packet Radio Service), CDMA (CDMA, Code Division Multiple Access), wideband code division multiple access (WCDMA, Wideband Code Division Multiple Access), long term evolution (LTE, Long Term Evolution), Email, short message clothes Be engaged in (SMS, Short Messaging Service) etc..
Server further includes the power supply 504 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply Management system and processor 501 are logically contiguous, to realize management charging, electric discharge and power consumption pipe by power-supply management system The functions such as reason.Power supply 504 can also include one or more direct current or AC power source, recharging system, power failure The random components such as detection circuit, power adapter or inverter, power supply status indicator.
The server may also include input unit 505, which can be used for receiving the number or character letter of input Breath, and generation keyboard related with user setting and function control, mouse, operating stick, optics or trackball signal are defeated Enter.
The server may also include display unit 506, the display unit 506 can be used for showing information input by user or Be supplied to the information of user and the various graphical user interface of server, these graphical user interface can by figure, text, Icon, video and any combination thereof are constituted.Display unit 508 may include display panel, optionally, can use liquid crystal display Device (LCD, Liquid Crystal Display), Organic Light Emitting Diode (OLED, Organic Light-Emitting ) etc. Diode forms configure display panel.
Specifically in the present embodiment, the processor 501 in server can be according to following instruction, by one or more The corresponding executable file of process of application program be loaded into memory 502, and run and be stored in by processor 501 Application program in reservoir 502, thus realize various functions, it is as follows:
The data file on disk is obtained, wherein the data file is stored with the data structure of shared drive;
The data file is loaded into the shared drive by the way of memory mapping, and the shared drive is carried out just Beginningization;
Based on the loading procedure record data more new information, the data more new packets include data file filename and First time information, the first time information are the time when loading procedure loads;
Processing is read out to data file according to the data more new information.
Preferably, which can be also used for: before obtaining the data file on disk,
Data file is stored in disk with the data structure of shared drive, the data structure of the shared drive includes number Group, hash table, even numbers group word lookup tree.
Preferably, which can be also used for, which is loaded into the shared drive, and total to this It enjoys after memory initialized:
Detection is updated to the data file on disk;It, will when determining there are when the data file of update in need The data file that the needs update copies under more new directory;Data file under this more new directory is loaded into shared drive In, and the shared drive is initialized;Data file under data directory is moved to backup directory;By the more new directory Under data file be moved under the data directory, wherein the more new directory, the data directory and the backup directory are same In file system.
Based on this, which can be also used for deleting the data file mapping before data file update.
Preferably, which can be also used for, after the data file mapping before deleting data file update:
Update the data more new information;According to prefixed time interval, the data more new information is read out;If it is determined that Data more new information middle finger is shown with the more new information of purpose data file, then is loaded data file by the way of memory mapping Into the shared drive, and time when load is recorded as the second temporal information.
Based on this, which be can be also used for: to the data after more new information is read out, if the number read It is later than the time in the second temporal information according to the time in first time information indicated in more new information, it is determined that data text Part has update;Data file updated under the data directory is mapped to the shared drive and is waited;Work as the waiting time Data file mapping when more than preset time threshold, before deleting data file update.
It can be seen from the above, first data file is deposited with the data structure of shared drive in server provided in this embodiment It is placed on disk, and the data file will be loaded into the shared drive by the way of memory mapping, and record more New information carries out loading processing to data file to realize, and then is carried out more according to the data more new information to data file New and load, the data being used in conjunction with so as to process in shared drive;The embodiment of the present invention is by by clear data knot on disk The data file of structure is stored with the data structure of shared drive, and is loaded into shared drive, and load effect is improved Rate;It supports load at one, many places to use, i.e., by shared drive, realizes that multiple processes are used with a shared drive data, greatly Additional EMS memory occupation is reduced greatly.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment Point, it may refer to the detailed description above with respect to data file processing method, details are not described herein again.
The data documents disposal system provided in an embodiment of the present invention is for example computer, tablet computer, has touching The mobile phone etc. of function is touched, the data documents disposal system belongs to same with the data file processing method in foregoing embodiments Design can run either offer in the data file processing method embodiment in the data documents disposal system Method, specific implementation process are detailed in the data file processing method embodiment, and details are not described herein again.
It should be noted that for data file processing method of the present invention, this field common test personnel can be with Understand all or part of the process for realizing data file processing method described in the embodiment of the present invention, is that can pass through computer program It is completed to control relevant hardware, the computer program can be stored in a computer-readable storage medium, such as store It is executed in the memory of terminal, and by least one processor in the terminal, in the process of implementation may include such as the number According to the process of the embodiment of document handling method.Wherein, the storage medium can for magnetic disk, CD, read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory) etc..
For the data documents disposal system of the embodiment of the present invention, each functional module be can integrate at one It manages in chip, is also possible to modules and physically exists alone, a module can also be integrated in two or more modules In.Above-mentioned integrated module both can take the form of hardware realization, can also be realized in the form of software function module.Institute If stating integrated module to realize in the form of software function module and when sold or used as an independent product, can also deposit In a computer readable storage medium, the storage medium is for example read-only memory, disk or CD etc. for storage.
It is provided for the embodiments of the invention a kind of data file processing method above and system is described in detail, this Apply that a specific example illustrates the principle and implementation of the invention in text, the explanation of above example is only intended to It facilitates the understanding of the method and its core concept of the invention;Meanwhile for those skilled in the art, according to the thought of the present invention, There will be changes in the specific implementation manner and application range, in conclusion the content of the present specification should not be construed as to this The limitation of invention.

Claims (7)

1. a kind of data file processing method characterized by comprising
The data file on disk is obtained, wherein the data file is stored with the data structure of shared drive;
The data file is loaded into the shared drive by the way of memory mapping, and the shared drive is carried out Initialization;Based on the loading procedure record data more new information, the data more new packets include the filename of data file with And first time information, the first time information are the time when loading procedure loads;
Processing is read out to data file according to the data more new information;
Detection is updated to the data file on disk;It, will be described when determining there are when the data file of update in need The data file for needing to update copies under more new directory;Data file under the more new directory is loaded into shared drive In, and the shared drive is initialized;Data file under data directory is moved to backup directory;By the update Data file under catalogue is moved under the data directory, wherein the more new directory, the data directory and described standby Part catalogue is in same file system;Data file mapping before deleting data file update;Update the data more new information;
According to prefixed time interval, the data more new information is read out;If it is determined that data more new information middle finger is shown with Data file is then loaded into the shared drive by the way of memory mapping by the more new information of purpose data file, and Time when record load is the second temporal information.
2. data file processing method according to claim 1, which is characterized in that the data file obtained on disk Before, further includes:
Data file is stored in disk with the data structure of shared drive, the data structure of the shared drive includes number Group, hash table, even numbers group word lookup tree.
3. data file processing method according to claim 1, which is characterized in that it is described to the data more new information into After row is read, further includes:
If the time in the data more new information read in indicated first time information be later than in the second temporal information when Between, it is determined that data file has update;
Data file updated under the data directory is mapped to the shared drive and is waited;
Data file mapping when the waiting time being more than preset time threshold, before deleting data file update.
4. a kind of data documents disposal system characterized by comprising
Data management module, for obtaining the data file on disk, wherein the data file is with the data knot of shared drive Structure is stored;The data file is loaded into the shared drive by the way of memory mapping, and to described shared Memory is initialized;Data more new information is recorded based on the loading procedure, the data more new packets include data file Filename and first time information, the first time information are the time when loading procedure loads;
Data read module, for being read out processing to data file according to the data more new information;
Detection module is updated, for being updated detection to the data file on disk;When determining that there are updates in need When data file, the data file updated is needed to copy under more new directory by described;
The data management module is also used to for the data file under the more new directory being loaded into shared drive, and to institute Shared drive is stated to be initialized;Data file under data directory is moved to backup directory;It will be under the more new directory Data file is moved under the data directory, wherein the more new directory, the data directory and the backup directory exist In same file system;Data file mapping before deleting data file update, updates the data more new information;
The data read module is also used to be read out the data more new information according to prefixed time interval;If it is determined that Data more new information middle finger is shown with the more new information of purpose data file out, then is added data file by the way of memory mapping It is downloaded in the shared drive, and records time when load as the second temporal information.
5. data documents disposal system according to claim 4, which is characterized in that the data management module is obtaining Before data file on disk, it is also used to that data file is stored in disk with the data structure of shared drive, it is described total The data structure for enjoying memory includes array, hash table, even numbers group word lookup tree.
6. data documents disposal system according to claim 4, which is characterized in that the data read module, to institute After stating data more new information being read out, if in first time information indicated in the data more new information for being also used to read Time be later than the time in the second temporal information, it is determined that data file has update;It will be updated under the data directory Data file is mapped to the shared drive and is waited;When the waiting time being more than preset time threshold, data text is deleted Data file before part updates maps.
7. a kind of computer readable storage medium, is stored with computer program, wherein the computer program can be by processor It executes to realize method as described in any one of claims 1 to 3.
CN201510454768.0A 2015-07-29 2015-07-29 A kind of data file processing method and system Active CN106708825B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510454768.0A CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510454768.0A CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Publications (2)

Publication Number Publication Date
CN106708825A CN106708825A (en) 2017-05-24
CN106708825B true CN106708825B (en) 2019-09-27

Family

ID=58894947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510454768.0A Active CN106708825B (en) 2015-07-29 2015-07-29 A kind of data file processing method and system

Country Status (1)

Country Link
CN (1) CN106708825B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908798A (en) * 2017-12-20 2018-04-13 浙江煮艺文化科技有限公司 The processing method and system of a kind of data file
CN108958732A (en) * 2018-06-28 2018-12-07 上海恺英网络科技有限公司 A kind of data load method and equipment based on PHP
CN109359005B (en) * 2018-09-14 2022-04-19 厦门天锐科技股份有限公司 Cross-process data acquisition and processing method
CN109542911B (en) * 2018-12-03 2021-10-29 郑州云海信息技术有限公司 Metadata organization method, system, equipment and computer readable storage medium
CN110716939B (en) * 2019-10-16 2023-05-09 深圳市网心科技有限公司 Data management method, electronic device, system and medium
CN111158611A (en) * 2020-03-26 2020-05-15 长春师范大学 New energy automobile controller memory management method
CN113806593A (en) * 2020-06-17 2021-12-17 新疆金风科技股份有限公司 Communication abnormity detection method and device for wind power plant and plant controller
CN111736973A (en) * 2020-06-24 2020-10-02 北京奇艺世纪科技有限公司 Service starting method, device, server and storage medium
CN113110944A (en) * 2021-03-31 2021-07-13 北京达佳互联信息技术有限公司 Information searching method, device, server, readable storage medium and program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101082928A (en) * 2007-06-25 2007-12-05 腾讯科技(深圳)有限公司 Method for accessing database and data-base mapping system
CN101296157A (en) * 2007-04-26 2008-10-29 北京师范大学珠海分校 Multi-network card coordination communication method
CN101551808A (en) * 2009-05-13 2009-10-07 山东中创软件商用中间件股份有限公司 Technology supporting multi-process embedded tree-based databases
CN101986649A (en) * 2010-11-29 2011-03-16 深圳天源迪科信息技术股份有限公司 Shared data center used in telecommunication industry billing system
CN102890679A (en) * 2011-07-20 2013-01-23 中兴通讯股份有限公司 Method and system for processing data version

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101296157A (en) * 2007-04-26 2008-10-29 北京师范大学珠海分校 Multi-network card coordination communication method
CN101082928A (en) * 2007-06-25 2007-12-05 腾讯科技(深圳)有限公司 Method for accessing database and data-base mapping system
CN101551808A (en) * 2009-05-13 2009-10-07 山东中创软件商用中间件股份有限公司 Technology supporting multi-process embedded tree-based databases
CN101986649A (en) * 2010-11-29 2011-03-16 深圳天源迪科信息技术股份有限公司 Shared data center used in telecommunication industry billing system
CN102890679A (en) * 2011-07-20 2013-01-23 中兴通讯股份有限公司 Method and system for processing data version

Also Published As

Publication number Publication date
CN106708825A (en) 2017-05-24

Similar Documents

Publication Publication Date Title
CN106708825B (en) A kind of data file processing method and system
US11281632B2 (en) Object information processing method and apparatus, and storage medium
US8745063B2 (en) Hashing with hardware-based reorder using duplicate values
KR102127522B1 (en) Computer program stored in computer readable medium, database server and audit performing server
US11093472B2 (en) Using an LSM tree file structure for the on-disk format of an object storage platform
US11176099B2 (en) Lockless synchronization of LSM tree metadata in a distributed system
WO2015078370A1 (en) Method, device, node and system for managing file in distributed data warehouse
US10310904B2 (en) Distributed technique for allocating long-lived jobs among worker processes
CN102984357B (en) Contact person information managing method and managing device
CN110737682A (en) cache operation method, device, storage medium and electronic equipment
US9928178B1 (en) Memory-efficient management of computer network resources
CN109710185A (en) Data processing method and device
CN111177143B (en) Key value data storage method and device, storage medium and electronic equipment
CN101983376A (en) Access device, information recording device, information recording system, file management method, and program
US8296270B2 (en) Adaptive logging apparatus and method
US20160139980A1 (en) Erasure-coding extents in an append-only storage system
CN112363871A (en) Data file returning method, device and storage medium
US10210067B1 (en) Space accounting in presence of data storage pre-mapper
CN109597707A (en) Clone volume data copying method, device and computer readable storage medium
CN107408055B (en) Code cache system
US20220342888A1 (en) Object tagging
EP3343395A1 (en) Data storage method and apparatus for mobile terminal
CN115705151A (en) System, method and apparatus for managing device local memory
KR102214697B1 (en) A computer program for providing space managrment for data storage in a database management system
US9967310B2 (en) Using an RPC framework to facilitate out-of-band data transfers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant