CN104573082A - Space small file data distribution storage method and system based on access log information - Google Patents

Space small file data distribution storage method and system based on access log information Download PDF

Info

Publication number
CN104573082A
CN104573082A CN201510042456.9A CN201510042456A CN104573082A CN 104573082 A CN104573082 A CN 104573082A CN 201510042456 A CN201510042456 A CN 201510042456A CN 104573082 A CN104573082 A CN 104573082A
Authority
CN
China
Prior art keywords
small documents
space small
access
documents data
frequent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510042456.9A
Other languages
Chinese (zh)
Other versions
CN104573082B (en
Inventor
潘少明
徐正全
种衍文
李红
李明
汤戈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201510042456.9A priority Critical patent/CN104573082B/en
Publication of CN104573082A publication Critical patent/CN104573082A/en
Application granted granted Critical
Publication of CN104573082B publication Critical patent/CN104573082B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a space small file data distribution storage method and system based on access log information. The method includes: dividing a space small file data set into a frequently-accessed sub-set and a non-frequently-accessed sub-set, extracting the access sequence of the frequently-accessed space small file sub-set, calculating the association degree of each frequently-accessed space small file datum, and using the values of the association degrees to form an association matrix; performing magnitude conversion on each value in the association matrix, using an RCM sorting algorithm to rearrange the values, then outputting the values, using a local approximation search method to search for the optimal combination of the rearranged association matrix, using the optimal combination to perform distributed storage on the frequently-accessed space small data, and separately storing the non-frequently-accessed space small file data according to space position neighboring relations.

Description

Based on the space small documents distributed data storage method and system of access log information
Technical field
The invention belongs to the distributed store technical field of space small documents data, particularly relate to a kind of space small documents distributed data storage method and system based on access log information newly.
Background technology
The storage of mass spatial information and fast access are the major issues that spatial Information Service system contemplates solves always, the data volume of conventional spatial Information Service system as collection every day of NASA Systeme pour l'Observation de la Terre reaches 2TB, store to obtain parallel fast access becoming crucial to the reasonable layout of these data, the solution that wherein a class is important is by carrying out distributed store to data to realize improving data access efficiency to the concurrent access of data.
More typical distributed file storage system mainly comprises as GFS (Google file system), HDFS (Hadoop distributed file system) and Lustre etc. at present.But the improvement of these systems in memory property is mainly reflected in the stores processor to large files.As GFS, its storage policy mainly, large files is divided into the block (as 64MB) of regular length, then all blocks are stored in respectively on different storeies to improve concurrent access rate (the list of references Ghemawat S of data, Gobioff H, Shun-Tak L.The Google file system.In:Proceedings of the Nineteenth ACMSymposium on Operating Systems Principles (SOSP ' 03) .Bolton Landing, New York:IEEE, 2003.1 – 15).Another kind of typical memory technology, as RAID (Redundant Array of Independent Disks), is also be stored in different disks after each large data file is divided into several data block respectively to improve the concurrent access to this file.
Although above distributed store method is effective to large file, but for small documents data, due to cannot piecemeal be proceeded, the method applicability stored by piecemeal is not enough, method general is at present simple being stored in by Single document on single storage server, thus be difficult to realize the concurrent access to multiple small documents data, I/O efficiency is not high.
Research shows, all there are a large amount of small documents data in current most of system, as having the file of 99% to be less than 64M in 1,300 ten thousand files at American National energy research scientific algorithm center, the file being less than 64K has accounted for 44% (list of references Carns P especially, Lang S, Ross R, et al..Small-file access in parallel file systems [C] .Parallel & DistributedProcessing, 2009.IPDPS 2009.IEEE International Symposium On.IEEE, 2009:1-11).
In fact, based on the spatial Information Service system of pyramid model, if Google Earth, World Wind etc. are the form storage space data with small documents equally.The earth is divided into the tile data of different resolution by World Wind according to pyramid model, each tile data saves as a file, the size of each tile data is fixed as 512 × 512 pixels, each tile file size is no more than 1MB (list of references Boschetti L, Roy D P, Justice C O.Using NASA ' s World Windvirtual globe for interactive internet visualization of the global MODIS burned area product.Int JRemote Sens, 2008, 29 (11): 3067 – 3072), Google Earth adopts multi-resolution models storage space data equally, the size of each data file is also no more than 64MB (list of references Sample J T, Loup E.Tile-base geospatialinformation system:principle and practices.New York:Springer, 2010.23 – 200).
In a word, distributed store method at present for large file is difficult to the storage being applied to small documents data, and Access Optimization (the non-memory optimization of data is concentrated on for the optimization of small documents data, Access Optimization curstomer-oriented end, and the service-oriented end of storage optimization), as reduced execution time (the list of references J.Kim of data-intensive applications program, A.Chandra, and J.B.Weissma.Using Data Accessibility for Resource Selection in Large-Scale Distributed Systems.IEEE Trans.Parallel Distributed Systems, vol.20, no.6, pp.788-801, June 2009), or reduce expense (the list of references A.L.Chervenak of small documents index information, R.Schuler, M.Ripeanu, M.A.Amer, S.Bharathi, I.Foster, A.Iamnitchi, and C.Kesselman.The Globus Replica Location Service:Design andExperience.IEEE Trans.Parallel Distributed Systems, vol.20, no.9, pp.1260-1272, Sept.2009) etc.But in a distributed system, the performance of access delay time is not only relevant with access method, and relevant with the distributed store pattern of data.Therefore the optimization problem of small documents data is solved not yet at all.
Summary of the invention
For above problem, the invention provides a kind of space small documents distributed data storage method and system based on access log information, utilize the access log information of space small documents data, analyze the mutual relationship between the small documents data of each space, and accordingly distributed store is carried out to space small documents data, to improve the concurrent access rate to space small documents data.
A kind of space small documents distributed data storage method and system based on access log information of the present invention, the technical scheme adopted is:
Based on a space small documents distributed data storage method for access log information, to any one space small documents data type, execution comprises the following steps:
Step 1, by space small documents data set, is divided into the subset of the non-frequent access of subset sums of frequent access according to access frequency difference; Comprise following sub-step,
Step 1.1, obtains each space small documents data access temperature, realizes as follows,
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N;
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is a=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N], access sequence number t=1,2 ..., M, wherein M in F the access total degree of small documents data of having living space;
Add up each space small documents data f ithe number of times λ occurred in access log sequence R i, with λ ifor this space small documents data f iaccess temperature;
Step 1.2, extracts the space small documents data be accessed frequently, realizes as follows according to space small documents data access temperature,
Discriminant parameter λ is preset in input,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise f ibelong to the space small documents data of non-frequent access;
Step 1.3, according to the subset of the space small documents data Special composition small documents data set that step 1.2 gained is frequently accessed, realizes as follows,
If the space small documents subset that data are formed of all frequent access is wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1];
Step 2, extracts the access sequence of the space small documents data subset of frequent access, comprises and form access sequence according to time order and function order from access log information R 1 = { f a 1 1 , f a 2 1 , . . . , f a M 1 } , A 1 = ( a 1 1 , a 2 1 , . . . , a M 1 ) For frequent addressing space small documents data access sequence vector, access sequence number t 1=(1 1, 2 1..., M 1), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data;
Step 3, utilizes the access sequence segmentation of the space small documents data subset of frequent access to calculate the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; Comprise following sub-step,
Step 3.1, according to storage server quantity m, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n=N 1/ m;
Step 3.2, carries out segmentation according to access sequence section length to frequent access sequence, realizes as follows,
According to access order, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n; By A 1in all subvector set be designated as S, S={S k: k ∈ [1, l] };
Step 3.3, calculates the space small documents data degree of association numerical value each other of frequent access, realizes as follows,
Defined function
Wherein for S kin all elements composition set; Function represent the space small documents data of frequent access within the access cycle that length is n with whether there is relevance;
Defined function R s(i 1, j 1),
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Wherein R s(i 1, j 1) represent S couple with total correlation degree;
Step 3.4, by the space small documents data degree of association numerical value composition incidence matrix R each other frequently accessed s,
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Step 4, exports after utilizing RCM sort algorithm to reset to element numerical value each in incidence matrix after carrying out size conversion;
Step 5, utilizes partial approximation search procedure to circulate successively to find after m best of breed to the incidence matrix after resetting and exports, and method is as follows,
Step 6, utilizes the space small documents data of step 5 gained best of breed to frequent access to carry out distributed store, and separately stores according to locus neighbouring relations the space small documents data of non-frequent access.
And step 4 comprises following sub-step,
Step 4.1, obtains element maximal value in incidence matrix, comprises traversal incidence matrix all elements value, and obtains maximal value R max;
Step 4.2, carries out size conversion to incidence matrix element numerical value, comprises traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1);
Step 4.3, utilizes standard RCM sort algorithm to reset incidence matrix.
And step 5 comprises following sub-step,
Step 5.1, initialization current iteration number of times d=1;
Step 5.2, adopts partial approximation search procedure to find a best of breed, is included in the block finding a n × n in current matrix, and make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed; First time, when performing step 5.2, current matrix was the incidence matrix after step 4 gained is reset; During follow-up execution step 5.2, current matrix is the matrix of a front iteration gained;
Step 5.3, after the search of current iteration execution step 5.2 obtains a best of breed be made up of n file, deletes the incidence matrix element of n file corresponding in incidence matrix, obtains (N 1-dn) × (N 1-dn) matrix;
Step 5.4, judges whether d=m-1, otherwise makes d=d+1, performs step 5.3 gained (N with current iteration 1-dn) × (N 1-dn) matrix be current matrix, return step 5.2 and carry out next iteration and continue the next combination recently of search, be stop search, obtain m best of breed altogether.
The present invention is also corresponding provides a kind of space small documents distributed data storage system based on access log information, comprises with lower unit,
Space small documents data set pretreatment unit (100), for the space small documents data set by any one space small documents data type, is divided into the subset of the non-frequent access of subset sums of frequent access according to access frequency difference; Comprising with lower module, space small documents data access frequency statistical module (101), for obtaining each space small documents data access temperature, realizing as follows,
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N;
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is a=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N], access sequence number t=1,2 ..., M, wherein M in F the access total degree of small documents data of having living space;
Add up each space small documents data f ithe number of times λ occurred in access log sequence R i, with λ ifor this space small documents data f iaccess temperature;
Frequent addressing space small documents data set extraction module (102), for extracting the space small documents data be accessed frequently according to space small documents data access temperature, realizes as follows,
Discriminant parameter λ is preset in input,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise f ibelong to the space small documents data of non-frequent access;
Frequent addressing space small documents subset builds module (103), for the subset of space small documents data Special composition small documents data set of frequently accessing according to frequent addressing space small documents data set extraction module (102) gained, realize as follows
If the space small documents subset that data are formed of all frequent access is wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1];
Space small documents data access vector acquiring unit (200), for extracting the access sequence of the space small documents data subset of frequent access from access log information, comprising and forming access sequence according to time order and function order for frequent addressing space small documents data access sequence vector, access sequence number t 1=(1 1, 2 1..., M 1), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data;
Space small documents data access incidence matrix computing unit (300), access sequence segmentation for the space small documents data subset utilizing frequent access calculates the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; Comprise with lower module, frequent access sequence section length computing module (301), for according to storage server quantity m, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n=N 1/ m;
Storage server number parameter m is inputted by outside.
Frequent access sequence segmentation module (302), for carrying out segmentation according to access sequence section length to frequent access sequence, realizes as follows,
According to access order, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n; By A 1in all subvector set be designated as S, S={S k: k ∈ [1, l] };
Space small documents data correlation degree computing module (303), for calculating the space small documents data degree of association numerical value each other of frequent access, realizes as follows,
Defined function
Wherein for S kin all elements composition set; Function represent the space small documents data of frequent access within the access cycle that length is n with whether there is relevance;
Defined function R s(i 1, j 1),
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Wherein R s(i 1, j 1) represent S couple with total correlation degree;
Space small documents data correlation matrix generation module (304), for the space small documents data degree of association numerical value composition incidence matrix R each other that will frequently access s,
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Incidence matrix conversion rearrangement units (400), utilizes RCM sort algorithm to reset rear output after carrying out size conversion to element numerical value each in incidence matrix;
Incidence matrix best of breed search unit (500), for utilizing partial approximation search procedure to find best of breed to the incidence matrix after rearrangement;
Space small documents distributed data storage unit (600), for utilizing the space small documents data of incidence matrix best of breed search unit (500) gained best of breed to frequent access to carry out distributed store, and the space small documents data of non-frequent access are separately stored according to locus neighbouring relations.
And incidence matrix conversion rearrangement units (400) comprises with lower module,
Incidence matrix element maximal value acquisition module (401), for obtaining element maximal value in incidence matrix, comprising traversal incidence matrix all elements value, and obtaining maximal value R max;
Incidence matrix element value size modular converter (402), for carrying out size conversion to incidence matrix element numerical value, comprises traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1);
Incidence matrix reordering module (403), resets incidence matrix for utilizing standard RCM sort algorithm.
And incidence matrix best of breed search unit (500) comprises with lower module,
Initialization module, for initialization current iteration number of times d=1;
Best of breed search module, a best of breed is found for adopting partial approximation search procedure, be included in the block finding a n × n in current matrix, make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed; Best of breed search mould first time, when working, current matrix was the incidence matrix after incidence matrix conversion rearrangement units (400) gained is reset; During best of breed search mould follow-up work, current matrix is the matrix of a front iteration gained;
Matrix update module, for carry out at best of breed search module current iteration work search obtain a best of breed be made up of n file after, by the incidence matrix element of n file corresponding in incidence matrix deletion, obtain (N 1-dn) × (N 1-dn) matrix;
Judging output module, for judging whether d=m-1, otherwise making d=d+1, with matrix update module current iteration work gained (N 1-dn) × (N 1-dn) matrix be current matrix, order best of breed search module carries out next iteration work and continues the next combination recently of search, is stop search, obtains m best of breed altogether.
The beneficial effect that the present invention has is: space small documents data are due to enormous amount, but there is aggregation in user access activity, major part request concentrates on small part space small documents data, the present invention is by after the temperature classification that conducts interviews to space small documents data for this reason, its degree of association is each other calculated to the space small documents data separate access log information of frequent access, and best distribution storage assembled scheme is found by partial approximation search procedure after composition incidence matrix, and the different scheme distributed store of space small documents data acquisition to different temperature, under limited computational resource consumes, the Optimal Distribution realizing magnanimity space small documents data stores, reach and improve its concurrent access performance, improve the object of the service ability of space information system.Therefore, the present invention can reduce the coincidence during small documents data access of server internal space, thus space small documents data parallel rate of people logging in high between final acquisition server, improve space small documents data, services performance, and reduce calculating data volume, efficiency is higher, has good engineering practice, can be applicable to the technical field of distributed memory of space small documents data under large-scale distributed environment.
Accompanying drawing explanation
Fig. 1 is system architecture schematic diagram in the embodiment of the present invention.
Fig. 2 is space small documents data set pretreatment unit 100 structural representation in the embodiment of the present invention.
Fig. 3 is space small documents data access incidence matrix computing unit 300 structural representation in the embodiment of the present invention.
Fig. 4 is incidence matrix conversion rearrangement units 400 structural representation in the embodiment of the present invention.
Fig. 5 is method flow diagram in the embodiment of the present invention.
Embodiment
Under distributed environment, concurrent access by realizing the piecemeal distributed store of data it is difficult to the access of space small documents data, therefore the mutual relationship between each space small documents data of Water demand, to realize when conducting interviews to space small documents data, asked space small documents data are made to be stored in different storage servers as much as possible, with the parallel acquisition of the realization of maximum possible to space small documents data, thus improve the performance of spatial Information Service system.
Because space small documents data bulk is huge, the storage Combinatorial Optimization computation complexity of large-scale space small documents data is high, search plain time overhead large, need to carry out temperature classification to space small documents data for this reason, and adopt diverse ways to obtain best storage assembled scheme respectively according to different temperature.
Below the concrete enforcement of technical solution of the present invention is provided and advise explanation in detail.
Space of the present invention small documents data, comprise Spatial data types and spatial coordinate location, and each space small documents data are less, are unsuitable for being continued be divided into many parts and store respectively on a different server to improve its concurrent access efficiency.Described access log information is that the log information of each client application addressing space small documents data, comprises accessed space small documents data type and coordinate by the spatial Information Service system of correspondence according to sequence of event.Described access log information is by spatial Information Service system record in operational process, and form includes but not limited to file, database.
Described space small documents packet, containing dissimilar, includes but not limited to SRTM30 (the 30m of global ShuttleRadar Topography Mission terrain data files), SRTM90.
Described a kind of space small documents distributed data storage method and system based on asking log information, the space small documents data for every type process respectively, and described method and system is identical to dissimilar space small documents data handling procedure.
As shown in Figure 5, the technical scheme that method of the present invention adopts is: a kind of space small documents distributed data storage method and system based on access log information, and to any one space small documents data type, execution comprises the following steps:
(1) frequent addressing space small documents data subset extracts: by space small documents data set, be divided into frequent access subset and non-frequent access subset according to access frequency difference; Comprise following sub-step,
1. each space small documents data access temperature is obtained.
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N.
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is corresponding title A=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N] (access sequence number t=1,2 ..., M), wherein M in F the access total degree of small documents data of having living space.
Add up each f i(f i∈ F) the number of times λ that occurs in access log sequence R i, then λ ifor this space small documents data f iaccess temperature.
2. the space small documents data be accessed frequently are extracted according to space small documents data access temperature.
Input the default discriminant parameter λ of frequent addressing space small documents data,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise, f ibelong to the space small documents data of non-frequent access.
3. according to the subset of 2. obtained frequent addressing space small documents data Special composition small documents data set F
If the subset of setting the space small documents data of all frequent access to form as wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1].
Can set equally the space small documents data set of non-frequent access as wherein N 2for the total number of space small documents data of non-frequent access.Wherein N 1+ N 2=N.
(2) frequent addressing space small documents data subset access sequence extracts: the access sequence extracting the space small documents data subset of frequent access from access log information;
Access log information have recorded the coordinate of spatial data, and different coordinates represents different data.Therefore can extract the coordinate information of the space small documents data of accessing according to access time sequencing from access log information.During concrete enforcement, specifying information extracting mode can be determined according to the record format of access log information.Coordinate information is the space latitude and longitude coordinates of space small documents data.
Extract access sequence subset according to frequent addressing space small documents data subset, realize as follows,
To small documents data in space in access log information according to access time sequencing, get the space small documents data of wherein frequent access, form the access sequence of the space small documents data subset of frequent access corresponding title for frequent addressing space small documents data access sequence vector, (access sequence number t 1=(1 1, 2 1..., M 1)), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data.
(3) calculation of relationship degree and incidence matrix obtain: utilize the access sequence segmentation of the space small documents data subset of frequently accessing to calculate the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; Comprise following sub-step,
1. according to storage server quantity, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n.
Storage server quantity m can be inputted by outside, such as, inputted by CONFIG.SYS.
By formula n=N 1/ m calculates frequent access sequence section length n.
2. according to access sequence section length, segmentation is carried out to frequent access sequence.
According to the access order of frequent addressing space small documents data, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as: A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n is A 1in length be the subvector of n.By A 1middle all length is that the access vector set of n is designated as S, i.e. A 1in the S set={ S of all subvectors k: k ∈ [1, l] }.
3. the space small documents data degree of association numerical value each other of frequent access is calculated
First small documents data interrelated degree in space in each segmentation is calculated, right defined function:
Wherein for S kin all elements composition set. the meaning of function is, the space small documents data of frequent access within the access cycle that length is n with whether there is relevance.
On this basis, defined function:
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1 - - - ( 2 )
Then R s(i 1, j 1) represent S couple with total correlation degree.
The space small documents data degree of association numerical value composition incidence matrix each other of 4. will frequently access.
By all N 1the space small documents data degree of association matrix representation each other of individual frequent access, can obtain following incidence matrix R s.
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1 - - - ( 3 )
(4) incidence matrix conversion and rearrangement export: export after utilizing RCM sort algorithm to reset after carrying out size conversion to element numerical value each in incidence matrix; Comprise following sub-step,
1. element maximal value in incidence matrix is obtained.
Traversal incidence matrix all elements value, and obtain maximal value R max.
2. size conversion is carried out to incidence matrix element numerical value.
Traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1), incidence matrix element value size is changed.
3. standard RCM sort algorithm is utilized to reset incidence matrix.
Employing standard RCM sort algorithm is reset incidence matrix, and target is concentrated near diagonal line by nonzero element in incidence matrix.New matrix after resetting is designated as P.Standard RCM sort algorithm is prior art, can list of references Gibbs N E during concrete enforcement, Poole W G, Stockmeyer P K.An algorithm for reducing the bandwidth and profile of asparse matrix.SIAM Journal on Numerical Analysis, 1976,13 (2): 236-250.
(5) search of optimal storage distributed combination exports.
Incidence matrix after resetting (4) gained utilizes partial approximation search procedure to find best of breed to obtain the highest concurrent access rate to these subset space small documents data.Partial approximation search procedure is prior art, can list of references XIA Kai during concrete enforcement, Wen-zhan.Adaptive Genetic Algorithm Based on Local Search Mechanism Quickly Solving TSP.Journal of Zhejiang Institute of Science and Technology, 2014,31 (3).
Incidence matrix after resetting according to (4) gained, iteration uses partial approximation search procedure, often perform partial approximation search and once obtain the best of breed that comprises n file, finally can obtain m combination, be stored in respectively so that follow-up on m storage server.Each combination is made up of n file, n file association angle value is each other the corresponding in a matrix block of a n × n; Be implemented as follows:
1. initialization current iteration number of times d=1;
2. adopt partial approximation search procedure to find a best of breed, be included in the block finding a n × n in current matrix, make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed;
First time, when performing 2., current matrix was the incidence matrix after (4) gained is reset, and matrix size is N 1× N 1; Follow-up execution 2. time, current matrix is a front iteration gained (N 1-(d-1) n) × (N 1-(d-1) matrix n);
3., after 2. search obtains a best of breed be made up of n file in current iteration execution, the incidence matrix element of n file corresponding in incidence matrix is deleted, obtains (N 1-dn) × (N 1-dn) matrix, reduce continue search incidence matrix size, can search time be saved;
4. judge whether d=m-1, otherwise make d=d+1, perform 3. gained (N with current iteration 1-dn) × (N 1-dn) matrix (after d=d+1 i.e. (N 1-(d-1) n) × (N 1-(d-1) is n)) based on as current matrix, return and 2. carry out the next combination recently of next iteration continuation search, stop search, current matrix is n × n, directly can obtain last best of breed be made up of n file, the best of breed that obtains for m-1 time of cyclic search together successively, obtains m best of breed altogether.
(6) space small documents distributed data storage: the space small documents data of the best of breed utilizing (5) finally to obtain to frequent access carry out distributed store, and separately store according to its locus neighbouring relations the space small documents data of non-frequent access.
Embodiment carries out distributed store to obtain the highest concurrent access rate of these space small documents data according to the space small documents data of the best of breed obtained to frequent access.
The best distribution obtained by step (5) stores the space small documents data of combination, there is the low feature of the degree of association each other (namely in incidence matrix after rearrangement, corresponding element value is large after the conversion of matrix element value size), then can the small documents data of having living space in a best of breed be stored in a server, obtain Concurrency Access low each other with this and require (namely achieving concurrent access rate high between different server).
According to the coordinate information of space small documents data, embodiment carries out separately storing according to its Space correlation to non-frequent addressing space small documents data.
For the F of step (1) 2, adjacent according to position, then stored in the principle of different server, the space small documents data of non-frequent access are stored in the server.
According to spatial data accessing feature, spatial data accessing has the continuity of space access road strength, and therefore, adjacent spatial data has higher probability by simultaneously accessed, therefore, is stored in different servers and can reduces concurrent, improves parallel rate.
During concrete enforcement, the discriminant parameter of described frequent addressing space small documents data, incidence matrix RCM sort algorithm parameter, storage server quantity can be inputted by outside or be preset by those skilled in the art.
See Fig. 1, the present invention is also corresponding provides a kind of space small documents distributed data storage system based on access log information, comprises with lower unit,
Space small documents data set pretreatment unit (100), for the space small documents data set by any one space small documents data type, is divided into the subset of the non-frequent access of subset sums of frequent access according to access frequency difference; See Fig. 2, comprising with lower module, space small documents data access frequency statistical module (101), for obtaining each space small documents data access temperature, realizing as follows,
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N;
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is a=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N], access sequence number t=1,2 ..., M, wherein M in F the access total degree of small documents data of having living space;
Add up each space small documents data f ithe number of times λ occurred in access log sequence R i, with λ ifor this space small documents data f iaccess temperature;
Frequent addressing space small documents data set extraction module (102), for extracting the space small documents data be accessed frequently according to space small documents data access temperature, realizes as follows,
Discriminant parameter λ is preset in input,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise f ibelong to the space small documents data of non-frequent access;
Frequent addressing space small documents subset builds module (103), for the subset of space small documents data Special composition small documents data set of frequently accessing according to frequent addressing space small documents data set extraction module (102) gained, realize as follows
If the space small documents subset that data are formed of all frequent access is wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1];
Space small documents data access vector acquiring unit (200), for extracting the access sequence of the space small documents data subset of frequent access from access log information, comprising and forming access sequence according to time order and function order for frequent addressing space small documents data access sequence vector, access sequence number t 1=(1 1, 2 1..., M 1), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data;
Space small documents data access incidence matrix computing unit (300), access sequence segmentation for the space small documents data subset utilizing frequent access calculates the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; See Fig. 3, comprise with lower module, frequent access sequence section length computing module (301), for according to storage server quantity m, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n=N 1/ m;
Frequent access sequence segmentation module (302), for carrying out segmentation according to access sequence section length to frequent access sequence, realizes as follows,
According to access order, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n; By A 1in all subvector set be designated as S, S={S k: k ∈ [1, l] };
Space small documents data correlation degree computing module (303), for calculating the space small documents data degree of association numerical value each other of frequent access, realizes as follows,
Defined function
Wherein for S kin all elements composition set; Function represent the space small documents data of frequent access within the access cycle that length is n with whether there is relevance;
Defined function R s(i 1, j 1),
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Wherein R s(i 1, j 1) represent S couple with total correlation degree;
Space small documents data correlation matrix generation module (304), for the space small documents data degree of association numerical value composition incidence matrix R each other that will frequently access s,
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 , 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Incidence matrix conversion rearrangement units (400), utilizes RCM sort algorithm to reset rear output after carrying out size conversion to element numerical value each in incidence matrix;
Incidence matrix best of breed search unit (500), for utilizing partial approximation search procedure to find best of breed to the incidence matrix after rearrangement;
Space small documents distributed data storage unit (600), for utilizing the space small documents data of incidence matrix best of breed search unit (500) gained best of breed to frequent access to carry out distributed store, and the space small documents data of non-frequent access are separately stored according to locus neighbouring relations.
See Fig. 4, incidence matrix conversion rearrangement units (400) comprises further with lower module,
Incidence matrix element maximal value acquisition module (401), for obtaining element maximal value in incidence matrix, comprising traversal incidence matrix all elements value, and obtaining maximal value R max;
Incidence matrix element value size modular converter (402), for carrying out size conversion to incidence matrix element numerical value, comprises traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1);
Incidence matrix reordering module (403), resets incidence matrix for utilizing standard RCM sort algorithm.
Incidence matrix best of breed search unit (500) comprises with lower module,
Initialization module, for initialization current iteration number of times d=1;
Best of breed search module, a best of breed is found for adopting partial approximation search procedure, be included in the block finding a n × n in current matrix, make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed; Best of breed search mould first time, when working, current matrix was the incidence matrix after incidence matrix conversion rearrangement units (400) gained is reset; During best of breed search mould follow-up work, current matrix is the matrix of a front iteration gained;
Matrix update module, for carry out at best of breed search module current iteration work search obtain a best of breed be made up of n file after, by the incidence matrix element of n file corresponding in incidence matrix deletion, obtain (N 1-dn) × (N 1-dn) matrix;
Judging output module, for judging whether d=m-1, otherwise making d=d+1, with matrix update module current iteration work gained (N 1-dn) × (N 1-dn) matrix be current matrix, order best of breed search module carries out next iteration work and continues the next combination recently of search, is stop search, obtains m best of breed altogether.
Each module specific implementation can be consistent with method concrete steps, and it will not go into details in the present invention.
Specific embodiment described herein is only to the explanation for example of the present invention's spirit.Those skilled in the art can make various amendment or supplement or adopt similar mode to substitute to described specific embodiment, but can't depart from spirit of the present invention or surmount the scope that appended claims defines.

Claims (6)

1. based on a space small documents distributed data storage method for access log information, it is characterized in that: to any one space small documents data type, execution comprises the following steps:
Step 1, by space small documents data set, is divided into the subset of the non-frequent access of subset sums of frequent access according to access frequency difference; Comprise following sub-step,
Step 1.1, obtains each space small documents data access temperature, realizes as follows,
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N;
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is a=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N], access sequence number t=1,2 ..., M, wherein M in F the access total degree of small documents data of having living space;
Add up each space small documents data f ithe number of times λ occurred in access log sequence R i, with λ ifor this space small documents data f iaccess temperature;
Step 1.2, extracts the space small documents data be accessed frequently, realizes as follows according to space small documents data access temperature,
Discriminant parameter λ is preset in input,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise f ibelong to the space small documents data of non-frequent access;
Step 1.3, according to the subset of the space small documents data Special composition small documents data set that step 1.2 gained is frequently accessed, realizes as follows,
If the space small documents subset that data are formed of all frequent access is wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1];
Step 2, extracts the access sequence of the space small documents data subset of frequent access, comprises and form access sequence according to time order and function order from access log information for frequent addressing space small documents data access sequence vector, access sequence number t 1=(1 1, 2 1..., M 1), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data;
Step 3, utilizes the access sequence segmentation of the space small documents data subset of frequent access to calculate the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; Comprise following sub-step,
Step 3.1, according to storage server quantity m, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n=N 1/ m;
Step 3.2, carries out segmentation according to access sequence section length to frequent access sequence, realizes as follows,
According to access order, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n; By A 1in all subvector set be designated as S, S={S k: k ∈ [1, l] };
Step 3.3, calculates the space small documents data degree of association numerical value each other of frequent access, realizes as follows,
Defined function
Wherein for S kin all elements composition set; Function represent the space small documents data of frequent access within the access cycle that length is n with whether there is relevance;
Defined function R s(i 1, j 1),
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Wherein R s(i 1, j 1) represent S couple with total correlation degree;
Step 3.4, by the space small documents data degree of association numerical value composition incidence matrix R each other frequently accessed s,
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Step 4, exports after utilizing RCM sort algorithm to reset to element numerical value each in incidence matrix after carrying out size conversion;
Step 5, utilizes partial approximation search procedure to find best of breed to the incidence matrix after resetting;
Step 6, utilizes the space small documents data of step 5 gained best of breed to frequent access to carry out distributed store, and separately stores according to locus neighbouring relations the space small documents data of non-frequent access.
2., according to claim 1 based on the space small documents distributed data storage method of access log information, it is characterized in that: step 4 comprises following sub-step,
Step 4.1, obtains element maximal value in incidence matrix, comprises traversal incidence matrix all elements value, and obtains maximal value R max;
Step 4.2, carries out size conversion to incidence matrix element numerical value, comprises traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1);
Step 4.3, utilizes standard RCM sort algorithm to reset incidence matrix.
3. according to claim 1 or 2 based on the space small documents distributed data storage method of access log information, it is characterized in that: step 5 comprises following sub-step,
Step 5.1, initialization current iteration number of times d=1;
Step 5.2, adopts partial approximation search procedure to find a best of breed, is included in the block finding a n × n in current matrix, and make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed; First time, when performing step 5.2, current matrix was the incidence matrix after step 4 gained is reset; During follow-up execution step 5.2, current matrix is the matrix of a front iteration gained;
Step 5.3, after the search of current iteration execution step 5.2 obtains a best of breed be made up of n file, deletes the incidence matrix element of n file corresponding in incidence matrix, obtains (N 1-dn) × (N 1-dn) matrix;
Step 5.4, judges whether d=m-1, otherwise makes d=d+1, performs step 5.3 gained (N with current iteration 1-dn) × (N 1-dn) matrix be current matrix, return step 5.2 and carry out next iteration and continue the next combination recently of search, be stop search, obtain m best of breed altogether.
4., based on a space small documents distributed data storage system for access log information, it is characterized in that: comprise with lower unit,
Space small documents data set pretreatment unit (100), for the space small documents data set by any one space small documents data type, is divided into the subset of the non-frequent access of subset sums of frequent access according to access frequency difference; Comprising with lower module, space small documents data access frequency statistical module (101), for obtaining each space small documents data access temperature, realizing as follows,
If space small documents data set is F={f 1, f 2..., f n, comprise space small documents data f 1, f 2..., f n, wherein N is total number of space small documents data, and i-th space small documents data markers is f i, i=1,2 ..., N;
Have accessed space small documents data successively if record in access log information the access log sequence of space small documents data is a=(a 1, a 2..., a m) be space small documents data access sequence vector, a t∈ [1, N], access sequence number t=1,2 ..., M, wherein M in F the access total degree of small documents data of having living space;
Add up each space small documents data f ithe number of times λ occurred in access log sequence R i, with λ ifor this space small documents data f iaccess temperature;
Frequent addressing space small documents data set extraction module (102), for extracting the space small documents data be accessed frequently according to space small documents data access temperature, realizes as follows,
Discriminant parameter λ is preset in input,
If space small documents data f in the small documents data set F of space iaccess temperature λ i> λ, then space small documents data f ifor the space small documents data of frequently accessing, otherwise f ibelong to the space small documents data of non-frequent access;
Frequent addressing space small documents subset builds module (103), for the subset of space small documents data Special composition small documents data set of frequently accessing according to frequent addressing space small documents data set extraction module (102) gained, realize as follows
If the space small documents subset that data are formed of all frequent access is wherein N 1for the total number of space small documents data of frequently accessing, i-th 1, j 1the space small documents data of individual frequent access are labeled as respectively with i 1, j 1∈ [1, N 1];
Space small documents data access vector acquiring unit (200), for extracting the access sequence of the space small documents data subset of frequent access from access log information, comprising and forming access sequence according to time order and function order for frequent addressing space small documents data access sequence vector, access sequence number t 1=(1 1, 2 1..., M 1), wherein M 1for to F 1in the access total degree of all frequent addressing space small documents data;
Space small documents data access incidence matrix computing unit (300), access sequence segmentation for the space small documents data subset utilizing frequent access calculates the degree of association of the space small documents data of each frequent access, and by the space small documents data of each frequent access degree of association numerical value composition incidence matrix each other; Comprise with lower module,
Frequent access sequence section length computing module (301), for according to storage server quantity m, frequent addressing space small documents data subset length N 1calculate frequent access sequence section length n=N 1/ m;
Frequent access sequence segmentation module (302), for carrying out segmentation according to access sequence section length to frequent access sequence, realizes as follows,
According to access order, by frequent addressing space small documents data access sequence vector A 1be divided into some subvectors with n element one group, be expressed as A 1=(S 1, S 2..., S l), wherein subvector S k=(a k1, a k2..., a kn), a kj∈ [1, N 1], 1≤k≤l, 1≤j≤n; By A 1in all subvector set be designated as S, S={S k: k ∈ [1, l] };
Space small documents data correlation degree computing module (303), for calculating the space small documents data degree of association numerical value each other of frequent access, realizes as follows,
Defined function
Wherein for S kin all elements composition set; Function represent the space small documents data of frequent access within the access cycle that length is n with whether there is relevance;
Defined function R s(i 1, j 1),
R S ( i 1 , j 1 ) = Σ k = 1 l R S k ( i 1 , j 1 ) 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Wherein R s(i 1, j 1) represent S couple with total correlation degree;
Space small documents data correlation matrix generation module (304), for the space small documents data degree of association numerical value composition incidence matrix R each other that will frequently access s,
R S = ( R S ( i 1 , j 1 ) ) N 1 × N 1 1 ≤ i 1 ≤ N 1 , 1 ≤ j 1 ≤ N 1
Incidence matrix conversion rearrangement units (400), utilizes RCM sort algorithm to reset rear output after carrying out size conversion to element numerical value each in incidence matrix;
Incidence matrix best of breed search unit (500), for utilizing partial approximation search procedure to find best of breed to the incidence matrix after rearrangement;
Space small documents distributed data storage unit (600), for utilizing the space small documents data of incidence matrix best of breed search unit (500) gained best of breed to frequent access to carry out distributed store, and the space small documents data of non-frequent access are separately stored according to locus neighbouring relations.
5. according to claim 4 based on the space small documents distributed data storage system of access log information, it is characterized in that: incidence matrix conversion rearrangement units (400) comprises with lower module,
Incidence matrix element maximal value acquisition module (401), for obtaining element maximal value in incidence matrix, comprising traversal incidence matrix all elements value, and obtaining maximal value R max;
Incidence matrix element value size modular converter (402), for carrying out size conversion to incidence matrix element numerical value, comprises traversal incidence matrix all elements value, and executable operations R s(i 1, j 1)=R max-R s(i 1, j 1);
Incidence matrix reordering module (403), resets incidence matrix for utilizing standard RCM sort algorithm.
6. according to claim 4 or 5 based on the space small documents distributed data storage system of access log information, it is characterized in that: incidence matrix best of breed search unit (500) comprises with lower module,
Initialization module, for initialization current iteration number of times d=1;
Best of breed search module, a best of breed is found for adopting partial approximation search procedure, be included in the block finding a n × n in current matrix, make matrix element value corresponding in n × n block in this matrix maximum, a corresponding n file forms a best of breed; Best of breed search mould first time, when working, current matrix was the incidence matrix after incidence matrix conversion rearrangement units (400) gained is reset; During best of breed search mould follow-up work, current matrix is the matrix of a front iteration gained;
Matrix update module, for carry out at best of breed search module current iteration work search obtain a best of breed be made up of n file after, by the incidence matrix element of n file corresponding in incidence matrix deletion, obtain (N 1-dn) × (N 1-dn) matrix;
Judging output module, for judging whether d=m-1, otherwise making d=d+1, with matrix update module current iteration work gained (N 1-dn) × (N 1-dn) matrix be current matrix, order best of breed search module carries out next iteration work and continues the next combination recently of search, is stop search, obtains m best of breed altogether.
CN201510042456.9A 2015-01-28 2015-01-28 Space small documents distributed data storage method and system based on access log information Expired - Fee Related CN104573082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510042456.9A CN104573082B (en) 2015-01-28 2015-01-28 Space small documents distributed data storage method and system based on access log information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510042456.9A CN104573082B (en) 2015-01-28 2015-01-28 Space small documents distributed data storage method and system based on access log information

Publications (2)

Publication Number Publication Date
CN104573082A true CN104573082A (en) 2015-04-29
CN104573082B CN104573082B (en) 2017-11-14

Family

ID=53089144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510042456.9A Expired - Fee Related CN104573082B (en) 2015-01-28 2015-01-28 Space small documents distributed data storage method and system based on access log information

Country Status (1)

Country Link
CN (1) CN104573082B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885463A (en) * 2017-11-10 2018-04-06 下代互联网重大应用技术(北京)工程研究中心有限公司 The processing method and processing device of file destination
CN109491594A (en) * 2018-09-28 2019-03-19 北京寄云鼎城科技有限公司 Optimize the method and apparatus of data space during matrix inversion
CN109542857A (en) * 2018-11-26 2019-03-29 杭州迪普科技股份有限公司 Audit log storage method, querying method, device and relevant device
CN111104381A (en) * 2019-11-30 2020-05-05 北京浪潮数据技术有限公司 Log management method, device and equipment and computer readable storage medium
CN111966950A (en) * 2020-10-21 2020-11-20 北京每日优鲜电子商务有限公司 Log sending method and device, electronic equipment and computer readable medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453688A (en) * 2007-12-04 2009-06-10 中兴通讯股份有限公司 Method for fast responding scene switching in mobile stream media service
US20130110915A1 (en) * 2008-07-24 2013-05-02 Alibaba Group Holding Limited Correlated information recommendation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101453688A (en) * 2007-12-04 2009-06-10 中兴通讯股份有限公司 Method for fast responding scene switching in mobile stream media service
US20130110915A1 (en) * 2008-07-24 2013-05-02 Alibaba Group Holding Limited Correlated information recommendation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨婧 等: "基于向量关系表的自动数据收集算法", 《计算机工程与应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885463A (en) * 2017-11-10 2018-04-06 下代互联网重大应用技术(北京)工程研究中心有限公司 The processing method and processing device of file destination
CN107885463B (en) * 2017-11-10 2021-08-31 下一代互联网重大应用技术(北京)工程研究中心有限公司 Target file processing method and device
CN109491594A (en) * 2018-09-28 2019-03-19 北京寄云鼎城科技有限公司 Optimize the method and apparatus of data space during matrix inversion
CN109491594B (en) * 2018-09-28 2021-12-03 北京寄云鼎城科技有限公司 Method and device for optimizing data storage space in matrix inversion process
CN109542857A (en) * 2018-11-26 2019-03-29 杭州迪普科技股份有限公司 Audit log storage method, querying method, device and relevant device
CN109542857B (en) * 2018-11-26 2021-06-29 杭州迪普科技股份有限公司 Audit log storage method, audit log query method, audit log storage device, audit log query device and related equipment
CN111104381A (en) * 2019-11-30 2020-05-05 北京浪潮数据技术有限公司 Log management method, device and equipment and computer readable storage medium
CN111966950A (en) * 2020-10-21 2020-11-20 北京每日优鲜电子商务有限公司 Log sending method and device, electronic equipment and computer readable medium
CN111966950B (en) * 2020-10-21 2021-01-15 北京每日优鲜电子商务有限公司 Log sending method and device, electronic equipment and computer readable medium

Also Published As

Publication number Publication date
CN104573082B (en) 2017-11-14

Similar Documents

Publication Publication Date Title
CN104573082A (en) Space small file data distribution storage method and system based on access log information
Chen et al. Distributed modeling in a MapReduce framework for data-driven traffic flow forecasting
Fu et al. An experimental evaluation of large scale GBDT systems
CN102214086A (en) General-purpose parallel acceleration algorithm based on multi-core processor
CN105373517A (en) Spark-based distributed matrix inversion parallel operation method
CN111104457A (en) Massive space-time data management method based on distributed database
Skluzacek et al. Klimatic: a virtual data lake for harvesting and distribution of geospatial data
Ceci et al. Big data techniques for supporting accurate predictions of energy production from renewable sources
CN106570145B (en) Distributed database result caching method based on hierarchical mapping
Gao et al. Network-scale traffic modeling and forecasting with graphical lasso
Aydin et al. Mining spatiotemporal co-occurrence patterns in non-relational databases
Zhang et al. Enabling in-situ data analysis for large protein-folding trajectory datasets
CN106547890B (en) Quick clustering preprocess method in large nuber of images characteristic vector
Madbouly et al. Clustering big data based on distributed fuzzy K-medoids: An application to geospatial informatics
Anusha et al. Big data techniques for efficient storage and processing of weather data
Li et al. Long-term traffic forecasting based on adaptive graph cross strided convolution network
Han et al. A parallel online trajectory compression approach for supporting big data workflow
Richly Optimized Spatio-Temporal Data Structures for Hybrid Transactional and Analytical Workloads on Columnar In-Memory Databases.
Lin et al. Performance evaluation of cluster algorithms for Big Data analysis on cloud
Ajay et al. A study for handelling of high-performance climate data using hadoop
Rammer et al. Small is beautiful: Distributed orchestration of spatial deep learning workloads
Rodriges Zalipynis Towards machine learning in distributed array DBMS: Networking considerations
Biookaghazadeh et al. Kaleido: Enabling efficient scientific data processing on big-data systems
Jitkajornwanich et al. Using mapreduce to speed up storm identification from big raw rainfall data
Radhika et al. Novel approach for spatiotemporal weather data analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171114

Termination date: 20190128

CF01 Termination of patent right due to non-payment of annual fee