CN106649890A - Data storage method and device - Google Patents
Data storage method and device Download PDFInfo
- Publication number
- CN106649890A CN106649890A CN201710066733.9A CN201710066733A CN106649890A CN 106649890 A CN106649890 A CN 106649890A CN 201710066733 A CN201710066733 A CN 201710066733A CN 106649890 A CN106649890 A CN 106649890A
- Authority
- CN
- China
- Prior art keywords
- data
- vector
- input
- classification model
- type
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/12—Accounting
- G06Q40/125—Finance or payroll
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Accounting & Taxation (AREA)
- Finance (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Development Economics (AREA)
- Technology Law (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data storage method and device. One embodiment of the data storage method includes: acquiring characteristic information of data to be stored, wherein the characteristic information includes at least one of a name of a data table item in a data table to which the data belongs, statistical characteristic information for indicting statistical characteristics of the data, and keywords; converting the characteristic information into an input vector of a data classification model to be input in the data classification model, and acquiring an output vector indicting the type of the data, wherein the data classification model is generated by performing training on a training sample in a monitoring manner, and the raining sample includes the characteristic information of the stored data, and the type of the marked storage data; and storing the data in a storage area corresponding to the type. The data storage method can save storage space, and can rapidly store data.
Description
Technical field
The application is related to field of computer technology, and in particular to Internet technical field, more particularly to date storage method
And device.
Background technology
Data storage is the collection to data, storage, retrieval, processing, conversion and transmits.In existing data storage,
Especially in finance, the data storage procedure in tax field, generally according to the needs of business, the good data of Manual definition first are special
Levy and the data type corresponding with data characteristics and stored, in order to follow-up financial accounting.
However, the existing finance, the data-storage system in tax field of being applied to lack to enter unstructured data first
Row analyzing and processing ability, secondly as there is larger difference between different financial accounting systems, according to different accounting systems
System, needs repeatedly to define data characteristics and matched rule to be stored, and while increasing data storage loaded down with trivial details and spend, takes
Substantial amounts of memory space, reduces the utilization ratio of data.
The content of the invention
The purpose of the application is to propose a kind of improved date storage method and device to solve background above technology department
Divide the technical problem mentioned.
In a first aspect, this application provides a kind of date storage method, said method includes:Obtain data to be stored
Characteristic information, features described above information include it is following at least one:The title of the data table items in tables of data belonging to above-mentioned data,
Indicate statistical nature information, the keyword of the statistical nature of above-mentioned data;Features described above information is converted into data classification model
Input vector be input to data classification model, the output vector of the type for obtaining indicating above-mentioned data, above-mentioned data are classified mould
Type is generated based on advancing with training sample with there is monitor mode to be trained, and above-mentioned training sample includes:Data storage
Features described above information, Jing mark above-mentioned data storage type;Above-mentioned data storage is deposited the above-mentioned type is corresponding
Storage area domain.
In certain embodiments, above-mentioned data classification model is decision-tree model.
In some optional implementations of the present embodiment, above-mentioned data are the data in tables of data, and features described above is believed
Breath includes:The title of the data table items in tables of data belonging to above-mentioned data, statistical nature information;And by features described above information
The input vector for being converted to data classification model is input to data classification model, the output of the type for obtaining indicating above-mentioned data to
Amount includes:The corresponding tables of data characteristic vector of characteristic information is generated, above-mentioned tables of data characteristic vector includes:Represent above-mentioned data institute
The component of the title of the data table items in the tables of data of category, the component for representing statistical nature information;Generate and include successively above-mentioned number
According to table characteristic vector and the first input vector of the data classification model of null vector;Above-mentioned first input vector is input into data
Disaggregated model, the output vector of the type for obtaining indicating above-mentioned data.
In certain embodiments, above-mentioned statistical nature information includes:Indicate incidence relation between above-mentioned data table items
Related information, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data, the minimum of the length of above-mentioned data
The type of the character in value, above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned data are text data, and features described above information is pass
Keyword;And the input vector that features described above information is converted to data classification model is input into data classification model, referred to
Showing the output vector of the type of above-mentioned data includes:The corresponding keyword feature vector of characteristic information is generated, wherein, keyword is special
Levy each keyword correspondence one-component in vector;Generate the data comprising null vector and above-mentioned keyword feature vector successively
Second input vector of disaggregated model;
In certain embodiments, above-mentioned second input vector is input into data classification model, obtains indicating above-mentioned data
Type output vector.
Second aspect, this application provides a kind of data storage device, said apparatus include:Acquiring unit, is configured to
Obtain the characteristic information of data to be stored, features described above information include it is following at least one:Tables of data belonging to above-mentioned data
In data table items title, indicate above-mentioned data statistical nature statistical nature information, keyword;Input block, configuration
Input vector for features described above information to be converted to data classification model is input to data classification model, obtains instruction above-mentioned
The output vector of the type of data, above-mentioned data classification model is based on and advances with training sample to have monitor mode to be trained
And generate, above-mentioned training sample includes:The features described above information of data storage, Jing mark above-mentioned data storage class
Type;Memory cell, is configured to above-mentioned data storage in the corresponding storage region of the above-mentioned type.
In certain embodiments, above-mentioned data classification model is decision-tree model.
In certain embodiments, above-mentioned data are the data in tables of data, and features described above information includes:Belonging to above-mentioned data
Tables of data in data table items title, statistical nature information, and above-mentioned input block includes:Tables of data characteristic vector is given birth to
Into subelement, it is configured to generate the corresponding tables of data characteristic vector of characteristic information, above-mentioned tables of data characteristic vector includes:Represent
The component of the title of the data table items in tables of data belonging to above-mentioned data, the component for representing statistical nature information;First input
Vector generates subelement, is configured to generate data classification model successively comprising above-mentioned tables of data characteristic vector and null vector
First input vector;Output vector generates subelement, is configured to for above-mentioned first input vector to be input to data classification model,
The output vector of the type for obtaining indicating above-mentioned data.
In certain embodiments, above-mentioned statistical nature information includes:Indicate incidence relation between above-mentioned data table items
Related information, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data, the minimum of the length of above-mentioned data
The type of the character in value, above-mentioned data.
In certain embodiments, above-mentioned data are text data, and features described above information is keyword, and above-mentioned input list
Unit includes:Keyword feature vector generates subelement, is configured to generate the corresponding keyword feature vector of characteristic information, its
In, each keyword correspondence one-component in keyword feature vector;Second input vector generates subelement, is configured to life
Into the second input vector of the data classification model for including null vector and above-mentioned keyword feature vector successively;Output vector is generated
Subelement, is configured to for above-mentioned second input vector to be input to data classification model, obtains indicating the type of above-mentioned data
Output vector.
Date storage method and device that the application is provided, by the characteristic information for obtaining data to be stored, then will
Characteristic information is converted into input vector and is input in the data classification model of Training, and will be defeated from data classification model
The data vector for going out is stored in storage region corresponding with data type, so as to effectively be divided data according to data type
Class, saves the memory space of data storage areas.
Description of the drawings
By reading the detailed description made to non-limiting example made with reference to the following drawings, the application other
Feature, objects and advantages will become more apparent upon:
Fig. 1 is that the application can apply to exemplary system architecture figure therein;
Fig. 2 is the flow chart of one embodiment of the date storage method according to the application;
Fig. 3 is the flow chart of another embodiment of the date storage method according to the application;
Fig. 4 is the structural representation of one embodiment of the data storage device according to the application;
Fig. 5 is adapted for the structural representation of the computer system of the server for realizing the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that, in order to
Be easy to description, illustrate only in accompanying drawing to about the related part of invention.
It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase
Mutually combination.Below with reference to the accompanying drawings and in conjunction with the embodiments describing the application in detail.
Fig. 1 shows the exemplary system of the embodiment of the date storage method or data storage device that can apply the application
System framework 100.
As shown in figure 1, system architecture 100 can include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 provide communication link medium.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted by network 104 with using terminal equipment 101,102,103 with server 105, to receive or send out
Send message etc..Various client applications, such as web browser applications, number can be installed on terminal device 101,102,103
According to accounting class application, financial statement class application, searching class application, JICQ, mailbox client, social platform software
Deng.
Terminal device 101,102,103 can be the various electronic equipments with display screen, including but not limited to intelligent hand
Machine, panel computer, E-book reader, MP3 player (Moving Picture Experts Group Audio Layer
III, dynamic image expert's compression standard audio frequency aspect 3), MP4 (Moving Picture Experts Group Audio
Layer IV, dynamic image expert's compression standard audio frequency aspect 4) player, pocket computer on knee and desktop computer etc.
Deng.
Server 105 can be to provide the server of various services, such as to operation on terminal device 101,102,103
Using the back-end data processing server that data are supported is provided, the server from gathered data in each data source is can also be.
Back-end data processing server can be analyzed process to the data got from data source, and result is deposited
Store up and feed back to terminal device.
It should be noted that the date storage method that the embodiment of the present application is provided typically is performed by server 105, accordingly
Ground, data storage device is generally positioned in server 105.
It should be understood that the number of the terminal device, network and server in Fig. 1 is only schematic.According to realizing need
Will, can have any number of terminal device, network and server.
With continued reference to Fig. 2, flow process Figure 200 of one embodiment of date storage method according to the application is shown.Institute
The date storage method stated, comprises the following steps:
Step 201, obtains the characteristic information of data to be stored.
In the present embodiment, date storage method operation electronic equipment (such as the server shown in Fig. 1) thereon can
To obtain the data source information of data to be stored by wired connection mode or radio connection, and according to data source information
Obtain data to be stored.Here, data source refers to the original media for providing desired data or the number supported by memory device
According to storehouse.Data source information refers to the information set up needed for database connection.When data to be stored are obtained according to data source information,
Data to be stored can be obtained from network, database or the application relevant with financial system.
When data to be stored are obtained from database, above-mentioned electronic equipment can be by the service for supporting database
Device provides correct DSN, finds corresponding database annexation, and then gets from corresponding data source and wait to deposit
The data of storage.
When data to be stored are obtained from the financial system of enterprise, data source information can include financial internal information
And external information, wherein internal information can include miscellaneous service processing data and all kinds of document datas, and external information can be with
Including all kinds of laws and regulations, market information etc..
In the present embodiment, after server gets data to be stored from data source, can further obtain and treat
The characteristic information of the data of storage, wherein, the characteristic information of data to be stored include it is following at least one:It is above-mentioned to be stored
The title of the data table items in tables of data belonging to data, indicate data statistical nature statistical nature information and key
Word.Here, above-mentioned tables of data can be arranged in above-mentioned database, for depositing above-mentioned data to be stored.Wherein, one
Tables of data can arrange a title, and the title can be for example department name, funds, employee etc..Above-mentioned statistical nature can be with
For the quantity of data, length of data etc..When above-mentioned data to be stored are text data, features described above information can be use
To indicate the keyword of text content.For example, when above-mentioned text data is " research funding of A departments ", above-mentioned keyword
Can be " A departments ", " research funding ".
In some optional implementations of the present embodiment, above-mentioned statistical nature information includes indicating above-mentioned data table items
Between incidence relation related information, the mean value of the length of data, the maximum of the length of data, the length of data most
The type of the character in little value, data.
As an example, server gets first data to be stored from multiple data sources.Then, server can enter one
Step gets the title of the data table items in the affiliated tables of data in database of data to be stored, and for example, one of them is treated
Entitled " department's wage " of the data table items in the affiliated tables of data in database of the data of storage, another is to be stored
Entitled " performance pay " of the data table items in the affiliated tables of data in database of data.Server can also obtain above-mentioned
The statistical nature information of data to be stored, for example, server can obtain the data length of " department's wage " this data
Mean value, it is also possible to obtain the minimum of a value and maximum of the data length of " performance pay " this data.
Step 202, by the input vector that characteristic information is converted to data classification model data classification model is input to, and is obtained
Indicate the output vector of the type of data.
In the present embodiment, according to the characteristic information of the data to be stored got in step 201, server can be with root
According to characteristic information build for represent data to be stored multiple features multi-C vector as data classification model input
Vector.The input vector include represent data table items title component, represent data statistical nature statistical nature component,
Represent the characteristic component of keyword.Then input vector is input in data classification model, it is to be stored so as to obtain instruction
The output vector of the type of data.Output vector can include type component, data to be stored and the number of each preset data
According to type between matching degree component.Mutually matching degree can be used between corresponding data to be stored and the type of data
Represent the power of its corresponding relation.Generally, matching degree is higher, and the probability of the type that data to be stored then belong to the data is got over
Greatly.
The type of data can include the character string for representing the title such as department name, document title of all kinds of things
Data type, can also include the data type for representing digital such as integer, floating-point, positive number, negative, can also include using
In the data type for representing date and time, the data type etc. for representing currency can also be included.
Data classification model can be used for the type of description data (such as the data in tables of data) to be stored and data
The corresponding relation of (such as numeral data type).Data classification model is by the characteristic information of data storage and
Data storage characteristic information matching Jing mark data storage type and the characteristic information of data storage with
Matching degree between the type of data storage is carried out in supervised learning mode as training sample by the method for machine learning
Training is formed.
Wherein, supervised learning mode can be carried out as follows:
First, using data storage as training sample, server obtains the characteristic information of the data for having stored.For example,
When the data for having stored are the data in database, due to there are multiple tables of data in database, server can be obtained
The title of the data table items of data storage, type of the character of data storage etc. can be obtained;When the data for having stored are text
During notebook data, server can obtain the keyword of data storage as characteristic information.
Then, it is the type label of the data of data storage setting, such as label can be numeral data class
Type, the data type for representing the date, data type of expression text etc..
Again, the data type label based on data storage and the characteristic information of data storage, set up and have stored number
According to data type and the matching degree between the characteristic information of data storage.Due to one data storage sample have extremely
A few characteristic information, and the type label of each one data of data storage sample standard deviation correspondence, server can basis
The algorithm of setting calculates the type of the data of data storage and the matching degree between the characteristic information of data storage.
Finally, using machine learning method, characteristic information based on data storage and the characteristic information of data storage
The type of the data storage of the Jing marks of matching and the characteristic information of data storage and between the type of data storage
Matching degree carry out data classification model training.
The method of above-mentioned machine learning can include the methods such as neutral net, genetic algorithm.
With " department name " this data instance to be stored, this step is illustrated." department name " this word exists
Name in different application scenarios is differed, and can be cried " department " in the system having, and may be cried in another system
" department ", and " depart " can be named as in another system, but their classification is " department name ".Cause
This, in a system, when data to be stored are that any of the above is a kind of, will can get and the above in step 201
The relevant characteristic information of title is converted to the input vector of data classification model and is input in data classification model and matched, and obtains
To the output vector of the type for indicating above-mentioned data to be stored, server can determine above-mentioned to be stored according to the output vector
Data type be " department name ".
Step 203, stores data in the corresponding storage region of type of the data indicated by output vector.
In the present embodiment, according to the output vector of the data classification model obtained in step 202, it may be determined that data institute
The type of category, so as to store data in the corresponding storage region of the above-mentioned type in.In server or client for the ease of
Data are carried out to unify effectively management, storage region is set generally according to different data types, server is according to output
After vector determination data type to be stored, can first look for whether being provided with the data type in default storage region,
If having, data to be stored can be stored directly in the corresponding storage region of the type, if nothing, server can be built again
Found a new storage region to be stored.
The date storage method that the embodiment of the present application is provided, by the characteristic information for obtaining the data with storage, then will
Characteristic information is converted to the input vector of the data classification model of training in advance and is input in data classification model, is referred to
The output vector of the type of registration evidence, finally stores data in the corresponding storage of data type indicated by data classification model
Region, so as to classifying to data to be stored for mailbox, while the storage efficiency of data is improved data has been saved
Memory space.
With further reference to Fig. 3, the flow process 300 of another embodiment of date storage method is it illustrates.The data storage
The flow process 300 of method, comprises the following steps:
Step 301, obtains the characteristic information of data to be stored.
Whether existing data can divide number of different types, can be realized come logical expression with bivariate table structure according to data,
Data can be divided into structural data and unstructured data.Structural data namely row data, can be with unified knot
Structure represented, for example numeral, symbol and traditional data models;Unstructured data refers to that the field length of data is variable,
And the record of each field can be included by the data for repeating or unrepeatable son field is constituted, unstructured data again
Video, audio frequency, document, textual image, all kinds of forms, image, office documents etc..In there is mass data table in financial system
Data, i.e. structural data, its characteristic information can be by type of character string in data length value, data etc. come table
Show;Substantial amounts of text data is also there is, its characteristic information can be represented by keyword.
In the present embodiment, date storage method operation electronic equipment (such as the server shown in Fig. 1) thereon can
To obtain the characteristic information of data to be stored by wired connection mode or radio connection.When above-mentioned number to be stored
According to for data in tables of data when, its characteristic information include it is following at least one:The data table items in tables of data belonging to data
Title, indicate the statistical nature information of the statistical nature of data, the statistical nature information of statistical nature for indicating data is also wrapped
Include indicate between data table items the related information of incidence relation, the mean value of the length of data, the maximum of the length of data,
The type of the character in minimum of a value, the data of the length of data.When above-mentioned data to be stored are text data, its feature letter
Breath includes keyword.
In the present embodiment, when data to be stored be text data when, it is possible to use natural language processing method or
Circulation neural network model carries out cutting word, participle to text data, so that it is determined that the keyword in text data.
Step 302, generates the corresponding tables of data characteristic vector of characteristic information.
The characteristic information of the data to be stored in the tables of data got in step 301, in the present embodiment, clothes
The characteristic information of data to be stored can be generated tables of data characteristic vector by business device, wherein, tables of data characteristic vector includes table
Registration is according to the component of the title of the data table items in affiliated tables of data, the component of expression statistical nature information.As an example, exist
In one system, data " B " to be stored are " employee information ", " information of employee " such as " sex ", " age " can be in " member
Store in the essential information of work " this tables of data, it is also possible to set up and " department information " this tables of data using main foreign key relationship
Relation being stored.The characteristic vector corresponding with data " B " to be stored is instruction " employee information " this data institute
The component of the title of the list item of the tables of data of category, the component of the incidence relation between instruction and " department information ", instruction employee's letter
The component of the average length of the data of breath.
Step 303, generate the first of the data classification model comprising tables of data characteristic vector and null vector successively be input into
Amount.
The input vector of data classification model can include the characteristic vector of structural data and the spy of unstructured data
Vectorial two parts are levied, in summing up in the point that general financial system, the input vector of data classification model mainly includes that tables of data is special
Vector sum keyword feature vector two parts are levied, when data to be stored are data table data, i.e. structural data, can be by
Keyword feature vector representation into null vector form, when data to be stored be text data, i.e. unstructured data when, can
So that tables of data characteristic vector to be expressed as in the form of null vector.
In the present embodiment, server is the data in tables of data according to the data to be stored determined in step 301, and
The characteristic vector of the data in the tables of data determined in step 302, server can further generate data classification model
The first input vector, in first input vector successively include step 302 in determine tables of data characteristic vector and null vector.
Step 304, generates the corresponding keyword feature vector of characteristic information.In the present embodiment, when data to be stored
For text data when, because the characteristic information of text data is keyword, in this step, can will be corresponding with text data
Key word information generates keyword feature vector, wherein, each keyword correspondence one-component in keyword feature vector.
In the present embodiment, it is possible to use vector space model is vectorial to generate keyword feature, vector space model is existing known
Technology, will not be described here.As an example, in some system, there are the non-structured text such as substantial amounts of document, contract
Notebook data.When data to be stored are " C company contracts ", characteristic information of the server according to " the C company contracts " for getting
The keyword such as " C companies ", " contract " generates respectively keyword component corresponding with keyword " C companies " and corresponding with " contract "
Keyword component.
Step 305, generate successively comprising null vector and keyword feature vector data classification model second be input into
Amount.
In the present embodiment, server is text data according to the data to be stored determined in step 301, and according to step
The crucial term vector of the text data determined in rapid 305, server can further generate the second input of data classification model
Vector, includes successively null vector and the key term vector determined in step 305 in the input vector
Step 306, by input vector data classification model is input to, and obtains the output vector of the type of instruction data.
In the present embodiment, according in step 303 and step 305 determine data classification model the first input vector and
Above-mentioned first input vector and the second input vector can be separately input to data classification model by the second input vector, server
In, obtain the output vector of the type of instruction data.Output vector can include the type component, to be stored of each preset data
Data and the type of data between matching degree component.Here, data classification model can be first according to input vector first
First determine that data to be stored are the data or text data in tables of data, then data classification model can be to above two
Data carry out separating to process, so as to respectively according to the first input vector and the second input vector generation output vector.For example, when
When server is input to the input vector that data " X " to be stored are generated in data classification model, data classification model can be with
Tables of data characteristic component and null vector based on the input vector determines that data " X " to be stored are the data in tables of data,
The data type for determining the data simultaneously is " data type relevant with numeral ", therefore data classification model output is " with numeral
The corresponding output vector of relevant data type ".Again for example, the input for generating data " Y " to be stored when server to
When amount is input in data classification model, data classification model can be based on the null vector and keyword feature point of the input vector
Amount determines that data " Y " to be stored are text data, while the data type for determining the data is " character type ", therefore data point
Class model exports the output component corresponding with " character type ".
In the present embodiment, above-mentioned data classification model is based on and advances with training sample to have monitor mode to be trained
Form, alternatively, above-mentioned data classification model is decision-tree model, here it should be noted that the machine of decision-tree model
Learning method is widely studied at present and application known technology, be will not be described here.
Step 307, stores data in the corresponding storage region of type of the data indicated by output vector.
In the present embodiment, according to the output vector of the data classification model obtained in step 306, it may be determined that data institute
The type of category, so as to store data in the corresponding storage region of the above-mentioned type in.
From figure 3, it can be seen that compared with the corresponding embodiments of Fig. 2, the flow process of the date storage method in the present embodiment
300 data to be stored are divided into data and text data in structural data and unstructured data, i.e. tables of data, together
When two kinds of data distributions be input in data classification model matched, data classification model carries out above two data
Separate to process, the output vector for respectively obtaining the type for indicating the data in tables of data is defeated with the type for indicating text data
Outgoing vector, so as to more rapidly and effectively data fast and effectively classify, and accelerates the speed of data storage, reduces storage number
According to space.
With further reference to Fig. 4, as the realization to method shown in above-mentioned each figure, this application provides a kind of data storage dress
The one embodiment put, the device embodiment is corresponding with the embodiment of the method shown in Fig. 2, and the device specifically can apply to respectively
In planting electronic equipment.
As shown in figure 4, the above-mentioned data storage device 400 of the present embodiment includes:Acquiring unit 401, input block 402,
And memory cell 403.Wherein, acquiring unit 401 is configured to the characteristic information of the data for obtaining to be stored, features described above letter
Breath include it is following at least one:The title of the data table items in tables of data belonging to above-mentioned data, the statistics for indicating above-mentioned data
The statistical nature information of feature, keyword;Input block 402 is configured to for features described above information to be converted to data classification model
Input vector be input to data classification model, the output vector of the type for obtaining indicating above-mentioned data, above-mentioned data are classified mould
Type is generated based on advancing with training sample with there is monitor mode to be trained, and above-mentioned training sample includes:Data storage
Features described above information, Jing mark above-mentioned data storage type;Memory cell 403 is configured to above-mentioned data storage
In the corresponding storage region of the above-mentioned type.
In the present embodiment, the acquiring unit 401 of data storage device 400, input block 402 and memory cell 403
Concrete process and its technique effect that brought can respectively with reference to step 201, step 202 and step in Fig. 2 correspondence embodiments
203 related description, will not be described here.
In some optional implementations of the present embodiment, above-mentioned data are the data in tables of data, and features described above is believed
Breath includes title, the statistical nature information of the data table items in the tables of data belonging to above-mentioned data, and above-mentioned input block 402
Including:Tables of data characteristic vector generates subelement 4021 and is configured to generate the corresponding tables of data characteristic vector of characteristic information, on
Stating tables of data characteristic vector includes:Represent the component of the title of the data table items in the tables of data belonging to above-mentioned data, represent system
The component of meter characteristic information;First input vector generates subelement 4022 and is configured to generate successively comprising above-mentioned tables of data feature
The input vector of the data classification model of vector sum null vector;Output vector generates subelement 4025 and is configured to above-mentioned input
Vector is input to data classification model, the output vector of the type for obtaining indicating above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned statistical nature information includes:Indicate above-mentioned tables of data
It is the related information of incidence relation, the mean value of the length of above-mentioned data, the maximum of the length of above-mentioned data between, above-mentioned
The type of the character in the minimum of a value of the length of data, above-mentioned data.
In some optional implementations of the present embodiment, above-mentioned data are text data, and features described above information is pass
Keyword, and above-mentioned input block 402 includes:Keyword feature vector generates subelement 4023 and is configured to generate characteristic information
Corresponding keyword feature vector, wherein, each keyword correspondence one-component in keyword feature vector;Second be input into
Amount generates subelement 4024 and is configured to generate the data classification model comprising null vector and above-mentioned keyword feature vector successively
The second input vector;Output vector determination subelement 4025 is configured to for above-mentioned second input vector to be input to data classification
Model, the output vector of the type for obtaining indicating above-mentioned data.
Below with reference to Fig. 5, the computer system 500 that is suitable to the server for realizing the embodiment of the present application is it illustrates
Structural representation.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and
Perform various appropriate actions and process.In RAM 503, the system that is also stored with 500 operates required various programs and data.
CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always
Line 504.
I/O interfaces 505 are connected to lower component:Including the importation 506 of keyboard, mouse etc.;Penetrate including such as negative electrode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.;
And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., as needed on driver 510, in order to read from it
Computer program be mounted into as needed storage part 508.
Especially, in accordance with an embodiment of the present disclosure, may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program, and it includes being tangibly embodied in machine readable
Computer program on medium, the computer program includes the program code for the method shown in execution flow chart.At this
In the embodiment of sample, the computer program can be downloaded and installed by communications portion 509 from network, and/or from removable
Unload medium 511 to be mounted.When the computer program is performed by CPU (CPU) 501, in performing the present processes
The above-mentioned functions of restriction.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of the various embodiments of the application, method and computer journey
The architectural framework in the cards of sequence product, function and operation.At this point, each square frame in flow chart or block diagram can generation
A part for table one module, program segment or code a, part for the module, program segment or code includes one or more
For realizing the executable instruction of the logic function of regulation.It should also be noted that in some realizations as replacement, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, the two square frame reality for succeedingly representing
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also
It is noted that the combination of block diagram and/or each square frame in flow chart and block diagram and/or the square frame in flow chart, Ke Yiyong
Perform the function of regulation or the special hardware based system of operation to realize, or can be referred to computer with specialized hardware
The combination of order is realizing.
Being described in unit involved in the embodiment of the present application can be realized by way of software, it is also possible to by hard
The mode of part is realizing.Described unit can also be arranged within a processor, for example, can be described as:A kind of processor bag
Include acquiring unit, input block and memory cell.Wherein, the title of these units is not constituted under certain conditions to the unit
The restriction of itself, for example, acquiring unit is also described as " obtaining the unit of the characteristic information of data to be stored ".
As on the other hand, present invention also provides a kind of nonvolatile computer storage media, the non-volatile calculating
Machine storage medium can be the nonvolatile computer storage media described in above-described embodiment included in device;Can also be
Individualism, without the nonvolatile computer storage media allocated into terminal.Above-mentioned nonvolatile computer storage media is deposited
One or more program is contained, when one or more of programs are performed by an equipment so that the equipment:Obtain
The characteristic information of data to be stored, features described above information include it is following at least one:In tables of data belonging to above-mentioned data
The title of data table items, statistical nature information, the keyword of the statistical nature of the above-mentioned data of instruction;By the conversion of features described above information
Input vector for data classification model is input to data classification model, the output vector of the type for obtaining indicating above-mentioned data,
Above-mentioned data classification model is generated based on advancing with training sample with there is monitor mode to be trained, above-mentioned training sample bag
Include:The features described above information of data storage, Jing mark above-mentioned data storage type;By above-mentioned data storage above-mentioned
The corresponding storage region of type.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.People in the art
Member should be appreciated that invention scope involved in the application, however it is not limited to the technology of the particular combination of above-mentioned technical characteristic
Scheme, while also should cover in the case of without departing from the inventive concept, is carried out by above-mentioned technical characteristic or its equivalent feature
Other technical schemes for being combined and being formed.Such as features described above has similar work(with (but not limited to) disclosed herein
The technical scheme that the technical characteristic of energy is replaced mutually and formed.
Claims (10)
1. a kind of date storage method, it is characterised in that methods described includes:
Obtain the characteristic information of data to be stored, the characteristic information include it is following at least one:Number belonging to the data
According to the title of the data table items in table, statistical nature information, the keyword of the statistical nature for indicating the data;
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described
The output vector of the type of data, the data classification model is based on and advances with training sample to have monitor mode to be trained
And generate, the training sample includes:The characteristic information of data storage, Jing mark the data storage class
Type;
Store the data in the corresponding storage region of the type.
2. method according to claim 1, it is characterised in that the data classification model is decision-tree model.
3. method according to claim 2, it is characterised in that the data are the data in tables of data, the feature letter
Breath includes:The title of the data table items in tables of data belonging to the data, statistical nature information;And
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described
The output vector of the type of data includes:
The corresponding tables of data characteristic vector of characteristic information is generated, the tables of data characteristic vector includes:Represent belonging to the data
Tables of data in data table items title component, represent statistical nature information component;
Generate the first input vector comprising the tables of data characteristic vector and the data classification model of null vector successively;
First input vector is input into data classification model, the output vector of the type for obtaining indicating the data.
4. method according to claim 3, it is characterised in that the statistical nature information includes:Indicate the tables of data
It is the related information of incidence relation, the mean value of the length of the data, the maximum of the length of the data between, described
The type of the character in the minimum of a value of the length of data, the data.
5. method according to claim 2, it is characterised in that the data are text data, the characteristic information is to close
Keyword;And
The input vector that the characteristic information is converted to data classification model is input into data classification model, instruction is obtained described
The output vector of the type of data includes:
The corresponding keyword feature vector of characteristic information is generated, wherein, each keyword correspondence one in keyword feature vector
Individual component;
Generate the second input vector comprising null vector and the data classification model of keyword feature vector successively;
Second input vector is input into data classification model, the output vector of the type for obtaining indicating the data.
6. a kind of data storage device, it is characterised in that described device includes:
Acquiring unit, is configured to obtain the characteristic information of data to be stored, the characteristic information include it is following at least one:
The title of the data table items in tables of data belonging to the data, indicate the data statistical nature statistical nature information,
Keyword;
Input block, is configured to for the input vector that the characteristic information is converted to data classification model to be input to data classification
Model, the output vector of the type for obtaining indicating the data, the data classification model based on advance with training sample with
There is monitor mode to be trained and generate, the training sample includes:The institute that characteristic information of data storage, Jing are marked
State the type of data storage;
Memory cell, is configured to store the data in the corresponding storage region of the type.
7. device according to claim 5, it is characterised in that the data classification model is decision-tree model.
8. device according to claim 7, it is characterised in that the data are the data in tables of data, the feature letter
Breath includes:The title of the data table items in tables of data belonging to the data, statistical nature information, and the input block bag
Include:
Tables of data characteristic vector generates subelement, is configured to generate the corresponding tables of data characteristic vector of characteristic information, the number
Include according to table characteristic vector:Represent the component of the title of the data table items in the tables of data belonging to the data, represent that statistics is special
The component of reference breath;
First input vector generates subelement, is configured to generate the number comprising the tables of data characteristic vector and null vector successively
According to the first input vector of disaggregated model;
Output vector generates subelement, is configured to for first input vector to be input to data classification model, is indicated
The output vector of the type of the data.
9. device according to claim 8, it is characterised in that the statistical nature information includes:Indicate the tables of data
It is the related information of incidence relation, the mean value of the length of the data, the maximum of the length of the data between, described
The type of the character in the minimum of a value of the length of data, the data.
10. device according to claim 7, it is characterised in that the data are text data, the characteristic information is to close
Keyword, and the input block includes:
Keyword feature vector generates subelement, is configured to generate the corresponding keyword feature vector of characteristic information, wherein, close
Each keyword correspondence one-component in keyword characteristic vector;
Second input vector generates subelement, is configured to generate the number comprising null vector and keyword feature vector successively
According to the second input vector of disaggregated model;
Output vector generates subelement, is configured to for second input vector to be input to data classification model, is indicated
The output vector of the type of the data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710066733.9A CN106649890B (en) | 2017-02-07 | 2017-02-07 | Data storage method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710066733.9A CN106649890B (en) | 2017-02-07 | 2017-02-07 | Data storage method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106649890A true CN106649890A (en) | 2017-05-10 |
CN106649890B CN106649890B (en) | 2020-07-14 |
Family
ID=58845975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710066733.9A Expired - Fee Related CN106649890B (en) | 2017-02-07 | 2017-02-07 | Data storage method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106649890B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107578014A (en) * | 2017-09-06 | 2018-01-12 | 上海寒武纪信息科技有限公司 | Information processor and method |
CN108427725A (en) * | 2018-02-11 | 2018-08-21 | 华为技术有限公司 | Data processing method, device and system |
CN108563783A (en) * | 2018-04-25 | 2018-09-21 | 张艳 | A kind of financial analysis management system and method based on big data |
CN108763952A (en) * | 2018-05-03 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of data classification method, device and electronic equipment |
CN109144999A (en) * | 2018-08-02 | 2019-01-04 | 东软集团股份有限公司 | A kind of data positioning method, device and storage medium, program product |
CN109271356A (en) * | 2018-09-03 | 2019-01-25 | 中国平安人寿保险股份有限公司 | Log file formats processing method, device, computer equipment and storage medium |
WO2019024231A1 (en) * | 2017-08-04 | 2019-02-07 | 平安科技(深圳)有限公司 | Automatic data matching method, electronic device and computer-readable storage medium |
CN109951509A (en) * | 2017-12-21 | 2019-06-28 | 航天信息股份有限公司 | A kind of cloud storage dispatching method, device, electronic equipment and storage medium |
WO2019196210A1 (en) * | 2018-04-10 | 2019-10-17 | 平安科技(深圳)有限公司 | Data analysis method, computer readable storage medium, terminal device and apparatus |
CN111611418A (en) * | 2019-02-25 | 2020-09-01 | 阿里巴巴集团控股有限公司 | Data storage method and data query method |
CN111626057A (en) * | 2020-07-28 | 2020-09-04 | 南京中孚信息技术有限公司 | Official document judgment method and judgment system based on named entity |
CN111881869A (en) * | 2020-08-04 | 2020-11-03 | 浪潮云信息技术股份公司 | Hierarchical storage method and system based on gesture data |
CN112199694A (en) * | 2020-09-30 | 2021-01-08 | 杭州云链趣链数字科技有限公司 | Standardized bill processing method and device, electronic device and storage medium |
CN112732601A (en) * | 2018-08-28 | 2021-04-30 | 中科寒武纪科技股份有限公司 | Data preprocessing method and device, computer equipment and storage medium |
CN112988884A (en) * | 2019-12-17 | 2021-06-18 | ***通信集团陕西有限公司 | Big data platform data storage method and device |
CN113515680A (en) * | 2021-04-20 | 2021-10-19 | 建信金融科技有限责任公司 | Financial monitoring data processing method and device |
CN116432238A (en) * | 2023-06-05 | 2023-07-14 | 全中半导体(深圳)有限公司 | Data storage method and device and storage chip |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866333A (en) * | 2009-12-24 | 2010-10-20 | 金蝶软件(中国)有限公司 | Worksheet self-defining method and adapter engine |
CN102033964A (en) * | 2011-01-13 | 2011-04-27 | 北京邮电大学 | Text classification method based on block partition and position weight |
CN102073704A (en) * | 2010-12-24 | 2011-05-25 | 华为终端有限公司 | Text classification processing method, system and equipment |
US8903182B1 (en) * | 2012-03-08 | 2014-12-02 | Google Inc. | Image classification |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
CN106126502A (en) * | 2016-07-07 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of emotional semantic classification system and method based on support vector machine |
-
2017
- 2017-02-07 CN CN201710066733.9A patent/CN106649890B/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101866333A (en) * | 2009-12-24 | 2010-10-20 | 金蝶软件(中国)有限公司 | Worksheet self-defining method and adapter engine |
CN102073704A (en) * | 2010-12-24 | 2011-05-25 | 华为终端有限公司 | Text classification processing method, system and equipment |
CN102033964A (en) * | 2011-01-13 | 2011-04-27 | 北京邮电大学 | Text classification method based on block partition and position weight |
US8903182B1 (en) * | 2012-03-08 | 2014-12-02 | Google Inc. | Image classification |
CN104881424A (en) * | 2015-03-13 | 2015-09-02 | 国家电网公司 | Regular expression-based acquisition, storage and analysis method of power big data |
CN106126502A (en) * | 2016-07-07 | 2016-11-16 | 四川长虹电器股份有限公司 | A kind of emotional semantic classification system and method based on support vector machine |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019024231A1 (en) * | 2017-08-04 | 2019-02-07 | 平安科技(深圳)有限公司 | Automatic data matching method, electronic device and computer-readable storage medium |
CN107578014A (en) * | 2017-09-06 | 2018-01-12 | 上海寒武纪信息科技有限公司 | Information processor and method |
CN107578014B (en) * | 2017-09-06 | 2020-11-03 | 上海寒武纪信息科技有限公司 | Information processing apparatus and method |
CN109951509A (en) * | 2017-12-21 | 2019-06-28 | 航天信息股份有限公司 | A kind of cloud storage dispatching method, device, electronic equipment and storage medium |
CN108427725A (en) * | 2018-02-11 | 2018-08-21 | 华为技术有限公司 | Data processing method, device and system |
CN108427725B (en) * | 2018-02-11 | 2021-08-03 | 华为技术有限公司 | Data processing method, device and system |
WO2019153735A1 (en) * | 2018-02-11 | 2019-08-15 | 华为技术有限公司 | Data processing method, device and system |
WO2019196210A1 (en) * | 2018-04-10 | 2019-10-17 | 平安科技(深圳)有限公司 | Data analysis method, computer readable storage medium, terminal device and apparatus |
CN108563783A (en) * | 2018-04-25 | 2018-09-21 | 张艳 | A kind of financial analysis management system and method based on big data |
CN108763952B (en) * | 2018-05-03 | 2022-04-05 | 创新先进技术有限公司 | Data classification method and device and electronic equipment |
CN108763952A (en) * | 2018-05-03 | 2018-11-06 | 阿里巴巴集团控股有限公司 | A kind of data classification method, device and electronic equipment |
CN109144999B (en) * | 2018-08-02 | 2021-06-08 | 东软集团股份有限公司 | Data positioning method, device, storage medium and program product |
CN109144999A (en) * | 2018-08-02 | 2019-01-04 | 东软集团股份有限公司 | A kind of data positioning method, device and storage medium, program product |
CN112732601A (en) * | 2018-08-28 | 2021-04-30 | 中科寒武纪科技股份有限公司 | Data preprocessing method and device, computer equipment and storage medium |
CN109271356B (en) * | 2018-09-03 | 2024-05-24 | 中国平安人寿保险股份有限公司 | Log file format processing method, device, computer equipment and storage medium |
CN109271356A (en) * | 2018-09-03 | 2019-01-25 | 中国平安人寿保险股份有限公司 | Log file formats processing method, device, computer equipment and storage medium |
CN111611418A (en) * | 2019-02-25 | 2020-09-01 | 阿里巴巴集团控股有限公司 | Data storage method and data query method |
CN112988884A (en) * | 2019-12-17 | 2021-06-18 | ***通信集团陕西有限公司 | Big data platform data storage method and device |
CN112988884B (en) * | 2019-12-17 | 2024-04-12 | ***通信集团陕西有限公司 | Big data platform data storage method and device |
CN111626057B (en) * | 2020-07-28 | 2020-10-30 | 南京中孚信息技术有限公司 | Official document judgment method and judgment system based on named entity |
CN111626057A (en) * | 2020-07-28 | 2020-09-04 | 南京中孚信息技术有限公司 | Official document judgment method and judgment system based on named entity |
CN111881869A (en) * | 2020-08-04 | 2020-11-03 | 浪潮云信息技术股份公司 | Hierarchical storage method and system based on gesture data |
CN111881869B (en) * | 2020-08-04 | 2023-04-18 | 浪潮云信息技术股份公司 | Hierarchical storage method and system based on gesture data |
CN112199694A (en) * | 2020-09-30 | 2021-01-08 | 杭州云链趣链数字科技有限公司 | Standardized bill processing method and device, electronic device and storage medium |
CN113515680A (en) * | 2021-04-20 | 2021-10-19 | 建信金融科技有限责任公司 | Financial monitoring data processing method and device |
CN116432238A (en) * | 2023-06-05 | 2023-07-14 | 全中半导体(深圳)有限公司 | Data storage method and device and storage chip |
CN116432238B (en) * | 2023-06-05 | 2023-09-08 | 全中半导体(深圳)有限公司 | Data storage method and device and storage chip |
Also Published As
Publication number | Publication date |
---|---|
CN106649890B (en) | 2020-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106649890A (en) | Data storage method and device | |
US11663254B2 (en) | System and engine for seeded clustering of news events | |
Vysotska et al. | Web Content Support Method in Electronic Business Systems. | |
CN110807527B (en) | Credit adjustment method and device based on guest group screening and electronic equipment | |
CN107247786A (en) | Method, device and server for determining similar users | |
CN115002200B (en) | Message pushing method, device, equipment and storage medium based on user portrait | |
WO2021208685A1 (en) | Method and apparatus for executing automatic machine learning process, and device | |
CN109785064A (en) | A kind of mobile e-business recommended method and system based on Multi-source Information Fusion | |
Ahmed et al. | Exploring nested ensemble learners using overproduction and choose approach for churn prediction in telecom industry | |
CN104572775B (en) | Advertisement classification method, device and server | |
CN107145485A (en) | Method and apparatus for compressing topic model | |
CN111191825A (en) | User default prediction method and device and electronic equipment | |
CN112015562A (en) | Resource allocation method and device based on transfer learning and electronic equipment | |
CN111581193A (en) | Data processing method, device, computer system and storage medium | |
CN111582314A (en) | Target user determination method and device and electronic equipment | |
CN107346344A (en) | The method and apparatus of text matches | |
CN111429161A (en) | Feature extraction method, feature extraction device, storage medium, and electronic apparatus | |
CN113282623A (en) | Data processing method and device | |
CN107341685A (en) | Data analysing method and device | |
US20190228101A1 (en) | Transaction categorization system | |
CN111930944B (en) | File label classification method and device | |
CN112231299A (en) | Method and device for dynamically adjusting feature library | |
CN116662546A (en) | Complaint text labeling method, device, equipment and medium | |
CN116402546A (en) | Store risk attribution method and device, equipment, medium and product thereof | |
CN110062112A (en) | Data processing method, device, equipment and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20200714 Termination date: 20220207 |
|
CF01 | Termination of patent right due to non-payment of annual fee |