CN110287191A - Data alignment method and device, storage medium, electronic device - Google Patents

Data alignment method and device, storage medium, electronic device Download PDF

Info

Publication number
CN110287191A
CN110287191A CN201910557282.8A CN201910557282A CN110287191A CN 110287191 A CN110287191 A CN 110287191A CN 201910557282 A CN201910557282 A CN 201910557282A CN 110287191 A CN110287191 A CN 110287191A
Authority
CN
China
Prior art keywords
information
data record
record sheet
field information
bag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910557282.8A
Other languages
Chinese (zh)
Other versions
CN110287191B (en
Inventor
接钧靖
张毅然
王建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910557282.8A priority Critical patent/CN110287191B/en
Publication of CN110287191A publication Critical patent/CN110287191A/en
Application granted granted Critical
Publication of CN110287191B publication Critical patent/CN110287191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of data alignment method and device, storage medium, electronic devices, wherein, the above method includes: the first field information and the corresponding first bag of words information of the field information in the first data record sheet for obtaining and receiving, wherein, the first bag of words information is for indicating the probability that first field information occurs in the database;For the second data record sheet of each of the second data record sheets multiple in the database, according in first field information and the first bag of words information and second data record sheet the second field information and the second bag of words information determine the likelihood probability of first data record sheet Yu the multiple second data record sheet;The second description information by the first description information of first data record sheet with the likelihood probability more than the second data record sheet of first threshold is aligned.

Description

Data alignment method and device, storage medium, electronic device
Technical field
The present invention relates to computer fields, in particular to a kind of data alignment method and device, storage medium, electricity Sub-device.
Background technique
With the development of computer technology, more and more people begin to focus on the analysis mining of relational data, and then obtain To the data analysis result about relational data.But due between different data resources data standard it is inconsistent, it will Lead to data quality problem, and then the reliability of data analysis result can be seriously affected.
Lead to not effectively be aligned data in the related technology, the standard of different data resource is inconsistent, into And the problem of influencing the reliability of data analysis result, not yet propose effective technical solution.
Summary of the invention
The embodiment of the invention provides a kind of data alignment method and device, storage medium, electronic devices, at least to solve In the related technology, the standard of different data resource is inconsistent leads to not effectively be aligned data, and then influences data point The problem of analysing the reliability of result.
According to one embodiment of present invention, a kind of data alignment method is provided, comprising: obtain the first number received According to the first field information and the corresponding first bag of words information of the field information in record sheet, wherein the first bag of words letter Cease the probability occurred in the database for indicating first field information;Second data multiple in the database are remembered The second data record sheet of each of table is recorded, according to first field information and the first bag of words information and described The second field information and the second bag of words information in second data record sheet determine first data record sheet with it is the multiple The likelihood probability of second data record sheet;It is more than with the likelihood probability by the first description information of first data record sheet Second description information of the second data record sheet of first threshold is aligned.
Optionally, according to first field information and the first bag of words information and second data record sheet In the second field information and the second bag of words information determine first data record sheet and the multiple second data record sheet Likelihood probability, comprising:
The value of first field information is multiplied to obtain the likelihood probability with the first bag of words information, wherein The second field information in second data record sheet there are in the case where first field information, believe by first field The value value of breath is second threshold, and the second field information in second data record sheet does not have the first field letter When breath, the value value of first field information is third threshold value
It optionally, is more than first threshold by the first description information of first data record sheet and the likelihood probability Second description information of the second data record sheet is aligned, comprising:
The first data record sheet at least one of information: table name information, field information and described first is counted It is more than the second data record sheet at least one of letter of first threshold with the likelihood probability according to the data format of record sheet Breath: table name information, field information and the likelihood probability are more than the data format pair of the second data record sheet of first threshold Together.
Optionally, it obtains the first field information in the first data record sheet received and the field information is corresponding Before first bag of words information, the method also includes:
Receive first data record sheet, wherein include the first field information in first data record sheet;
Establish the first bag of words information corresponding with first field information.
According to another embodiment of the invention, a kind of alignment of data device is additionally provided, comprising:
Module is obtained, for obtaining the first field information and the field information in the first data record sheet received Corresponding first bag of words information, wherein the first bag of words information is for indicating that first field information goes out in the database Existing probability;
First processing module, for for the second data of each of the second data record sheets multiple in the database Record sheet, according to second in first field information and the first bag of words information and second data record sheet Field information and the second bag of words information determine that first data record sheet is similar to the multiple second data record sheet general Rate;
Second processing module, for being more than with the likelihood probability by the first description information of first data record sheet Second description information of the second data record sheet of first threshold is aligned.
Optionally, the first processing module is also used to believe the value of first field information and first bag of words Manner of breathing is multiplied to arrive the likelihood probability, wherein there are described first for the second field information in second data record sheet In the case where field information, the value value of first field information is second threshold, in second data record sheet When second field information does not have first field information, the value value of first field information is third threshold value.
Optionally, the Second processing module is also used to the first data record sheet at least one of information: table Name information, the data format of field information and first data record sheet are more than first threshold with the likelihood probability Second data record sheet at least one of information: table name information, field information and the likelihood probability are more than first threshold The second data record sheet data format alignment.
Optionally, described device further include: receiving module, for receiving first data record sheet, wherein described It include the first field information in one data record sheet;
Third processing module, for establishing the first bag of words information corresponding with first field information.According to the present invention Another embodiment, additionally provide a kind of storage medium, be stored with computer program in the storage medium, wherein described Computer program is arranged to execute data alignment method described in any of the above item when operation.
According to another embodiment of the invention, a kind of electronic device, including memory and processor are additionally provided, it is special Sign is, computer program is stored in the memory, and the processor is arranged to run the computer program to hold Data alignment method described in row any of the above item.
Through the invention, the first field information and the field information pair in the first data record sheet received are obtained The the first bag of words information answered, wherein the first bag of words information is for indicating that first field information occurs in the database Probability;For the second data record sheet of each of the second data record sheets multiple in the database, according to described One field information and the first bag of words information and the second field information and the second bag of words in second data record sheet Information determines the likelihood probability of first data record sheet Yu the multiple second data record sheet;First data are remembered The first description information and the likelihood probability of recording table are more than the second description information pair of the second data record sheet of first threshold Together, i.e., bag of words information is introduced during alignment of data to solve in the related technology, different data by adopting the above technical scheme The standard of resource is inconsistent leads to not effectively be aligned data, and then the reliability for influencing data analysis result is asked Topic, and then a kind of data alignment method is provided, also facilitate the subsequent analysis to data resource.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart according to a kind of optional data alignment method of the embodiment of the present invention;
Fig. 2 is a kind of structural block diagram of optional alignment of data device according to an embodiment of the present invention;
Fig. 3 is a kind of another structural block diagram of optional alignment of data device according to an embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.
Embodiment 1
The embodiment of the invention provides a kind of data alignment method, Fig. 1 is the alignment of data side according to the embodiment of the present invention The flow chart of method, as shown in Figure 1, comprising:
Step S102, the first field information and the field information obtained in the first data record sheet received correspond to The first bag of words information, wherein the first bag of words information is for indicating what first field information occurred in the database Probability;
Step S104, for the second data record sheet of each of the second data record sheets multiple in the database, According to the second field information in first field information and the first bag of words information and second data record sheet The likelihood probability of first data record sheet Yu the multiple second data record sheet is determined with the second bag of words information;
First description information of first data record sheet and the likelihood probability are more than first threshold by step S106 The second data record sheet the second description information alignment.
Through the invention, the first field information and the field information pair in the first data record sheet received are obtained The the first bag of words information answered, wherein the first bag of words information is for indicating that first field information occurs in the database Probability;For the second data record sheet of each of the second data record sheets multiple in the database, according to described One field information and the first bag of words information and the second field information and the second bag of words in second data record sheet Information determines the likelihood probability of first data record sheet Yu the multiple second data record sheet;First data are remembered The first description information and the likelihood probability of recording table are more than the second description information pair of the second data record sheet of first threshold Together, i.e., bag of words information is introduced during alignment of data to solve in the related technology, different data by adopting the above technical scheme The standard of resource is inconsistent leads to not effectively be aligned data, and then the reliability for influencing data analysis result is asked Topic, and then a kind of data alignment method is provided, also facilitate the subsequent analysis to data resource.
In an alternate embodiment of the present invention, according to first field information and the first bag of words information, Yi Jisuo It states the second field information in the second data record sheet and the second bag of words information determines first data record sheet and described more The likelihood probability of a second data record sheet, comprising:
The value of first field information is multiplied to obtain the likelihood probability with the first bag of words information, wherein The second field information in second data record sheet there are in the case where first field information, believe by first field The value value of breath is second threshold, and the second field information in second data record sheet does not have the first field letter When breath, the value value of first field information is third threshold value.
In an alternate embodiment of the present invention, by the first description information of first data record sheet to it is described similar general Rate is more than the second description information alignment of the second data record sheet of first threshold, comprising:
The first data record sheet at least one of information: table name information, field information and described first is counted It is more than the second data record sheet at least one of letter of first threshold with the likelihood probability according to the data format of record sheet Breath: table name information, field information and the likelihood probability are more than the data format pair of the second data record sheet of first threshold Together.
In an alternate embodiment of the present invention, the first field information in the first data record sheet received and institute are obtained Before stating the corresponding first bag of words information of field information, the method also includes:
Receive first data record sheet, wherein include the first field information in first data record sheet;
Establish the first bag of words information corresponding with first field information.
Above-mentioned alignment of data process is explained below in conjunction with an example, but is not used in the restriction embodiment of the present invention Technical solution, the exemplary technical solution of the present invention is as follows:
Step 1, the first data record sheet is received, wherein include the first field information in first data record sheet;
Step 2, the first bag of words information corresponding with first field information is established;
Wherein, dictionary is the set of word, and any document is all by word can be understood according to being described below for bag of words Word combination in allusion quotation forms.Different from dictionary, for bag of words other than all words comprising forming document, each word is also one corresponding Probability indicates in all documents generated by the bag of words, the probability which occurs.Therefore, the corresponding word of the document of different themes Bag is also just different.For example, in the document of sports-theme, the probability that this word of Yao Ming occurs is very high, and the text of entertainment theme In shelves, the probability that this word of Yao Ming occurs is just very low.That is, bag of words are the concepts for being attached to theme.
Specifically, model (Latent Dirichlet Allocation, LDA) can be generated by a kind of document subject matter, It is referred to as a kind of three layers of bayesian probability model, the generation of Lai Shixian document subject matter.The LDA model include word, theme and Document three-decker.Wherein, model is generated it is to be understood that each word of an article is by " with certain probability selection Such a process of some theme, and from this theme with some word of certain probability selection " obtains.Document takes to theme From multinomial distribution, theme to word obeys multinomial distribution.
It is possible to further regard above-mentioned first data record sheet as a theme, by above-mentioned first data record Table name regards document as, and then establishes the first bag of words information for each field information of first data record sheet.
Step 3, for the second data record sheet of each of the second data record sheets multiple in the database, according to First field information and the first bag of words information and the second field information in second data record sheet and Two bag of words information determine the likelihood probability of first data record sheet Yu the multiple second data record sheet;
Specifically, can be similar by being multiplied the value of first field information with the first bag of words information to obtain this Probability, wherein the second field information in second data record sheet is there are in the case where first field information, institute The value value for stating the first field information is second threshold, and the second field information in second data record sheet does not have institute When stating the first field information, the value value of first field information is third threshold value;
Using above-mentioned first data record sheet as motor vehicle register information table, the second data record sheet is the illegal information of motor vehicle Illustrate above-mentioned likelihood probability for table:
Motor vehicle register information table: { motor vehicle 0.9, vehicle 0.8, registration 0.3, information 0.6, record 0.2, registration 0.2 ... ... };
Motor vehicle register information table: motor vehicle 0.9, vehicle 0.8, illegal 0.5, violating the regulations 0.4, information 0.6 is disobeyed and stops 0.2, Hypervelocity 0.1 ... ... };
It is appreciated that the bag of words information of the above-mentioned each field information of digital representation.
" motor vehicle " in above-mentioned motor vehicle register information table exists in above-mentioned motor vehicle register information table, then should " machine The corresponding second threshold of motor-car " can be 1 or 0.9, and " registration " in above-mentioned motor vehicle register information table is in above-mentioned motor vehicle It is not present in register information table, being then somebody's turn to do " registration " corresponding third threshold value can be 0 or 0.7 (i.e. 1-0.3=0.7).For The corresponding specific threshold value of other field informations can refer to aforesaid way, no longer be described in detail one by one herein.
Then each field of each field information in the motor vehicle register information table and motor vehicle register information table is believed After breath compares, the corresponding threshold value of each field information in the available motor vehicle register information table, by the motor vehicle The corresponding threshold value of each field information is added in register information table, may finally obtain above-mentioned likelihood probability.
Step 4, by above-mentioned likelihood probability, second data record most like with first data record sheet is obtained Table.
Wherein, second data record sheet most like with first data record sheet can be one or more, herein It is not construed as limiting.
Specifically, by the first data record sheet at least one of information: table name information, field information and described The data format of first data record sheet, with the likelihood probability be more than first threshold the second data record sheet below at least it One information: table name information, field information and the likelihood probability are more than the data lattice of the second data record sheet of first threshold Formula alignment.
It for example, include motor vehicle register information and the illegal letter of motor vehicle in normal data specification (i.e. above-mentioned database) Cease the definition of two kinds of data resources (i.e. above-mentioned second data record sheet).Existing data resource is the table name provided by certain mechanism For the relational data of vehicle violation record sheet (i.e. above-mentioned first data record sheet).Vehicle can be determined through the above technical solution In record sheet (i.e. above-mentioned first data record sheet) violating the regulations and normal data specification the illegal information of motor vehicle (i.e. with the first number Likelihood probability according to record sheet is more than the second data record sheet of first threshold) it is aligned, rather than motor vehicle register information.
In aforesaid way, although the calibration of data manually can be relatively accurately completed, with normal data Specification is more and more perfect, becomes increasingly complex, under the calibration of data will expend a large amount of manpower, and efficiency is very low, It can quickly determine by adopting the above technical scheme and highest second data record sheet of the first data record sheet likelihood probability, auxiliary It is accomplished manually data normalization.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
A kind of alignment of data device is additionally provided in the present embodiment, and the device is real for realizing above-described embodiment and preferably Mode is applied, the descriptions that have already been made will not be repeated.As used below, the soft of predetermined function may be implemented in term " module " The combination of part and/or hardware.Although device described in following embodiment is preferably realized with software, hardware, or The realization of the combination of software and hardware is also that may and be contemplated.
Fig. 2 is the structural block diagram of alignment of data device according to an embodiment of the present invention, as shown in Fig. 2, the device includes:
Module 20 is obtained, for obtaining the first field information and field letter in the first data record sheet received Cease corresponding first bag of words information, wherein the first bag of words information is for indicating first field information in the database The probability of appearance;
First processing module 22, for for the number of each of second data record sheets multiple in the database second According to record sheet, according in first field information and the first bag of words information and second data record sheet Two field informations and the second bag of words information determine that first data record sheet is similar to the multiple second data record sheet Probability;
Second processing module 24, for surpassing the first description information of first data record sheet with the likelihood probability Cross the second description information alignment of the second data record sheet of first threshold.
Through the invention, the first field information and the field information pair in the first data record sheet received are obtained The the first bag of words information answered, wherein the first bag of words information is for indicating that first field information occurs in the database Probability;For the second data record sheet of each of the second data record sheets multiple in the database, according to described One field information and the first bag of words information and the second field information and the second bag of words in second data record sheet Information determines the likelihood probability of first data record sheet Yu the multiple second data record sheet;First data are remembered The first description information and the likelihood probability of recording table are more than the second description information pair of the second data record sheet of first threshold Together, i.e., bag of words information is introduced during alignment of data to solve in the related technology, different data by adopting the above technical scheme The standard of resource is inconsistent leads to not effectively be aligned data, and then the reliability for influencing data analysis result is asked Topic, and then a kind of data alignment method is provided, also facilitate the subsequent analysis to data resource.
In an alternate embodiment of the present invention, the first processing module 22 is also used to first field information Value is multiplied to obtain the likelihood probability with the first bag of words information, wherein the second word in second data record sheet Segment information is there are in the case where first field information, and the value value of first field information is second threshold, described When the second field information in second data record sheet does not have first field information, the value of first field information is taken Value is third threshold value.In an alternate embodiment of the present invention, the Second processing module 24 is also used to remember first data Table at least one of information: the data format of table name information, field information and first data record sheet is recorded, and it is described Likelihood probability is more than the second data record sheet at least one of information of first threshold: table name information, field information and institute State data format alignment of the likelihood probability more than the second data record sheet of first threshold.
In an alternate embodiment of the present invention, as shown in figure 3, described device further include: receiving module 26, for receiving State the first data record sheet, wherein include the first field information in first data record sheet;
Third processing module 28, for establishing the first bag of words information corresponding with first field information.
The embodiments of the present invention also provide a kind of storage medium, which includes the program of storage, wherein above-mentioned The method of any of the above-described is executed when program is run.
Optionally, in the present embodiment, above-mentioned storage medium can be set to store the journey for executing following steps Sequence code:
S1 obtains the first field information and the field information corresponding first in the first data record sheet received Bag of words information, wherein the first bag of words information is for indicating the probability that first field information occurs in the database;
S2, for the second data record sheet of each of the second data record sheets multiple in the database, according to institute State the second field information and second in the first field information and the first bag of words information and second data record sheet Bag of words information determines the likelihood probability of first data record sheet Yu the multiple second data record sheet;
First description information of first data record sheet and the likelihood probability are more than the second of first threshold by S3 Second description information of data record sheet is aligned.
Optionally, in the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (Read- Only Memory, referred to as ROM), it is random access memory (Random Access Memory, referred to as RAM), mobile hard The various media that can store program code such as disk, magnetic or disk.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
The embodiments of the present invention also provide a kind of electronic device, including memory and processor, stored in the memory There is computer program, which is arranged to run computer program to execute the step in any of the above-described embodiment of the method Suddenly.
Optionally, above-mentioned electronic device can also include transmission device and input-output equipment, wherein the transmission device It is connected with above-mentioned processor, which connects with above-mentioned processor.
Optionally, in the present embodiment, above-mentioned processor can be set to execute following steps by computer program:
S1 obtains the first field information and the field information corresponding first in the first data record sheet received Bag of words information, wherein the first bag of words information is for indicating the probability that first field information occurs in the database;
S2, for the second data record sheet of each of the second data record sheets multiple in the database, according to institute State the second field information and second in the first field information and the first bag of words information and second data record sheet Bag of words information determines the likelihood probability of first data record sheet Yu the multiple second data record sheet;
First description information of first data record sheet and the likelihood probability are more than the second of first threshold by S3 Second description information of data record sheet is aligned.
Optionally, the specific example in the present embodiment can be with reference to described in above-described embodiment and optional embodiment Example, details are not described herein for the present embodiment.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.It is all within principle of the invention, it is made it is any modification, etc. With replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data alignment method characterized by comprising
Obtain the first field information and the corresponding first bag of words letter of the field information in the first data record sheet received Breath, wherein the first bag of words information is for indicating the probability that first field information occurs in the database;
For the second data record sheet of each of the second data record sheets multiple in the database, according to first word Segment information and the first bag of words information and the second field information and the second bag of words information in second data record sheet Determine the likelihood probability of first data record sheet Yu the multiple second data record sheet;
The second data that first description information of first data record sheet and the likelihood probability are more than first threshold are remembered Record the second description information alignment of table.
2. the method according to claim 1, wherein being believed according to first field information and first bag of words Breath and the second field information in second data record sheet and the second bag of words information determine first data record sheet With the likelihood probability of the multiple second data record sheet, comprising:
The value of first field information is multiplied to obtain the likelihood probability with the first bag of words information, wherein described The second field information in second data record sheet there are in the case where first field information, first field information Value value is second threshold, and the second field information in second data record sheet does not have first field information When, the value value of first field information is third threshold value.
3. the method according to claim 1, wherein by the first description information of first data record sheet with The likelihood probability is more than the second description information alignment of the second data record sheet of first threshold, comprising:
The first data record sheet at least one of information: table name information, field information and first data is remembered The data format for recording table is more than the second data record sheet at least one of information of first threshold: table with the likelihood probability Name information, field information and the likelihood probability are more than the data format alignment of the second data record sheet of first threshold.
4. method according to any one of claims 1 to 3, which is characterized in that obtain the first data record sheet received In the corresponding first bag of words information of the first field information and the field information before, the method also includes:
Receive first data record sheet, wherein include the first field information in first data record sheet;
Establish the first bag of words information corresponding with first field information.
5. a kind of alignment of data device characterized by comprising
Module is obtained, for obtaining the first field information and field information correspondence in the first data record sheet received The first bag of words information, wherein the first bag of words information is for indicating what first field information occurred in the database Probability;
First processing module, for for the second data record of each of the second data record sheets multiple in the database Table, according to the second field in first field information and the first bag of words information and second data record sheet Information and the second bag of words information determine the likelihood probability of first data record sheet Yu the multiple second data record sheet;
Second processing module, for being more than first by the first description information of first data record sheet and the likelihood probability Second description information of the second data record sheet of threshold value is aligned.
6. device according to claim 5, which is characterized in that the first processing module is also used to first word The value of segment information is multiplied to obtain the likelihood probability with the first bag of words information, wherein in second data record sheet The second field information there are in the case where first field information, the value value of first field information is the second threshold Value, when not there is no first field information in the second field information in second data record sheet, first field The value value of information is third threshold value.
7. device according to claim 5, which is characterized in that the Second processing module is also used to first number According to record sheet at least one of information: the data format of table name information, field information and first data record sheet, with The likelihood probability be more than first threshold the second data record sheet at least one of information: table name information, field information with And the likelihood probability is more than the data format alignment of the second data record sheet of first threshold.
8. device according to claim 5, which is characterized in that described device further include:
Receiving module, for receiving first data record sheet, wherein include the first field in first data record sheet Information;
Third processing module, for establishing the first bag of words information corresponding with first field information.
9. a kind of storage medium, which is characterized in that be stored with computer program in the storage medium, wherein the computer Program is arranged to execute method described in any one of Claims 1-4 when operation.
10. a kind of electronic device, including memory and processor, which is characterized in that be stored with computer journey in the memory Sequence, the processor are arranged to run the computer program to execute side described in any one of Claims 1-4 Method.
CN201910557282.8A 2019-06-25 2019-06-25 Data alignment method and device, storage medium and electronic device Active CN110287191B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910557282.8A CN110287191B (en) 2019-06-25 2019-06-25 Data alignment method and device, storage medium and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910557282.8A CN110287191B (en) 2019-06-25 2019-06-25 Data alignment method and device, storage medium and electronic device

Publications (2)

Publication Number Publication Date
CN110287191A true CN110287191A (en) 2019-09-27
CN110287191B CN110287191B (en) 2021-07-27

Family

ID=68005654

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910557282.8A Active CN110287191B (en) 2019-06-25 2019-06-25 Data alignment method and device, storage medium and electronic device

Country Status (1)

Country Link
CN (1) CN110287191B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017158058A1 (en) * 2016-03-15 2017-09-21 Imra Europe Sas Method for classification of unique/rare cases by reinforcement learning in neural networks
CN107657062A (en) * 2017-10-25 2018-02-02 医渡云(北京)技术有限公司 Similar case search method and device, storage medium, electronic equipment
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109378080A (en) * 2018-09-14 2019-02-22 浙江大学 A kind of similar Chinese medicine search method based on feature bag of words
CN109634949A (en) * 2018-12-28 2019-04-16 浙江大学 A kind of blended data cleaning method based on more versions of data
CN109766436A (en) * 2018-12-04 2019-05-17 北京明略软件***有限公司 A kind of matched method and apparatus of data element of the field and knowledge base of tables of data
CN109783611A (en) * 2018-12-29 2019-05-21 北京明略软件***有限公司 A kind of method, apparatus of fields match, computer storage medium and terminal
CN109800215A (en) * 2018-12-26 2019-05-24 北京明略软件***有限公司 Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing
CN109871382A (en) * 2019-02-13 2019-06-11 北京明略软件***有限公司 A kind of implementation method and device of tables of data access java standard library

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017158058A1 (en) * 2016-03-15 2017-09-21 Imra Europe Sas Method for classification of unique/rare cases by reinforcement learning in neural networks
CN107657062A (en) * 2017-10-25 2018-02-02 医渡云(北京)技术有限公司 Similar case search method and device, storage medium, electronic equipment
CN107844560A (en) * 2017-10-30 2018-03-27 北京锐安科技有限公司 A kind of method, apparatus of data access, computer equipment and readable storage medium storing program for executing
CN109378080A (en) * 2018-09-14 2019-02-22 浙江大学 A kind of similar Chinese medicine search method based on feature bag of words
CN109766436A (en) * 2018-12-04 2019-05-17 北京明略软件***有限公司 A kind of matched method and apparatus of data element of the field and knowledge base of tables of data
CN109800215A (en) * 2018-12-26 2019-05-24 北京明略软件***有限公司 Method, apparatus, computer storage medium and the terminal of a kind of pair of mark processing
CN109634949A (en) * 2018-12-28 2019-04-16 浙江大学 A kind of blended data cleaning method based on more versions of data
CN109783611A (en) * 2018-12-29 2019-05-21 北京明略软件***有限公司 A kind of method, apparatus of fields match, computer storage medium and terminal
CN109871382A (en) * 2019-02-13 2019-06-11 北京明略软件***有限公司 A kind of implementation method and device of tables of data access java standard library

Also Published As

Publication number Publication date
CN110287191B (en) 2021-07-27

Similar Documents

Publication Publication Date Title
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
WO2014160282A1 (en) Classifying resources using a deep network
CN106599317A (en) Test data processing method and device for question-answering system and terminal
CN110634471B (en) Voice quality inspection method and device, electronic equipment and storage medium
CN109409241A (en) Video checking method, device, equipment and readable storage medium storing program for executing
CN108366052A (en) Verify the processing method and system of short message
CN109978139B (en) Method, system, electronic device and storage medium for automatically generating description of picture
CN112507167A (en) Method and device for identifying video collection, electronic equipment and storage medium
CN110362563A (en) The processing method and processing device of tables of data, storage medium, electronic device
CN105574480B (en) A kind of information processing method, device and terminal
CN110069594B (en) Contract confirmation method, contract confirmation device, electronic equipment and storage medium
CN109992665A (en) A kind of classification method based on the extension of problem target signature
CN113609390A (en) Information analysis method and device, electronic equipment and computer readable storage medium
CN110287191A (en) Data alignment method and device, storage medium, electronic device
CN110400560A (en) Data processing method and device, storage medium, electronic device
CN109947850A (en) Data distributing method, device and equipment
CN107749201A (en) Point reads object processing method, device, storage medium and electronic equipment
WO2023272833A1 (en) Data detection method, apparatus and device and readable storage medium
EP3764245A1 (en) Method, apparatus, electronic device and computer readable medium for obtaining an answer to a question
CN107833259B (en) Dynamic cartoon engine processing method and system based on intelligent terminal
CN113569091A (en) Video data processing method and device
CN110515653A (en) Document structure tree method, apparatus, electronic equipment and computer readable storage medium
CN110879868A (en) Consultant scheme generation method, device, system, electronic equipment and medium
CN110196947A (en) Method, apparatus, electronic equipment and the storage medium that recommendation information determines
CN113591467B (en) Event main body recognition method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant