CN107247724B - Transition probability matrix update, information identifying method and device, computer equipment - Google Patents
Transition probability matrix update, information identifying method and device, computer equipment Download PDFInfo
- Publication number
- CN107247724B CN107247724B CN201710288225.5A CN201710288225A CN107247724B CN 107247724 B CN107247724 B CN 107247724B CN 201710288225 A CN201710288225 A CN 201710288225A CN 107247724 B CN107247724 B CN 107247724B
- Authority
- CN
- China
- Prior art keywords
- information
- transition probability
- transferred
- identified
- probability matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of transition probability matrix update method and device, information identifying method and device, computer equipment determine quasi- each first information to be transferred for carrying out probability updating according to the transition probability matrix of acquisition;Obtain each default general information corresponding with the first information to be transferred, and information after each default general information is followed by respectively to each merging corresponding with the first information to be transferred of the first information acquisition to be transferred;It obtains and the first volumes of searches obtained is scanned for the first information to be transferred, and each second volumes of searches obtained is scanned for information after each merging respectively;The ratio of each second volumes of searches and the first volumes of searches is calculated separately, each second transition probability that the first information to be transferred is transferred to each default general information respectively is obtained;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, transition probability matrix accuracy is improved and identification accuracy can be improved when being identified into row information according to updated transition probability matrix.
Description
Technical field
The present invention relates to technical field of computer information processing, more particularly to a kind of transition probability matrix update method and dress
It sets, information identifying method and device, computer equipment.
Background technology
Field is identified in information such as character, audios, the recognition result obtained when being identified using different methods
Accuracy is simultaneously different, currently, frequently with character identifying method, the image of to be by scanning include character, according to word in image
The feature of symbol carries out character recognition, frequently with audio identification methods, be to be identified by the sound frequency of audio.
It is difficult to be identified as by above-mentioned recognition methods however, when the fuzzy character for causing to include of image appearance is unintelligible
Work(, the result accuracy identified in other words are relatively low.When audio is interfered by noise, the identification of audio is accurately relatively low.In profit
Smudgy clear character or the audio being interfered are identified with the transition probability matrix mode of information, but due to depositing
It is unable to wide range covering day commonly used information, the transition probability matrix of formation in the information content of sparsity, i.e. message sample
Inaccuracy causes the result being subsequently identified according to transition probability matrix also inaccurate.
Invention content
Based on this, it is necessary to for causing the transition probability matrix of information inaccurate since information is sparse, to make subsequently
The inaccurate problem of information identification proposes a kind of transition probability matrix update method and device, information identifying method and device, meter
Calculate machine equipment.
Accordingly, the present embodiment uses following technical scheme:
A kind of transition probability matrix update method, includes the following steps:
The transition probability matrix of information is obtained, the first transition probability in the transition probability matrix turns for information to be transferred
Move to the probability for the information that diverts the aim;
According to the transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined;
Each default general information corresponding with the described first information to be transferred is obtained, and each default general information is divided
It is not followed by the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and is passed through
Described search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated separately, first information to be transferred is obtained
It is transferred to each second transition probability of each default general information respectively;
It is general that second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transfer
In rate matrix, updated transition probability matrix is obtained.
The present invention also provides a kind of transition probability matrix updating devices, including:
Initial matrix acquisition module, the transition probability matrix for obtaining information, first in the transition probability matrix
Transition probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, determining that each the first of quasi- progress probability updating is waited turning
Move information;
Merging module, for obtaining each default general information corresponding with the described first information to be transferred, and will it is each described in
Default general information is followed by respectively to the described first information to be transferred, obtains each merging corresponding with the described first information to be transferred
Information afterwards;
Search result acquisition module obtains and scans for the of acquisition to the described first information to be transferred by search engine
One volumes of searches, and each second search obtained is scanned for information after each merging by described search engine respectively
Amount;
Probability evaluation entity, the ratio for calculating separately each second volumes of searches and first volumes of searches obtain
First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module is used for the second transition probability of the maximum preceding preset number of the value of second transition probability
It is updated in the transition probability matrix, obtains updated transition probability matrix.
The present invention also provides a kind of information identifying methods, include the following steps:
Signature analysis is carried out to object to be identified, obtains characteristic information;
It is determined there are when unusual part according to characteristic information, the unusual part is divided from the object to be identified
From the object to be identified after being detached;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is
Previous subobject in the object to be identified, adjacent with the unusual part;
Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of the object to be identified more
Transition probability matrix after new, according to the updated transition probability matrix and first identification information determine with it is described
Corresponding second identification information in unusual part;
The initial recognition result and second identification information are combined, the information of the object to be identified is obtained
Recognition result.
The present invention also provides a kind of information recognition devices, including above-mentioned transition probability matrix updating device, further include;
Initial analysis module obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module waits for the unusual part from described for being determined there are when unusual part according to characteristic information
It is detached in identification object, the object to be identified after being detached;
Initial identification module obtains initial recognition result for the object to be identified after the separation to be identified;
Subobject acquisition module, for obtaining the first identification letter corresponding with preceding adjacent object in the initial recognition result
Breath, the preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, the class with the object to be identified for being determined according to the transition probability matrix updating device
The corresponding updated transition probability matrix of type and first identification information determine corresponding with the unusual part
Second identification information;
Recognition result update module is obtained for the initial recognition result and second identification information to be combined
Obtain the information recognition result of the object to be identified.
The present invention also provides a kind of computer equipment, including memory, processor and it is stored on the memory simultaneously
The computer program that can be run on the processor, the processor are realized above-mentioned any one when executing the computer program
The transition probability matrix update method or information identifying method of item.
By above-mentioned transition probability matrix update method and device, computer equipment, even if there is text in message sample
Sparse, corresponding transition probability matrix is inaccurate, can wait turning with first by corresponding second volumes of searches of information after choosing merging
The transition probability for moving the maximum preceding preset number of ratio of corresponding first volumes of searches of information is updated to the transition probability
In matrix, i.e., the first information to be transferred is transferred to the larger portion of the volumes of searches of information after corresponding merging in default conventional characters
Divide the transition probability of character to increase to transition probability matrix, update transition probability matrix, improve the accuracy of transition probability matrix,
When being subsequently identified according to transition probability matrix, identification accuracy is improved.
Above- mentioned information recognition methods and device find unusual part first, then again to object to be identified separating abnormality portion
Acquisition initial recognition result is identified in the object to be identified after separation after point, then according to initial recognition result and transfer
Unusual part is identified in probability matrix, that is, detach after object to be identified and unusual part be separate identification, without pair
The identification of unusual part is carried out after being entirely identified there are the object to be identified of unusual part again, then to the whole of object to be identified
A recognition result is updated, and reduces recognition time, improves recognition efficiency.And in the present solution, there are different in object to be identified
Chang Shi is the updated transition probability matrix determined by above-mentioned transition probability matrix update method and initial recognition result into
The accuracy of the identification of row unusual part, updated transition probability matrix is good, can be according to more in this way when being identified into row information
Add accurate updated transition probability matrix to be identified, improves identification accuracy.
Description of the drawings
Fig. 1 is the working environment schematic diagram of one embodiment of the invention;
Fig. 2 is the composed structure schematic diagram of the server in one embodiment;
Fig. 3 is the flow diagram of the transition probability matrix update method of one embodiment;
Fig. 4 is the flow diagram of the information identifying method of one embodiment;
Fig. 5 is the sub-process schematic diagram in the information identifying method of another embodiment;
Fig. 6 is the interface schematic diagram of the transition probability matrix of an embodiment;
Fig. 7 is the module diagram of the transition probability matrix updating device of one embodiment;
Fig. 8 is the module diagram of the information recognition device of one embodiment;
Fig. 9 is the submodule schematic diagram of the information recognition device of another embodiment.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to the accompanying drawings and embodiments, to this
Invention is described in further detail.It should be appreciated that the specific embodiments described herein are only used to explain the present invention,
Do not limit protection scope of the present invention.
Fig. 1 shows the working environment schematic diagram in one embodiment of the invention, as shown in Figure 1, its working environment is related to
Terminal 110, server 120 and network 130, terminal 110 and server 120 can be communicated by network 130.Terminal
110 may have access to corresponding server 120 by network 130, and to ask corresponding information recognition result, server 120 can should
Information recognition result pushes to terminal 110.The user of terminal 110 checks the information recognition result.The terminal 110 can be any
A kind of equipment that can realize intelligent input output, for example, desktop computer or mobile terminal, mobile terminal can be intelligent hand
Machine, tablet computer, vehicle-mounted computer, wearable intelligent equipment etc..The server 120 can be to provide the platform of information recognition result
The server at place.Server 120 can be one or more.The present embodiment refers to server 120 to transition probability square
The newer scheme of battle array and the scheme for obtaining information recognition result is being identified into row information.
The internal structure chart of server 120 in one embodiment is as shown in Figure 2.The server 120 includes passing through system
Processor, storage medium, network interface and the memory of bus connection.Wherein, the processor of server 120 for provide calculate and
Control ability supports the operation of entire server.The storage medium of server 120 is stored with operating system, local data base, one
The computer applied algorithm of kind transition probability matrix updating device and information recognition device, the transition probability matrix updating device
A kind of transition probability matrix update method, the calculating of the information recognition device are realized when computer applied algorithm is executed by processor
Machine application program realizes a kind of information identifying method when being executed by processor.Non-volatile memories Jie is saved as in server 120
The operation of the transition probability matrix updating device of character in matter provides environment, and computer-readable finger can be stored in the memory
Enable, when which is executed by processor, may make processor execute a kind of transition probability matrix update method and
A kind of information identifying method.The network interface of server 120 is for connecting and communicating with network 130.
As shown in figure 3, the transition probability matrix update method of one embodiment, including step S310 to step S360:
S310:Obtain the transition probability matrix of information.
Wherein, information can be character or sound frequency, i.e., when information is character, transition probability matrix corresponds to character
Transition probability frequency, information to be transferred is character to be transferred, and the information that diverts the aim is to divert the aim character.When information is sound
When frequency, transition probability matrix is the transition probability of wave audio, and information to be transferred is sound frequency to be transferred, and divert the aim letter
Breath is the sound frequency that diverts the aim.Wherein, the transition probability that the transition probability matrix of information can be calculated according to message sample
Matrix.
Element in transition probability matrix is each transition probability, i.e. transfer matrix includes each transition probability, wherein transfer is general
The first transition probability in rate matrix is the probability that information to be transferred is transferred to the information that diverts the aim, and is turned to be arrived with follow-up update
The second transition probability moved in probability matrix distinguishes, and herein, the transition probability in transition probability matrix is defined as first
Transition probability.It is appreciated that the probability that information to be transferred is transferred to the information that diverts the aim is to switch through shifting mesh behind information to be transferred
Mark the probability of information.By taking character as an example, if character to be transferred is " electricity ", the character that diverts the aim includes " depending on " and " words " etc., is corresponded to
Transition probability be respectively character to be transferred " electricity " be transferred to the character " depending on " that diverts the aim probability (" electricity " below connect the general of " depending on "
Rate) and character to be transferred " electricity " be transferred to the probability of the character that diverts the aim " words " (" electricity " connects the probability of " words " below).In addition,
By taking sound frequency as an example, sound frequency to be transferred is " electricity " corresponding frequency, and the sound frequency that diverts the aim includes that " depending on " is corresponding
Sound frequency and " words " corresponding sound frequency etc., corresponding transition probability are respectively " electricity " corresponding transfer sound frequency transfer
To the probability (probability for connecing the sound frequency of " depending on " behind the sound frequency of " electricity ") of " depending on " the corresponding sound frequency that diverts the aim
And " electricity " corresponding transfer sound frequency is transferred to the probability (sound audio of " electricity " of " words " corresponding sound frequency that diverts the aim
The probability of the sound frequency of " words " is connect behind rate).
S320:According to transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined.
Since information sparsity, that is, information content is less, the sparsity of transition probability matrix is may result in, in this way may
Accuracy is not high when follow-up being caused to identify, to need to be updated to transition probability matrix, increases its information content, to
Increase the transition probability of information.In this way, firstly the need of according to transition probability matrix, determine that each the first of quasi- progress probability updating is waited for
Transinformation needs to be updated the corresponding transition probability of the first transinformation.For example, by taking character as an example, transition probability
There is character " water layer " to connect the transition probability of character " shallow " below in matrix, behind do not connect other characters, then after character " water layer "
The character that face connects needs to increase the transition probability that " water layer " arrives other characters such as " depth " there may be sparsity, to incite somebody to action
The character " water layer " increases it and arrives the transition probability of other characters as quasi- each first information to be transferred for carrying out probability updating,
Realize the transition probability update of character " water layer ".
S330:Obtain each default general information corresponding with the first information to be transferred, and by each default general information difference
It is followed by the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred.
It includes default conventional characters or default common sound frequency that default general information, which corresponds to,.Under normal circumstances, some
The frequency of use of information can be higher, some uncommon information frequency of use can be relatively low, then is used as and first by default general information
The information foundation that information to be transferred merges, from can largely cover the common information of people.For example, the first information to be transferred
For character " electricity ", it includes " depending on ", " words ", " stream " and " pressure " etc. to preset conventional characters, then can distinguish these conventional characters
It is followed by character " electricity ", obtained each pooling information is respectively " TV ", " phone ", " electric current " and " voltage " etc..
S340:It obtains and the first volumes of searches obtained, Yi Jitong is scanned for by the first information to be transferred of search engine pair
It crosses search engine and each second volumes of searches obtained is scanned for information after each merging respectively.
After being merged after information, it can obtain and scan for obtain first by the first information to be transferred of search engine pair
Volumes of searches, and each second volumes of searches obtained is scanned for information after each merging by search engine respectively.In this implementation
In example, information after the first information to be transferred and each merging can be scanned for, the first search is obtained by calling search engine
Amount and each second volumes of searches, can also be by the first transinformation and it is each merge after information be sent respectively to search engine it
Afterwards, the search result for then obtaining search engine return again, to obtain the first volumes of searches and each second volumes of searches.
S350:The ratio of each second volumes of searches and the first volumes of searches is calculated separately, the first information to be transferred is obtained and turns respectively
Move to each second transition probability of each default general information.
S360:Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated to transition probability square
In battle array, updated transition probability matrix is obtained.
First volumes of searches is certain, and the second volumes of searches is different according to the difference of information after merging, and the second volumes of searches is got over
Information is more common after indicating the merging greatly, and the ratio of the second volumes of searches and the first volumes of searches is bigger, behind the first information to be transferred
The probability for connecing information after the merging is bigger, i.e. possibility that the first transinformation is transferred to the default general information is bigger.In order to
It is the transition probability closer to common information to improve the transition probability in updated transfer matrix, calculates separately each second and searches
The ratio of rope amount and the first volumes of searches obtains each second transfer that the first information to be transferred is transferred to each default general information respectively
Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, obtains by probability
Obtain updated transition probability matrix.For example, the first information to be transferred is character " electricity ", the of information " TV " after above-mentioned merging
The second volumes of searches that the ratio of two volumes of searches and the first volumes of searches of character " electricity " is more than " phone " is searched with the first of character " electricity "
The ratio of rope amount indicates that TV is more often used, if preset number is 1, what is chosen is the second search of information " TV " after merging
The ratio of amount and the first volumes of searches of character " electricity " is updated to transition probability matrix.
It is corresponding even if it is sparse text occur in message sample by the transition probability matrix update method of above-mentioned character
Transition probability matrix is inaccurate, also can be corresponding with the first information to be transferred by corresponding second volumes of searches of information after choosing merging
The transition probability of the maximum preceding preset number of ratio of the first volumes of searches be updated in transition probability matrix, i.e., first is waited turning
The transition probability for moving the larger partial character of volumes of searches that information is transferred to information after corresponding merging in default conventional characters increases
It is added to transition probability matrix, transition probability matrix is updated, the accuracy of transition probability matrix is improved, subsequently according to transition probability square
When battle array is identified, identification accuracy is improved.
The first information to be transferred is that the number of the associated information that diverts the aim is less than present count in one of the embodiments,
The information to be transferred of amount.
That is, when determining quasi- each first information to be transferred for carrying out probability updating according to transition probability matrix, it is
Whether the quantity by detecting the information that diverts the aim of information association to be transferred is less than preset quantity, if being less than, then it represents that transfer
The information to be transferred of this in matrix is followed by diverting the aim information there are the sparsity information that is followed by diverting the aim is less, at this point,
The information to be transferred is determined as quasi- the first information to be transferred for carrying out probability updating.For example, preset quantity is 30, letter to be transferred
Breath " water layer " information that is followed by uniquely diverting the aim is " shallow ", that is, the quantity for the information that is followed by diverting the aim is 1, is less than preset quantity
30, at this point, can be by information to be transferred " water layer " as quasi- the first information to be transferred for carrying out probability updating.
As shown in figure 4, a kind of information identifying method of embodiment, including step S410 to S460:
S410:Signature analysis is carried out to object to be identified, obtains characteristic information.
Object to be identified may include character picture to be identified or audio to be identified.Each object to be identified has its associated feature
Information, characteristic information correspondence may include that image pixel information and voiceprint, i.e., character picture to be identified have its associated figure
As Pixel Information, audio to be identified has its associated voiceprint, by carrying out signature analysis to object to be identified, can get and waits for
Identify the characteristic information of object.
S420:It is determined there are when unusual part according to characteristic information, unusual part is detached from object to be identified, is obtained
Object to be identified after must detaching.
Object to be identified is carried out abnormality detection according to characteristic information, determining object to be identified, there are unusual parts
When, unusual part is detached from object to be identified, the object to be identified after being detached.It is carried out to character picture to be identified
It when identification, is carried out abnormality detection according to image pixel value, when image pixel value is continuously big for 1 area in character picture to be identified
When preset area, it is believed that there are exceptions, and unusual part is the part that above-mentioned image pixel value is continuously 1.To be identified
When audio is identified, carried out abnormality detection according to voiceprint, when in audio to be identified voiceprint there are it is inconsistent when, recognize
Exist for audio to be identified abnormal.By taking character picture to be identified as an example, blocked when there is dash area in character picture to be identified
The character of the part, at this time the pixel value of dash area be all 1, be abnormal, i.e., can detect that according to pixel value and wait knowing
Unusual part, that is, dash area in other character picture removes the audio-visual part from character picture to be identified, you can obtains
Character picture to be identified after separation, the character picture to be identified after the separation are not including dash area.In addition, to treat
For identification audio is identified, when a user occurs the sound that other noises cover user when speaking,
There are two different voiceprints in corresponding audio to be identified, and noise general persistence is shorter, then by the duration compared with
Short corresponding voiceprint is determined as unusual part, which is detached from the voiceprint of audio to be identified, is obtained
Audio to be identified after separation.
S430:Object to be identified after separation is identified, initial recognition result is obtained.
Object to be identified after separation is part without exception, it can be identified, for be identified after separation
For character picture, acquisition original character recognition result is identified and is identified for the audio to be identified after separation
Obtain initial audio recognition result.
S440:The first identification information corresponding with preceding adjacent object in initial recognition result is obtained, preceding adjacent object is to wait for
It identifies in object, the previous subobject adjacent with unusual part.
Object to be identified after separation is identified, initial recognition result is obtained, include in initial recognition result
In object to be identified, there are the recognition result of adjacent previous subobject, i.e. the first identification information with unusual part, knows from initial
The first identification information corresponding with preceding adjacent object is obtained in other result.For example, in object to be identified, with abnormal portion split-phase
First identification information of adjacent previous subobject is character " electricity ", i.e., what character " electricity " connect below is unusual part.In another example
When object to be identified is audio to be identified, initial audio recognition result includes sound frequency, in object to be identified and unusual part
First identification information of adjacent previous object word is " electricity " corresponding sound frequency, is obtained with before from initial recognition result
The corresponding sound frequency of adjacent object corresponding " electricity ".
S450:Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of object to be identified more
Transition probability matrix after new determines corresponding with unusual part according to updated transition probability matrix and the first identification information
The second identification information.
Since the type of object to be identified may include character picture to be identified or audio to be identified, different objects to be identified pair
It answers different transition probability matrixs, in above-mentioned transition probability matrix update method, is carried out more to the transition probability matrix of information
Newly, wherein information can be character or sound frequency, i.e. transition probability matrix can correspond to the transition probability matrix of different types
(for example, corresponding to the transition probability frequency of character or the transition probability of wave audio respectively), above-mentioned transition probability matrix update side
The confirmable updated transition probability matrix corresponding with the type of above- mentioned information of method, in the present embodiment, to be identified right
When as being identified, object to be identified can be identification character picture or audio to be identified, accordingly, above-mentioned transition probability matrix
The transition probability matrix of character or the transition probability matrix of sound frequency may be updated in update method, can directly acquire using above-mentioned turn
The updated transition probability matrix corresponding with the type of object to be identified that probability matrix update method determines is moved, according to update
Transition probability matrix afterwards and the first identification information, it may be determined that the second identification information corresponding with unusual part, i.e., to exception
Part identification finishes to have obtained the second identification information.For example, the first identification information is " electricity ", there is the word in transition probability matrix
Symbol " electricity " switches through the probability for moving target character below, so as to determine that unusual part corresponds to according to updated transition probability matrix
The second identification information, that is, determine unusual part recognition result, improve identification accuracy.In another realization method, also may be used
It is then general according to updated transfer to determine updated transition probability matrix using above-mentioned transition probability matrix update method
Rate matrix and the first identification information determine the second identification information corresponding with unusual part.
S460:Initial recognition result and the second identification information are combined, the information identification knot of object to be identified is obtained
Fruit.
Due to initial recognition result be to the object to be identified after separation be identified as a result, the not knowledge of unusual part
Not as a result, that is, object to be identified identification is missing from, therefore, in order to realize complete identification, by initial recognition result
It is combined with the second identification information, obtains the information recognition result of object to be identified.
Unusual part is found in above- mentioned information recognition methods first, then again to object to be identified separating abnormality part after
Acquisition initial recognition result is identified in object to be identified after separation, then according to initial recognition result and transition probability square
Unusual part is identified in battle array, that is, the object to be identified and unusual part after detaching are to separate identification, without to entirely depositing
Carry out the identification of unusual part, then the entire identification to object to be identified again after the object to be identified of unusual part is identified
As a result it is updated, reduces recognition time, improve recognition efficiency.And in the present solution, deposited when abnormal in object to be identified, it is
The updated transition probability matrix and initial recognition result determined by above-mentioned transition probability matrix update method carries out abnormal
The accuracy of partial identification, updated transition probability matrix is good, can be according to more accurate in this way when being identified into row information
Updated transition probability matrix be identified, improve identification accuracy.
As shown in figure 5, in one of the embodiments, according to updated transition probability matrix and the first identification information
Determine the second identification information corresponding with unusual part the step of include:
S451:According to updated transition probability matrix determine it is corresponding with the first identification information respectively divert the aim information with
And the transition probability for the information that respectively diverts the aim.
After the determination of the first identification information, further according to updated transition probability matrix, determine and the first identification information pair
The transition probability of respectively the divert the aim information and the information that respectively diverts the aim answered.For example, the first identification information be character " electricity " after,
It can determine that the corresponding information that diverts the aim includes " depending on " and " words " etc., corresponding transition probability also can determine, i.e., electricity turns
Move on to depending on transition probability can determine, electrotransfer to words transition probability can determine.
S452:The corresponding information that diverts the aim of the transition probability of the maximum predetermined number of the value of transition probability is determined as joining
Examine the information that diverts the aim.
However, the transition probability that the different information that diverts the aim corresponds to the first identification information is different, accordingly it is desirable to will transfer
The corresponding information that diverts the aim of transition probability of the maximum predetermined number of value of probability is determined as reference transfer target information, to carry
The accuracy of high transfer matrix.
S453:Actual transfer information corresponding with the first identification information is determined according to each reference transfer target information, and will
Actual transfer information is determined as the second identification information.
Specifically, actual transfer information corresponding with the first identification information is selected from each reference transfer target information, and
Actual transfer information is determined as the second identification information.Maximum turn of transition probability can be selected from each reference transfer target information
The reference target information of probability is moved as actual transfer information, it will be understood that can also be according to reference transfer target information initial
The number occurred in recognition result determines actual transfer information corresponding with the first identification information, and actual transfer information is determined
The number occurred in initial recognition result specifically can be selected from each reference transfer target information for the second identification information
Most reference transfer target informations is as actual transfer information corresponding with the first identification information.
The corresponding information that diverts the aim of the maximum transition probability of the value of transition probability is determined in one of the embodiments,
For actual transfer information corresponding with the first identification information.
It indicates that the possibility that the information to be transferred is transferred to the information that diverts the aim is bigger since transition probability is bigger, more indicates
The information that diverts the aim is that the information to be transferred is commonly followed by information, to by the maximum transition probability of the value of transition probability
The corresponding information that diverts the aim is determined as actual transfer information corresponding with the first identification information.
Above-mentioned transition probability matrix update method is illustrated with a specific embodiment below, wherein information is
Character, object to be identified are character picture to be identified, and the transition probability matrix of information is the transition probability matrix of character.
When occurring obscure portions (unusual part) when scanning character picture to be identified, OCR (optical character identification)
It is that None- identified or be unable to reach accurately identifies to the unusual part.Therefore it needs to assist the context using the unusual part
Information estimates the part, and preferably the unusual part could be identified.However, due to the " text of character sample
Sparsity ", there are errors when leading to the transition probability matrix calculated to the unusual part, lead to transition probability square in this way
Battle array is inaccurate, causes subsequently to identify inaccurate.
For example, being made by 13,000,000 words (can be the word on the words publication such as newspaper) by being obtained in the way of crawling etc.
For character sample, the transition probability matrix of calculating character, the result of transition probability matrix is as shown in Fig. 6.
Wherein, the transition probability that character " water layer " is followed by " shallow " is 1, and the transition probability that " Chi Yuan " connects " economy " below is
1, if in this way, when " water layer " occurs to obscure None- identified afterwards, according to transition probability matrix, behind can only connect " shallow ", then obscure portion
The recognition result divided can only be " shallow ", can cause to identify that accuracy is not high in this way.The character that " water layer " and " Chi Yuan " connects below
Number is 1, this indicates transition probability matrix, and there are sparsities, and under normal circumstances, " water layer " can not possibly only connect " shallow " below,
I.e. transition probability can not possibly be 1, and the number for requiring supplementation with the character that diverts the aim being followed by is less than the character to be transferred of preset data,
" water layer " and " Chi Yuan " is exactly 2 therein, it is necessary to be optimized to the transition probability matrix.The scheme of optimization is:Work as OCR
When there is sparsity in the system discovery transition probability matrix, the conjunction after " water layer " and " water layer " is merged with default conventional characters
And rear character, the first volumes of searches for scanning for obtaining to " water layer " by search engine is obtained, and pass through search engine point
Other pair merge with " water layer " after it is each merge after information scan for each second volumes of searches obtained, calculate separately each second search
The ratio of amount and the first volumes of searches obtains the first information to be transferred and is transferred to each second transfer of each default general information respectively generally
Rate;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, is obtained
Updated transition probability matrix.To, allow OCR system to understand that the character of " water layer " continued access afterwards is not only " shallow ", it can be with
There are other characters, such as character " depth ", so that the accuracy of OCR identifications increases substantially.
Referring to Fig. 7, being based on thought identical with above-mentioned transition probability matrix update method, turning for one embodiment is provided
Probability matrix updating device is moved, including:
Initial matrix acquisition module 710, the transition probability matrix for obtaining information, first turn in transition probability matrix
It is the probability that information to be transferred is transferred to the information that diverts the aim to move probability;
Data obtaining module 720, for according to transition probability matrix, determining that each the first of quasi- progress probability updating is to be transferred
Information;
Merging module 730 for obtaining each default general information corresponding with the first information to be transferred, and each will be preset often
It is followed by respectively with information to the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred;
Search result acquisition module 740 obtains and scans for the of acquisition by the first information to be transferred of search engine pair
One volumes of searches, and each second volumes of searches obtained is scanned for information after each merging by search engine respectively;
Probability evaluation entity 750, the ratio for calculating separately each second volumes of searches and the first volumes of searches obtain first and wait for
Transinformation is transferred to each second transition probability of each default general information respectively;
Probability updating module 760 is used for the second transition probability of the maximum preceding preset number of the value of the second transition probability
It is updated in transition probability matrix, obtains updated transition probability matrix.
The first information to be transferred is that the number of the associated information that diverts the aim is less than present count in one of the embodiments,
The information to be transferred of amount.
Referring to Fig. 8, being based on thought identical with above- mentioned information recognition methods, the information identification dress of one embodiment is provided
It sets, including:Initial analysis module 810, separation module 820, initial identification module 830, subobject acquisition module 840, sub- identification
Module 850, recognition result update module 860 and above-mentioned transition probability matrix updating device.
Initial analysis module 810 obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module 820, for being determined there are when unusual part according to characteristic information, by unusual part to be identified right
As middle separation, the object to be identified after being detached;
Initial identification module 830 obtains initial recognition result for the object to be identified after separation to be identified;
Subobject acquisition module 840, for obtaining the first identification letter corresponding with preceding adjacent object in initial recognition result
Breath, preceding adjacent object be object to be identified in, the previous subobject adjacent with unusual part;
Sub- identification module 850, for being determined according to the transition probability matrix updating device with the object to be identified
The corresponding updated transition probability matrix of type, the updated transition probability matrix and first identification information
Determine the second identification information corresponding with the unusual part;
Recognition result update module 860 obtains for initial recognition result and the second identification information to be combined and waits knowing
The information recognition result of other object.
Refering to Fig. 9, sub- identification module 850 may include in one of the embodiments,:
Probability acquisition module 851, it is corresponding with the first identification information for being determined according to updated transition probability matrix
The transition probability of the information that respectively diverts the aim and the information that respectively diverts the aim;
Reference information determining module 852, for the transition probability of the maximum predetermined number of the value of transition probability is corresponding
The information that diverts the aim is determined as reference transfer target information;
Sub- recognition result determining module 853, for being determined and the first identification information pair according to each reference transfer target information
The actual transfer information answered, and actual transfer information is determined as the second identification information.
Sub- recognition result determining module 853 in one of the embodiments, by the maximum transition probability of the value of transition probability
The corresponding information that diverts the aim is determined as actual transfer information corresponding with the first identification information.
One embodiment of the present of invention also provides a kind of computer equipment, including memory, processor and is stored in
On reservoir and the computer program that can run on a processor, processor realize above-mentioned transition probability square when executing computer program
Battle array update method or information identifying method.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, program can be stored in a non-volatile computer-readable storage
In medium, in the embodiment of the present invention, which can be stored in the storage medium of computer system, and by the computer system
At least one of processor execute, to realize including flow such as the embodiment of above-mentioned each method.Wherein, storage medium can be
Magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access
Memory, RAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment
In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance
Shield is all considered to be the range of this specification record.
Only several embodiments of the present invention are expressed for above example, the description thereof is more specific and detailed, but can not
Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for those of ordinary skill in the art,
Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention
It encloses.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.
Claims (11)
1. a kind of transition probability matrix update method, which is characterized in that including step:
The transition probability matrix of information is obtained, the first transition probability in the transition probability matrix is that information to be transferred is transferred to
The probability for the information that diverts the aim;
According to the transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined;
Obtain each default general information corresponding with the described first information to be transferred, and will be after each default general information distinguishes
It is connected to the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and by described
Search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated separately, the described first information difference to be transferred is obtained
It is transferred to each second transition probability of each default general information;
Second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transition probability square
In battle array, updated transition probability matrix is obtained.
2. transition probability matrix update method according to claim 1, which is characterized in that first information to be transferred is
The number of the associated information that diverts the aim is less than the information to be transferred of preset quantity.
3. a kind of information identifying method, which is characterized in that include the following steps:
Signature analysis is carried out to object to be identified, obtains characteristic information;
It is determined there are when unusual part according to characteristic information, the unusual part is detached from the object to be identified, is obtained
Object to be identified after must detaching;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is described
Previous subobject in object to be identified, adjacent with the unusual part;
It obtains using the transition probability matrix update method determination described in the claims 1 or 2 and the object to be identified
The corresponding updated transition probability matrix of type, according to the updated transition probability matrix and it is described first identification
Information determines the second identification information corresponding with the unusual part;
The initial recognition result and second identification information are combined, the information identification of the object to be identified is obtained
As a result.
4. information identifying method according to claim 3, which is characterized in that according to the updated transition probability matrix
And first identification information determines that the step of the second identification information corresponding with the unusual part includes:
According to the updated transition probability matrix determine it is corresponding with first identification information respectively divert the aim information with
And the transition probability for the information that respectively diverts the aim;
The corresponding information that diverts the aim of the transition probability of the maximum predetermined number of the value of transition probability is determined as reference transfer mesh
Mark information;
Actual transfer information corresponding with first identification information is determined according to each reference transfer target information, and by institute
It states actual transfer information and is determined as second identification information.
5. information identifying method according to claim 4, which is characterized in that
The corresponding information that diverts the aim of the maximum transition probability of the value of transition probability is determined as and first identification information pair
The actual transfer information answered.
6. a kind of transition probability matrix updating device, which is characterized in that including:
Initial matrix acquisition module, the transition probability matrix for obtaining information, the first transfer in the transition probability matrix
Probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, determining quasi- each first letter to be transferred for carrying out probability updating
Breath;
Merging module for obtaining each default general information corresponding with the described first information to be transferred, and each described will be preset
General information is followed by respectively to the described first information to be transferred, is believed after obtaining each merging corresponding with the described first information to be transferred
Breath;
Search result acquisition module is obtained and is searched to the described first information to be transferred scans for acquisition first by search engine
Suo Liang, and each second volumes of searches obtained is scanned for information after each merging by described search engine respectively;
Probability evaluation entity, the ratio for calculating separately each second volumes of searches and first volumes of searches, described in acquisition
First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module, for updating the second transition probability of the maximum preceding preset number of the value of second transition probability
To in the transition probability matrix, updated transition probability matrix is obtained.
7. transition probability matrix updating device according to claim 6, which is characterized in that first information to be transferred is
The number of the associated information that diverts the aim is less than the information to be transferred of preset quantity.
8. a kind of information recognition device, which is characterized in that including the transition probability matrix update described in the claims 6 or 7
Device further includes;
Initial analysis module obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module, for being determined there are when unusual part according to characteristic information, by the unusual part from described to be identified
It is detached in object, the object to be identified after being detached;
Initial identification module obtains initial recognition result for the object to be identified after the separation to be identified;
Subobject acquisition module, for obtaining the first identification information corresponding with preceding adjacent object in the initial recognition result,
The preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, the type pair with the object to be identified for being determined according to the transition probability matrix updating device
The updated transition probability matrix answered, the updated transition probability matrix and first identification information determination and institute
State corresponding second identification information in unusual part;
Recognition result update module obtains institute for the initial recognition result and second identification information to be combined
State the information recognition result of object to be identified.
9. information recognition device according to claim 8, which is characterized in that the sub- identification module includes:
Probability acquisition module, it is corresponding with first identification information for being determined according to the updated transition probability matrix
The transition probability of the information that respectively diverts the aim and the information that respectively diverts the aim;
Reference information determining module, for diverting the aim the transition probability of the maximum predetermined number of the value of transition probability is corresponding
Information is determined as reference transfer target information;
Sub- recognition result determining module, for being determined and first identification information pair according to each reference transfer target information
The actual transfer information answered, and the actual transfer information is determined as second identification information.
10. information recognition device according to claim 8, which is characterized in that the sub- recognition result determining module will turn
The corresponding information that diverts the aim of the maximum transition probability of value for moving probability is determined as reality corresponding with first identification information
Transinformation.
11. a kind of computer equipment, including memory, processor and it is stored on the memory and can be in the processor
The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to
Method described in 5 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710288225.5A CN107247724B (en) | 2017-04-27 | 2017-04-27 | Transition probability matrix update, information identifying method and device, computer equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710288225.5A CN107247724B (en) | 2017-04-27 | 2017-04-27 | Transition probability matrix update, information identifying method and device, computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107247724A CN107247724A (en) | 2017-10-13 |
CN107247724B true CN107247724B (en) | 2018-07-20 |
Family
ID=60016419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710288225.5A Active CN107247724B (en) | 2017-04-27 | 2017-04-27 | Transition probability matrix update, information identifying method and device, computer equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107247724B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173684A (en) * | 1996-05-21 | 1998-02-18 | 株式会社日立制作所 | Apparatus for recognizing input character strings by inference |
JP2009252044A (en) * | 2008-04-08 | 2009-10-29 | Canon Inc | Document management system, method, and program |
CN101652773A (en) * | 2007-03-30 | 2010-02-17 | 微软公司 | Look-ahead document ranking system |
CN102982330A (en) * | 2012-11-21 | 2013-03-20 | 新浪网技术(中国)有限公司 | Method and device recognizing characters in character images |
CN106156142A (en) * | 2015-04-13 | 2016-11-23 | 深圳市腾讯计算机***有限公司 | The processing method of a kind of text cluster, server and system |
-
2017
- 2017-04-27 CN CN201710288225.5A patent/CN107247724B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1173684A (en) * | 1996-05-21 | 1998-02-18 | 株式会社日立制作所 | Apparatus for recognizing input character strings by inference |
CN101652773A (en) * | 2007-03-30 | 2010-02-17 | 微软公司 | Look-ahead document ranking system |
JP2009252044A (en) * | 2008-04-08 | 2009-10-29 | Canon Inc | Document management system, method, and program |
CN102982330A (en) * | 2012-11-21 | 2013-03-20 | 新浪网技术(中国)有限公司 | Method and device recognizing characters in character images |
CN106156142A (en) * | 2015-04-13 | 2016-11-23 | 深圳市腾讯计算机***有限公司 | The processing method of a kind of text cluster, server and system |
Also Published As
Publication number | Publication date |
---|---|
CN107247724A (en) | 2017-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108121816B (en) | Picture classification method and device, storage medium and electronic equipment | |
EP3493101A1 (en) | Image recognition method, terminal, and nonvolatile storage medium | |
US10748007B2 (en) | Identifying objects in an image | |
CN107885430B (en) | Audio playing method and device, storage medium and electronic equipment | |
CN111222397B (en) | Drawing recognition method and device and robot | |
CN105046186A (en) | Two-dimensional code recognition method and device | |
CN107368550B (en) | Information acquisition method, device, medium, electronic device, server and system | |
CN110032510B (en) | Application testing method and device | |
CN109547393B (en) | Malicious number identification method, device, equipment and storage medium | |
CN105653171A (en) | Fingerprint identification based terminal control method, terminal control apparatus and terminal | |
US20030012440A1 (en) | Form recognition system, form recognition method, program and storage medium | |
CN111866392A (en) | Shooting prompting method and device, storage medium and electronic equipment | |
CN105069013A (en) | Control method and device for providing input interface in search interface | |
CN104615663A (en) | File sorting method and device and terminal | |
CN104994236A (en) | Information processing method and device | |
CN112801235A (en) | Model training method, prediction device, re-recognition model and electronic equipment | |
CN109325539A (en) | Insulator falls crosstalk detecting method and device | |
CN113255566B (en) | Form image recognition method and device | |
CN107247724B (en) | Transition probability matrix update, information identifying method and device, computer equipment | |
KR101483611B1 (en) | Method and Terminal for Extracting a Object from Image | |
CN105528428A (en) | Image display method and terminal | |
CN110929057A (en) | Image processing method, device and system, storage medium and electronic device | |
CN111178349A (en) | Image identification method, device, equipment and storage medium | |
CN110598027A (en) | Image processing effect display method and device, electronic equipment and storage medium | |
CN112183149B (en) | Graphic code processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |