CN107247724B - Transition probability matrix update, information identifying method and device, computer equipment - Google Patents

Transition probability matrix update, information identifying method and device, computer equipment Download PDF

Info

Publication number
CN107247724B
CN107247724B CN201710288225.5A CN201710288225A CN107247724B CN 107247724 B CN107247724 B CN 107247724B CN 201710288225 A CN201710288225 A CN 201710288225A CN 107247724 B CN107247724 B CN 107247724B
Authority
CN
China
Prior art keywords
information
transition probability
transferred
identified
probability matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710288225.5A
Other languages
Chinese (zh)
Other versions
CN107247724A (en
Inventor
许利宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710288225.5A priority Critical patent/CN107247724B/en
Publication of CN107247724A publication Critical patent/CN107247724A/en
Application granted granted Critical
Publication of CN107247724B publication Critical patent/CN107247724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of transition probability matrix update method and device, information identifying method and device, computer equipment determine quasi- each first information to be transferred for carrying out probability updating according to the transition probability matrix of acquisition;Obtain each default general information corresponding with the first information to be transferred, and information after each default general information is followed by respectively to each merging corresponding with the first information to be transferred of the first information acquisition to be transferred;It obtains and the first volumes of searches obtained is scanned for the first information to be transferred, and each second volumes of searches obtained is scanned for information after each merging respectively;The ratio of each second volumes of searches and the first volumes of searches is calculated separately, each second transition probability that the first information to be transferred is transferred to each default general information respectively is obtained;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, transition probability matrix accuracy is improved and identification accuracy can be improved when being identified into row information according to updated transition probability matrix.

Description

Transition probability matrix update, information identifying method and device, computer equipment
Technical field
The present invention relates to technical field of computer information processing, more particularly to a kind of transition probability matrix update method and dress It sets, information identifying method and device, computer equipment.
Background technology
Field is identified in information such as character, audios, the recognition result obtained when being identified using different methods Accuracy is simultaneously different, currently, frequently with character identifying method, the image of to be by scanning include character, according to word in image The feature of symbol carries out character recognition, frequently with audio identification methods, be to be identified by the sound frequency of audio.
It is difficult to be identified as by above-mentioned recognition methods however, when the fuzzy character for causing to include of image appearance is unintelligible Work(, the result accuracy identified in other words are relatively low.When audio is interfered by noise, the identification of audio is accurately relatively low.In profit Smudgy clear character or the audio being interfered are identified with the transition probability matrix mode of information, but due to depositing It is unable to wide range covering day commonly used information, the transition probability matrix of formation in the information content of sparsity, i.e. message sample Inaccuracy causes the result being subsequently identified according to transition probability matrix also inaccurate.
Invention content
Based on this, it is necessary to for causing the transition probability matrix of information inaccurate since information is sparse, to make subsequently The inaccurate problem of information identification proposes a kind of transition probability matrix update method and device, information identifying method and device, meter Calculate machine equipment.
Accordingly, the present embodiment uses following technical scheme:
A kind of transition probability matrix update method, includes the following steps:
The transition probability matrix of information is obtained, the first transition probability in the transition probability matrix turns for information to be transferred Move to the probability for the information that diverts the aim;
According to the transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined;
Each default general information corresponding with the described first information to be transferred is obtained, and each default general information is divided It is not followed by the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and is passed through Described search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated separately, first information to be transferred is obtained It is transferred to each second transition probability of each default general information respectively;
It is general that second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transfer In rate matrix, updated transition probability matrix is obtained.
The present invention also provides a kind of transition probability matrix updating devices, including:
Initial matrix acquisition module, the transition probability matrix for obtaining information, first in the transition probability matrix Transition probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, determining that each the first of quasi- progress probability updating is waited turning Move information;
Merging module, for obtaining each default general information corresponding with the described first information to be transferred, and will it is each described in Default general information is followed by respectively to the described first information to be transferred, obtains each merging corresponding with the described first information to be transferred Information afterwards;
Search result acquisition module obtains and scans for the of acquisition to the described first information to be transferred by search engine One volumes of searches, and each second search obtained is scanned for information after each merging by described search engine respectively Amount;
Probability evaluation entity, the ratio for calculating separately each second volumes of searches and first volumes of searches obtain First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module is used for the second transition probability of the maximum preceding preset number of the value of second transition probability It is updated in the transition probability matrix, obtains updated transition probability matrix.
The present invention also provides a kind of information identifying methods, include the following steps:
Signature analysis is carried out to object to be identified, obtains characteristic information;
It is determined there are when unusual part according to characteristic information, the unusual part is divided from the object to be identified From the object to be identified after being detached;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is Previous subobject in the object to be identified, adjacent with the unusual part;
Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of the object to be identified more Transition probability matrix after new, according to the updated transition probability matrix and first identification information determine with it is described Corresponding second identification information in unusual part;
The initial recognition result and second identification information are combined, the information of the object to be identified is obtained Recognition result.
The present invention also provides a kind of information recognition devices, including above-mentioned transition probability matrix updating device, further include;
Initial analysis module obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module waits for the unusual part from described for being determined there are when unusual part according to characteristic information It is detached in identification object, the object to be identified after being detached;
Initial identification module obtains initial recognition result for the object to be identified after the separation to be identified;
Subobject acquisition module, for obtaining the first identification letter corresponding with preceding adjacent object in the initial recognition result Breath, the preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, the class with the object to be identified for being determined according to the transition probability matrix updating device The corresponding updated transition probability matrix of type and first identification information determine corresponding with the unusual part Second identification information;
Recognition result update module is obtained for the initial recognition result and second identification information to be combined Obtain the information recognition result of the object to be identified.
The present invention also provides a kind of computer equipment, including memory, processor and it is stored on the memory simultaneously The computer program that can be run on the processor, the processor are realized above-mentioned any one when executing the computer program The transition probability matrix update method or information identifying method of item.
By above-mentioned transition probability matrix update method and device, computer equipment, even if there is text in message sample Sparse, corresponding transition probability matrix is inaccurate, can wait turning with first by corresponding second volumes of searches of information after choosing merging The transition probability for moving the maximum preceding preset number of ratio of corresponding first volumes of searches of information is updated to the transition probability In matrix, i.e., the first information to be transferred is transferred to the larger portion of the volumes of searches of information after corresponding merging in default conventional characters Divide the transition probability of character to increase to transition probability matrix, update transition probability matrix, improve the accuracy of transition probability matrix, When being subsequently identified according to transition probability matrix, identification accuracy is improved.
Above- mentioned information recognition methods and device find unusual part first, then again to object to be identified separating abnormality portion Acquisition initial recognition result is identified in the object to be identified after separation after point, then according to initial recognition result and transfer Unusual part is identified in probability matrix, that is, detach after object to be identified and unusual part be separate identification, without pair The identification of unusual part is carried out after being entirely identified there are the object to be identified of unusual part again, then to the whole of object to be identified A recognition result is updated, and reduces recognition time, improves recognition efficiency.And in the present solution, there are different in object to be identified Chang Shi is the updated transition probability matrix determined by above-mentioned transition probability matrix update method and initial recognition result into The accuracy of the identification of row unusual part, updated transition probability matrix is good, can be according to more in this way when being identified into row information Add accurate updated transition probability matrix to be identified, improves identification accuracy.
Description of the drawings
Fig. 1 is the working environment schematic diagram of one embodiment of the invention;
Fig. 2 is the composed structure schematic diagram of the server in one embodiment;
Fig. 3 is the flow diagram of the transition probability matrix update method of one embodiment;
Fig. 4 is the flow diagram of the information identifying method of one embodiment;
Fig. 5 is the sub-process schematic diagram in the information identifying method of another embodiment;
Fig. 6 is the interface schematic diagram of the transition probability matrix of an embodiment;
Fig. 7 is the module diagram of the transition probability matrix updating device of one embodiment;
Fig. 8 is the module diagram of the information recognition device of one embodiment;
Fig. 9 is the submodule schematic diagram of the information recognition device of another embodiment.
Specific implementation mode
To make the objectives, technical solutions, and advantages of the present invention more comprehensible, with reference to the accompanying drawings and embodiments, to this Invention is described in further detail.It should be appreciated that the specific embodiments described herein are only used to explain the present invention, Do not limit protection scope of the present invention.
Fig. 1 shows the working environment schematic diagram in one embodiment of the invention, as shown in Figure 1, its working environment is related to Terminal 110, server 120 and network 130, terminal 110 and server 120 can be communicated by network 130.Terminal 110 may have access to corresponding server 120 by network 130, and to ask corresponding information recognition result, server 120 can should Information recognition result pushes to terminal 110.The user of terminal 110 checks the information recognition result.The terminal 110 can be any A kind of equipment that can realize intelligent input output, for example, desktop computer or mobile terminal, mobile terminal can be intelligent hand Machine, tablet computer, vehicle-mounted computer, wearable intelligent equipment etc..The server 120 can be to provide the platform of information recognition result The server at place.Server 120 can be one or more.The present embodiment refers to server 120 to transition probability square The newer scheme of battle array and the scheme for obtaining information recognition result is being identified into row information.
The internal structure chart of server 120 in one embodiment is as shown in Figure 2.The server 120 includes passing through system Processor, storage medium, network interface and the memory of bus connection.Wherein, the processor of server 120 for provide calculate and Control ability supports the operation of entire server.The storage medium of server 120 is stored with operating system, local data base, one The computer applied algorithm of kind transition probability matrix updating device and information recognition device, the transition probability matrix updating device A kind of transition probability matrix update method, the calculating of the information recognition device are realized when computer applied algorithm is executed by processor Machine application program realizes a kind of information identifying method when being executed by processor.Non-volatile memories Jie is saved as in server 120 The operation of the transition probability matrix updating device of character in matter provides environment, and computer-readable finger can be stored in the memory Enable, when which is executed by processor, may make processor execute a kind of transition probability matrix update method and A kind of information identifying method.The network interface of server 120 is for connecting and communicating with network 130.
As shown in figure 3, the transition probability matrix update method of one embodiment, including step S310 to step S360:
S310:Obtain the transition probability matrix of information.
Wherein, information can be character or sound frequency, i.e., when information is character, transition probability matrix corresponds to character Transition probability frequency, information to be transferred is character to be transferred, and the information that diverts the aim is to divert the aim character.When information is sound When frequency, transition probability matrix is the transition probability of wave audio, and information to be transferred is sound frequency to be transferred, and divert the aim letter Breath is the sound frequency that diverts the aim.Wherein, the transition probability that the transition probability matrix of information can be calculated according to message sample Matrix.
Element in transition probability matrix is each transition probability, i.e. transfer matrix includes each transition probability, wherein transfer is general The first transition probability in rate matrix is the probability that information to be transferred is transferred to the information that diverts the aim, and is turned to be arrived with follow-up update The second transition probability moved in probability matrix distinguishes, and herein, the transition probability in transition probability matrix is defined as first Transition probability.It is appreciated that the probability that information to be transferred is transferred to the information that diverts the aim is to switch through shifting mesh behind information to be transferred Mark the probability of information.By taking character as an example, if character to be transferred is " electricity ", the character that diverts the aim includes " depending on " and " words " etc., is corresponded to Transition probability be respectively character to be transferred " electricity " be transferred to the character " depending on " that diverts the aim probability (" electricity " below connect the general of " depending on " Rate) and character to be transferred " electricity " be transferred to the probability of the character that diverts the aim " words " (" electricity " connects the probability of " words " below).In addition, By taking sound frequency as an example, sound frequency to be transferred is " electricity " corresponding frequency, and the sound frequency that diverts the aim includes that " depending on " is corresponding Sound frequency and " words " corresponding sound frequency etc., corresponding transition probability are respectively " electricity " corresponding transfer sound frequency transfer To the probability (probability for connecing the sound frequency of " depending on " behind the sound frequency of " electricity ") of " depending on " the corresponding sound frequency that diverts the aim And " electricity " corresponding transfer sound frequency is transferred to the probability (sound audio of " electricity " of " words " corresponding sound frequency that diverts the aim The probability of the sound frequency of " words " is connect behind rate).
S320:According to transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined.
Since information sparsity, that is, information content is less, the sparsity of transition probability matrix is may result in, in this way may Accuracy is not high when follow-up being caused to identify, to need to be updated to transition probability matrix, increases its information content, to Increase the transition probability of information.In this way, firstly the need of according to transition probability matrix, determine that each the first of quasi- progress probability updating is waited for Transinformation needs to be updated the corresponding transition probability of the first transinformation.For example, by taking character as an example, transition probability There is character " water layer " to connect the transition probability of character " shallow " below in matrix, behind do not connect other characters, then after character " water layer " The character that face connects needs to increase the transition probability that " water layer " arrives other characters such as " depth " there may be sparsity, to incite somebody to action The character " water layer " increases it and arrives the transition probability of other characters as quasi- each first information to be transferred for carrying out probability updating, Realize the transition probability update of character " water layer ".
S330:Obtain each default general information corresponding with the first information to be transferred, and by each default general information difference It is followed by the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred.
It includes default conventional characters or default common sound frequency that default general information, which corresponds to,.Under normal circumstances, some The frequency of use of information can be higher, some uncommon information frequency of use can be relatively low, then is used as and first by default general information The information foundation that information to be transferred merges, from can largely cover the common information of people.For example, the first information to be transferred For character " electricity ", it includes " depending on ", " words ", " stream " and " pressure " etc. to preset conventional characters, then can distinguish these conventional characters It is followed by character " electricity ", obtained each pooling information is respectively " TV ", " phone ", " electric current " and " voltage " etc..
S340:It obtains and the first volumes of searches obtained, Yi Jitong is scanned for by the first information to be transferred of search engine pair It crosses search engine and each second volumes of searches obtained is scanned for information after each merging respectively.
After being merged after information, it can obtain and scan for obtain first by the first information to be transferred of search engine pair Volumes of searches, and each second volumes of searches obtained is scanned for information after each merging by search engine respectively.In this implementation In example, information after the first information to be transferred and each merging can be scanned for, the first search is obtained by calling search engine Amount and each second volumes of searches, can also be by the first transinformation and it is each merge after information be sent respectively to search engine it Afterwards, the search result for then obtaining search engine return again, to obtain the first volumes of searches and each second volumes of searches.
S350:The ratio of each second volumes of searches and the first volumes of searches is calculated separately, the first information to be transferred is obtained and turns respectively Move to each second transition probability of each default general information.
S360:Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated to transition probability square In battle array, updated transition probability matrix is obtained.
First volumes of searches is certain, and the second volumes of searches is different according to the difference of information after merging, and the second volumes of searches is got over Information is more common after indicating the merging greatly, and the ratio of the second volumes of searches and the first volumes of searches is bigger, behind the first information to be transferred The probability for connecing information after the merging is bigger, i.e. possibility that the first transinformation is transferred to the default general information is bigger.In order to It is the transition probability closer to common information to improve the transition probability in updated transfer matrix, calculates separately each second and searches The ratio of rope amount and the first volumes of searches obtains each second transfer that the first information to be transferred is transferred to each default general information respectively Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, obtains by probability Obtain updated transition probability matrix.For example, the first information to be transferred is character " electricity ", the of information " TV " after above-mentioned merging The second volumes of searches that the ratio of two volumes of searches and the first volumes of searches of character " electricity " is more than " phone " is searched with the first of character " electricity " The ratio of rope amount indicates that TV is more often used, if preset number is 1, what is chosen is the second search of information " TV " after merging The ratio of amount and the first volumes of searches of character " electricity " is updated to transition probability matrix.
It is corresponding even if it is sparse text occur in message sample by the transition probability matrix update method of above-mentioned character Transition probability matrix is inaccurate, also can be corresponding with the first information to be transferred by corresponding second volumes of searches of information after choosing merging The transition probability of the maximum preceding preset number of ratio of the first volumes of searches be updated in transition probability matrix, i.e., first is waited turning The transition probability for moving the larger partial character of volumes of searches that information is transferred to information after corresponding merging in default conventional characters increases It is added to transition probability matrix, transition probability matrix is updated, the accuracy of transition probability matrix is improved, subsequently according to transition probability square When battle array is identified, identification accuracy is improved.
The first information to be transferred is that the number of the associated information that diverts the aim is less than present count in one of the embodiments, The information to be transferred of amount.
That is, when determining quasi- each first information to be transferred for carrying out probability updating according to transition probability matrix, it is Whether the quantity by detecting the information that diverts the aim of information association to be transferred is less than preset quantity, if being less than, then it represents that transfer The information to be transferred of this in matrix is followed by diverting the aim information there are the sparsity information that is followed by diverting the aim is less, at this point, The information to be transferred is determined as quasi- the first information to be transferred for carrying out probability updating.For example, preset quantity is 30, letter to be transferred Breath " water layer " information that is followed by uniquely diverting the aim is " shallow ", that is, the quantity for the information that is followed by diverting the aim is 1, is less than preset quantity 30, at this point, can be by information to be transferred " water layer " as quasi- the first information to be transferred for carrying out probability updating.
As shown in figure 4, a kind of information identifying method of embodiment, including step S410 to S460:
S410:Signature analysis is carried out to object to be identified, obtains characteristic information.
Object to be identified may include character picture to be identified or audio to be identified.Each object to be identified has its associated feature Information, characteristic information correspondence may include that image pixel information and voiceprint, i.e., character picture to be identified have its associated figure As Pixel Information, audio to be identified has its associated voiceprint, by carrying out signature analysis to object to be identified, can get and waits for Identify the characteristic information of object.
S420:It is determined there are when unusual part according to characteristic information, unusual part is detached from object to be identified, is obtained Object to be identified after must detaching.
Object to be identified is carried out abnormality detection according to characteristic information, determining object to be identified, there are unusual parts When, unusual part is detached from object to be identified, the object to be identified after being detached.It is carried out to character picture to be identified It when identification, is carried out abnormality detection according to image pixel value, when image pixel value is continuously big for 1 area in character picture to be identified When preset area, it is believed that there are exceptions, and unusual part is the part that above-mentioned image pixel value is continuously 1.To be identified When audio is identified, carried out abnormality detection according to voiceprint, when in audio to be identified voiceprint there are it is inconsistent when, recognize Exist for audio to be identified abnormal.By taking character picture to be identified as an example, blocked when there is dash area in character picture to be identified The character of the part, at this time the pixel value of dash area be all 1, be abnormal, i.e., can detect that according to pixel value and wait knowing Unusual part, that is, dash area in other character picture removes the audio-visual part from character picture to be identified, you can obtains Character picture to be identified after separation, the character picture to be identified after the separation are not including dash area.In addition, to treat For identification audio is identified, when a user occurs the sound that other noises cover user when speaking, There are two different voiceprints in corresponding audio to be identified, and noise general persistence is shorter, then by the duration compared with Short corresponding voiceprint is determined as unusual part, which is detached from the voiceprint of audio to be identified, is obtained Audio to be identified after separation.
S430:Object to be identified after separation is identified, initial recognition result is obtained.
Object to be identified after separation is part without exception, it can be identified, for be identified after separation For character picture, acquisition original character recognition result is identified and is identified for the audio to be identified after separation Obtain initial audio recognition result.
S440:The first identification information corresponding with preceding adjacent object in initial recognition result is obtained, preceding adjacent object is to wait for It identifies in object, the previous subobject adjacent with unusual part.
Object to be identified after separation is identified, initial recognition result is obtained, include in initial recognition result In object to be identified, there are the recognition result of adjacent previous subobject, i.e. the first identification information with unusual part, knows from initial The first identification information corresponding with preceding adjacent object is obtained in other result.For example, in object to be identified, with abnormal portion split-phase First identification information of adjacent previous subobject is character " electricity ", i.e., what character " electricity " connect below is unusual part.In another example When object to be identified is audio to be identified, initial audio recognition result includes sound frequency, in object to be identified and unusual part First identification information of adjacent previous object word is " electricity " corresponding sound frequency, is obtained with before from initial recognition result The corresponding sound frequency of adjacent object corresponding " electricity ".
S450:Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of object to be identified more Transition probability matrix after new determines corresponding with unusual part according to updated transition probability matrix and the first identification information The second identification information.
Since the type of object to be identified may include character picture to be identified or audio to be identified, different objects to be identified pair It answers different transition probability matrixs, in above-mentioned transition probability matrix update method, is carried out more to the transition probability matrix of information Newly, wherein information can be character or sound frequency, i.e. transition probability matrix can correspond to the transition probability matrix of different types (for example, corresponding to the transition probability frequency of character or the transition probability of wave audio respectively), above-mentioned transition probability matrix update side The confirmable updated transition probability matrix corresponding with the type of above- mentioned information of method, in the present embodiment, to be identified right When as being identified, object to be identified can be identification character picture or audio to be identified, accordingly, above-mentioned transition probability matrix The transition probability matrix of character or the transition probability matrix of sound frequency may be updated in update method, can directly acquire using above-mentioned turn The updated transition probability matrix corresponding with the type of object to be identified that probability matrix update method determines is moved, according to update Transition probability matrix afterwards and the first identification information, it may be determined that the second identification information corresponding with unusual part, i.e., to exception Part identification finishes to have obtained the second identification information.For example, the first identification information is " electricity ", there is the word in transition probability matrix Symbol " electricity " switches through the probability for moving target character below, so as to determine that unusual part corresponds to according to updated transition probability matrix The second identification information, that is, determine unusual part recognition result, improve identification accuracy.In another realization method, also may be used It is then general according to updated transfer to determine updated transition probability matrix using above-mentioned transition probability matrix update method Rate matrix and the first identification information determine the second identification information corresponding with unusual part.
S460:Initial recognition result and the second identification information are combined, the information identification knot of object to be identified is obtained Fruit.
Due to initial recognition result be to the object to be identified after separation be identified as a result, the not knowledge of unusual part Not as a result, that is, object to be identified identification is missing from, therefore, in order to realize complete identification, by initial recognition result It is combined with the second identification information, obtains the information recognition result of object to be identified.
Unusual part is found in above- mentioned information recognition methods first, then again to object to be identified separating abnormality part after Acquisition initial recognition result is identified in object to be identified after separation, then according to initial recognition result and transition probability square Unusual part is identified in battle array, that is, the object to be identified and unusual part after detaching are to separate identification, without to entirely depositing Carry out the identification of unusual part, then the entire identification to object to be identified again after the object to be identified of unusual part is identified As a result it is updated, reduces recognition time, improve recognition efficiency.And in the present solution, deposited when abnormal in object to be identified, it is The updated transition probability matrix and initial recognition result determined by above-mentioned transition probability matrix update method carries out abnormal The accuracy of partial identification, updated transition probability matrix is good, can be according to more accurate in this way when being identified into row information Updated transition probability matrix be identified, improve identification accuracy.
As shown in figure 5, in one of the embodiments, according to updated transition probability matrix and the first identification information Determine the second identification information corresponding with unusual part the step of include:
S451:According to updated transition probability matrix determine it is corresponding with the first identification information respectively divert the aim information with And the transition probability for the information that respectively diverts the aim.
After the determination of the first identification information, further according to updated transition probability matrix, determine and the first identification information pair The transition probability of respectively the divert the aim information and the information that respectively diverts the aim answered.For example, the first identification information be character " electricity " after, It can determine that the corresponding information that diverts the aim includes " depending on " and " words " etc., corresponding transition probability also can determine, i.e., electricity turns Move on to depending on transition probability can determine, electrotransfer to words transition probability can determine.
S452:The corresponding information that diverts the aim of the transition probability of the maximum predetermined number of the value of transition probability is determined as joining Examine the information that diverts the aim.
However, the transition probability that the different information that diverts the aim corresponds to the first identification information is different, accordingly it is desirable to will transfer The corresponding information that diverts the aim of transition probability of the maximum predetermined number of value of probability is determined as reference transfer target information, to carry The accuracy of high transfer matrix.
S453:Actual transfer information corresponding with the first identification information is determined according to each reference transfer target information, and will Actual transfer information is determined as the second identification information.
Specifically, actual transfer information corresponding with the first identification information is selected from each reference transfer target information, and Actual transfer information is determined as the second identification information.Maximum turn of transition probability can be selected from each reference transfer target information The reference target information of probability is moved as actual transfer information, it will be understood that can also be according to reference transfer target information initial The number occurred in recognition result determines actual transfer information corresponding with the first identification information, and actual transfer information is determined The number occurred in initial recognition result specifically can be selected from each reference transfer target information for the second identification information Most reference transfer target informations is as actual transfer information corresponding with the first identification information.
The corresponding information that diverts the aim of the maximum transition probability of the value of transition probability is determined in one of the embodiments, For actual transfer information corresponding with the first identification information.
It indicates that the possibility that the information to be transferred is transferred to the information that diverts the aim is bigger since transition probability is bigger, more indicates The information that diverts the aim is that the information to be transferred is commonly followed by information, to by the maximum transition probability of the value of transition probability The corresponding information that diverts the aim is determined as actual transfer information corresponding with the first identification information.
Above-mentioned transition probability matrix update method is illustrated with a specific embodiment below, wherein information is Character, object to be identified are character picture to be identified, and the transition probability matrix of information is the transition probability matrix of character.
When occurring obscure portions (unusual part) when scanning character picture to be identified, OCR (optical character identification) It is that None- identified or be unable to reach accurately identifies to the unusual part.Therefore it needs to assist the context using the unusual part Information estimates the part, and preferably the unusual part could be identified.However, due to the " text of character sample Sparsity ", there are errors when leading to the transition probability matrix calculated to the unusual part, lead to transition probability square in this way Battle array is inaccurate, causes subsequently to identify inaccurate.
For example, being made by 13,000,000 words (can be the word on the words publication such as newspaper) by being obtained in the way of crawling etc. For character sample, the transition probability matrix of calculating character, the result of transition probability matrix is as shown in Fig. 6.
Wherein, the transition probability that character " water layer " is followed by " shallow " is 1, and the transition probability that " Chi Yuan " connects " economy " below is 1, if in this way, when " water layer " occurs to obscure None- identified afterwards, according to transition probability matrix, behind can only connect " shallow ", then obscure portion The recognition result divided can only be " shallow ", can cause to identify that accuracy is not high in this way.The character that " water layer " and " Chi Yuan " connects below Number is 1, this indicates transition probability matrix, and there are sparsities, and under normal circumstances, " water layer " can not possibly only connect " shallow " below, I.e. transition probability can not possibly be 1, and the number for requiring supplementation with the character that diverts the aim being followed by is less than the character to be transferred of preset data, " water layer " and " Chi Yuan " is exactly 2 therein, it is necessary to be optimized to the transition probability matrix.The scheme of optimization is:Work as OCR When there is sparsity in the system discovery transition probability matrix, the conjunction after " water layer " and " water layer " is merged with default conventional characters And rear character, the first volumes of searches for scanning for obtaining to " water layer " by search engine is obtained, and pass through search engine point Other pair merge with " water layer " after it is each merge after information scan for each second volumes of searches obtained, calculate separately each second search The ratio of amount and the first volumes of searches obtains the first information to be transferred and is transferred to each second transfer of each default general information respectively generally Rate;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, is obtained Updated transition probability matrix.To, allow OCR system to understand that the character of " water layer " continued access afterwards is not only " shallow ", it can be with There are other characters, such as character " depth ", so that the accuracy of OCR identifications increases substantially.
Referring to Fig. 7, being based on thought identical with above-mentioned transition probability matrix update method, turning for one embodiment is provided Probability matrix updating device is moved, including:
Initial matrix acquisition module 710, the transition probability matrix for obtaining information, first turn in transition probability matrix It is the probability that information to be transferred is transferred to the information that diverts the aim to move probability;
Data obtaining module 720, for according to transition probability matrix, determining that each the first of quasi- progress probability updating is to be transferred Information;
Merging module 730 for obtaining each default general information corresponding with the first information to be transferred, and each will be preset often It is followed by respectively with information to the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred;
Search result acquisition module 740 obtains and scans for the of acquisition by the first information to be transferred of search engine pair One volumes of searches, and each second volumes of searches obtained is scanned for information after each merging by search engine respectively;
Probability evaluation entity 750, the ratio for calculating separately each second volumes of searches and the first volumes of searches obtain first and wait for Transinformation is transferred to each second transition probability of each default general information respectively;
Probability updating module 760 is used for the second transition probability of the maximum preceding preset number of the value of the second transition probability It is updated in transition probability matrix, obtains updated transition probability matrix.
The first information to be transferred is that the number of the associated information that diverts the aim is less than present count in one of the embodiments, The information to be transferred of amount.
Referring to Fig. 8, being based on thought identical with above- mentioned information recognition methods, the information identification dress of one embodiment is provided It sets, including:Initial analysis module 810, separation module 820, initial identification module 830, subobject acquisition module 840, sub- identification Module 850, recognition result update module 860 and above-mentioned transition probability matrix updating device.
Initial analysis module 810 obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module 820, for being determined there are when unusual part according to characteristic information, by unusual part to be identified right As middle separation, the object to be identified after being detached;
Initial identification module 830 obtains initial recognition result for the object to be identified after separation to be identified;
Subobject acquisition module 840, for obtaining the first identification letter corresponding with preceding adjacent object in initial recognition result Breath, preceding adjacent object be object to be identified in, the previous subobject adjacent with unusual part;
Sub- identification module 850, for being determined according to the transition probability matrix updating device with the object to be identified The corresponding updated transition probability matrix of type, the updated transition probability matrix and first identification information Determine the second identification information corresponding with the unusual part;
Recognition result update module 860 obtains for initial recognition result and the second identification information to be combined and waits knowing The information recognition result of other object.
Refering to Fig. 9, sub- identification module 850 may include in one of the embodiments,:
Probability acquisition module 851, it is corresponding with the first identification information for being determined according to updated transition probability matrix The transition probability of the information that respectively diverts the aim and the information that respectively diverts the aim;
Reference information determining module 852, for the transition probability of the maximum predetermined number of the value of transition probability is corresponding The information that diverts the aim is determined as reference transfer target information;
Sub- recognition result determining module 853, for being determined and the first identification information pair according to each reference transfer target information The actual transfer information answered, and actual transfer information is determined as the second identification information.
Sub- recognition result determining module 853 in one of the embodiments, by the maximum transition probability of the value of transition probability The corresponding information that diverts the aim is determined as actual transfer information corresponding with the first identification information.
One embodiment of the present of invention also provides a kind of computer equipment, including memory, processor and is stored in On reservoir and the computer program that can run on a processor, processor realize above-mentioned transition probability square when executing computer program Battle array update method or information identifying method.
One of ordinary skill in the art will appreciate that realizing all or part of flow in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, program can be stored in a non-volatile computer-readable storage In medium, in the embodiment of the present invention, which can be stored in the storage medium of computer system, and by the computer system At least one of processor execute, to realize including flow such as the embodiment of above-mentioned each method.Wherein, storage medium can be Magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to keep description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield is all considered to be the range of this specification record.
Only several embodiments of the present invention are expressed for above example, the description thereof is more specific and detailed, but can not Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for those of ordinary skill in the art, Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention It encloses.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (11)

1. a kind of transition probability matrix update method, which is characterized in that including step:
The transition probability matrix of information is obtained, the first transition probability in the transition probability matrix is that information to be transferred is transferred to The probability for the information that diverts the aim;
According to the transition probability matrix, quasi- each first information to be transferred for carrying out probability updating is determined;
Obtain each default general information corresponding with the described first information to be transferred, and will be after each default general information distinguishes It is connected to the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and by described Search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated separately, the described first information difference to be transferred is obtained It is transferred to each second transition probability of each default general information;
Second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transition probability square In battle array, updated transition probability matrix is obtained.
2. transition probability matrix update method according to claim 1, which is characterized in that first information to be transferred is The number of the associated information that diverts the aim is less than the information to be transferred of preset quantity.
3. a kind of information identifying method, which is characterized in that include the following steps:
Signature analysis is carried out to object to be identified, obtains characteristic information;
It is determined there are when unusual part according to characteristic information, the unusual part is detached from the object to be identified, is obtained Object to be identified after must detaching;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is described Previous subobject in object to be identified, adjacent with the unusual part;
It obtains using the transition probability matrix update method determination described in the claims 1 or 2 and the object to be identified The corresponding updated transition probability matrix of type, according to the updated transition probability matrix and it is described first identification Information determines the second identification information corresponding with the unusual part;
The initial recognition result and second identification information are combined, the information identification of the object to be identified is obtained As a result.
4. information identifying method according to claim 3, which is characterized in that according to the updated transition probability matrix And first identification information determines that the step of the second identification information corresponding with the unusual part includes:
According to the updated transition probability matrix determine it is corresponding with first identification information respectively divert the aim information with And the transition probability for the information that respectively diverts the aim;
The corresponding information that diverts the aim of the transition probability of the maximum predetermined number of the value of transition probability is determined as reference transfer mesh Mark information;
Actual transfer information corresponding with first identification information is determined according to each reference transfer target information, and by institute It states actual transfer information and is determined as second identification information.
5. information identifying method according to claim 4, which is characterized in that
The corresponding information that diverts the aim of the maximum transition probability of the value of transition probability is determined as and first identification information pair The actual transfer information answered.
6. a kind of transition probability matrix updating device, which is characterized in that including:
Initial matrix acquisition module, the transition probability matrix for obtaining information, the first transfer in the transition probability matrix Probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, determining quasi- each first letter to be transferred for carrying out probability updating Breath;
Merging module for obtaining each default general information corresponding with the described first information to be transferred, and each described will be preset General information is followed by respectively to the described first information to be transferred, is believed after obtaining each merging corresponding with the described first information to be transferred Breath;
Search result acquisition module is obtained and is searched to the described first information to be transferred scans for acquisition first by search engine Suo Liang, and each second volumes of searches obtained is scanned for information after each merging by described search engine respectively;
Probability evaluation entity, the ratio for calculating separately each second volumes of searches and first volumes of searches, described in acquisition First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module, for updating the second transition probability of the maximum preceding preset number of the value of second transition probability To in the transition probability matrix, updated transition probability matrix is obtained.
7. transition probability matrix updating device according to claim 6, which is characterized in that first information to be transferred is The number of the associated information that diverts the aim is less than the information to be transferred of preset quantity.
8. a kind of information recognition device, which is characterized in that including the transition probability matrix update described in the claims 6 or 7 Device further includes;
Initial analysis module obtains characteristic information for carrying out signature analysis to object to be identified;
Separation module, for being determined there are when unusual part according to characteristic information, by the unusual part from described to be identified It is detached in object, the object to be identified after being detached;
Initial identification module obtains initial recognition result for the object to be identified after the separation to be identified;
Subobject acquisition module, for obtaining the first identification information corresponding with preceding adjacent object in the initial recognition result, The preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, the type pair with the object to be identified for being determined according to the transition probability matrix updating device The updated transition probability matrix answered, the updated transition probability matrix and first identification information determination and institute State corresponding second identification information in unusual part;
Recognition result update module obtains institute for the initial recognition result and second identification information to be combined State the information recognition result of object to be identified.
9. information recognition device according to claim 8, which is characterized in that the sub- identification module includes:
Probability acquisition module, it is corresponding with first identification information for being determined according to the updated transition probability matrix The transition probability of the information that respectively diverts the aim and the information that respectively diverts the aim;
Reference information determining module, for diverting the aim the transition probability of the maximum predetermined number of the value of transition probability is corresponding Information is determined as reference transfer target information;
Sub- recognition result determining module, for being determined and first identification information pair according to each reference transfer target information The actual transfer information answered, and the actual transfer information is determined as second identification information.
10. information recognition device according to claim 8, which is characterized in that the sub- recognition result determining module will turn The corresponding information that diverts the aim of the maximum transition probability of value for moving probability is determined as reality corresponding with first identification information Transinformation.
11. a kind of computer equipment, including memory, processor and it is stored on the memory and can be in the processor The computer program of upper operation, which is characterized in that the processor realized when executing the computer program as claim 1 to Method described in 5 any one.
CN201710288225.5A 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment Active CN107247724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710288225.5A CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710288225.5A CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Publications (2)

Publication Number Publication Date
CN107247724A CN107247724A (en) 2017-10-13
CN107247724B true CN107247724B (en) 2018-07-20

Family

ID=60016419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710288225.5A Active CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Country Status (1)

Country Link
CN (1) CN107247724B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1173684A (en) * 1996-05-21 1998-02-18 株式会社日立制作所 Apparatus for recognizing input character strings by inference
JP2009252044A (en) * 2008-04-08 2009-10-29 Canon Inc Document management system, method, and program
CN101652773A (en) * 2007-03-30 2010-02-17 微软公司 Look-ahead document ranking system
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN106156142A (en) * 2015-04-13 2016-11-23 深圳市腾讯计算机***有限公司 The processing method of a kind of text cluster, server and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1173684A (en) * 1996-05-21 1998-02-18 株式会社日立制作所 Apparatus for recognizing input character strings by inference
CN101652773A (en) * 2007-03-30 2010-02-17 微软公司 Look-ahead document ranking system
JP2009252044A (en) * 2008-04-08 2009-10-29 Canon Inc Document management system, method, and program
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN106156142A (en) * 2015-04-13 2016-11-23 深圳市腾讯计算机***有限公司 The processing method of a kind of text cluster, server and system

Also Published As

Publication number Publication date
CN107247724A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN108121816B (en) Picture classification method and device, storage medium and electronic equipment
EP3493101A1 (en) Image recognition method, terminal, and nonvolatile storage medium
US10748007B2 (en) Identifying objects in an image
CN107885430B (en) Audio playing method and device, storage medium and electronic equipment
CN111222397B (en) Drawing recognition method and device and robot
CN105046186A (en) Two-dimensional code recognition method and device
CN107368550B (en) Information acquisition method, device, medium, electronic device, server and system
CN110032510B (en) Application testing method and device
CN109547393B (en) Malicious number identification method, device, equipment and storage medium
CN105653171A (en) Fingerprint identification based terminal control method, terminal control apparatus and terminal
US20030012440A1 (en) Form recognition system, form recognition method, program and storage medium
CN111866392A (en) Shooting prompting method and device, storage medium and electronic equipment
CN105069013A (en) Control method and device for providing input interface in search interface
CN104615663A (en) File sorting method and device and terminal
CN104994236A (en) Information processing method and device
CN112801235A (en) Model training method, prediction device, re-recognition model and electronic equipment
CN109325539A (en) Insulator falls crosstalk detecting method and device
CN113255566B (en) Form image recognition method and device
CN107247724B (en) Transition probability matrix update, information identifying method and device, computer equipment
KR101483611B1 (en) Method and Terminal for Extracting a Object from Image
CN105528428A (en) Image display method and terminal
CN110929057A (en) Image processing method, device and system, storage medium and electronic device
CN111178349A (en) Image identification method, device, equipment and storage medium
CN110598027A (en) Image processing effect display method and device, electronic equipment and storage medium
CN112183149B (en) Graphic code processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant