CN107247724A - Transition probability matrix renewal, information identifying method and device, computer equipment - Google Patents

Transition probability matrix renewal, information identifying method and device, computer equipment Download PDF

Info

Publication number
CN107247724A
CN107247724A CN201710288225.5A CN201710288225A CN107247724A CN 107247724 A CN107247724 A CN 107247724A CN 201710288225 A CN201710288225 A CN 201710288225A CN 107247724 A CN107247724 A CN 107247724A
Authority
CN
China
Prior art keywords
information
transition probability
transferred
identified
probability matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710288225.5A
Other languages
Chinese (zh)
Other versions
CN107247724B (en
Inventor
许利宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201710288225.5A priority Critical patent/CN107247724B/en
Publication of CN107247724A publication Critical patent/CN107247724A/en
Application granted granted Critical
Publication of CN107247724B publication Critical patent/CN107247724B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2468Fuzzy queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of transition probability matrix update method and device, information identifying method and device, computer equipment, according to the transition probability matrix of acquisition, it is determined that intending carrying out each first information to be transferred of probability updating;Obtain each default general information corresponding with the first information to be transferred, and each default general information is followed by the information to each merging corresponding with the first information to be transferred of the first information acquisition to be transferred respectively;The first volumes of searches that acquisition is scanned for the first information to be transferred is obtained, and scans for each second volumes of searches of acquisition to information after each merging respectively;The ratio of each second volumes of searches and the first volumes of searches is calculated respectively, obtains each second transition probability that the first information to be transferred is transferred to each default general information respectively;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, transition probability matrix accuracy is improved, when entering row information identification according to the transition probability matrix after renewal, identification accuracy can be improved.

Description

Transition probability matrix renewal, information identifying method and device, computer equipment
Technical field
The present invention relates to technical field of computer information processing, more particularly to a kind of transition probability matrix update method and dress Put, information identifying method and device, computer equipment.
Background technology
Field is recognized in information such as character, audios, the recognition result obtained when being identified using different methods Accuracy is simultaneously different, at present, frequently with character identifying method, be that the image of character is included by scanning, according to word in image The feature of symbol carries out character recognition, frequently with audio identification methods, be to be identified by the sound frequency of audio.
However, being difficult to be identified as by above-mentioned recognition methods when the fuzzy character for causing to include of image appearance is unintelligible Work(, the result accuracy identified in other words is relatively low.When audio is disturbed by noise, the identification of audio is accurately relatively low.In profit Smudgy clear character or the audio being interfered are identified with the transition probability matrix mode of information, but are due to deposit Openness, i.e., the information content of message sample can not cover day commonly used information, the transition probability matrix of formation in a big way It is inaccurate, cause the result being subsequently identified according to transition probability matrix also inaccurate.
The content of the invention
Based on this, it is necessary to for causing the transition probability matrix of information inaccurate because information is sparse, so that subsequently The problem of information identification is inaccurate, proposes a kind of transition probability matrix update method and device, information identifying method and device, meter Calculate machine equipment.
Accordingly, the present embodiment uses following technical scheme:
A kind of transition probability matrix update method, comprises the following steps:
The first transition probability in the transition probability matrix of information, the transition probability matrix is obtained for information to be transferred to turn Move to the probability for the information that diverts the aim;
According to the transition probability matrix, it is determined that intending carrying out each first information to be transferred of probability updating;
Each default general information corresponding with the described first information to be transferred is obtained, and each default general information is divided It is not followed by the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and is passed through The search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated respectively, obtains first information to be transferred Each second transition probability of each default general information is transferred to respectively;
Second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transfer general In rate matrix, the transition probability matrix after being updated.
The present invention also provides a kind of transition probability matrix updating device, including:
First in initial matrix acquisition module, the transition probability matrix for obtaining information, the transition probability matrix Transition probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, waiting to turn it is determined that intending each the first of progress probability updating Move information;
Merging module, for obtaining each default general information corresponding with the described first information to be transferred, and will it is each described in Default general information is followed by the described first information to be transferred respectively, obtains each merging corresponding with the described first information to be transferred Information afterwards;
Search result acquisition module, obtains and scans for the of acquisition to the described first information to be transferred by search engine One volumes of searches, and each second search for scanning for obtaining respectively to information after each merging by the search engine Amount;
Probability evaluation entity, the ratio for calculating each second volumes of searches and first volumes of searches respectively is obtained First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module, for by the second transition probability of the maximum preceding preset number of the value of second transition probability It is updated in the transition probability matrix, the transition probability matrix after being updated.
The present invention also provides a kind of information identifying method, comprises the following steps:
Signature analysis is carried out to object to be identified, characteristic information is obtained;
When determining to have unusual part according to characteristic information, the unusual part is divided from the object to be identified From the object to be identified after being separated;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is Previous subobject in the object to be identified, adjacent with the unusual part;
Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of the object to be identified more Transition probability matrix after new, according to the transition probability matrix after the renewal and first identification information determine with it is described Corresponding second identification information in unusual part;
The initial recognition result is combined with second identification information, the information of the object to be identified is obtained Recognition result.
The present invention also provides a kind of information recognition device, including above-mentioned transition probability matrix updating device, in addition to;
Initial analysis module, for carrying out signature analysis to object to be identified, obtains characteristic information;
Separation module, for when determining to have unusual part according to characteristic information, the unusual part to be treated from described Separated in identification object, the object to be identified after being separated;
Initial identification module, for the object to be identified after the separation to be identified, obtains initial recognition result;
Subobject acquisition module, for obtaining the first identification letter corresponding with preceding adjacent object in the initial recognition result Breath, the preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, for the class with the object to be identified determined according to the transition probability matrix updating device Transition probability matrix and first identification information after the corresponding renewal of type determine corresponding with the unusual part Second identification information;
Recognition result update module, for the initial recognition result to be combined with second identification information, is obtained Obtain the information recognition result of the object to be identified.
The present invention also provides a kind of computer equipment, including memory, processor and is stored on the memory simultaneously The computer program that can be run on the processor, is realized during computer program above-mentioned any one described in the computing device The transition probability matrix update method or information identifying method of item.
By above-mentioned transition probability matrix update method and device, computer equipment, there is text even in message sample Sparse, corresponding transition probability matrix is inaccurate, can wait to turn with first by choosing corresponding second volumes of searches of information after merging The transition probability for moving the preceding preset number of the ratio maximum of corresponding first volumes of searches of information is updated to the transition probability In matrix, i.e., the larger portion of the volumes of searches that the first information to be transferred is transferred to information after corresponding merging in default conventional characters Dividing the transition probability of character increases to transition probability matrix, updates transition probability matrix, improves the accuracy of transition probability matrix, When being subsequently identified according to transition probability matrix, identification accuracy is improved.
Above- mentioned information recognition methods and device, find unusual part first, then again to object to be identified separating abnormality portion Acquisition initial recognition result is identified in the object to be identified after separation after point, then according to initial recognition result and transfer Unusual part is identified probability matrix, that is, the object to be identified and unusual part after separating are separated identification, without right The whole object to be identified that there is unusual part carries out the identification of unusual part again after being identified, then to the whole of object to be identified Individual recognition result is updated, and reduces recognition time, improves recognition efficiency.And in this programme, exist in object to be identified different Chang Shi, be by above-mentioned transition probability matrix update method determine renewal after transition probability matrix and initial recognition result enter The identification of row unusual part, the accuracy of the transition probability matrix after renewal is good, can be according to more so when entering row information identification Plus the transition probability matrix after accurately updating is identified, identification accuracy is improved.
Brief description of the drawings
Fig. 1 is the working environment schematic diagram of one embodiment of the invention;
Fig. 2 be one embodiment in server composition structural representation;
Fig. 3 is the schematic flow sheet of the transition probability matrix update method of one embodiment;
Fig. 4 is the schematic flow sheet of the information identifying method of one embodiment;
Fig. 5 is the sub-process schematic diagram in the information identifying method of another embodiment;
Fig. 6 is the interface schematic diagram of the transition probability matrix of an embodiment;
Fig. 7 is the module diagram of the transition probability matrix updating device of one embodiment;
Fig. 8 is the module diagram of the information recognition device of one embodiment;
Fig. 9 is the submodule schematic diagram of the information recognition device of another embodiment.
Embodiment
For the objects, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with drawings and Examples, to this Invention is described in further detail.It should be appreciated that embodiment described herein is only to explain the present invention, Do not limit protection scope of the present invention.
Fig. 1 shows the working environment schematic diagram in one embodiment of the invention, as shown in figure 1, its working environment is related to Terminal 110, server 120 and network 130, terminal 110 and server 120 can be communicated by network 130.Terminal 110 may have access to corresponding server 120 by network 130, and to ask corresponding information recognition result, server 120 will can be somebody's turn to do Information recognition result pushes to terminal 110.The user of terminal 110 checks the information recognition result.The terminal 110 can be any A kind of equipment that can realize intelligent input output, for example, desktop computer or mobile terminal, mobile terminal can be intelligent hands Machine, tablet personal computer, vehicle-mounted computer, wearable intelligent equipment etc..The server 120 can be to provide the platform of information recognition result The server at place.Server 120 can be one or more.The present embodiment refers to server 120 to transition probability square Scheme and entering the scheme of row information identification acquisition information recognition result that battle array updates.
The cut-away view of server 120 in one embodiment is as shown in Figure 2.The server 120 includes passing through system Processor, storage medium, network interface and the internal memory of bus connection.Wherein, the processor of server 120 be used for provide calculate and Control ability, supports the operation of whole server.The storage medium of server 120 is stored with operating system, local data base, one The computer applied algorithm of transition probability matrix updating device and information recognition device is planted, the transition probability matrix updating device A kind of transition probability matrix update method, the calculating of the information recognition device are realized when computer applied algorithm is executed by processor Machine application program realizes a kind of information identifying method when being executed by processor.Non-volatile memories Jie is saved as in server 120 The operation of the transition probability matrix updating device of character in matter, which is provided in environment, the internal memory, can store computer-readable finger Order, when the computer-readable instruction is executed by processor, may be such that a kind of transition probability matrix update method of computing device and A kind of information identifying method.The network interface of server 120 is used to be connected and communicate with network 130.
As shown in figure 3, the transition probability matrix update method of one embodiment, including step S310 to step S360:
S310:Obtain the transition probability matrix of information.
Wherein, information can be character or sound frequency, i.e., when information is character, transition probability matrix corresponds to character Transition probability frequency, information to be transferred is character to be transferred, and the information that diverts the aim is diverts the aim character.When information is sound During frequency, transition probability matrix is the transition probability of wave audio, and information to be transferred is sound frequency to be transferred, and divert the aim letter Cease for the sound frequency that diverts the aim.Wherein, the transition probability matrix of information can calculate obtained transition probability according to message sample Matrix.
Element in transition probability matrix includes each transition probability for each transition probability, i.e. transfer matrix, wherein, transfer is general The first transition probability in rate matrix is transferred to the probability for the information that diverts the aim for information to be transferred, turns to be arrived with follow-up renewal The second transition probability moved in probability matrix makes a distinction, and herein, the transition probability in transition probability matrix is defined as into first Transition probability.It is appreciated that information to be transferred is transferred to the probability for the information that diverts the aim to switch through shifting mesh behind information to be transferred Mark the probability of information.By taking character as an example, if character to be transferred is " electricity ", the character that diverts the aim includes " depending on " and " words " etc., correspondence Transition probability be respectively character to be transferred " electricity " be transferred to the character " depending on " that diverts the aim probability (" electricity " below connect the general of " depending on " Rate) and character to be transferred " electricity " be transferred to the probability of the character that diverts the aim " words " (" electricity " connects the probability of " words " below).In addition, By taking sound frequency as an example, sound frequency to be transferred is " electricity " corresponding frequency, and it is corresponding that the sound frequency that diverts the aim includes " depending on " Sound frequency and " words " corresponding sound frequency etc., corresponding transition probability are respectively " electricity " corresponding transfer sound frequency transfer To the probability (probability that the sound frequency of " depending on " is connect behind the sound frequency of " electricity ") of " depending on " the corresponding sound frequency that diverts the aim And " electricity " corresponding transfer sound frequency is transferred to the probability (sound audio of " electricity " of " words " corresponding sound frequency that diverts the aim The probability of the sound frequency of " words " is connect behind rate).
S320:According to transition probability matrix, it is determined that intending carrying out each first information to be transferred of probability updating.
Because the openness i.e. information content of information is less, the openness of transition probability matrix is may result in, so may Accuracy is not high when causing the follow-up to recognize, so as to need to be updated transition probability matrix, increases its information content, so that Increase the transition probability of information.In this way, firstly the need of according to transition probability matrix, it is determined that intend progress probability updating each first treats Transinformation, that is, need to be updated the corresponding transition probability of the first transinformation.For example, by taking character as an example, transition probability There is character " water layer " to connect the transition probability of character " shallow " below in matrix, behind do not connect other characters, then after character " water layer " The character that face connects there may be openness, it is necessary to increase the transition probability that " water layer " arrives other characters such as " depth ", so that, can be by The character " water layer " increases its transition probability for arriving other characters as each first information to be transferred for intending carrying out probability updating, Realize that the transition probability of character " water layer " updates.
S330:Obtain each default general information corresponding with the first information to be transferred, and by each default general information difference It is followed by the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred.
Default general information correspondence includes default conventional characters or default conventional sound frequency.Generally, some The frequency of use of information can be higher, and some uncommon information frequency of use can be relatively low, then be used as and first by default general information The information foundation that information to be transferred merges, from can largely cover the information that people commonly use.Such as, the first information to be transferred For character " electricity ", default conventional characters include " depending on ", " words ", " stream " and " pressure " etc., then can distinguished these conventional characters Character " electricity " is followed by, obtained each pooling information is respectively " TV ", " phone ", " electric current " and " voltage " etc..
S340:Obtain the first volumes of searches for scanning for obtaining to the first information to be transferred by search engine, Yi Jitong Cross each second volumes of searches that search engine scans for obtaining respectively to information after each merging.
After being merged after information, it can obtain and scan for the first of acquisition to the first information to be transferred by search engine Volumes of searches, and each second volumes of searches for scanning for obtaining respectively to information after each merging by search engine.In this implementation In example, information after the first information to be transferred and each merging can be scanned for, first is obtained and search for by calling search engine Amount and each second volumes of searches or by the first transinformation and it is each merge after information be sent respectively to search engine it Afterwards, the search result of search engine return is then obtained again, to obtain the first volumes of searches and each second volumes of searches.
S350:The ratio of each second volumes of searches and the first volumes of searches is calculated respectively, is obtained the first information to be transferred and is turned respectively Move to each second transition probability of each default general information.
S360:Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated to transition probability square In battle array, the transition probability matrix after being updated.
First volumes of searches be it is certain, the second volumes of searches according to after merging information it is different and different, the second volumes of searches is got over Represent that information is more conventional after the merging greatly, the ratio of the second volumes of searches and the first volumes of searches is bigger, behind the first information to be transferred The probability for connecing information after the merging is bigger, i.e. the possibility that the first transinformation is transferred to the default general information is bigger.In order to It is, closer to the transition probability of conventional information, each second to be calculated respectively and is searched to improve the transition probability in the transfer matrix after updating The ratio of rope amount and the first volumes of searches, obtains each second transfer that the first information to be transferred is transferred to each default general information respectively Probability, the second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, obtained Transition probability matrix after must updating.Such as, the first information to be transferred is character " electricity ", the of information " TV " after above-mentioned merging The second volumes of searches that the ratio of two volumes of searches and the first volumes of searches of character " electricity " is more than " phone " is searched with the first of character " electricity " The ratio of rope amount, represents that TV is more often used, if preset number is 1, and what is chosen is the second search of information " TV " after merging The ratio of amount and the first volumes of searches of character " electricity " is updated to transition probability matrix.
By the transition probability matrix update method of above-mentioned character, there is text even in message sample sparse, it is corresponding Transition probability matrix is inaccurate, also can be corresponding with the first information to be transferred by choosing corresponding second volumes of searches of information after merging The transition probability of the maximum preceding preset number of ratio of the first volumes of searches be updated in transition probability matrix, i.e., first is treated turn The transition probability for moving the larger partial character of volumes of searches that information is transferred to information after corresponding merging in default conventional characters increases Transition probability matrix is added to, transition probability matrix is updated, the accuracy of transition probability matrix is improved, subsequently according to transition probability square When battle array is identified, identification accuracy is improved.
In one of the embodiments, the first information to be transferred is less than present count for the number of the information that diverts the aim of association The information to be transferred of amount.
That is, when determining to intend carrying out each first information to be transferred of probability updating according to transition probability matrix, being Whether predetermined number is less than by the quantity for the information that diverts the aim for detecting information association to be transferred, if being less than, then it represents that transfer The information to be transferred of this in matrix is followed by diverting the aim, and to there is the openness information that is followed by diverting the aim less for information, now, you can The information to be transferred is defined as to the first information to be transferred for intending carrying out probability updating.For example, predetermined number is 30, letter to be transferred Breath " water layer " is followed by uniquely diverting the aim information for " shallow ", that is, the quantity for the information that is followed by diverting the aim is 1, less than predetermined number 30, now, information to be transferred " water layer " can be regard as the first information to be transferred for intending carrying out probability updating.
As shown in figure 4, a kind of information identifying method of embodiment, including step S410 to S460:
S410:Signature analysis is carried out to object to be identified, characteristic information is obtained.
Object to be identified may include character picture to be identified or audio to be identified.Each object to be identified has the feature that it is associated Information, characteristic information correspondence may include image pixel information and voiceprint, i.e., character picture to be identified has the figure that it is associated As Pixel Information, audio to be identified has the voiceprint that it is associated, and by carrying out signature analysis to object to be identified, can be treated The characteristic information of identification object.
S420:When determining to have unusual part according to characteristic information, unusual part is separated from object to be identified, obtained Object to be identified after must separating.
Abnormality detection is carried out to object to be identified according to characteristic information, it is determined that object to be identified has unusual part When, unusual part is separated from object to be identified, the object to be identified after being separated.Carried out to character picture to be identified During identification, abnormality detection is carried out according to image pixel value, when image pixel value is continuously big for 1 area in character picture to be identified When preset area, it is believed that there is exception, and unusual part is part of the above-mentioned image pixel value continuously for 1.To be identified When audio is identified, abnormality detection is carried out according to voiceprint, when voiceprint has inconsistent in audio to be identified, recognized Exist for audio to be identified abnormal.By taking character picture to be identified as an example, blocked when there is dash area in character picture to be identified The character of the part, now the pixel value of dash area be all 1, be abnormal, i.e., be that can detect that to wait to know according to pixel value Unusual part in other character picture is dash area, and the audio-visual part is removed from character picture to be identified, you can obtained Character picture to be identified after separation, the character picture to be identified after the separation is not include dash area.In addition, to treat Exemplified by identification audio is identified, occur in that other noises cover the sound of user when a user is being spoken, its There are two kinds of different voiceprints in corresponding audio to be identified, noise general persistence is shorter, then by the duration compared with Short corresponding voiceprint is defined as unusual part, and the abnormal vocal print is separated from the voiceprint of audio to be identified, obtained Audio to be identified after separation.
S430:Object to be identified after separation is identified, initial recognition result is obtained.
Object to be identified after separation is part without exception, and it can be identified, for be identified after separation For character picture, acquisition original character recognition result is identified, for the audio to be identified after separation, is identified Obtain initial audio recognition result.
S440:The first identification information corresponding with preceding adjacent object in initial recognition result is obtained, preceding adjacent object is to treat Previous subobject in identification object, adjacent with unusual part.
Object to be identified after separation is identified, obtains and includes in initial recognition result, initial recognition result In object to be identified, there are the recognition result of adjacent previous subobject, i.e. the first identification information with unusual part, from initial knowledge The first identification information corresponding with preceding adjacent object is obtained in other result.For example, in object to be identified, with abnormal portion split-phase First identification information of adjacent previous subobject is character " electricity ", i.e., what character " electricity " connect below is unusual part.In another example, When object to be identified is audio to be identified, initial audio recognition result includes in sound frequency, object to be identified and unusual part First identification information of adjacent previous object word is " electricity " corresponding sound frequency, is obtained from initial recognition result with before The corresponding sound frequency of adjacent object corresponding " electricity ".
S450:Obtain using above-mentioned transition probability matrix update method determine it is corresponding with the type of object to be identified more Transition probability matrix after new, is determined corresponding with unusual part according to the transition probability matrix after renewal and the first identification information The second identification information.
Because the type of object to be identified may include character picture to be identified or audio to be identified, different objects to be identified pair Answer in different transition probability matrixs, above-mentioned transition probability matrix update method, be that the transition probability matrix of information is carried out more Newly, wherein, information can be that character or sound frequency, i.e. transition probability matrix can correspond to the transition probability matrix of different types (for example, corresponding to the transition probability frequency of character or the transition probability of wave audio respectively), above-mentioned transition probability matrix renewal side Transition probability matrix after the confirmable renewal corresponding with the type of above- mentioned information of method, in the present embodiment, to be identified right During as being identified, object to be identified can be identification character picture or audio to be identified, accordingly, above-mentioned transition probability matrix The transition probability matrix of character or the transition probability matrix of sound frequency may be updated in update method, can directly obtain using above-mentioned turn The transition probability matrix moved after the renewal corresponding with the type of object to be identified that probability matrix update method is determined, according to renewal Transition probability matrix afterwards and the first identification information, it may be determined that the second identification information corresponding with unusual part, i.e., to exception Part identification, which is finished, has obtained the second identification information.For example, the first identification information is " electricity ", there is the word in transition probability matrix Symbol " electricity " switches through the probability for moving target character below, so as to determine unusual part correspondence according to the transition probability matrix after renewal The second identification information, that is, determine unusual part recognition result, improve identification accuracy.In another implementation, also may be used It is then general according to the transfer after renewal to determine the transition probability matrix after updating using above-mentioned transition probability matrix update method Rate matrix and the first identification information determine the second identification information corresponding with unusual part.
S460:Initial recognition result is combined with the second identification information, the information identification knot of object to be identified is obtained Really.
Because initial recognition result is result that the object to be identified after separation is identified, the knowledge without unusual part Other result, that is, what object to be identified identification was missing from, therefore, in order to realize complete identification, by initial recognition result It is combined with the second identification information, obtains the information recognition result of object to be identified.
Above- mentioned information recognition methods, finds unusual part first, then again to object to be identified separating abnormality part after Acquisition initial recognition result is identified in object to be identified after separation, then according to initial recognition result and transition probability square Unusual part is identified battle array, that is, the object to be identified and unusual part after separating are separated identification, without to entirely depositing Carry out the identification of unusual part, then the whole identification to object to be identified again after the object to be identified of unusual part is identified As a result it is updated, reduces recognition time, improves recognition efficiency.And in this programme, when object to be identified has abnormal, be Transition probability matrix and initial recognition result after the renewal determined by above-mentioned transition probability matrix update method carry out abnormal Partial identification, the accuracy of the transition probability matrix after renewal is good, can be according to more accurate so when entering row information identification Renewal after transition probability matrix be identified, improve identification accuracy.
As shown in figure 5, in one of the embodiments, according to the transition probability matrix after renewal and the first identification information It is determined that the step of the second identification information corresponding with unusual part includes:
S451:According to after renewal transition probability matrix determine it is corresponding with the first identification information respectively divert the aim information with And the transition probability for the information that respectively diverts the aim.
The first identification information determination after, further according to the transition probability matrix after renewal, it is determined that with the first identification information pair The transition probability of respectively divert the aim information and the information that respectively diverts the aim answered.For example, the first identification information be character " electricity " after, It can determine that the corresponding information that diverts the aim includes " depending on " and " words " etc., corresponding transition probability also can determine that respectively, i.e., electricity turns Move on to depending on transition probability can determine that, electrotransfer to words transition probability can determine that.
S452:The corresponding information that diverts the aim of transition probability of the maximum predetermined number of the value of transition probability is defined as ginseng Examine the information that diverts the aim.
However, the transition probability of different information the first identification informations of correspondence that diverts the aim is different, accordingly it is desirable to will transfer The corresponding information that diverts the aim of transition probability of the maximum predetermined number of the value of probability is defined as reference transfer target information, to carry The accuracy of high transfer matrix.
S453:Actual transfer information corresponding with the first identification information is determined according to each reference transfer target information, and will Actual transfer information is defined as the second identification information.
Specifically, actual transfer information corresponding with the first identification information is selected from each reference transfer target information, and Actual transfer information is defined as the second identification information.It can select transition probability is maximum to turn from each reference transfer target information The reference target information for moving probability is used as actual transfer information, it will be understood that can also be according to reference transfer target information initial The number of times occurred in recognition result determines actual transfer information corresponding with the first identification information, and actual transfer information is determined For the second identification information, specifically, the number of times occurred in initial recognition result can be selected from each reference transfer target information Most reference transfer target informations is used as actual transfer information corresponding with the first identification information.
In one of the embodiments, the corresponding information that diverts the aim of the maximum transition probability of the value of transition probability is determined For actual transfer information corresponding with the first identification information.
Represent that the possibility that the information to be transferred is transferred to the information that diverts the aim is bigger because transition probability is bigger, more represent The information that diverts the aim is followed by information for what the information to be transferred was commonly used, so that, by the transition probability that the value of transition probability is maximum The corresponding information that diverts the aim is defined as actual transfer information corresponding with the first identification information.
Above-mentioned transition probability matrix update method is illustrated with a specific embodiment below, wherein, information is Character, object to be identified is character picture to be identified, and the transition probability matrix of information is the transition probability matrix of character.
When there is obscure portions (unusual part) when scanning character picture to be identified, OCR (optical character identification) It is that None- identified or be unable to reach is accurately identified to the unusual part.Therefore need to aid in the context using the unusual part Information is estimated the part, and preferably the unusual part could be identified.However, due to the " text of character sample It is openness ", it can cause there is error when the transition probability matrix calculated the unusual part, so cause transition probability square Battle array is inaccurate, causes subsequently to recognize inaccurate.
For example, being made using 13,000,000 words (can be the word on the word publication such as newspaper) obtained by the mode such as crawling For character sample, the transition probability matrix of calculating character, the result of transition probability matrix is as shown in Figure 6.
Wherein, the transition probability that character " water layer " is followed by " shallow " is 1, and the transition probability that " Chi Yuan " connects " economy " below is 1, if in this way, when afterwards fuzzy None- identified occurs for " water layer ", according to transition probability matrix, behind can only connect " shallow ", then obscure portion The recognition result divided can only be " shallow ", can so cause identification accuracy not high.The character that " water layer " and " Chi Yuan " connects below Number is 1, and it is openness that this represents that transition probability matrix is present, generally, and " water layer " can not possibly only connect " shallow " below, I.e. transition probability can not possibly be 1, it is necessary to supplement the character that diverts the aim being followed by number be less than preset data character to be transferred, " water layer " and " Chi Yuan " is exactly 2 therein, it is necessary to which the transition probability matrix is optimized.The scheme of optimization is:Work as OCR When the system discovery transition probability matrix occurs openness, the conjunction after " water layer " and " water layer " is merged with default conventional characters And rear character, the first volumes of searches for scanning for obtaining to " water layer " by search engine is obtained, and pass through search engine point Other pair merge with " water layer " after each merging after information scan for obtain each second volumes of searches, calculate respectively it is each second search The ratio of amount and the first volumes of searches, obtains the first information to be transferred and is transferred to each second transfer of each default general information respectively generally Rate;Second transition probability of the maximum preceding preset number of the value of the second transition probability is updated in transition probability matrix, obtained Transition probability matrix after renewal.So as to, allow OCR system to understand that the character of " water layer " continued access afterwards is not only " shallow ", can be with There are other characters, such as character " depth ", so that the degree of accuracy of OCR identifications is increased substantially.
Referring to Fig. 7, based on the turning there is provided one embodiment with above-mentioned transition probability matrix update method identical thought Probability matrix updating device is moved, including:
First turn in initial matrix acquisition module 710, the transition probability matrix for obtaining information, transition probability matrix It is the probability that information to be transferred is transferred to the information that diverts the aim to move probability;
Data obtaining module 720, for according to transition probability matrix, it is determined that intend carrying out probability updating each first is to be transferred Information;
Merging module 730, for obtaining each default general information corresponding with the first information to be transferred, and each will preset often It is followed by respectively with information to the first information to be transferred, obtains information after each merging corresponding with the first information to be transferred;
Search result acquisition module 740, obtains and scans for the of acquisition to the first information to be transferred by search engine One volumes of searches, and each second volumes of searches for scanning for obtaining respectively to information after each merging by search engine;
Probability evaluation entity 750, the ratio for calculating each second volumes of searches and the first volumes of searches respectively obtains first and treated Transinformation is transferred to each second transition probability of each default general information respectively;
Probability updating module 760, for by the second transition probability of the maximum preceding preset number of the value of the second transition probability It is updated in transition probability matrix, the transition probability matrix after being updated.
In one of the embodiments, the first information to be transferred is less than present count for the number of the information that diverts the aim of association The information to be transferred of amount.
Referring to Fig. 8, being filled based on being recognized with above- mentioned information recognition methods identical thought there is provided the information of one embodiment Put, including:Initial analysis module 810, separation module 820, initial identification module 830, subobject acquisition module 840, sub- identification Module 850, recognition result update module 860 and above-mentioned transition probability matrix updating device.
Initial analysis module 810, for carrying out signature analysis to object to be identified, obtains characteristic information;
Separation module 820, for when determining to have unusual part according to characteristic information, by unusual part to be identified right As middle separation, the object to be identified after being separated;
Initial identification module 830, for the object to be identified after separation to be identified, obtains initial recognition result;
Subobject acquisition module 840, for obtaining the first identification letter corresponding with preceding adjacent object in initial recognition result Breath, preceding adjacent object be object to be identified in, the previous subobject adjacent with unusual part;
Sub- identification module 850, for determining with the object to be identified according to the transition probability matrix updating device The corresponding renewal of type after transition probability matrix, transition probability matrix and first identification information after the renewal It is determined that the second identification information corresponding with the unusual part;
Recognition result update module 860, for initial recognition result to be combined with the second identification information, obtains and waits to know The information recognition result of other object.
Refering to Fig. 9, in one of the embodiments, sub- identification module 850 can include:
Probability obtains mould 851, corresponding with the first identification information each for being determined according to the transition probability matrix after renewal The transition probability of the information that diverts the aim and the information that respectively diverts the aim;
Reference information determining module 852, for the transition probability of the maximum predetermined number of the value of transition probability is corresponding The information that diverts the aim is defined as reference transfer target information;
Sub- recognition result determining module 853, for being determined and the first identification information pair according to each reference transfer target information The actual transfer information answered, and actual transfer information is defined as the second identification information.
In one of the embodiments, sub- recognition result determining module 853, by the transition probability that the value of transition probability is maximum The corresponding information that diverts the aim is defined as actual transfer information corresponding with the first identification information.
One embodiment of the present of invention also provides a kind of computer equipment, including memory, processor and is stored in On reservoir and the computer program that can run on a processor, above-mentioned transition probability square is realized during computing device computer program Battle array update method or information identifying method.
One of ordinary skill in the art will appreciate that realize all or part of flow in above-described embodiment method, being can be with The hardware of correlation is instructed to complete by computer program, program can be stored in a non-volatile embodied on computer readable storage In medium, in such as embodiment of the present invention, the program can be stored in the storage medium of computer system, and by the computer system In at least one computing device, with realize include as above-mentioned each method embodiment flow.Wherein, storage medium can be Magnetic disc, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, RAM) etc..
Each technical characteristic of above example can be combined arbitrarily, to make description succinct, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield, is all considered to be the scope of this specification record.
Above example only expresses the several embodiments of the present invention, and it describes more specific and detailed, but can not Therefore it is interpreted as the limitation to the scope of the claims of the present invention.It should be pointed out that for the person of ordinary skill of the art, Without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to the protection model of the present invention Enclose.Therefore, the protection domain of patent of the present invention should be determined by the appended claims.

Claims (11)

1. a kind of transition probability matrix update method, it is characterised in that including step:
The first transition probability in the transition probability matrix of information, the transition probability matrix is obtained to be transferred to for information to be transferred The probability for the information that diverts the aim;
According to the transition probability matrix, it is determined that intending carrying out each first information to be transferred of probability updating;
Obtain each default general information corresponding with the described first information to be transferred, and after each default general information is distinguished It is connected to the described first information to be transferred, obtains information after each merging corresponding with the described first information to be transferred;
The first volumes of searches for scanning for obtaining to the described first information to be transferred by search engine is obtained, and by described Search engine scans for each second volumes of searches obtained to information after each merging respectively;
The ratio of each second volumes of searches and first volumes of searches is calculated respectively, obtains the described first information difference to be transferred It is transferred to each second transition probability of each default general information;
Second transition probability of the maximum preceding preset number of the value of second transition probability is updated to the transition probability square In battle array, the transition probability matrix after being updated.
2. transition probability matrix update method according to claim 1, it is characterised in that first information to be transferred is The number of the information that diverts the aim of association is less than the information to be transferred of predetermined number.
3. a kind of information identifying method, it is characterised in that comprise the following steps:
Signature analysis is carried out to object to be identified, characteristic information is obtained;
When determining to have unusual part according to characteristic information, the unusual part is separated from the object to be identified, obtained Object to be identified after must separating;
Object to be identified after the separation is identified, initial recognition result is obtained;
The first identification information corresponding with preceding adjacent object in the initial recognition result is obtained, the preceding adjacent object is described Previous subobject in object to be identified, adjacent with the unusual part;
Obtain using the transition probability matrix update method determination described in the claims 1 or 2 and the object to be identified The corresponding renewal of type after transition probability matrix, according to the transition probability matrix after the renewal and it is described first identification Information determines the second identification information corresponding with the unusual part;
The initial recognition result is combined with second identification information, the information identification of the object to be identified is obtained As a result.
4. character identifying method according to claim 3, it is characterised in that according to the transition probability matrix after the renewal And the step of the first identification information determination second identification information corresponding with the unusual part includes:
According to after the renewal transition probability matrix determine it is corresponding with first identification information respectively divert the aim information with And the transition probability for the information that respectively diverts the aim;
The corresponding information that diverts the aim of transition probability of the maximum predetermined number of the value of transition probability is defined as reference transfer mesh Mark information;
Actual transfer information corresponding with first identification information is determined according to each reference transfer target information, and by institute State actual transfer information and be defined as second identification information.
5. character identifying method according to claim 4, it is characterised in that
The maximum corresponding information that diverts the aim of transition probability of the value of transition probability is defined as and first identification information pair The actual transfer information answered.
6. a kind of transition probability matrix updating device, it is characterised in that including:
The first transfer in initial matrix acquisition module, the transition probability matrix for obtaining information, the transition probability matrix Probability is the probability that information to be transferred is transferred to the information that diverts the aim;
Data obtaining module, for according to the transition probability matrix, it is determined that intending carrying out each first letter to be transferred of probability updating Breath;
Merging module, for obtaining each default general information corresponding with the described first information to be transferred, and each described will preset General information is followed by the described first information to be transferred respectively, obtains letter after each merging corresponding with the described first information to be transferred Breath;
Search result acquisition module, obtains scan for obtaining to the described first information to be transferred by search engine first and searches Suo Liang, and each second volumes of searches for scanning for obtaining respectively to information after each merging by the search engine;
Probability evaluation entity, the ratio for calculating each second volumes of searches and first volumes of searches respectively obtains described First information to be transferred is transferred to each second transition probability of each default general information respectively;
Probability updating module, for the second transition probability of the maximum preceding preset number of the value of second transition probability to be updated Into the transition probability matrix, the transition probability matrix after being updated.
7. transition probability matrix updating device according to claim 6, it is characterised in that first information to be transferred is The number of the information that diverts the aim of association is less than the information to be transferred of predetermined number.
8. a kind of information recognition device, it is characterised in that updated including the transition probability matrix described in the claims 6 or 7 Device, in addition to;
Initial analysis module, for carrying out signature analysis to object to be identified, obtains characteristic information;
Separation module, for when determining to have unusual part according to characteristic information, by the unusual part from described to be identified Separated in object, the object to be identified after being separated;
Initial identification module, for the object to be identified after the separation to be identified, obtains initial recognition result;
Subobject acquisition module, for obtaining the first identification information corresponding with preceding adjacent object in the initial recognition result, The preceding adjacent object be the object to be identified in, the previous subobject adjacent with the unusual part;
Sub- identification module, for the type pair with the object to be identified determined according to the transition probability matrix updating device Transition probability matrix after the renewal answered, transition probability matrix and first identification information after the renewal are determined and institute State corresponding second identification information in unusual part;
Recognition result update module, for the initial recognition result to be combined with second identification information, obtains institute State the information recognition result of object to be identified.
9. character recognition device according to claim 8, it is characterised in that the sub- identification module includes:
Probability acquisition module, it is corresponding with first identification information for being determined according to the transition probability matrix after the renewal The transition probability of the information that respectively diverts the aim and the information that respectively diverts the aim;
Reference information determining module, for diverting the aim the transition probability of the maximum predetermined number of the value of transition probability is corresponding Information is defined as reference transfer target information;
Sub- recognition result determining module, for being determined and first identification information pair according to each reference transfer target information The actual transfer information answered, and the actual transfer information is defined as second identification information.
10. character recognition device according to claim 8, it is characterised in that the sub- recognition result determining module, will turn The corresponding information that diverts the aim of transition probability for moving the value maximum of probability is defined as reality corresponding with first identification information Transinformation.
11. a kind of computer equipment, including memory, processor and it is stored on the memory and can be in the processor The computer program of upper operation, it is characterised in that described in the computing device during computer program realize as claim 1 to Method described in 5 any one.
CN201710288225.5A 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment Active CN107247724B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710288225.5A CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710288225.5A CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Publications (2)

Publication Number Publication Date
CN107247724A true CN107247724A (en) 2017-10-13
CN107247724B CN107247724B (en) 2018-07-20

Family

ID=60016419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710288225.5A Active CN107247724B (en) 2017-04-27 2017-04-27 Transition probability matrix update, information identifying method and device, computer equipment

Country Status (1)

Country Link
CN (1) CN107247724B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1173684A (en) * 1996-05-21 1998-02-18 株式会社日立制作所 Apparatus for recognizing input character strings by inference
JP2009252044A (en) * 2008-04-08 2009-10-29 Canon Inc Document management system, method, and program
CN101652773A (en) * 2007-03-30 2010-02-17 微软公司 Look-ahead document ranking system
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN106156142A (en) * 2015-04-13 2016-11-23 深圳市腾讯计算机***有限公司 The processing method of a kind of text cluster, server and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1173684A (en) * 1996-05-21 1998-02-18 株式会社日立制作所 Apparatus for recognizing input character strings by inference
CN101652773A (en) * 2007-03-30 2010-02-17 微软公司 Look-ahead document ranking system
JP2009252044A (en) * 2008-04-08 2009-10-29 Canon Inc Document management system, method, and program
CN102982330A (en) * 2012-11-21 2013-03-20 新浪网技术(中国)有限公司 Method and device recognizing characters in character images
CN106156142A (en) * 2015-04-13 2016-11-23 深圳市腾讯计算机***有限公司 The processing method of a kind of text cluster, server and system

Also Published As

Publication number Publication date
CN107247724B (en) 2018-07-20

Similar Documents

Publication Publication Date Title
CN108875787B (en) Image recognition method and device, computer equipment and storage medium
CN111523470B (en) Pedestrian re-identification method, device, equipment and medium
CN104268603B (en) Intelligent marking method and system for text objective questions
US9696873B2 (en) System and method for processing sliding operations on portable terminal devices
US6864809B2 (en) Korean language predictive mechanism for text entry by a user
EP3493101A1 (en) Image recognition method, terminal, and nonvolatile storage medium
CN109918987A (en) A kind of video caption keyword recognition method and device
CN104298982A (en) Text recognition method and device
US20070244844A1 (en) Methods and systems for data analysis and feature recognition
CN111414888A (en) Low-resolution face recognition method, system, device and storage medium
CN105069013A (en) Control method and device for providing input interface in search interface
CN106682092A (en) Target retrieval method and terminal
CN104572717A (en) Information searching method and device
CN109086276A (en) Data translating method, device, terminal and storage medium
CN115100739B (en) Man-machine behavior detection method, system, terminal device and storage medium
CN102243708B (en) Handwriting recognition method, handwriting recognition system and handwriting recognition terminal
CN111125327A (en) Short-session-based new word discovery method, storage medium and electronic device
CN114529910A (en) Handwritten character recognition method and device, storage medium and electronic equipment
CN107247724A (en) Transition probability matrix renewal, information identifying method and device, computer equipment
CN112101135A (en) Moving target detection method and device and terminal equipment
CN116912881A (en) Animal species identification method, computer equipment and identification system
CN113378902B (en) Video plagiarism detection method based on optimized video features
CN105956633B (en) Method and device for identifying search engine category
CN113221718B (en) Formula identification method, device, storage medium and electronic equipment
JPS60153574A (en) Character reading system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant