CN108549850A

CN108549850A - A kind of image-recognizing method and electronic equipment

Info

Publication number: CN108549850A
Application number: CN201810260038.0A
Authority: CN
Inventors: 田疆; 李聪
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2018-03-27
Filing date: 2018-03-27
Publication date: 2018-09-18
Anticipated expiration: 2038-03-27
Also published as: CN108549850B

Abstract

The invention discloses a kind of image-recognizing methods, including：Obtain image information and the first text message；Based on described image information and first text message, the second text message is generated, second text message is for characterizing described image information and the content of text messages.The invention also discloses a kind of electronic equipment.

Description

A kind of image-recognizing method and electronic equipment

Technical field

The present invention relates to image recognition technology more particularly to a kind of image-recognizing methods and electronic equipment.

Background technology

The prior art is only capable of making simple judgement to the composition of image during image is identified, or By being judged to described image to the personnel that image operates, recognition efficiency is relatively low, while identifying that error rate is higher.

Invention content

A kind of image-recognizing method of offer of the embodiment of the present invention and electronic equipment can while realizing identification image It is coded and decoded according to the visual signature of described image and acquired text message, obtains description and merge the vision spy It seeks peace the second text message of the text message.

What the technical solution of the embodiment of the present invention was realized in：

An embodiment of the present invention provides a kind of image-recognizing methods, including：

Obtain image information and the first text message；

Based on described image information and first text message, the second text message, second text message are generated For characterizing described image information and the content of text messages.

In said program, the acquisition image information and the first text message, including：

Visual signature is extracted from described image；

At least two different types of text messages of described image are encoded, the semanteme of characterization text message is obtained Coding result.

It is described to be based on described image information and first text message in said program, generate the second text message, packet It includes：

It is decoded based on the visual signature and the coding result, obtains merging the visual signature and the text Information is to describe the second text message of described image.

It is described to extract visual signature from image in said program, including：

Described image intersection is handled by the convolutional layer and maximum value pond layer of convolutional neural networks model, is obtained The down-sampled result of described image；

The down-sampled result is handled by the average pond layer of the convolutional neural networks model, is obtained described The visual signature of image.

In said program, the method further includes：

Described image visual signature is handled by the average pond layer of the convolutional neural networks model, obtains institute State the label of the classification of characterization described image.

In said program, at least two different types of text messages to described image encode, including：

By neural network model corresponding with different types of text message, by least two types of the picture Text message carries out the coding of word rank；

The coding result of word rank is subjected to the other coding of statement level.

It is described to be decoded based on the visual signature and the coding result in said program, including：

By the way that in the first decoder model, the other decoding of statement level is carried out to the coding result；

The decoding of word rank is carried out by the other decoding result of the second decoder model statement level.

In said program, the method further includes：

It is the visual signature, the corresponding weight of coding result distribution by attention model；

By first weight matrix, the second weight matrix, coding result and the visual signature input first solution Code device is decoded.

In said program, the method further includes：

Tag along sort based on image pattern and described image sample, it is special to being used to extract vision from described image The convolutional neural networks model of sign is trained；

The first decoder model is trained based on sentence sample and corresponding decoding result；

Word-based sample and corresponding decoding result train the second decoder model.

In said program,

When described image be patient part medical imaging when, first text message include patient part indication and Clinical report, second text message include the diagnostic result of the patient part.

The embodiment of the present invention additionally provides a kind of electronic equipment, and the electronic equipment includes：

Data obtaining module, for obtaining image and the first text message；

Message processing module, for based on described image information and first text message, generating the second text message, Second text message is for characterizing described image information and the content of text messages.

In said program,

Described information acquisition module, for extracting visual signature from described image；

Described information processing module is encoded for at least two different types of text messages to described image, Obtain the semantic coding result of characterization text message.

In said program,

Described information processing module is decoded for being based on the visual signature and the coding result, is merged The visual signature and the text message are to describe the second text message of described image.

In said program,

Described information acquisition module, for the convolutional layer and maximum value pond layer by convolutional neural networks model to described Image intersection is handled, and the down-sampled result of described image is obtained；

Described information acquisition module, for the average pond layer by the convolutional neural networks model to described down-sampled As a result it is handled, obtains the visual signature of described image.

In said program,

Described information acquisition module, for being regarded to described image by the average pond layer of the convolutional neural networks model Feel that feature is handled, obtains the label of the classification of the characterization described image.

In said program,

Described information processing module is used for by neural network model corresponding with different types of text message, by institute The text message for stating at least two types of picture carries out the coding of word rank；

Described information processing module, for the coding result of word rank to be carried out the other coding of statement level.

In said program,

Described information processing module, for by the way that in the first decoder model, sentence rank is carried out to the coding result Decoding；

Described information processing module, for carrying out word rank by the other decoding result of the second decoder model statement level Decoding.

In said program,

Described information processing module is additionally operable to through attention model be the visual signature, coding result distribution Corresponding weight；

Described information processing module is additionally operable to first weight matrix, the second weight matrix, coding result and described Visual signature inputs first decoder and is decoded.

In said program, the electronic equipment further includes：

Training module is used for the tag along sort based on image pattern and described image sample, to being used for from described image The convolutional neural networks model of middle extraction visual signature is trained；

The training module, for training the first decoder model based on sentence sample and corresponding decoding result；

The training module trains the second decoder model for word-based sample and corresponding decoding result.

In said program,

When described image is the medical imaging of patient part, the text message includes the indication and clinic of patient part Report, second text message includes the diagnostic result of the patient part.

Of the present invention to additionally provide a kind of electronic equipment, the electronic equipment includes：

Memory, for storing executable instruction；

Processor when executable instruction for running memory storage, executes：

Obtain image information and the first text message；

The acquisition image information and the first text message, including：

Visual signature is extracted from described image；

It is described to be based on described image information and first text message, the second text message is generated, including：

It is described to extract visual signature from image, including：

The method further includes：

At least two different types of text messages to described image encode, including：

It is described to be decoded based on the visual signature and the coding result, including：

The method further includes：

In the embodiment of the present invention, by acquired image information and the first text message, generation can characterize the figure As information and the second text message of the content of text messages, realizes and automatic identification is carried out to image, and due to output Second text message characterizes the information of described image and the content of text messages so that the reader of second text message Clearly understand image and the first text message, and forms intuitive visual experience.

Description of the drawings

Fig. 1 is an optional flow diagram of image-recognizing method provided in an embodiment of the present invention；

Fig. 2 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention；

Fig. 3 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention；

Fig. 4 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention；

Fig. 5 is the signal for handling image in convolutional layer and pond layer by activation primitive (Activation Function) Figure；

Fig. 6 is an optional flow diagram of image-recognizing method provided in an embodiment of the present invention；

Fig. 7 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention.

Specific implementation mode

The present invention is further described in detail below with reference to the accompanying drawings and embodiments.It should be appreciated that described herein Specific embodiment is only used to explain the present invention, is not intended to limit the present invention.

Fig. 1 is an optional flow diagram of image-recognizing method provided in an embodiment of the present invention, as shown in Figure 1, One optional flow chart of image-recognizing method provided in an embodiment of the present invention, illustrates the step of showing.

Step 101：Obtain image information and the first text message.

Step 102：Based on described image information and first text message, the second text message is generated；

Wherein, second text message is for characterizing described image information and the content of text messages.

In one embodiment of the invention, the acquisition image information and the first text message, including：From described image Middle extraction visual signature；At least two different types of text messages of described image are encoded, characterization text envelope is obtained The semantic coding result of breath.Technical solution shown in through this embodiment may be implemented to image information and text message Accurate extraction, specifically, described image information can be photo or medical imaging, first text message can be at least The text message of two kinds of different information sources.

In one embodiment of the invention, described to be based on described image information and first text message, generate the Two text messages, including：Be decoded based on the visual signature and the coding result, obtain merging the visual signature and The text message is to describe the second text message of described image.Technical solution shown in through this embodiment is regarded to described Feel that feature and the coding result are decoded, and obtains the second text message for merging the visual signature and the text Information realizes the fusion to extracted information, specifically, second text message can be for being retouched by natural language State the sign for the user that images to be recognized is belonged to or the characteristics of image by natural language description images to be recognized.

In one embodiment of the invention, described to extract visual signature from image, including：Pass through convolutional neural networks Model includes convolutional layer (alternating convolutional layer) and pond layer (pooling layer) to described Image intersection is handled, and the down-sampled result of described image is obtained；Pass through the average pond of the convolutional neural networks model Layer handles the down-sampled result, obtains the visual signature of described image.Technical solution shown in through this embodiment, By the cross processing of the convolutional layer and maximum value pond layer of the convolutional neural networks model, convolutional layer is realized by nerve net Each fritter in network is more in depth analyzed to obtain the higher feature of level of abstraction simultaneously and can reduce matrix Size, the number of last full articulamentum interior joint is further reduced, to reach the mesh for reducing parameter in entire neural network 's.

In one embodiment of the invention, the average pond of the convolutional neural networks model can also further be passed through Change layer to handle described image visual signature, obtains the label of the classification of the characterization described image.Through this embodiment Shown in technical solution, obtain it is described characterization described image classification label, with realize to multiple image classifications processing or Classification to same figure different characteristic.

In one embodiment of the invention, at least two different types of text messages to described image carry out Coding, including：By neural network model corresponding with different types of text message, by least two types of the picture Text message carry out word rank coding；The coding result of word rank is subjected to the other coding of statement level.Through this embodiment Shown in technical solution, can be right by two-way long short-term memory Recognition with Recurrent Neural Network (Bi-directional LSTM RNN) The text message of at least two types of the picture carries out the coding and the other coding of statement level of word rank respectively, wherein institute The word grade encoding or sentence grade encoding for stating the text message of at least two types of picture can use identical encoder Model.

In one embodiment of the invention, described to be decoded based on the visual signature and the coding result, it wraps It includes：By the way that in the first decoder model, the other decoding of statement level is carried out to the coding result；Pass through the second decoder model language The decoding result of sentence rank carries out the decoding of word rank.Technical solution shown in through this embodiment, when first decoder When model is sentence decoder, second decoder model is shot and long term memory (LSTM, Long Short-Term Memory) network.

In one embodiment of the invention, can also be further the visual signature by attention model, institute It states coding result and distributes corresponding weight；First weight matrix, the second weight matrix, coding result and the vision is special Sign inputs first decoder and is decoded.

In one embodiment of the invention, the tag along sort based on image pattern and described image sample, to being used for The convolutional neural networks model that visual signature is extracted from described image is trained；Based on sentence sample and corresponding decoding As a result the first decoder model is trained；Word-based sample and corresponding decoding result train the second decoder model.Pass through this The special training to neural network model and different decoders may be implemented in technical solution shown in embodiment.

In one embodiment of the invention, when described image is the medical imaging of patient part, the text message Indication including patient part and clinical report, second text message include the diagnostic result of the patient part.Pass through Technical solution shown in the present embodiment may be implemented to merge the medical imaging of the patient part and described with natural language output The diagnostic result of the indication of affected part position and the patient part of clinical report.

Fig. 2 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in Fig. 2, this hair One optional structure chart of the electronic equipment that bright embodiment provides, below illustrates the module involved by Fig. 2 respectively.

Data obtaining module 201, for obtaining image and the first text message；

Message processing module 202 generates the second text envelope for being based on described image information and first text message Breath, second text message is for characterizing described image information and the content of text messages.

In one embodiment of the invention, described information acquisition module 201, it is special for extracting vision from described image Sign；Described information processing module 202 is encoded for at least two different types of text messages to described image, is obtained To the semantic coding result of characterization text message.Technical solution shown in through this embodiment, may be implemented to image information With the accurate extraction of text message, specifically, described image information can be photo or medical imaging, first text envelope Breath can be the text message of at least two different information sources.

In one embodiment of the invention, described information processing module 202, for based on the visual signature and described Coding result is decoded, and obtains merging the visual signature and the text message to describe the second text envelope of described image Breath.Technical solution shown in through this embodiment is decoded the visual signature and the coding result, and obtains second Text message realizes the fusion to extracted information, specifically, institute for merging the visual signature and the text message State the sign or pass through that the second text message can be the user for being belonged to by natural language description images to be recognized The characteristics of image of natural language description images to be recognized.

In one embodiment of the invention, described information acquisition module 201, for passing through convolutional neural networks model Convolutional layer and maximum value pond layer handle described image intersection, obtain the down-sampled result of described image；Described information Acquisition module 202, for being handled the down-sampled result by the average pond layer of the convolutional neural networks model, Obtain the visual signature of described image.Technical solution shown in through this embodiment, by the convolutional neural networks model The cross processing of convolutional layer and maximum value pond layer realizes convolutional layer each fritter progress in neural network is more deep Ground analysis can reduce the size of matrix again to obtain the higher feature of level of abstraction simultaneously, further reduce last full connection The number of layer interior joint, to achieve the purpose that reduce parameter in entire neural network.

In one embodiment of the invention, described information acquisition module 201, for passing through the convolutional neural networks mould The average pond layer of type handles described image visual signature, obtains the label of the classification of the characterization described image.It is logical Technical solution shown in the present embodiment is crossed, the label of the classification of the characterization described image is obtained, to realize to multiple images point Class processing or the classification to same figure different characteristic.

In one embodiment of the invention, described information processing module 202, for by with different types of text envelope Corresponding neural network model is ceased, the text message of at least two types of the picture is carried out to the coding of word rank；It is described Message processing module 202, for the coding result of word rank to be carried out the other coding of statement level.Skill shown in through this embodiment Art scheme, can be by two-way long short-term memory Recognition with Recurrent Neural Network (Bi-directional LSTM RNN) to the picture The text messages of at least two types carry out the coding and the other coding of statement level of word rank respectively, wherein the picture The word grade encoding or sentence grade encoding of the text message of at least two types can use identical encoder model.

In one embodiment of the invention, described information processing module 202, for passing through in the first decoder model, The other decoding of statement level is carried out to the coding result；Described information processing module 202, for passing through the second decoder model language The decoding result of sentence rank carries out the decoding of word rank.Technical solution shown in through this embodiment, when first decoder When model is sentence decoder, second decoder model is shot and long term memory network (LSTM Long Short-Term Memory)。

In one embodiment of the invention, described information processing module 202 is additionally operable to through attention model be described Visual signature, the coding result distribute corresponding weight；Described information processing module 202 is additionally operable to first weight Matrix, the second weight matrix, coding result and the visual signature input first decoder and are decoded.

In one embodiment of the invention, the electronic equipment further includes：Training module, for based on image pattern, And the tag along sort of described image sample, to for from described image extract visual signature convolutional neural networks model into Row training；The training module, for training the first decoder model based on sentence sample and corresponding decoding result；It is described Training module trains the second decoder model for word-based sample and corresponding decoding result.Shown in through this embodiment Technical solution, the special training to neural network model and different decoders may be implemented.

Fig. 3 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in figure 3, this hair One optional structure chart of the electronic equipment that bright embodiment provides, below illustrates the module involved by Fig. 3 respectively.

Image encoder 301, for the convolutional layer and maximum value pond layer by convolutional neural networks model to the figure It is handled as intersecting, obtains the down-sampled result of described image；Pass through the average pond layer of the convolutional neural networks model The down-sampled result is handled, the visual signature of described image is obtained.

Text decoder 302 for obtaining the first text message, and encodes the first acquired text message.

Text decoder 303 generates for being based on being based on described image visual signature and first text message Two text messages, second text message is for characterizing described image visual signature and the content of text messages.Wherein institute The information process for stating image encoder 301, text decoder 302 and text decoder 303 is as shown in Figure 4.

Fig. 4 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in figure 4, this hair One optional structure chart of the electronic equipment that bright embodiment provides, below illustrates the module involved by Fig. 4 respectively.

First nerves network 401 obtains table for being encoded to the first kind text message in the first text message Levy the semantic coding result of the first kind text message.

First text decoder 402 is decoded processing for the coding result to the first nerves network 401, with The first kind text message in first text message is exported by natural language.

Nervus opticus network 403 obtains table for being encoded to the Second Type text message in the first text message The semantic coding result of the Second Type text message is levied, specifically, the Second Type text in first text message This information includes at least two sentences.

Second text decoder 404 is decoded processing for the coding result to the nervus opticus network 403, with The Second Type text message in first text message is exported by natural language.

Third nerve network 405 obtains table for being encoded to the Second Type text message in the first text message The semantic coding result of the Second Type text message is levied, specifically, the Second Type text in first text message This information includes at least two sentences.

Third text decoder 406 is decoded processing for the coding result to the nervus opticus network 405, with The Second Type text message in first text message is exported by natural language.

In one embodiment of the invention, the first nerves network 401, nervus opticus network 403, third nerve net Network 405 can use two-way long short-term memory Recognition with Recurrent Neural Network (Bi-directional LSTM RNN), different types of The corresponding encoder model of text message can be identical, i.e., described first text decoder 402,404 and of the second text decoder Third text decoder 406 can be the decoder of same type.

In one embodiment of the invention, by the first nerves network 401, nervus opticus network 403, third god Word grade encoding through network 405 and sentence grade encoding can obtain first text of at least two separate sources of fusion The coding result of the statement level of this information.

Convolutional neural networks 407, for extracting visual signature from described image.

In one embodiment of the invention, technical solution shown in can support the image of any format, these formats Including but not limited to JPG, PNG, TIF, BMP etc..Certainly, when realizing, in order to ensure the uniformity and processing speed of image procossing Sample image first can also be converted into unified a kind of format that system is supported, then by rate when receiving sample image Respective handling is carried out again.Certainly, for the process performance of adaptive system, it can also be directed to different size of sample image, first will It is cut into the image of the fixed size of system support, then carries out respective handling to image again.

In one embodiment of the invention, described to extract visual signature from image, including：Pass through convolutional neural networks The convolutional layer and maximum value pond layer of model handle described image intersection, obtain the down-sampled result of described image；It is logical The average pond layer for crossing the convolutional neural networks model handles the down-sampled result, obtains the vision of described image Feature.In the optional structure of one of electronic equipment shown in Fig. 4, what the convolutional neural networks were identified can be sufferer When the medical imaging at position, first text message includes the indication and clinical report of patient part, the clinical report packet Include at least two sentences.

In one embodiment of the invention, the average pond of the convolutional neural networks model can also further be passed through Change layer to handle described image visual signature, obtains the label of the classification of the characterization described image, through this embodiment Shown in technical solution, can obtain it is described characterization described image classification label, with realize to multiple image classifications processing Or the classification to same figure different characteristic.

In one embodiment of the invention, by convolutional neural networks to processing of vision such as Fig. 5 of described image Shown, by the processing of activation primitive (Activation Function), it is 256 pixel *, 256 pictures to realize to original size The processing of the image of element.

Attention model 408 obtains first for distributing corresponding weight for the visual signature, the coding result Weight matrix and the second weight matrix, the weight matrix are used to characterize the conspicuousness of weight characterization target signature.

4th text decoder 409, for receiving the first weight matrix, the second weight matrix, coding result and described regarding Feel feature, and is decoded accordingly.

Information generator 410, for the handling result of the 4th text decoder to be sent in nervus opticus network.

Nervus opticus network includes：

First decoder model 411, for carrying out the other decoding of statement level to the coding result；

Second decoder model 412, the decoding for carrying out word rank to the other decoding result of statement level obtain fusion institute Visual signature and the text message are stated to describe the second text message of described image.

In one embodiment of the invention, when the precision for judging repeatedly training neural network model and different decoders It tends towards stability, when being no longer mutated, then illustrates that the neural network model trained at this time and different decoders have reached stable state, Without being further continued for training, this judgment mode, can either effectively controlled training obtain stable neural network model and different solutions Code device, and the sample training time can be saved as far as possible.

In one embodiment of the invention, trained iterations can be pre-set 2000 times, then, work as model When trained iterations reach 2000 times, it can be assumed that currently trained neural network model and different decoders has reached Stable state, can be with deconditioning.

Fig. 5 is the signal for handling image in convolutional layer and pond layer by activation primitive (Activation Function) Figure, as shown in figure 5, by activation primitive in convolutional layer and pond layer respectively to figure that original size is 256 pixel *, 256 pixels As carrying out process of convolution and pondization processing, the visual signature of described image is obtained.Technical solution shown in through this embodiment, warp The cross processing for crossing the convolutional layer and maximum value pond layer of the convolutional neural networks model, realizes convolutional layer by neural network In each fritter more in depth analyzed to obtain the higher feature of level of abstraction and can reduce matrix again simultaneously Size further reduces the number of last full articulamentum interior joint, to achieve the purpose that reduce parameter in entire neural network.

Fig. 6 is an optional flow diagram of image-recognizing method provided in an embodiment of the present invention, as shown in fig. 6, One optional flow chart of image-recognizing method provided in an embodiment of the present invention, illustrates the step of showing.

Step 601：Described image is intersected by the convolutional layer and maximum value pond layer of convolutional neural networks model and is carried out Processing, obtains the down-sampled result of described image.

Wherein, described image is the facial feature image of people.

Step 602：The down-sampled result is handled by the average pond layer of the convolutional neural networks model, Obtain the visual signature of described image.

Thus, it can be achieved that extraction to the corresponding visual signature of the facial characteristics of people in described image.By described The average pond layer of convolutional neural networks model handles the down-sampled result, obtains the visual signature of described image. Technical solution shown in through this embodiment, by the friendship of the convolutional layer and maximum value pond layer of the convolutional neural networks model Fork processing, realizes convolutional layer and is more in depth analyzed to obtain level of abstraction more by each fritter in neural network High feature can reduce the size of matrix again simultaneously, the number of last full articulamentum interior joint be further reduced, to reach The purpose for reducing parameter in entire neural network, to be more used in a fairly large number of situation of the face-image.

In one embodiment of the invention, the average pond of the convolutional neural networks model can also further be passed through Change layer to handle the Image Visual Feature of the facial characteristics, obtains the classification of the characterization facial feature image Label.Technical solution shown in through this embodiment obtains the label of the classification of the characterization face-image, with realization pair Multiple face-images classification processing or the classification to same face-image different characteristic.

Step 603：By neural network model corresponding with different types of first text message, extremely by the picture Few two kinds of text message carries out the coding of word rank, and the coding result of word rank is carried out the other coding of statement level.

Step 604：It is the visual signature, the corresponding weight of coding result distribution by attention model, by institute The first weight matrix, the second weight matrix, coding result and visual signature input first decoder is stated to be decoded.

In one embodiment of the invention, at least two different types of text messages to described image carry out Coding, including：By neural network model corresponding with different types of text message, by least two types of the picture Text message carry out word rank coding；The coding result of word rank is subjected to the other coding of statement level.Through this embodiment Shown in technical solution, can ((Bi-directional LSTM RNN) be right by two-way long short-term memory Recognition with Recurrent Neural Network The text message of at least two types of the picture carries out the coding and the other coding of statement level of word rank respectively, wherein institute The word grade encoding or sentence grade encoding for stating the text message of at least two types of picture can use identical encoder Model.

Step 605：By in the first decoder model, the other decoding of statement level is carried out to the coding result and by the The other decoding result of two decoder model statement levels carries out the decoding of word rank, obtains merging the visual signature and institute to be formed Text message is stated to describe the second text message of described image.

Fig. 7 is an optional structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in fig. 7, electronics Equipment 700 can be with including mobile phone, computer, digital broadcast terminal, information transmit-receive with image identification function Equipment, game console, tablet device, Medical Devices, body-building equipment, personal digital assistant etc..Electronic equipment shown in Fig. 7 700 include：At least one processor 701, memory 702, at least one network interface 704 and user interface 703.Electronic equipment Various components in 700 are coupled by bus system 705.It is understood that bus system 705 for realizing these components it Between connection communication.Bus system 705 further includes power bus, controlling bus and status signal in addition to including data/address bus Bus.But for the sake of clear explanation, various buses are all designated as bus system 705 in the figure 7.

Wherein, user interface 703 may include display, keyboard, mouse, trace ball, click wheel, button, button, sense of touch Plate or touch screen etc..

It is appreciated that memory 702 can be volatile memory or nonvolatile memory, may also comprise volatibility and Both nonvolatile memories.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory), Programmable read only memory (PROM, Programmable Read-Only Memory), Erasable Programmable Read Only Memory EPROM (EPROM, Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access store Device (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface are deposited Reservoir, CD or CD-ROM (CD-ROM, Compact Disc Read-Only Memory)；Magnetic surface storage can be Magnetic disk storage or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as Static RAM (SRAM, Static Random Access Memory), synchronous static RAM (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links Dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus Random access memory (DRRAM, Direct Rambus Random Access Memory).Description of the embodiment of the present invention is deposited Reservoir 702 be intended to include these and any other suitable type memory.

Memory 702 in the embodiment of the present invention includes but not limited to：Three-state content addressing memory, static random storage Device can store image data, and the multiple types such as text data image recognition program data are to support the operation of electronic equipment 700.This The example of data includes a bit：Any computer program for being operated on electronic equipment 700, such as operating system 7021 and application Program 7022, image data, text data, image recognition program etc..Wherein, operating system 7021 includes various system programs, Such as ccf layer, core library layer, driving layer etc., for realizing various basic businesses and the hardware based task of processing.Using Program 7022 can include various application programs, such as the client with image identification function or application program etc., for real Include now obtaining image information and the first text message, is based on described image information and first text message, generates second Various applied business including text message.Realize that the program of power regulating method of the embodiment of the present invention may be embodied in using journey In sequence 7022.

The method that the embodiments of the present invention disclose can be applied in processor 701, or be realized by processor 701. Processor 701 may be a kind of IC chip, the processing capacity with signal.During realization, the above method it is each Step can be completed by the integrated logic circuit of the hardware in processor 701 or the operation of software form.Above-mentioned processing Device 701 can be general processor, digital signal processor (DSP, Digital Signal Processor) or other can Programmed logic device, discrete gate or transistor logic, discrete hardware components etc..Processor 701 may be implemented or hold Disclosed each method, step and logic diagram in the row embodiment of the present invention.General processor can be microprocessor or appoint What conventional processor etc..It the step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly at hardware decoding Reason device executes completion, or in decoding processor hardware and software module combine and execute completion.Software module can be located at In storage medium, which is located at memory 702, and processor 701 reads the information in memory 702, in conjunction with its hardware The step of completing preceding method.

In the exemplary embodiment, electronic equipment 700 can by one or more application application-specific integrated circuit (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic Device), scene Programmable gate array (FPGA, Field-Programmable Gate Array), general processor, controller, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronic components are realized, are used for Execute the power regulating method.

In the exemplary embodiment, the embodiment of the present invention additionally provides a kind of computer readable storage medium, such as including The memory 702 of computer program, above computer program can be executed by the processor 701 of electronic equipment 700, aforementioned to complete Step described in method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, The memories such as magnetic surface storage, CD or CD-ROM；Can also be to combine including one of above-mentioned memory or arbitrarily various Equipment, such as mobile phone, computer, tablet device, personal digital assistant.

The embodiment of the present invention additionally provides a kind of computer readable storage medium, is stored thereon with computer program, the meter When calculation machine program is run by processor, execute：

Obtain image information and the first text message.

It should be understood by those skilled in the art that, the embodiment of the present invention can be provided as method, system or computer program production Product.Therefore, hardware embodiment, software implementation or embodiment combining software and hardware aspects can be used in the embodiment of the present invention Form.Moreover, the embodiment of the present invention can be used in one or more wherein include computer usable program code calculating The form for the computer program product implemented in machine usable storage medium (including magnetic disk storage and optical memory etc.).

The embodiment of the present invention be with reference to according to the method for the embodiment of the present invention, equipment (system) and computer program product Flowchart and/or the block diagram describe.It should be understood that can be operated by computer program in implementation flow chart and/or block diagram The combination of flow and/or box in each flow and/or block and flowchart and/or the block diagram.These calculating can be provided Processing of the machine procedure operation to all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices Device is to generate a machine so that the operation executed by computer or the processor of other programmable data processing devices generates For realizing the function of being specified in one flow of flow chart or multiple flows and/or one box of block diagram or multiple boxes Device.

The operation of these computer programs, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that operation generation stored in the computer readable memory includes behaviour Make the manufacture of device, the operating device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.

The operation of these computer programs also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The operation executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.

The foregoing is only a preferred embodiment of the present invention, is not intended to limit the scope of the present invention, it is all All any modification, equivalent and improvement made by within the spirit and principles in the present invention etc. should be included in the protection of the present invention Within the scope of.

Claims

1. a kind of image-recognizing method, which is characterized in that the method includes：

Obtain image information and the first text message；

Based on described image information and first text message, the second text message is generated, second text message is used for Characterize described image information and the content of text messages.

2. according to the method described in claim 1, it is characterized in that, the method further includes：

Visual signature is extracted from described image；

At least two different types of text messages of described image are encoded, the semantic volume of characterization text message is obtained Code result.

3. according to the method described in claim 2, it is characterized in that, described be based on described image information and first text envelope Breath generates the second text message, including：

It is decoded based on the visual signature and the coding result, obtains merging the visual signature and the text message To describe the second text message of described image.

4. according to the method described in claim 2, it is characterized in that, described extract visual signature from image, including：

Described image intersection is handled by the convolutional layer and maximum value pond layer of convolutional neural networks model, is obtained described The down-sampled result of image；

The down-sampled result is handled by the average pond layer of the convolutional neural networks model, obtains described image Visual signature.

5. according to the method described in claim 2, it is characterized in that, at least two different types of texts to described image This information is encoded, including：

By neural network model corresponding with different types of text message, by the text of at least two types of the picture Information carries out the coding of word rank；

6. according to the method described in claim 3, it is characterized in that, it is described based on the visual signature and the coding result into Row decoding, including：

The other decoding of statement level is carried out to the coding result by the first decoder model；

The decoding of word rank is carried out to the other decoding result of the statement level by the second decoder model.

7. according to the method described in claim 6, it is characterized in that, the method further includes：

First weight matrix, the second weight matrix, coding result and the visual signature are inputted into first decoder Model is decoded.

8. according to the method described in claim 1, it is characterized in that, the method further includes：

Tag along sort based on image pattern and described image sample, to for extracting visual signature from described image Convolutional neural networks model is trained；

Statement level the first decoder model of other decoding is carried out based on sentence sample and the training of corresponding decoding result；

Word-based sample and the training of corresponding decoding result carry out decoded second decoder model of word rank.

9. a kind of electronic equipment, which is characterized in that the electronic equipment includes：

Data obtaining module, for obtaining image and the first text message；

Message processing module, for being based on described image information and first text message, the second text message of generation is described Second text message is for characterizing described image information and the content of text messages.

10. a kind of electronic equipment, which is characterized in that the electronic equipment includes：

Memory, for storing executable instruction；

Processor, when executable instruction for running memory storage, perform claim requires the image described in 1 to 8 to know Other method.