CN109657079A - A kind of Image Description Methods and terminal device - Google Patents
A kind of Image Description Methods and terminal device Download PDFInfo
- Publication number
- CN109657079A CN109657079A CN201811343846.XA CN201811343846A CN109657079A CN 109657079 A CN109657079 A CN 109657079A CN 201811343846 A CN201811343846 A CN 201811343846A CN 109657079 A CN109657079 A CN 109657079A
- Authority
- CN
- China
- Prior art keywords
- keyword
- image
- entity
- emotion
- adjective
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of Image Description Methods and terminal devices, are suitable for technical field of data processing, this method comprises: the entity for including in identification image, and generate the corresponding keyword of entity;Identify the corresponding emotional category of image and emotion grade;Emotional category, emotion grade and keyword are input to presupposition analysis model, generate putting in order for the corresponding adjective of keyword, the corresponding adverbial word of adjective and keyword, adjective and adverbial word;It based on keyword, adjective, adverbial word and puts in order, generates the corresponding descriptive statement of image.The embodiment of the present invention to contain the description content to the Image emotional semantic state in final descriptive statement, realizes and carries out the iamge description with emotion to image.
Description
Technical field
The invention belongs to technical field of data processing more particularly to Image Description Methods and terminal devices.
Background technique
Iamge description, which refers to, automatically generates natural language sentence by computer to describe the content of given image.It is existing
After the entity that Image Description Methods often only include in identifying image, simply according to the syntax rule of natural language
Being arranged is corresponding natural sentence, such as " blue sky and white cloud, two people play soccer on meadow.", although being able to achieve in this way pair
The mechanical description of image, but in actual conditions, natural language used in us all has certain emotion, such as
" blue sky and white cloud, two people amusedly kick football on meadow!", happy happy emotion is filled in the words, therefore right
Image is only merely to carry out mechanical description to be often difficult to meet the actual demand of people.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of Image Description Methods and terminal device, to solve the prior art
In the problem of can not carrying out the iamge description with emotion.
The first aspect of the embodiment of the present invention provides a kind of Image Description Methods, comprising:
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, described in generation
The corresponding adjective of keyword, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word
It puts in order;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate described image it is corresponding
Descriptive statement.
The second aspect of the embodiment of the present invention provides a kind of terminal device, and the terminal device includes memory, processing
Device, the computer program that can be run on the processor is stored on the memory, and the processor executes the calculating
Following steps are realized when machine program.
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, described in generation
The corresponding adjective of keyword, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word
It puts in order;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate described image it is corresponding
Descriptive statement.
The third aspect of the embodiment of the present invention provides a kind of iamge description device, comprising:
Keyword identification module, the entity for including in image for identification, and generate the corresponding keyword of the entity;
Emotion recognition module, for identification corresponding emotional category of described image and emotion grade;
Word analysis module, it is default for the emotional category, the emotion grade and the keyword to be input to
Analysis model generates the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, described
Adjective and the adverbial word put in order;
Sentence generation module, for based on the keyword, the adjective, the adverbial word and it is described put in order,
Generate the corresponding descriptive statement of described image.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, comprising: is stored with computer
Program, which is characterized in that the step of computer program realizes Image Description Methods as described above when being executed by processor.
Existing beneficial effect is the embodiment of the present invention compared with prior art: by carrying out emotion type and feelings to image
Feel the analysis of grade, and corresponding adjective, adverbial word and several words are generated according to the emotion type of image and emotion grade
It puts in order, then is generated based on putting in order for the corresponding keyword of entity, the adjective of generation, adverbial word and three to figure
As final descriptive statement, so that the description content to the Image emotional semantic state is contained in final descriptive statement, it is real
Show and the iamge description with emotion is carried out to image.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of Image Description Methods provided by Embodiment 2 of the present invention;
Fig. 3 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention three provides;
Fig. 4 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention four provides;
Fig. 5 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention five provides;
Fig. 6 is the structural schematic diagram for the iamge description device that the embodiment of the present invention six provides;
Fig. 7 is the schematic diagram for the terminal device that the embodiment of the present invention seven provides.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed
Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific
The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Fig. 1 shows the implementation flow chart of the Image Description Methods of the offer of the embodiment of the present invention one, and details are as follows:
S101 identifies the entity for including in image, and generates the corresponding keyword of entity.
In view of existing entity identification algorithms are more mature, therefore the entity identification algorithms of use are not carried out herein
It limits, can specifically be chosen according to demand by technical staff.The corresponding keyword of entity refers to entity attribute+title forms word
Language or phrase, such as the doggie run or white teacup, in order to generate the corresponding keyword of entity, the embodiment of the present invention is being known
Further entity attributes Chu not can be identified after entity, determine each entity attributes regeneration pair in image
The keyword answered.Wherein, thingness identification can equally be realized using some existing Attribute Recognition algorithms, it is such as existing
Some color recognizers, action recognition algorithm and emotion recognition algorithm etc. not limit herein, can be by technical staff voluntarily
It chooses.
S102, the corresponding emotional category of identification image and emotion grade.
The description with emotion is carried out to image in order to realize, it is necessary first to determine the main emotion that image is included,
Therefore the embodiment of the present invention can identify the corresponding emotional category of image.Emotion grade refers to that Image emotional semantic is corresponding dense
Degree, such as happy emotion can be divided into " happy " and " very happy " two emotion grades according to degree difference, specifically to not
The grade classification rule of feeling of sympathy, can be by technical staff's sets itself according to actual needs.Due to according to referenced emotion knowledge
The difference of other object, obtained affection recognition of image algorithm can also have differences, if with image generally emotion recognition pair
As when carrying out emotion recognition model training artificial emotion type and emotion rank can be carried out to the image as training sample
Label, at this time training obtain be image Yu emotional category and emotion rank correlation model, and if with the entity in image
Or entity attribute be emotion recognition object when, needed when carrying out emotion recognition model training to image progress artificial emotion type
With the label of emotion rank, and entity in image or entity attribute are identified, then carry out sentiment analysis, and training obtains at this time
Be entity or entity attribute and emotional category and emotion rank correlation model, therefore, herein also not to the feelings specifically used
Sense classification and emotion grade identification model are defined, and technical staff can carry out model selection or designed, designed according to actual needs
Training pattern.
Emotional category, emotion grade and keyword are input to presupposition analysis model, it is corresponding to generate keyword by S103
The corresponding adverbial word of adjective, adjective and keyword, adjective and adverbial word put in order.
Have the descriptive statement of emotion to picture strip in order to obtain, the emotional category and emotion grade for identifying image it
Afterwards, it is also necessary to generate putting in order between the corresponding words of description of each keyword and these words and keyword, i.e. feelings
Putting in order between the corresponding adjective of sense classification, the corresponding adverbial word of emotion grade and these words and keyword, with reality
Now to the generation of descriptive statement.
Due to the corresponding words of description of every a kind of emotion be all it is limited and known, such as happy emotion can be using " opening
The adjectives such as the heart ", " happiness " and " happiness " are described, and in the case where emotion grade classification rule determines, every first-class
The corresponding words of description of grade is also limited and known, such as with " very ", " very " or adverbial word can be not added realizes to not
The description of ad eundem, therefore in order to generate each keyword corresponding words of description under emotional category and emotion grade, can be with
Using corresponding words of description has been respectively provided with to every a kind of emotion and each grade in advance, further according to the specific emotion class of image
It is not chosen with emotion grade.Wherein, it is contemplated that it is different to different entities description habit in natural language, and
Emotion used in different entities, which describes adjective, in same sentence generally can also be distinguished, therefore the embodiment of the present invention exists
When generating emotional category and the corresponding words of description of emotion grade to different keywords, it is preferable that can be according to wherein entity
Difference carries out the settings of different words of description corresponding relationships, to realize that the words of description to different keyword adaptability is chosen,
For example, entity " people " can choose " happy " for happy emotion, " bright and beautiful " can be chosen to entity " sun ", it is same right
Different corresponding relationships also can be set in different happy grades.
After determining the corresponding words of description of each keyword, it is also necessary to further determine these words of description
Putting in order between keyword, to realize that the subsequent arrangement to descriptive statement generates.To between words of description and keyword
Arrangement, be exactly in fact in order to be arranged the sentence for meeting grammar for natural language rule for one, therefore here can be direct
Handled using the related algorithm of some natural language analysis, or can also by technical staff's designed, designed is some can be into
The algorithm that word sorts in line statement is handled, such as can voluntarily be analyzed a large amount of natural language, is extracted
The wherein ordering rule between word, and realized according to these ordering rules to the sequence between descriptive statement and keyword.
S104 based on keyword, adjective, adverbial word and puts in order, and generates the corresponding descriptive statement of image.
After determining putting in order between keyword and words of description, the obtained young bird to iamge description sentence
Final description can be obtained in shape, the addition arrangement for only needing to carry out some words at this time according to the syntax rule of natural language
Sentence, such as some prepositions are added to realize the smoothness of sentence.In order to realize the generation to final descriptive statement, the present invention is implemented
Example can directly adopt some existing natural language analysis algorithms such as textRank algorithm etc., describe language to final image to realize
The arrangement of sentence not limits herein.
The embodiment of the present invention is by carrying out the analysis of emotion type and emotion grade to image, and according to the emotion kind of image
Class and emotion grade generate putting in order for corresponding adjective, adverbial word and several words, then based on the corresponding keyword of entity,
The adjective of generation, adverbial word and three put in order to generate the descriptive statement final to image, so that final
The description content to the Image emotional semantic state is contained in descriptive statement, is realized the image for have emotion to image and is retouched
It states.
A kind of specific implementation as Entity recognition in the embodiment of the present invention one, comprising: known according to preset entity
Other model carries out Entity recognition, determines entity wherein included.
It is wherein as follows to the training process of entity recognition model:
Step 1: input sample image, and mark entity wherein included.
Step 2: convolutional layer 1: with the size of 10*10, step-length 1 carries out convolution operation, obtains shallow-layer feature.
Step 3: pond layer: the method for selecting maximum pond, having a size of 10*10, step-length 10, progress is down-sampled, obtains
Further feature.
Step 4: convolutional layer 2: with the size of 2*2, the carry out convolution operation that step-length is 2.
Step 5: full articulamentum 1: generating 5 nodes, and vector dimension is 256 dimensions, and activation primitive is relu function.
Step 6: full articulamentum 2: generating 1 node, and vector is 128 dimensions, and activation primitive is relu function.
Step 7: output layer: being based on whether there is specific target as output valve, carries out two classification processings of entity.
As a kind of specific implementation for generating the corresponding keyword of entity in the embodiment of the present invention one, as shown in Fig. 2,
The embodiment of the present invention two, comprising:
S201, identifies the entity class of entity, and reads the corresponding preset attribute label of entity class.
Wherein, preset attribute label record entity class it is corresponding needed for the attribute that parses.In view of different entity institutes
The attribute having is different, such as the corresponding attribute of people has ethnic group, gender, quantity and age bracket, and the corresponding attribute of cloud has
Color and quantity etc., the recognizer used as needed for different attributes may difference, such as gender identification and color
Algorithm needed for identification is entirely different, therefore accurately identifies to realize to different entities attribute, it is thus necessary to determine that goes out each reality
The attribute parsed needed for body is corresponding is handled with accurately choosing corresponding recognizer.It in embodiments of the present invention, can be preparatory
Type division is carried out to every kind of entity, the attribute parsed needed for it is corresponded to is set for each entity class, is carrying out entity category
Property parsing identification when, it is only necessary to determine the attribute parsed needed for it is corresponded to according to entity class, then choose corresponding identification calculation
Method is parsed.Regular and every a kind of corresponding required attribute parsed of entity that wherein specifically entity class is divided,
It can be set according to actual needs by technical staff, such as can be that the mankind, animal, plant and article are planted by entity division
Class, and the corresponding required attribute parsed is set.
S202, the attribute value for carrying out preset attribute label to image parses, and generates the obtained corresponding attribute of attribute value
Text.
After the attribute parsed needed for each entity is corresponding in determining image, start the attribute value to these attributes
Parsed, due to the used recognizer of different attribute parsing may difference, as the gender needs of people use
Gender recognizer, movement need to use action recognition algorithm, and the color of article needs use color recognition algorithm, because
This needs determines corresponding recognizer according to specific attribute, and determines that attribute is corresponding using these recognizers
Specific object value, such as it be female, movement is that run be jump and what color is that gender, which is male on earth,.Wherein, due to existing skill
There is the recognizer of the attributes such as more multipair gender, movement and color in art, therefore herein not to specifically used identification
Algorithm is defined, and technical staff can voluntarily select.
It is determining corresponding attribute value and then is generating corresponding attribute text, such as to gender schoolgirl at correspondence
" female ", to blue generate corresponding " blue ".
S203, entity name and attribute text based on entity generate the corresponding keyword of entity.
After obtaining each entity attributes text, directly combines it with entity name and produce corresponding key
Word, wherein the readability in order to guarantee keyword, can carry out word arrangement in combination ,+" dog " increase Jie of for example " running "
Word " ", obtain corresponding keyword: " dog run ", specific arrangement rule can by technical staff according to syntax rule voluntarily
Setting.
It in embodiments of the present invention, is adaptively each entity selection and generation pair by the specific type according to entity
The attribute text answered, and corresponding keyword is obtained, realize the extracted in self-adaptive to entity key.
As a kind of implementation for determining the corresponding preset attribute label of each entity class in the embodiment of the present invention two,
In view of entity each in actual conditions is corresponding with a large amount of attribute, as the corresponding attribute of people have ethnic group, gender, quantity and
Age bracket etc., if being depicted by all properties of each entity when carrying out iamge description, the description language that will lead to
Sentence is excessively cumbersome, and in real life when carrying out iamge description, will not generally retouch all properties of wherein entity
It states, but carries out selective attribute according to the actual scene situation of image and accept or reject, such as the picture field unrelated with politics
Scape will not generally emphasize ethnic group, will not generally describe the ethnic group attribute of people in image at this time, therefore, in order to guarantee iamge description
Sentence it is accurate succinct, as shown in figure 3, the embodiment of the present invention three, comprising:
S301 carries out scene category identification to image.
In view of the description emphasis of different scenes is different, therefore before carrying out entity attribute and determining, it is necessary first to determine
The scene type of image out can directly use existing some scene Recognition algorithms here, can also be by technical staff voluntarily
Design corresponding scene Recognition algorithm.
S302 determines the corresponding preset attribute label of each entity class according to scene type.
In order to realize that the descriptive statement for being directed to image scene simplifies, need exist for being directed to every kind of scene in advance by technical staff
The corresponding relationship between entity class and attribute tags is set, as assumed at this time may be used in scene type comprising non-political scene
The corresponding attribute of people is arranged as gender, quantity and age bracket.
As a kind of emotional category for carrying out image of the embodiment of the present invention and a kind of specific implementation of emotion grade identification
Mode, the embodiment of the present invention includes:
Extract the facial image for including in image, and emotion recognition carried out to facial image, obtain emotional category and
Emotion grade.
In embodiments of the present invention, it realizes by the way of facial image emotion recognition to Image emotional semantic in image
Identification, wherein the emotion recognition algorithm specifically used, can freely be set by technical staff, including but not limited to as using
Existing emotion recognition algorithm, or voluntarily after the label by carrying out emotional category and grade to sample facial image, then benefit
It is trained with such as depth convolutional neural networks model etc., the above-mentioned analysis of model progress of sentiment analysis can be carried out by obtaining correspondence.
Another kind as a kind of emotional category for carrying out image of the embodiment of the present invention and the identification of emotion grade is specific real
Existing mode, the embodiment of the present invention includes: Entity recognition is carried out according to preset Image emotional semantic model, determines reality wherein included
Body.
It is wherein as follows to the training process of Image emotional semantic model:
Step 1: input sample image, and mark the corresponding emotional category of sample image and emotion grade.
Step 2: convolutional layer 1: with the size of 10*10, step-length 1 carries out convolution operation, obtains shallow-layer feature.
Step 3: pond layer: the method for selecting maximum pond, having a size of 10*10, step-length 10, progress is down-sampled, obtains
Further feature.
Step 4: convolutional layer 2: with the size of 2*2, the carry out convolution operation that step-length is 2.
Step 5: full articulamentum 1: generating 5 nodes, and vector dimension is 256 dimensions, and activation primitive is relu function.
Step 6: full articulamentum 2: generating 1 node, and vector is 128 dimensions, and activation primitive is relu function.
Step 7: output layer: based on emotion type and rank as output valve, belonging to more disaggregated models.
As the embodiment of the present invention four, in order to realize in the embodiments of the present invention according to emotional category, emotion grade and
Keyword generates corresponding words of description, and determines putting in order between words of description and keyword, and the present invention is implemented
Regular meeting builds corresponding analysis model in advance, so that analysis uses, as shown in figure 4, specific analysis model construction method packet
It includes:
S401, the default training dataset of acquisition, presets training data and concentrates comprising multiple sample images and sample image one
One corresponding emotion category identities, emotion class letter and pattern representation sentence.
Wherein, emotion category identities and emotion class letter are used to identify the emotion type and emotion grade of image, can be by
Technical staff is in advance labeled every sample image.Pattern representation sentence can also be by technical staff in advance to every sample graph
As being obtained after carrying out natural language description.
S402 carries out keyword extraction, the sample keyword and sample keyword for being included to pattern representation sentence
Put in order.
Wherein sample keyword includes title+entity attribute of entity that is, in pattern representation sentence, and such as " what is run is small
Dog " can carry out keyword extraction by corresponding qualifier before the noun and noun in parsing sentence.Sample keyword
Put in order and directly can be obtained according to the sequence of extraction.
S403, the arrangement based on emotion category identities, emotion class letter, sample keyword and sample keyword are suitable
Sequence, creation and the one-to-one sample vector data of sample image.
In view of there may be differences for the keyword quantity of different pattern representation sentences, and image is included in practical application
Keyword quantity can not also determine, therefore the versatility in order to guarantee final training pattern, to guarantee that aforementioned present invention is implemented
The energy normal use embodiment of the present invention trains obtained presupposition analysis model in example, can be by pattern representation language in the embodiment of the present invention
The corresponding emotion category identities of sentence, emotion class letter and comprising keyword, be created as corresponding one-dimensional sample vector, and
One can be arranged and be fixed as vector length, it is 13 that length, which such as can be set, if emotion category identities+emotion of pattern representation sentence
Class letter+keyword quantity is then set as 0 to few part less than 13, and the sample vector obtained at this time is [emotion kind category
Know, emotion class letter, keyword 1, keyword 2, keyword 3, keyword 4, keyword 5, keyword 6, keyword 7, it is crucial
Word 8, keyword 9, keyword 10, keyword 11], by analyzing multiple pattern representation sentences, available a plurality of sample
This vector data.Wherein, the occurrence of vector length can be by technical staff's sets itself.
S404 is trained presupposition analysis model based on obtained a plurality of sample vector data, until presupposition analysis mould
The similarity for the descriptive statement and pattern representation sentence to sample image that type generates is higher than preset threshold, obtains presupposition analysis mould
The model parameter of type.
After obtaining the corresponding sample vector data of sample image, using sample vector data to presupposition analysis model into
Row training, wherein presupposition analysis model can be the text sequence model constructed by LSTM shot and long term memory network, presets and divides
The preset adjective for being largely used to emotion description is contained in analysis model, and for the adverbial word of emotion descriptive grade, into
The keyword in sample vector data is combined using these preset adjectives and adverbial word when row training, and by each group
It closes obtained sample vector and constructs corresponding descriptive statement, then the descriptive statement of generation is carried out with corresponding pattern representation sentence
Matching is higher than preset threshold as constraint condition, when higher than threshold value using the similarity of the descriptive statement of output and pattern representation sentence
When, the model illustrated may be implemented to generate the corresponding description language of keyword for Image emotional semantic classification and emotion grade
Sentence, so that the building to presupposition analysis model is realized, as long as subsequent directly using trained presupposition analysis model to image
Emotional category, emotion grade and comprising keyword analyzed, can be obtained required keyword words of description and
It puts in order.Wherein the specific value size of threshold value can be by technical staff's sets itself.
As according to keyword, adjective, adverbial word and putting in order in the embodiments of the present invention and generate corresponding figure
As a kind of specific implementation of descriptive statement, such as Fig. 5, the embodiment of the present invention five, comprising:
S501 is ranked up combination to keyword, adjective and adverbial word based on putting in order, obtains basic sentence.
S502 carries out semantic analysis error correction to basic sentence, obtains descriptive statement.
Since directly according to putting in order, to be sequentially placed the combination that keyword, adjective and adverbial word obtain be these words
The basic sentence of language, usually can not normal reading, need for basic sentence to be adjusted to meet the syntax rule of natural language
Sentence, in the embodiment of the present invention guarantee the descriptive statement validity of final output, can base after being sorted
On the basis of plinth sentence, semantic analysis error correction is carried out to basic sentence, determines the portion for being unsatisfactory for syntax rule present in it
Divide and error correction adjustment is carried out to it, to obtain final readable natural language.Wherein specific speech analysis error correction method can
It is selected according to actual needs by technical staff, some existing algorithms such as can be selected or according to syntax rule designed, designed.
In embodiments of the present invention, by carrying out the analysis of emotion type and emotion grade to image, and divided in advance
The training building for analysing model, generates corresponding adjective, adverbial word and several words according to the emotion type of image and emotion grade
Put in order, then it is suitable to the corresponding keyword of entity, the adjective of generation, adverbial word and the arrangement of three based on analysis model
Sequence is handled, to generate the descriptive statement final to image, so that being contained in final descriptive statement to the image
The description content of affective state realizes and carries out the iamge description with emotion to image.
Corresponding to the method for foregoing embodiments, Fig. 6 shows the structure of iamge description device provided in an embodiment of the present invention
Block diagram, for ease of description, only parts related to embodiments of the present invention are shown.The exemplary iamge description device of Fig. 6 can be with
It is the executing subject for the Image Description Methods that previous embodiment one provides.
Referring to Fig. 6, which includes:
Keyword identification module 61, the entity for including in image for identification, and generate the corresponding keyword of the entity.
Emotion recognition module 62, for identification corresponding emotional category of described image and emotion grade.
Word analysis module 63, it is pre- for the emotional category, the emotion grade and the keyword to be input to
If analysis model, the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, institute are generated
State putting in order for adjective and the adverbial word.
Sentence generation module 64, for suitable based on the keyword, the adjective, the adverbial word and the arrangement
Sequence generates the corresponding descriptive statement of described image.
Further, keyword identification module 61, comprising:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class.
The attribute value parsing of the preset attribute label is carried out to described image, and generates the obtained corresponding category of attribute value
Property text.
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
Further, emotion recognition module 62, comprising:
The facial image for including in described image is extracted, and emotion recognition is carried out to the facial image, is obtained described
Emotional category and the emotion grade.
Further, the iamge description device, further includes:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample graph
As one-to-one emotion category identities, emotion class letter and pattern representation sentence.
Keyword extraction is carried out to the pattern representation sentence, the sample keyword for being included and the sample are crucial
Word puts in order.
It is crucial based on the emotion category identities, the emotion class letter, the sample keyword and the sample
Word puts in order, creation and the one-to-one sample vector data of the sample image.
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until described default
The similarity for the descriptive statement and the pattern representation sentence to the sample image that analysis model generates is higher than preset threshold,
Obtain the model parameter of the presupposition analysis model.
Further, sentence generation module 64, comprising:
Combination is ranked up to the keyword, the adjective and the adverbial word based on described put in order, is obtained
Basic sentence.
Semantic analysis error correction is carried out to the basic sentence, obtains the descriptive statement.
Each module realizes the process of respective function in iamge description device provided in an embodiment of the present invention, before specifically referring to
The description of embodiment illustrated in fig. 1 one is stated, details are not described herein again.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Although will also be appreciated that term " first ", " second " etc. are used in some embodiment of the present invention in the text
Various elements are described, but these elements should not be limited by these terms.These terms are used only to an element
It is distinguished with another element.For example, the first table can be named as the second table, and similarly, the second table can be by
It is named as the first table, without departing from the range of various described embodiments.First table and the second table are all tables, but
It is them is not same table.
Fig. 7 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 7, the terminal of the embodiment is set
Standby 7 include: processor 70, memory 71, and the computer that can be run on the processor 70 is stored in the memory 71
Program 72.The processor 70 realizes the step in above-mentioned each Image Description Methods embodiment when executing the computer program 72
Suddenly, such as step 101 shown in FIG. 1 is to 104.Alternatively, the processor 70 realized when executing the computer program 72 it is above-mentioned
The function of each module/unit in each Installation practice, such as the function of module 61 to 64 shown in Fig. 6.
The terminal device 7 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set
It is standby.The terminal device may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that Fig. 7
The only example of terminal device 7 does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions
Part perhaps combines certain components or different components, such as the terminal device can also include input sending device, net
Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk or interior of terminal device 7
It deposits.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as be equipped on the terminal device 7
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7
Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device
Other programs and data.The memory 71, which can be also used for temporarily storing, have been sent or data to be sent.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or
In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation
All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program
Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on
The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium
It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code
Dish, CD, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random
Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the essence of corresponding technical solution is departed from the spirit and scope of the technical scheme of various embodiments of the present invention, it should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of Image Description Methods characterized by comprising
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, generate the key
The arrangement of the corresponding adjective of word, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word
Sequentially;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate the corresponding description of described image
Sentence.
2. Image Description Methods as described in claim 1, which is characterized in that it is described to generate the corresponding keyword of the entity,
Include:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class;
The attribute value parsing of the preset attribute label is carried out to described image, and generates the corresponding attribute text of obtained attribute value
Word;
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
3. Image Description Methods as described in claim 1, which is characterized in that the corresponding emotional category of the identification described image
And emotion grade, comprising:
The facial image for including in described image is extracted, and emotion recognition is carried out to the facial image, obtains the emotion
Classification and the emotion grade.
4. Image Description Methods as described in claim 1, which is characterized in that it is described identification image in include entity it
Before, further includes:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample image one
One corresponding emotion category identities, emotion class letter and pattern representation sentence;
Keyword extraction, the sample keyword for being included and the sample keyword are carried out to the pattern representation sentence
It puts in order;
Based on the emotion category identities, the emotion class letter, the sample keyword and the sample keyword
It puts in order, creation and the one-to-one sample vector data of the sample image;
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until the presupposition analysis
The similarity for the descriptive statement and the pattern representation sentence to the sample image that model generates is higher than preset threshold, obtains
The model parameter of the presupposition analysis model.
5. Image Description Methods as described in claim 1, which is characterized in that it is described based on the keyword, the adjective,
The adverbial word and it is described put in order, generate the corresponding descriptive statement of described image, comprising:
Combination is ranked up to the keyword, the adjective and the adverbial word based on described put in order, obtains basis
Sentence;
Semantic analysis error correction is carried out to the basic sentence, obtains the descriptive statement.
6. a kind of terminal device, which is characterized in that the terminal device includes memory, processor, is stored on the memory
There is the computer program that can be run on the processor, the processor realizes following step when executing the computer program
It is rapid:
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, generate the key
The arrangement of the corresponding adjective of word, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word
Sequentially;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate the corresponding description of described image
Sentence.
7. terminal device as claimed in claim 6, which is characterized in that described to generate the corresponding keyword of the entity, comprising:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class;
The attribute value parsing of the preset attribute label is carried out to described image, and generates the corresponding attribute text of obtained attribute value
Word;
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
8. terminal device as claimed in claim 6, which is characterized in that the processor goes back reality when executing the computer program
Existing following steps:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample image one
One corresponding emotion category identities, emotion class letter and pattern representation sentence;
Keyword extraction, the sample keyword for being included and the sample keyword are carried out to the pattern representation sentence
It puts in order;
Based on the emotion category identities, the emotion class letter, the sample keyword and the sample keyword
It puts in order, creation and the one-to-one sample vector data of the sample image;
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until the presupposition analysis
The similarity for the descriptive statement and the pattern representation sentence to the sample image that model generates is higher than preset threshold, obtains
The model parameter of the presupposition analysis model.
9. a kind of iamge description device characterized by comprising
Keyword identification module, the entity for including in image for identification, and generate the corresponding keyword of the entity;
Emotion recognition module, for identification corresponding emotional category of described image and emotion grade;
Word analysis module, for the emotional category, the emotion grade and the keyword to be input to presupposition analysis
Model generates the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, described describes
Word and the adverbial word put in order;
Sentence generation module, for based on the keyword, the adjective, the adverbial word and it is described put in order, generate
The corresponding descriptive statement of described image.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811343846.XA CN109657079A (en) | 2018-11-13 | 2018-11-13 | A kind of Image Description Methods and terminal device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811343846.XA CN109657079A (en) | 2018-11-13 | 2018-11-13 | A kind of Image Description Methods and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109657079A true CN109657079A (en) | 2019-04-19 |
Family
ID=66110869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811343846.XA Pending CN109657079A (en) | 2018-11-13 | 2018-11-13 | A kind of Image Description Methods and terminal device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109657079A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275110A (en) * | 2020-01-20 | 2020-06-12 | 北京百度网讯科技有限公司 | Image description method and device, electronic equipment and storage medium |
CN113536009A (en) * | 2021-07-14 | 2021-10-22 | Oppo广东移动通信有限公司 | Data description method and device, computer readable medium and electronic device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN106503055A (en) * | 2016-09-27 | 2017-03-15 | 天津大学 | A kind of generation method from structured text to iamge description |
CN107169409A (en) * | 2017-03-31 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of emotion identification method and device |
CN107679580A (en) * | 2017-10-21 | 2018-02-09 | 桂林电子科技大学 | A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth |
CN107766853A (en) * | 2016-08-16 | 2018-03-06 | 阿里巴巴集团控股有限公司 | A kind of generation, display methods and the electronic equipment of the text message of image |
CN108241682A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Determine the method and device of text emotion |
CN108268629A (en) * | 2018-01-15 | 2018-07-10 | 北京市商汤科技开发有限公司 | Image Description Methods and device, equipment, medium, program based on keyword |
CN108764141A (en) * | 2018-05-25 | 2018-11-06 | 广州虎牙信息科技有限公司 | A kind of scene of game describes method, apparatus, equipment and its storage medium |
-
2018
- 2018-11-13 CN CN201811343846.XA patent/CN109657079A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766853A (en) * | 2016-08-16 | 2018-03-06 | 阿里巴巴集团控股有限公司 | A kind of generation, display methods and the electronic equipment of the text message of image |
CN106446782A (en) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | Image identification method and device |
CN106503055A (en) * | 2016-09-27 | 2017-03-15 | 天津大学 | A kind of generation method from structured text to iamge description |
CN108241682A (en) * | 2016-12-26 | 2018-07-03 | 北京国双科技有限公司 | Determine the method and device of text emotion |
CN107169409A (en) * | 2017-03-31 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of emotion identification method and device |
CN107679580A (en) * | 2017-10-21 | 2018-02-09 | 桂林电子科技大学 | A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth |
CN108268629A (en) * | 2018-01-15 | 2018-07-10 | 北京市商汤科技开发有限公司 | Image Description Methods and device, equipment, medium, program based on keyword |
CN108764141A (en) * | 2018-05-25 | 2018-11-06 | 广州虎牙信息科技有限公司 | A kind of scene of game describes method, apparatus, equipment and its storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111275110A (en) * | 2020-01-20 | 2020-06-12 | 北京百度网讯科技有限公司 | Image description method and device, electronic equipment and storage medium |
CN113536009A (en) * | 2021-07-14 | 2021-10-22 | Oppo广东移动通信有限公司 | Data description method and device, computer readable medium and electronic device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10430689B2 (en) | Training a classifier algorithm used for automatically generating tags to be applied to images | |
US20210271824A1 (en) | Context Saliency-Based Deictic Parser for Natural Language Processing | |
JP7178995B2 (en) | Architectures and processes for computer learning and understanding | |
US11227118B2 (en) | Methods, devices, and systems for constructing intelligent knowledge base | |
CN106257440B (en) | Semantic information generation method and semantic information generation device | |
Boyd-Graber et al. | A topic model for word sense disambiguation | |
CN110321563B (en) | Text emotion analysis method based on hybrid supervision model | |
CN110427463A (en) | Search statement response method, device and server and storage medium | |
CN105975458B (en) | A kind of Chinese long sentence similarity calculating method based on fine granularity dependence | |
CN105975531B (en) | Robot dialog control method and system based on dialogue knowledge base | |
CN109800414A (en) | Faulty wording corrects recommended method and system | |
CN110309114B (en) | Method and device for processing media information, storage medium and electronic device | |
CN109033085B (en) | Chinese word segmentation system and Chinese text word segmentation method | |
CN113239169A (en) | Artificial intelligence-based answer generation method, device, equipment and storage medium | |
CN108664465A (en) | One kind automatically generating text method and relevant apparatus | |
CN110427478A (en) | A kind of the question and answer searching method and system of knowledge based map | |
CN109840255A (en) | Reply document creation method, device, equipment and storage medium | |
CN112101042A (en) | Text emotion recognition method and device, terminal device and storage medium | |
CN110929532B (en) | Data processing method, device, equipment and storage medium | |
CN109657079A (en) | A kind of Image Description Methods and terminal device | |
US9286289B2 (en) | Ordering a lexicon network for automatic disambiguation | |
CN107193941A (en) | Story generation method and device based on picture content | |
CN115525740A (en) | Method and device for generating dialogue response sentence, electronic equipment and storage medium | |
CN116757195B (en) | Implicit emotion recognition method based on prompt learning | |
CN113743079A (en) | Text similarity calculation method and device based on co-occurrence entity interaction graph |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |