CN109657079A - A kind of Image Description Methods and terminal device - Google Patents

A kind of Image Description Methods and terminal device Download PDF

Info

Publication number
CN109657079A
CN109657079A CN201811343846.XA CN201811343846A CN109657079A CN 109657079 A CN109657079 A CN 109657079A CN 201811343846 A CN201811343846 A CN 201811343846A CN 109657079 A CN109657079 A CN 109657079A
Authority
CN
China
Prior art keywords
keyword
image
entity
emotion
adjective
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811343846.XA
Other languages
Chinese (zh)
Inventor
吴壮伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811343846.XA priority Critical patent/CN109657079A/en
Publication of CN109657079A publication Critical patent/CN109657079A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of Image Description Methods and terminal devices, are suitable for technical field of data processing, this method comprises: the entity for including in identification image, and generate the corresponding keyword of entity;Identify the corresponding emotional category of image and emotion grade;Emotional category, emotion grade and keyword are input to presupposition analysis model, generate putting in order for the corresponding adjective of keyword, the corresponding adverbial word of adjective and keyword, adjective and adverbial word;It based on keyword, adjective, adverbial word and puts in order, generates the corresponding descriptive statement of image.The embodiment of the present invention to contain the description content to the Image emotional semantic state in final descriptive statement, realizes and carries out the iamge description with emotion to image.

Description

A kind of Image Description Methods and terminal device
Technical field
The invention belongs to technical field of data processing more particularly to Image Description Methods and terminal devices.
Background technique
Iamge description, which refers to, automatically generates natural language sentence by computer to describe the content of given image.It is existing After the entity that Image Description Methods often only include in identifying image, simply according to the syntax rule of natural language Being arranged is corresponding natural sentence, such as " blue sky and white cloud, two people play soccer on meadow.", although being able to achieve in this way pair The mechanical description of image, but in actual conditions, natural language used in us all has certain emotion, such as " blue sky and white cloud, two people amusedly kick football on meadow!", happy happy emotion is filled in the words, therefore right Image is only merely to carry out mechanical description to be often difficult to meet the actual demand of people.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of Image Description Methods and terminal device, to solve the prior art In the problem of can not carrying out the iamge description with emotion.
The first aspect of the embodiment of the present invention provides a kind of Image Description Methods, comprising:
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, described in generation The corresponding adjective of keyword, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word It puts in order;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate described image it is corresponding Descriptive statement.
The second aspect of the embodiment of the present invention provides a kind of terminal device, and the terminal device includes memory, processing Device, the computer program that can be run on the processor is stored on the memory, and the processor executes the calculating Following steps are realized when machine program.
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, described in generation The corresponding adjective of keyword, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word It puts in order;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate described image it is corresponding Descriptive statement.
The third aspect of the embodiment of the present invention provides a kind of iamge description device, comprising:
Keyword identification module, the entity for including in image for identification, and generate the corresponding keyword of the entity;
Emotion recognition module, for identification corresponding emotional category of described image and emotion grade;
Word analysis module, it is default for the emotional category, the emotion grade and the keyword to be input to Analysis model generates the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, described Adjective and the adverbial word put in order;
Sentence generation module, for based on the keyword, the adjective, the adverbial word and it is described put in order, Generate the corresponding descriptive statement of described image.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, comprising: is stored with computer Program, which is characterized in that the step of computer program realizes Image Description Methods as described above when being executed by processor.
Existing beneficial effect is the embodiment of the present invention compared with prior art: by carrying out emotion type and feelings to image Feel the analysis of grade, and corresponding adjective, adverbial word and several words are generated according to the emotion type of image and emotion grade It puts in order, then is generated based on putting in order for the corresponding keyword of entity, the adjective of generation, adverbial word and three to figure As final descriptive statement, so that the description content to the Image emotional semantic state is contained in final descriptive statement, it is real Show and the iamge description with emotion is carried out to image.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of Image Description Methods provided by Embodiment 2 of the present invention;
Fig. 3 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention three provides;
Fig. 4 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention four provides;
Fig. 5 is the implementation process schematic diagram for the Image Description Methods that the embodiment of the present invention five provides;
Fig. 6 is the structural schematic diagram for the iamge description device that the embodiment of the present invention six provides;
Fig. 7 is the schematic diagram for the terminal device that the embodiment of the present invention seven provides.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.
In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.
Fig. 1 shows the implementation flow chart of the Image Description Methods of the offer of the embodiment of the present invention one, and details are as follows:
S101 identifies the entity for including in image, and generates the corresponding keyword of entity.
In view of existing entity identification algorithms are more mature, therefore the entity identification algorithms of use are not carried out herein It limits, can specifically be chosen according to demand by technical staff.The corresponding keyword of entity refers to entity attribute+title forms word Language or phrase, such as the doggie run or white teacup, in order to generate the corresponding keyword of entity, the embodiment of the present invention is being known Further entity attributes Chu not can be identified after entity, determine each entity attributes regeneration pair in image The keyword answered.Wherein, thingness identification can equally be realized using some existing Attribute Recognition algorithms, it is such as existing Some color recognizers, action recognition algorithm and emotion recognition algorithm etc. not limit herein, can be by technical staff voluntarily It chooses.
S102, the corresponding emotional category of identification image and emotion grade.
The description with emotion is carried out to image in order to realize, it is necessary first to determine the main emotion that image is included, Therefore the embodiment of the present invention can identify the corresponding emotional category of image.Emotion grade refers to that Image emotional semantic is corresponding dense Degree, such as happy emotion can be divided into " happy " and " very happy " two emotion grades according to degree difference, specifically to not The grade classification rule of feeling of sympathy, can be by technical staff's sets itself according to actual needs.Due to according to referenced emotion knowledge The difference of other object, obtained affection recognition of image algorithm can also have differences, if with image generally emotion recognition pair As when carrying out emotion recognition model training artificial emotion type and emotion rank can be carried out to the image as training sample Label, at this time training obtain be image Yu emotional category and emotion rank correlation model, and if with the entity in image Or entity attribute be emotion recognition object when, needed when carrying out emotion recognition model training to image progress artificial emotion type With the label of emotion rank, and entity in image or entity attribute are identified, then carry out sentiment analysis, and training obtains at this time Be entity or entity attribute and emotional category and emotion rank correlation model, therefore, herein also not to the feelings specifically used Sense classification and emotion grade identification model are defined, and technical staff can carry out model selection or designed, designed according to actual needs Training pattern.
Emotional category, emotion grade and keyword are input to presupposition analysis model, it is corresponding to generate keyword by S103 The corresponding adverbial word of adjective, adjective and keyword, adjective and adverbial word put in order.
Have the descriptive statement of emotion to picture strip in order to obtain, the emotional category and emotion grade for identifying image it Afterwards, it is also necessary to generate putting in order between the corresponding words of description of each keyword and these words and keyword, i.e. feelings Putting in order between the corresponding adjective of sense classification, the corresponding adverbial word of emotion grade and these words and keyword, with reality Now to the generation of descriptive statement.
Due to the corresponding words of description of every a kind of emotion be all it is limited and known, such as happy emotion can be using " opening The adjectives such as the heart ", " happiness " and " happiness " are described, and in the case where emotion grade classification rule determines, every first-class The corresponding words of description of grade is also limited and known, such as with " very ", " very " or adverbial word can be not added realizes to not The description of ad eundem, therefore in order to generate each keyword corresponding words of description under emotional category and emotion grade, can be with Using corresponding words of description has been respectively provided with to every a kind of emotion and each grade in advance, further according to the specific emotion class of image It is not chosen with emotion grade.Wherein, it is contemplated that it is different to different entities description habit in natural language, and Emotion used in different entities, which describes adjective, in same sentence generally can also be distinguished, therefore the embodiment of the present invention exists When generating emotional category and the corresponding words of description of emotion grade to different keywords, it is preferable that can be according to wherein entity Difference carries out the settings of different words of description corresponding relationships, to realize that the words of description to different keyword adaptability is chosen, For example, entity " people " can choose " happy " for happy emotion, " bright and beautiful " can be chosen to entity " sun ", it is same right Different corresponding relationships also can be set in different happy grades.
After determining the corresponding words of description of each keyword, it is also necessary to further determine these words of description Putting in order between keyword, to realize that the subsequent arrangement to descriptive statement generates.To between words of description and keyword Arrangement, be exactly in fact in order to be arranged the sentence for meeting grammar for natural language rule for one, therefore here can be direct Handled using the related algorithm of some natural language analysis, or can also by technical staff's designed, designed is some can be into The algorithm that word sorts in line statement is handled, such as can voluntarily be analyzed a large amount of natural language, is extracted The wherein ordering rule between word, and realized according to these ordering rules to the sequence between descriptive statement and keyword.
S104 based on keyword, adjective, adverbial word and puts in order, and generates the corresponding descriptive statement of image.
After determining putting in order between keyword and words of description, the obtained young bird to iamge description sentence Final description can be obtained in shape, the addition arrangement for only needing to carry out some words at this time according to the syntax rule of natural language Sentence, such as some prepositions are added to realize the smoothness of sentence.In order to realize the generation to final descriptive statement, the present invention is implemented Example can directly adopt some existing natural language analysis algorithms such as textRank algorithm etc., describe language to final image to realize The arrangement of sentence not limits herein.
The embodiment of the present invention is by carrying out the analysis of emotion type and emotion grade to image, and according to the emotion kind of image Class and emotion grade generate putting in order for corresponding adjective, adverbial word and several words, then based on the corresponding keyword of entity, The adjective of generation, adverbial word and three put in order to generate the descriptive statement final to image, so that final The description content to the Image emotional semantic state is contained in descriptive statement, is realized the image for have emotion to image and is retouched It states.
A kind of specific implementation as Entity recognition in the embodiment of the present invention one, comprising: known according to preset entity Other model carries out Entity recognition, determines entity wherein included.
It is wherein as follows to the training process of entity recognition model:
Step 1: input sample image, and mark entity wherein included.
Step 2: convolutional layer 1: with the size of 10*10, step-length 1 carries out convolution operation, obtains shallow-layer feature.
Step 3: pond layer: the method for selecting maximum pond, having a size of 10*10, step-length 10, progress is down-sampled, obtains Further feature.
Step 4: convolutional layer 2: with the size of 2*2, the carry out convolution operation that step-length is 2.
Step 5: full articulamentum 1: generating 5 nodes, and vector dimension is 256 dimensions, and activation primitive is relu function.
Step 6: full articulamentum 2: generating 1 node, and vector is 128 dimensions, and activation primitive is relu function.
Step 7: output layer: being based on whether there is specific target as output valve, carries out two classification processings of entity.
As a kind of specific implementation for generating the corresponding keyword of entity in the embodiment of the present invention one, as shown in Fig. 2, The embodiment of the present invention two, comprising:
S201, identifies the entity class of entity, and reads the corresponding preset attribute label of entity class.
Wherein, preset attribute label record entity class it is corresponding needed for the attribute that parses.In view of different entity institutes The attribute having is different, such as the corresponding attribute of people has ethnic group, gender, quantity and age bracket, and the corresponding attribute of cloud has Color and quantity etc., the recognizer used as needed for different attributes may difference, such as gender identification and color Algorithm needed for identification is entirely different, therefore accurately identifies to realize to different entities attribute, it is thus necessary to determine that goes out each reality The attribute parsed needed for body is corresponding is handled with accurately choosing corresponding recognizer.It in embodiments of the present invention, can be preparatory Type division is carried out to every kind of entity, the attribute parsed needed for it is corresponded to is set for each entity class, is carrying out entity category Property parsing identification when, it is only necessary to determine the attribute parsed needed for it is corresponded to according to entity class, then choose corresponding identification calculation Method is parsed.Regular and every a kind of corresponding required attribute parsed of entity that wherein specifically entity class is divided, It can be set according to actual needs by technical staff, such as can be that the mankind, animal, plant and article are planted by entity division Class, and the corresponding required attribute parsed is set.
S202, the attribute value for carrying out preset attribute label to image parses, and generates the obtained corresponding attribute of attribute value Text.
After the attribute parsed needed for each entity is corresponding in determining image, start the attribute value to these attributes Parsed, due to the used recognizer of different attribute parsing may difference, as the gender needs of people use Gender recognizer, movement need to use action recognition algorithm, and the color of article needs use color recognition algorithm, because This needs determines corresponding recognizer according to specific attribute, and determines that attribute is corresponding using these recognizers Specific object value, such as it be female, movement is that run be jump and what color is that gender, which is male on earth,.Wherein, due to existing skill There is the recognizer of the attributes such as more multipair gender, movement and color in art, therefore herein not to specifically used identification Algorithm is defined, and technical staff can voluntarily select.
It is determining corresponding attribute value and then is generating corresponding attribute text, such as to gender schoolgirl at correspondence " female ", to blue generate corresponding " blue ".
S203, entity name and attribute text based on entity generate the corresponding keyword of entity.
After obtaining each entity attributes text, directly combines it with entity name and produce corresponding key Word, wherein the readability in order to guarantee keyword, can carry out word arrangement in combination ,+" dog " increase Jie of for example " running " Word " ", obtain corresponding keyword: " dog run ", specific arrangement rule can by technical staff according to syntax rule voluntarily Setting.
It in embodiments of the present invention, is adaptively each entity selection and generation pair by the specific type according to entity The attribute text answered, and corresponding keyword is obtained, realize the extracted in self-adaptive to entity key.
As a kind of implementation for determining the corresponding preset attribute label of each entity class in the embodiment of the present invention two, In view of entity each in actual conditions is corresponding with a large amount of attribute, as the corresponding attribute of people have ethnic group, gender, quantity and Age bracket etc., if being depicted by all properties of each entity when carrying out iamge description, the description language that will lead to Sentence is excessively cumbersome, and in real life when carrying out iamge description, will not generally retouch all properties of wherein entity It states, but carries out selective attribute according to the actual scene situation of image and accept or reject, such as the picture field unrelated with politics Scape will not generally emphasize ethnic group, will not generally describe the ethnic group attribute of people in image at this time, therefore, in order to guarantee iamge description Sentence it is accurate succinct, as shown in figure 3, the embodiment of the present invention three, comprising:
S301 carries out scene category identification to image.
In view of the description emphasis of different scenes is different, therefore before carrying out entity attribute and determining, it is necessary first to determine The scene type of image out can directly use existing some scene Recognition algorithms here, can also be by technical staff voluntarily Design corresponding scene Recognition algorithm.
S302 determines the corresponding preset attribute label of each entity class according to scene type.
In order to realize that the descriptive statement for being directed to image scene simplifies, need exist for being directed to every kind of scene in advance by technical staff The corresponding relationship between entity class and attribute tags is set, as assumed at this time may be used in scene type comprising non-political scene The corresponding attribute of people is arranged as gender, quantity and age bracket.
As a kind of emotional category for carrying out image of the embodiment of the present invention and a kind of specific implementation of emotion grade identification Mode, the embodiment of the present invention includes:
Extract the facial image for including in image, and emotion recognition carried out to facial image, obtain emotional category and Emotion grade.
In embodiments of the present invention, it realizes by the way of facial image emotion recognition to Image emotional semantic in image Identification, wherein the emotion recognition algorithm specifically used, can freely be set by technical staff, including but not limited to as using Existing emotion recognition algorithm, or voluntarily after the label by carrying out emotional category and grade to sample facial image, then benefit It is trained with such as depth convolutional neural networks model etc., the above-mentioned analysis of model progress of sentiment analysis can be carried out by obtaining correspondence.
Another kind as a kind of emotional category for carrying out image of the embodiment of the present invention and the identification of emotion grade is specific real Existing mode, the embodiment of the present invention includes: Entity recognition is carried out according to preset Image emotional semantic model, determines reality wherein included Body.
It is wherein as follows to the training process of Image emotional semantic model:
Step 1: input sample image, and mark the corresponding emotional category of sample image and emotion grade.
Step 2: convolutional layer 1: with the size of 10*10, step-length 1 carries out convolution operation, obtains shallow-layer feature.
Step 3: pond layer: the method for selecting maximum pond, having a size of 10*10, step-length 10, progress is down-sampled, obtains Further feature.
Step 4: convolutional layer 2: with the size of 2*2, the carry out convolution operation that step-length is 2.
Step 5: full articulamentum 1: generating 5 nodes, and vector dimension is 256 dimensions, and activation primitive is relu function.
Step 6: full articulamentum 2: generating 1 node, and vector is 128 dimensions, and activation primitive is relu function.
Step 7: output layer: based on emotion type and rank as output valve, belonging to more disaggregated models.
As the embodiment of the present invention four, in order to realize in the embodiments of the present invention according to emotional category, emotion grade and Keyword generates corresponding words of description, and determines putting in order between words of description and keyword, and the present invention is implemented Regular meeting builds corresponding analysis model in advance, so that analysis uses, as shown in figure 4, specific analysis model construction method packet It includes:
S401, the default training dataset of acquisition, presets training data and concentrates comprising multiple sample images and sample image one One corresponding emotion category identities, emotion class letter and pattern representation sentence.
Wherein, emotion category identities and emotion class letter are used to identify the emotion type and emotion grade of image, can be by Technical staff is in advance labeled every sample image.Pattern representation sentence can also be by technical staff in advance to every sample graph As being obtained after carrying out natural language description.
S402 carries out keyword extraction, the sample keyword and sample keyword for being included to pattern representation sentence Put in order.
Wherein sample keyword includes title+entity attribute of entity that is, in pattern representation sentence, and such as " what is run is small Dog " can carry out keyword extraction by corresponding qualifier before the noun and noun in parsing sentence.Sample keyword Put in order and directly can be obtained according to the sequence of extraction.
S403, the arrangement based on emotion category identities, emotion class letter, sample keyword and sample keyword are suitable Sequence, creation and the one-to-one sample vector data of sample image.
In view of there may be differences for the keyword quantity of different pattern representation sentences, and image is included in practical application Keyword quantity can not also determine, therefore the versatility in order to guarantee final training pattern, to guarantee that aforementioned present invention is implemented The energy normal use embodiment of the present invention trains obtained presupposition analysis model in example, can be by pattern representation language in the embodiment of the present invention The corresponding emotion category identities of sentence, emotion class letter and comprising keyword, be created as corresponding one-dimensional sample vector, and One can be arranged and be fixed as vector length, it is 13 that length, which such as can be set, if emotion category identities+emotion of pattern representation sentence Class letter+keyword quantity is then set as 0 to few part less than 13, and the sample vector obtained at this time is [emotion kind category Know, emotion class letter, keyword 1, keyword 2, keyword 3, keyword 4, keyword 5, keyword 6, keyword 7, it is crucial Word 8, keyword 9, keyword 10, keyword 11], by analyzing multiple pattern representation sentences, available a plurality of sample This vector data.Wherein, the occurrence of vector length can be by technical staff's sets itself.
S404 is trained presupposition analysis model based on obtained a plurality of sample vector data, until presupposition analysis mould The similarity for the descriptive statement and pattern representation sentence to sample image that type generates is higher than preset threshold, obtains presupposition analysis mould The model parameter of type.
After obtaining the corresponding sample vector data of sample image, using sample vector data to presupposition analysis model into Row training, wherein presupposition analysis model can be the text sequence model constructed by LSTM shot and long term memory network, presets and divides The preset adjective for being largely used to emotion description is contained in analysis model, and for the adverbial word of emotion descriptive grade, into The keyword in sample vector data is combined using these preset adjectives and adverbial word when row training, and by each group It closes obtained sample vector and constructs corresponding descriptive statement, then the descriptive statement of generation is carried out with corresponding pattern representation sentence Matching is higher than preset threshold as constraint condition, when higher than threshold value using the similarity of the descriptive statement of output and pattern representation sentence When, the model illustrated may be implemented to generate the corresponding description language of keyword for Image emotional semantic classification and emotion grade Sentence, so that the building to presupposition analysis model is realized, as long as subsequent directly using trained presupposition analysis model to image Emotional category, emotion grade and comprising keyword analyzed, can be obtained required keyword words of description and It puts in order.Wherein the specific value size of threshold value can be by technical staff's sets itself.
As according to keyword, adjective, adverbial word and putting in order in the embodiments of the present invention and generate corresponding figure As a kind of specific implementation of descriptive statement, such as Fig. 5, the embodiment of the present invention five, comprising:
S501 is ranked up combination to keyword, adjective and adverbial word based on putting in order, obtains basic sentence.
S502 carries out semantic analysis error correction to basic sentence, obtains descriptive statement.
Since directly according to putting in order, to be sequentially placed the combination that keyword, adjective and adverbial word obtain be these words The basic sentence of language, usually can not normal reading, need for basic sentence to be adjusted to meet the syntax rule of natural language Sentence, in the embodiment of the present invention guarantee the descriptive statement validity of final output, can base after being sorted On the basis of plinth sentence, semantic analysis error correction is carried out to basic sentence, determines the portion for being unsatisfactory for syntax rule present in it Divide and error correction adjustment is carried out to it, to obtain final readable natural language.Wherein specific speech analysis error correction method can It is selected according to actual needs by technical staff, some existing algorithms such as can be selected or according to syntax rule designed, designed.
In embodiments of the present invention, by carrying out the analysis of emotion type and emotion grade to image, and divided in advance The training building for analysing model, generates corresponding adjective, adverbial word and several words according to the emotion type of image and emotion grade Put in order, then it is suitable to the corresponding keyword of entity, the adjective of generation, adverbial word and the arrangement of three based on analysis model Sequence is handled, to generate the descriptive statement final to image, so that being contained in final descriptive statement to the image The description content of affective state realizes and carries out the iamge description with emotion to image.
Corresponding to the method for foregoing embodiments, Fig. 6 shows the structure of iamge description device provided in an embodiment of the present invention Block diagram, for ease of description, only parts related to embodiments of the present invention are shown.The exemplary iamge description device of Fig. 6 can be with It is the executing subject for the Image Description Methods that previous embodiment one provides.
Referring to Fig. 6, which includes:
Keyword identification module 61, the entity for including in image for identification, and generate the corresponding keyword of the entity.
Emotion recognition module 62, for identification corresponding emotional category of described image and emotion grade.
Word analysis module 63, it is pre- for the emotional category, the emotion grade and the keyword to be input to If analysis model, the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, institute are generated State putting in order for adjective and the adverbial word.
Sentence generation module 64, for suitable based on the keyword, the adjective, the adverbial word and the arrangement Sequence generates the corresponding descriptive statement of described image.
Further, keyword identification module 61, comprising:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class.
The attribute value parsing of the preset attribute label is carried out to described image, and generates the obtained corresponding category of attribute value Property text.
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
Further, emotion recognition module 62, comprising:
The facial image for including in described image is extracted, and emotion recognition is carried out to the facial image, is obtained described Emotional category and the emotion grade.
Further, the iamge description device, further includes:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample graph As one-to-one emotion category identities, emotion class letter and pattern representation sentence.
Keyword extraction is carried out to the pattern representation sentence, the sample keyword for being included and the sample are crucial Word puts in order.
It is crucial based on the emotion category identities, the emotion class letter, the sample keyword and the sample Word puts in order, creation and the one-to-one sample vector data of the sample image.
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until described default The similarity for the descriptive statement and the pattern representation sentence to the sample image that analysis model generates is higher than preset threshold, Obtain the model parameter of the presupposition analysis model.
Further, sentence generation module 64, comprising:
Combination is ranked up to the keyword, the adjective and the adverbial word based on described put in order, is obtained Basic sentence.
Semantic analysis error correction is carried out to the basic sentence, obtains the descriptive statement.
Each module realizes the process of respective function in iamge description device provided in an embodiment of the present invention, before specifically referring to The description of embodiment illustrated in fig. 1 one is stated, details are not described herein again.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Although will also be appreciated that term " first ", " second " etc. are used in some embodiment of the present invention in the text Various elements are described, but these elements should not be limited by these terms.These terms are used only to an element It is distinguished with another element.For example, the first table can be named as the second table, and similarly, the second table can be by It is named as the first table, without departing from the range of various described embodiments.First table and the second table are all tables, but It is them is not same table.
Fig. 7 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in fig. 7, the terminal of the embodiment is set Standby 7 include: processor 70, memory 71, and the computer that can be run on the processor 70 is stored in the memory 71 Program 72.The processor 70 realizes the step in above-mentioned each Image Description Methods embodiment when executing the computer program 72 Suddenly, such as step 101 shown in FIG. 1 is to 104.Alternatively, the processor 70 realized when executing the computer program 72 it is above-mentioned The function of each module/unit in each Installation practice, such as the function of module 61 to 64 shown in Fig. 6.
The terminal device 7 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that Fig. 7 The only example of terminal device 7 does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input sending device, net Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk or interior of terminal device 7 It deposits.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as be equipped on the terminal device 7 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7 Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device Other programs and data.The memory 71, which can be also used for temporarily storing, have been sent or data to be sent.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code Dish, CD, computer storage, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), electric carrier signal, telecommunication signal and software distribution medium etc..
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the essence of corresponding technical solution is departed from the spirit and scope of the technical scheme of various embodiments of the present invention, it should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of Image Description Methods characterized by comprising
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, generate the key The arrangement of the corresponding adjective of word, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word Sequentially;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate the corresponding description of described image Sentence.
2. Image Description Methods as described in claim 1, which is characterized in that it is described to generate the corresponding keyword of the entity, Include:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class;
The attribute value parsing of the preset attribute label is carried out to described image, and generates the corresponding attribute text of obtained attribute value Word;
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
3. Image Description Methods as described in claim 1, which is characterized in that the corresponding emotional category of the identification described image And emotion grade, comprising:
The facial image for including in described image is extracted, and emotion recognition is carried out to the facial image, obtains the emotion Classification and the emotion grade.
4. Image Description Methods as described in claim 1, which is characterized in that it is described identification image in include entity it Before, further includes:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample image one One corresponding emotion category identities, emotion class letter and pattern representation sentence;
Keyword extraction, the sample keyword for being included and the sample keyword are carried out to the pattern representation sentence It puts in order;
Based on the emotion category identities, the emotion class letter, the sample keyword and the sample keyword It puts in order, creation and the one-to-one sample vector data of the sample image;
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until the presupposition analysis The similarity for the descriptive statement and the pattern representation sentence to the sample image that model generates is higher than preset threshold, obtains The model parameter of the presupposition analysis model.
5. Image Description Methods as described in claim 1, which is characterized in that it is described based on the keyword, the adjective, The adverbial word and it is described put in order, generate the corresponding descriptive statement of described image, comprising:
Combination is ranked up to the keyword, the adjective and the adverbial word based on described put in order, obtains basis Sentence;
Semantic analysis error correction is carried out to the basic sentence, obtains the descriptive statement.
6. a kind of terminal device, which is characterized in that the terminal device includes memory, processor, is stored on the memory There is the computer program that can be run on the processor, the processor realizes following step when executing the computer program It is rapid:
The entity for including in identification image, and generate the corresponding keyword of the entity;
Identify the corresponding emotional category of described image and emotion grade;
The emotional category, the emotion grade and the keyword are input to presupposition analysis model, generate the key The arrangement of the corresponding adjective of word, the corresponding adverbial word of the adjective and the keyword, the adjective and the adverbial word Sequentially;
Based on the keyword, the adjective, the adverbial word and it is described put in order, generate the corresponding description of described image Sentence.
7. terminal device as claimed in claim 6, which is characterized in that described to generate the corresponding keyword of the entity, comprising:
It identifies the entity class of the entity, and reads the corresponding preset attribute label of the entity class;
The attribute value parsing of the preset attribute label is carried out to described image, and generates the corresponding attribute text of obtained attribute value Word;
Entity name and the attribute text based on the entity, generate the corresponding keyword of the entity.
8. terminal device as claimed in claim 6, which is characterized in that the processor goes back reality when executing the computer program Existing following steps:
Default training dataset is obtained, the default training data is concentrated comprising multiple sample images and the sample image one One corresponding emotion category identities, emotion class letter and pattern representation sentence;
Keyword extraction, the sample keyword for being included and the sample keyword are carried out to the pattern representation sentence It puts in order;
Based on the emotion category identities, the emotion class letter, the sample keyword and the sample keyword It puts in order, creation and the one-to-one sample vector data of the sample image;
The presupposition analysis model is trained based on the obtained a plurality of sample vector data, until the presupposition analysis The similarity for the descriptive statement and the pattern representation sentence to the sample image that model generates is higher than preset threshold, obtains The model parameter of the presupposition analysis model.
9. a kind of iamge description device characterized by comprising
Keyword identification module, the entity for including in image for identification, and generate the corresponding keyword of the entity;
Emotion recognition module, for identification corresponding emotional category of described image and emotion grade;
Word analysis module, for the emotional category, the emotion grade and the keyword to be input to presupposition analysis Model generates the corresponding adjective of the keyword, the corresponding adverbial word of the adjective and the keyword, described describes Word and the adverbial word put in order;
Sentence generation module, for based on the keyword, the adjective, the adverbial word and it is described put in order, generate The corresponding descriptive statement of described image.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
CN201811343846.XA 2018-11-13 2018-11-13 A kind of Image Description Methods and terminal device Pending CN109657079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811343846.XA CN109657079A (en) 2018-11-13 2018-11-13 A kind of Image Description Methods and terminal device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811343846.XA CN109657079A (en) 2018-11-13 2018-11-13 A kind of Image Description Methods and terminal device

Publications (1)

Publication Number Publication Date
CN109657079A true CN109657079A (en) 2019-04-19

Family

ID=66110869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811343846.XA Pending CN109657079A (en) 2018-11-13 2018-11-13 A kind of Image Description Methods and terminal device

Country Status (1)

Country Link
CN (1) CN109657079A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275110A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Image description method and device, electronic equipment and storage medium
CN113536009A (en) * 2021-07-14 2021-10-22 Oppo广东移动通信有限公司 Data description method and device, computer readable medium and electronic device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN106503055A (en) * 2016-09-27 2017-03-15 天津大学 A kind of generation method from structured text to iamge description
CN107169409A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of emotion identification method and device
CN107679580A (en) * 2017-10-21 2018-02-09 桂林电子科技大学 A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
CN108241682A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Determine the method and device of text emotion
CN108268629A (en) * 2018-01-15 2018-07-10 北京市商汤科技开发有限公司 Image Description Methods and device, equipment, medium, program based on keyword
CN108764141A (en) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 A kind of scene of game describes method, apparatus, equipment and its storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766853A (en) * 2016-08-16 2018-03-06 阿里巴巴集团控股有限公司 A kind of generation, display methods and the electronic equipment of the text message of image
CN106446782A (en) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 Image identification method and device
CN106503055A (en) * 2016-09-27 2017-03-15 天津大学 A kind of generation method from structured text to iamge description
CN108241682A (en) * 2016-12-26 2018-07-03 北京国双科技有限公司 Determine the method and device of text emotion
CN107169409A (en) * 2017-03-31 2017-09-15 北京奇艺世纪科技有限公司 A kind of emotion identification method and device
CN107679580A (en) * 2017-10-21 2018-02-09 桂林电子科技大学 A kind of isomery shift image feeling polarities analysis method based on the potential association of multi-modal depth
CN108268629A (en) * 2018-01-15 2018-07-10 北京市商汤科技开发有限公司 Image Description Methods and device, equipment, medium, program based on keyword
CN108764141A (en) * 2018-05-25 2018-11-06 广州虎牙信息科技有限公司 A kind of scene of game describes method, apparatus, equipment and its storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111275110A (en) * 2020-01-20 2020-06-12 北京百度网讯科技有限公司 Image description method and device, electronic equipment and storage medium
CN113536009A (en) * 2021-07-14 2021-10-22 Oppo广东移动通信有限公司 Data description method and device, computer readable medium and electronic device

Similar Documents

Publication Publication Date Title
US10430689B2 (en) Training a classifier algorithm used for automatically generating tags to be applied to images
US20210271824A1 (en) Context Saliency-Based Deictic Parser for Natural Language Processing
JP7178995B2 (en) Architectures and processes for computer learning and understanding
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
CN106257440B (en) Semantic information generation method and semantic information generation device
Boyd-Graber et al. A topic model for word sense disambiguation
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
CN110427463A (en) Search statement response method, device and server and storage medium
CN105975458B (en) A kind of Chinese long sentence similarity calculating method based on fine granularity dependence
CN105975531B (en) Robot dialog control method and system based on dialogue knowledge base
CN109800414A (en) Faulty wording corrects recommended method and system
CN110309114B (en) Method and device for processing media information, storage medium and electronic device
CN109033085B (en) Chinese word segmentation system and Chinese text word segmentation method
CN113239169A (en) Artificial intelligence-based answer generation method, device, equipment and storage medium
CN108664465A (en) One kind automatically generating text method and relevant apparatus
CN110427478A (en) A kind of the question and answer searching method and system of knowledge based map
CN109840255A (en) Reply document creation method, device, equipment and storage medium
CN112101042A (en) Text emotion recognition method and device, terminal device and storage medium
CN110929532B (en) Data processing method, device, equipment and storage medium
CN109657079A (en) A kind of Image Description Methods and terminal device
US9286289B2 (en) Ordering a lexicon network for automatic disambiguation
CN107193941A (en) Story generation method and device based on picture content
CN115525740A (en) Method and device for generating dialogue response sentence, electronic equipment and storage medium
CN116757195B (en) Implicit emotion recognition method based on prompt learning
CN113743079A (en) Text similarity calculation method and device based on co-occurrence entity interaction graph

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination