CN113220834A

CN113220834A - Multimedia intelligent image matching method based on news content semantic analysis

Info

Publication number: CN113220834A
Application number: CN202110496811.5A
Authority: CN
Inventors: 朱迦榕; 马利庄; 杨太海
Original assignee: Shanghai Finance Union Financial Technology Co ltd
Current assignee: Shanghai Finance Union Financial Technology Co ltd
Priority date: 2021-05-07
Filing date: 2021-05-07
Publication date: 2021-08-06
Anticipated expiration: 2041-05-07
Also published as: CN113220834B

Abstract

The invention discloses a multimedia intelligent image matching method based on news content semantic analysis, which comprises the following steps: s1 extracting the titles of the existing matching news and classifying; s2, classifying the pictures of the matched news according to the classification result of S1, and obtaining a first vector alpha; s3 reads the title and content of the unpatterned news and obtains a second vector beta; s4, determining the matching degree according to alpha and beta, word frequency, description and the like, and selecting alternative pictures according to the matching degree; s5, constructing a background picture library and a style picture library, accessing the alternative pictures into a semantic segmentation network, and processing to obtain background updated pictures; s6 performs style migration and smoothing on the background updated picture. The method can realize automatic matching, scene change and style migration of the news matching, improve news relevance of intelligent matching, save labor, expand the picture library, ensure the legality of picture sources and solve the problem of monotonous matching of the picture library to a certain extent.

Description

Multimedia intelligent image matching method based on news content semantic analysis

Technical Field

The invention belongs to the field of combination of natural language processing, information retrieval and image generation technologies, and particularly relates to a multimedia intelligent image matching method based on news content semantic analysis.

Background

In recent years, the combination of artificial intelligence technology and new internet media is more and more compact, and the news production and push efficiency is obviously improved. For example, in the field of news matching, most of traditional news matching needs to be edited and manually selected in a gallery, which is time-consuming and labor-consuming; the semantic feature description can be carried out on the pictures and news contents by a deep learning method, and automatic picture matching is realized by a semantic feature matching method, so that the picture matching efficiency is greatly improved. However, with the improvement of the protection consciousness of the copyright of the picture in the whole society, the news matching not only needs to consider the fitness of the picture and the news content, but also needs to improve the legal safety of the copyright of the used picture, which firstly requires the legality of the source of the picture library material, and simultaneously needs to consider the harmony of the whole image and text and the unity of style after matching. The existing matching method mainly focuses on the consistency of picture content and news content, and the legal safety problem of picture sources cannot be well solved. In addition, matching through a deep learning method also has the problems of low matching degree between pictures and news contents, insufficient coordination of styles and the like.

Disclosure of Invention

The invention aims to provide a multimedia intelligent matching method based on news content semantic analysis, which adopts the existing matching news pictures as materials to automatically classify and describe the pictures, realizes the automatic matching, scene change and style migration of the news matching pictures, improves the news relevance of the intelligent matching pictures, carries out the automatic matching of the media news without changing the original matching criteria, saves manpower, expands a picture library, ensures the legality of picture sources and solves the problem of monotonous matching picture library to a certain extent.

In order to solve the technical problems, the invention specifically adopts the following technical scheme:

a multimedia intelligent matching method based on news content semantic analysis comprises the following steps:

s1: extracting the titles of the existing matching picture news, and automatically classifying the titles by using a classifier;

s2: classifying the pictures of the existing matching news according to the classification result of S1, generating a first text description of the pictures of the existing matching news by the corresponding title, and then vectorizing the first text description to obtain a first vector alpha;

s3: reading the title and the content of the unpatterned news, classifying the title of the unpatterned news according to the classification result of S1, generating a second text description of the unpatterned news according to the title or the content of the unpatterned news, and then vectorizing the second text description to obtain a second vector beta;

s4: determining the matching degree of the first character description and the second character description according to the first vector alpha and the second vector beta and the word frequency of the first character description and the words with the same title or content as the un-matched news, and selecting a plurality of pictures with matched news as alternative pictures according to the matching degree;

s5: establishing a background picture library and a style picture library, accessing the alternative pictures into a semantic segmentation network, and performing background switching based on semantic segmentation by using a mask outputting a semantic segmentation result to obtain a background updating picture;

s6: and performing style migration and smoothing processing on the background updating picture.

Preferably, the classification in S1 is divided into two categories, one category is a fixed pattern matching diagram, and the other category is a variable pattern matching diagram.

Preferably, in S2, the first text description of the picture of the existing matching news is generated as follows: taking the title of the matched image news as the initial description of the image of the matched image news, and taking the union set of the titles of the matched image news as the initial description of the image of the matched image news for the titles of the matched image news with the same image of the matched image news; and according to the picture classification result, performing content extraction on the initial description of the picture of the matched picture news to generate a first text description of the picture of the matched picture news.

Preferably, in the S3, a second text description of the unpatterned news is generated as follows: if the fixed pattern matching is carried out, using a news title as an initial description of the un-matched news; if the mode is variable, using a textrank algorithm to extract the abstract of the unpatterned news, and simultaneously taking a union set with the headlines of the unpatterned news as an initial description of the unpatterned news; and then extracting the content of the initial description of the unpatterned news to generate a second text description of the unpatterned news.

Preferably, in S2, the first vector α is obtained by vectorizing the first text description using a pre-trained bert model, the second vector β is obtained by vectorizing the second text description using a pre-trained bert model in S3, and the cosine distance between the first vector α and the second vector β is calculated:

and then calculating the matching times of the same words of the first character description and the second character description:

matches(a，b)＝match(a，b)/max_match

wherein a is a word extracted from the first text description, b is a word extracted from the second text description, matches (a, b) are the matching times of the same word in a, b, max_matchIs the maximum value among all matches (a, b);

then, calculating the matching degree of the pictures of the matched picture news and the unmatched picture news:

l＝matches(a，b)-k*dis(α,β)；

wherein k is a system preset hyper-parameter;

and selecting a plurality of pictures in a preset interval of the maximum value in the matching degrees of all the pictures of the matched news and the unpatterned news as alternative pictures.

Preferably, in S5, the picture in the background library and the candidate picture are scaled to a uniform size by bilinear interpolation, and then the part with fixed background semantics is switched.

Preferably, training an automatic encoder taking image reconstruction as a loss function, sending the characteristics extracted by the automatic encoder into a WCT operator for transformation, and then decoding and restoring the characteristics into corresponding RGB space pictures; and defining the similar style of the similar pixels by using a gram matrix, then defining a target loss function by combining style difference with the similarity matrix of the pixels, and obtaining the picture with uniform style by optimizing the minimum target loss.

Preferably, the fixed pattern matching chart aims at the special column type news with fixed matching chart logic; the variable pattern thumbnails are directed to matching news that is related to both the headline and the news content.

Preferably, the content extraction is performed on the initial description of the image of the matched image news according to the following method, so as to generate a first text description of the image of the matched image news: if the pattern is matched in a fixed mode, performing rule matching on keywords by using a candidate word bank to generate the first text description; if the matched pictures are in the variable mode, extracting keywords by using TF-IDF (Transflash-inverse discrete frequency), wherein the TF range is the initial description of the pictures of the current matched pictures news, and the IDF range is the set of the initial descriptions of all the pictures of the matched pictures news

The invention has the following beneficial effects that firstly, the existing matching news is used as a material, the titles of the news are automatically classified, then the pictures are classified, and the first character description of the pictures is generated, for a news provider, the news provider has a large amount of matched pictures news, and the pictures are often manually edited, so that the image-text uniformity is good, the copyright source compliance is met, the picture description can be generated according to the news titles, and a basis is provided for later adaptation. The second text description is generated for the news title and the content without matching pictures, the matching degree of the picture and the news without matching pictures is calculated by the vectorization and the word frequency times of the first text description and the second text description, the alternative picture can be matched and matched with the content of the news without matching pictures, the picture and the news content can be more coordinated and consistent by performing background transformation and style migration on the picture, the number of pictures with different styles can be increased, and a material library is expanded. The method can be used for automatically matching the news, greatly saves labor and has good matching effect.

Detailed Description

For the purpose of facilitating an understanding of the present invention, the following detailed description is given with reference to the accompanying drawings and examples.

The multimedia intelligent image matching method based on news content analysis comprises the following steps:

s1, automatically classifying the titles of the existing matching news, which comprises the following steps:

s1-1, inputting a certain number of titles of the existing matching image news as a data set;

and S1-2, manually labeling the titles, and classifying the titles into 2 types, namely a fixed mode matching picture and a variable mode matching picture.

S1-3, using a fasttext algorithm to train and test the manually marked title to generate a classifier model;

s1-4, classifying all the headline data sets of the existing matching chart news by using the trained classifier model, namely classifying the headline data sets into two types of fixed mode matching charts and variable mode matching charts;

s2, establishing an image classification library and a corresponding picture description, and specifically comprising the following steps:

s2-1, classifying the pictures of the existing matching image news according to the classification result of the existing matching image news headlines in S1, namely classifying the pictures into two classes of a fixed mode matching image and a variable mode matching image;

s2-2, taking the title of the existing matching picture news as the initial description of the corresponding picture (original matching picture), and taking the union of the titles as the initial description for the title with the same original matching picture;

and S2-3, extracting the content of the initial description according to the picture classification result of the existing matching news to generate a first character description (namely a final picture description) of the picture.

S3, analyzing the news content without matching pictures to generate a key content description, namely a second text description, and the specific steps are as follows:

s3-1, reading the content and title of the news article without the matching picture;

s3-2, classifying the undepicted news headlines, wherein if the images are matched in a fixed mode, the undepicted news headlines are used as initial descriptions of the undepicted news articles; if the variable mode map is matched, abstract extraction is carried out on the content of the non-matched map news article by using a textrank algorithm, and meanwhile, a union set of the abstract extraction and the title of the non-matched map news article is taken as an initial description of the non-matched map news article;

s3-3, extracting the content of the initial description of the unpatterned news article to generate a second text description (namely the final article description of the unpatterned news);

s4, matching the images through text matching search, which comprises the following steps:

s4-1, vectorizing a text description (namely a first character description) corresponding to a picture database of the existing matching news by using a pre-training bert model to obtain a first vector alpha (in the step, the first character description can be obtained at S2-3 and then vectorized);

and S4-2, vectorizing the second word description extracted in S3 by using a bert model pre-trained by the same corpus as that in S4-1 to obtain a second vector beta (generating the second vector beta can also be carried out after obtaining the second word description in S3-3). The cosine distance and the word frequency matching jointly define the matching response degree of the picture text description and the news content, and the cosine distance defines: as shown in the following formula, where α and β represent a text vector of a news keyword and a text description vector of a picture, respectively, which are vectorized by the bert model.

Word frequency matching definition: a and b are respectively sentences described by the picture texts and words extracted from news contents, and match is defined as the number of times of matching of words in the a and b. max_matchDefined as the maximum value of match in all data.

matches(a,b)＝match(a，b)/max_match

The matching degree of a graph and news content can be defined as follows: where k is a hyperparameter to balance the contributions of the two different distance definitions, the system may preset k, e.g. to 0.7.

l＝matches(a，b)-k*dis(α，β)

By calculating the matching degree of each picture in the picture data set with the news content, the largest 10 pictures can be selected as a candidate.

S5, switching picture backgrounds, which comprises the following steps:

s5-1, a background picture library and a style picture library are constructed. And constructing a background picture and a style picture library according to the requirement and the news content.

And S5-2, accessing the alternative picture obtained in the step S4 into a semantic segmentation network, and performing background switching based on semantic segmentation by using a mask for outputting a semantic segmentation result. Scaling the background picture and the alternative picture to a uniform size in a bilinear interpolation mode, and then switching the part with fixed background semantics. For example, a background switching of the sky may be made for outdoor scenes. That is, the picture portion of the sky in the mask is switched to the pixels corresponding to the background picture of the sky.

S6, transferring the style of the picture, which comprises the following steps:

s6-1 style migration: the method comprises the steps of training an automatic encoder with image reconstruction as a loss function, sending features extracted by the encoder into a WCT operator for transformation, and then decoding and restoring the features into corresponding RGB space pictures. And obtaining the picture with small style difference by optimizing the image reconstruction loss function to be smaller.

S6-2 image smoothing: the picture with the uniform style can be defined as that similar pixels in a local area have the similar style, a gram matrix is used for defining the similar style of the similar pixels, then a target loss function can be defined by combining style difference with the similarity matrix of the pixels, and the picture which is consistent with the uniform style is obtained by optimizing the minimum target loss. We define where y_iRepresenting the pixel output of the previous step in the optimization process, r_iRepresenting the pixels in the map R resulting from this step in the optimization process. The definitions of W and d are as follows.

W＝{w_ij}∈R^n·n

The invention selects pictures from the existing matching news as the matching materials, solves the problem of picture copyright, further realizes automatic matching, increases the diversity of the pictures to a certain extent through background conversion and style migration, and expands the picture set. The method can be applied to the scene of media news matching demand, saves labor and realizes an objective evaluation matching mode.

The above embodiments are merely illustrative of the concept and implementation of the present invention, and are not restrictive, and technical solutions that are not substantially changed under the concept of the present invention are still within the scope of protection.

Claims

1. A multimedia intelligent matching method based on news content semantic analysis is characterized by comprising the following steps:

2. The method as claimed in claim 1, wherein the classification in S1 is divided into two categories, one category is fixed pattern matching and the other category is variable pattern matching.

3. The method for multimedia intelligent matching of images based on semantic analysis of news contents as claimed in claim 1, wherein in S2, the first text description of the image of the existing matching news is generated as follows: taking the title of the matched image news as the initial description of the image of the matched image news, and taking the union set of the titles of the matched image news as the initial description of the image of the matched image news for the titles of the matched image news with the same image of the matched image news; and according to the picture classification result, performing content extraction on the initial description of the picture of the matched picture news to generate a first text description of the picture of the matched picture news.

4. The method for multimedia intelligent matching of images based on semantic analysis of news contents according to claim 2, wherein in the step S3, a second text description of unpatterned news is generated as follows: if the fixed pattern matching is carried out, using a news title as an initial description of the un-matched news; if the mode is variable, using a textrank algorithm to extract the abstract of the unpatterned news, and simultaneously taking a union set with the headlines of the unpatterned news as an initial description of the unpatterned news; and then extracting the content of the initial description of the unpatterned news to generate a second text description of the unpatterned news.

5. The method as claimed in claim 1, wherein in S2, the first vector α is obtained by vectorizing the first text description using a pre-trained bert model, in S3, the second vector β is obtained by vectorizing the second text description using a pre-trained bert model, and cosine distances of the first vector α and the second vector β are calculated:

matches(a，b)＝match(a，b)/max_match

l＝matches(a，b)-k*dis(α，β)；

wherein k is a system preset hyper-parameter;

6. The method as claimed in claim 1, wherein in S5, the pictures in the background library and the candidate pictures are scaled to a uniform size by bilinear interpolation, and then the parts with fixed background semantics are switched.

7. The intelligent multimedia mapping method based on semantic analysis of news contents as claimed in claim 1, wherein the style migration and smoothing process is performed as follows:

training an automatic encoder taking image reconstruction as a loss function, sending the characteristics extracted by the automatic encoder into a WCT operator for transformation, and then decoding and restoring the characteristics into corresponding RGB space pictures; and defining the similar style of the similar pixels by using a gram matrix, then defining a target loss function by combining style difference with the similarity matrix of the pixels, and obtaining the picture with uniform style by optimizing the minimum target loss.

8. The multimedia intelligent matching method based on news content semantic analysis as claimed in claim 2, wherein the fixed pattern matching is for special column type news with fixed matching logic; the variable pattern thumbnails are directed to matching news that is related to both the headline and the news content.

9. The multimedia intelligent matching method based on news content semantic analysis as claimed in claim 2, wherein content extraction is performed on the initial description of the picture of the matched picture news according to the following method to generate a first text description of the picture of the matched picture news: if the pattern is matched in a fixed mode, performing rule matching on keywords by using a candidate word bank to generate the first text description; and if the image is matched in the variable mode, extracting keywords by using TF-IDF, wherein the TF range is the initial description of the image of the current matched image news, and the IDF range is the set of the initial descriptions of all the images of the matched image news.