CN112597765A

CN112597765A - Automatic movie and television topic generation method based on multi-mode features

Info

Publication number: CN112597765A
Application number: CN202011565739.9A
Authority: CN
Inventors: 吴上波
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-02

Abstract

The invention relates to the field of movie and television topics, in particular to an automatic movie and television topic generation method based on multi-mode features.

Description

Automatic movie and television topic generation method based on multi-mode features

Technical Field

The invention relates to the field of film and television topics, in particular to a film and television topic automatic generation method based on multi-mode characteristics.

Background

The method for automatically generating the film and television topics generally adopts a traditional machine learning algorithm, and unsupervised clustering is performed by utilizing the text characteristics of the film and television, such as directors, actors, subject matters, ages, regions and the like to generate the film and television topics.

Disclosure of Invention

The technical problems solved by the invention are as follows: the method for automatically generating the film and television special topic based on the multi-mode features solves the problems of non-uniform visual style of film and television posters in the film and television special topic and low film and television association degree.

The invention adopts the technical scheme for solving the technical problems that: the method for automatically generating the film and television special topic based on the multi-modal characteristics comprises the following steps:

s01, preprocessing poster pictures of the movie and television into a uniform size, and extracting graphic feature vectors of the movie and television posters by utilizing the representation learning capacity of a convolutional neural network;

s02, performing word segmentation and extraction on the text information of the movie to construct a keyword lexicon, calculating the inverse document frequency IDF of each keyword, performing one-hot coding on the keywords in the movie information to generate a text vector of the movie, and weighting the text vector by tf-IDF to obtain a weighted text feature vector;

s03, integrating user behaviors into segmented corpus sets, integrating corpus sets of all users to generate word2vec word vector model training samples, establishing a word vector model, training the word vector model by using a skip-gram, and generating a word feature vector of each movie;

s04, splicing the graphic feature vectors, the text feature vectors and the word feature vectors to generate movie and television multi-mode vectors, and performing cluster analysis by using an unsupervised clustering algorithm;

and S05, acquiring the film and television special topic with the multi-modal characteristics.

Further, in step S02, the text information of the movie includes a name, a brief introduction, a subject, a comment, a year, a region, a director, and an actor.

Further, in step S03, the user behavior includes a continuous movie browsing behavior, a movie searching behavior, and a watching behavior.

Further, in step S04, the unsupervised clustering algorithm is a K-Means algorithm.

The invention has the beneficial effects that: the method for automatically generating the film and television special topics based on the multi-mode features utilizes the convolutional neural network to learn the film and television posters, achieves the purpose of unifying visual styles, combines the graphic feature vectors, the text feature vectors and the word feature vectors to generate the film and television special topics through the unsupervised clustering algorithm, improves the association degree of the film and television special topics, and enhances the experience of a user when browsing the content of the film and television special topics.

Drawings

FIG. 1 is a flow chart of the method for automatically generating film and television titles based on multi-modal features.

FIG. 2 is a neural parameter diagram of a convolutional neural network of the automatic movie topic generation method based on multi-modal features.

Detailed Description

The invention provides a method for automatically generating film and television topics based on multi-modal characteristics, which utilizes a convolutional neural network to learn film and television posters to achieve the purpose of unifying visual styles, combines a graphic feature vector, a text feature vector and a word feature vector to generate the film and television topics through an unsupervised clustering algorithm to improve the association degree of the film and television topics, and comprises the following steps:

In particular, as shown in figure 1.

Movie graphics characteristic vector: the method comprises the steps of firstly downloading an original movie poster from an open source website, preprocessing the movie poster into a square size with the size of 224 x 224 to obtain a standard poster, then importing the standard poster into a convolutional neural network, wherein the convolutional neural network comprises 13 convolutional layers and 3 fully-connected layers, and convolutional layer network neuron parameters are shown in figure 2 to obtain a movie graphic characteristic vector.

Movie text characteristic vector: the movie text information comprises a name, a brief introduction, a subject, a comment, a year, a region, a director, an actor and the like, the name, the brief introduction and the comment text information are subjected to word segmentation to extract keywords, a complete movie text keyword dictionary base is generated by combining the subject, the region, the director and the actor, the anti-document frequency idf of each word in the dictionary base is calculated, then an 0/1 vector of the movie is generated by using one-hot coding through the keywords in the movie text information, and a weighted text characteristic vector is generated by combining tf-idf weights of the keywords in the movie text information to obtain a movie text characteristic vector.

Movie word feature vector: the user behaviors comprise user browsing, user searching, user watching and the like, firstly, the behaviors of a user in a continuous conversation period are constructed into a continuous array, movie objects of each behavior are represented by unique IDs of the behaviors, a section of text corpora separated by blank spaces is generated, users with the IDs less than 10 in the corpora are filtered, the text corpora of the user behaviors meeting conditions are integrated to generate word2vec word vector model training samples, a word vector model is established, and word vector models are trained by utilizing skip-grams to generate word feature vectors of each movie.

And splicing the image feature vectors, the text feature vectors and the word feature vectors to generate multi-mode movie vectors, and performing clustering analysis by using a K-Means algorithm to obtain movie titles with multi-mode features.

Claims

1. The method for automatically generating the film and television special topic based on the multi-modal characteristics is characterized by comprising the following steps of:

2. The method of claim 1, wherein in step S02, the text information of the movie comprises name, introduction, subject, comment, year, region, director and actor.

3. The method for automatically generating a movie theme based on multi-modal features according to claim 1, wherein in step S03, the user behavior comprises a continuous movie browsing behavior, a movie searching behavior and a watching behavior.

4. The method for automatically generating a movie theme based on multi-modal features according to claim 1, wherein in step S04, the unsupervised clustering algorithm is a K-Means algorithm.