CN108985244B

CN108985244B - Television program type identification method and device

Info

Publication number: CN108985244B
Application number: CN201810821306.1A
Authority: CN
Inventors: 王月岭; 黄利
Original assignee: Hisense Co Ltd
Current assignee: Hisense Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2021-10-15
Anticipated expiration: 2038-07-24
Also published as: CN108985244A

Abstract

The application provides a television program type identification method and device, which are characterized in that continuous N frames of video images in a current television program are obtained; inputting the continuous N frames of video images into a pre-trained convolutional neural network to obtain a program type corresponding to each frame of video image in the output continuous N frames of video images; and then, counting the program type corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program type of the current television program. The invention can avoid misjudgment caused by identification error of a few video images, thereby improving the identification accuracy of the television program.

Description

Television program type identification method and device

Technical Field

The present application relates to the field of television technologies, and in particular, to a method and an apparatus for identifying a television program type.

Background

In the existing television program type identification algorithm, most of the existing algorithms identify the television programs according to the time periods and the identification information of the programs, however, the identification has limitations, and if there is no time period or identification information in a picture, the identification effect is greatly reduced.

Another common recognition algorithm is a deep learning algorithm, which is most prominently called convolutional neural network algorithm. For the identification of the television program type, because the identification rate of some pictures is often low in the video broadcasting process, the television program type is similar to a certain program and is also similar to another program. Therefore, when the obtained sample is not large enough, the recognition accuracy is not high.

Disclosure of Invention

In view of this, in order to solve the problem of low accuracy in the existing program type identification, the present invention provides a method and an apparatus for identifying a television program type, which perform program type identification on input multiple frames of video images by using a convolutional neural network, and perform combined judgment on the program types corresponding to the identified multiple frames of video images to finally determine the program types corresponding to the television programs, thereby improving the identification accuracy of the television programs.

Specifically, the method is realized through the following technical scheme:

according to a first aspect of embodiments of the present application, there is provided a method for identifying a television program type, the method including:

acquiring continuous N frames of video images in a current television program;

inputting the continuous N frames of video images into a pre-trained convolutional neural network, and acquiring a program type corresponding to each frame of video image in the N frames of video images output by the convolutional neural network;

and counting the program type corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program type of the current television program.

As an embodiment, the training method of the convolutional neural network includes:

dividing a television program into a plurality of program types;

acquiring a video sample corresponding to each program type;

extracting image characteristic data of each frame of image in the video sample as training data;

and inputting the training data into a convolutional neural network for training to obtain a convolutional neural network model.

As an embodiment, the step of counting the program types corresponding to each frame of video images in the consecutive N frames of video images according to a preset policy to obtain the program types of the current television program includes:

respectively counting the number of the program types with the same program type in the continuous N frames of video images output by the convolutional neural network; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types;

respectively counting the number of the prediction types with the same prediction type from a plurality of continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

As an embodiment, the method further comprises:

if N in the continuous N frames of video images is larger than or equal to a first threshold value, and the program type corresponding to the current television program is not determined, acquiring the next frame of video image;

and if the program type corresponding to the current television program is not determined when N in the continuous N frames of video images is larger than or equal to a second threshold value, stopping acquiring the video images, and after a first time interval, starting to acquire the video images.

As an embodiment, the method further comprises:

and if the program type of the current television program is the same as the program type of the television program acquired next time, stopping acquiring the video image, and after a second time interval, starting to acquire the video image.

According to a second aspect of the embodiments of the present application, there is provided a television program type identification apparatus, the apparatus including:

the acquisition unit is used for acquiring continuous N frames of video images in the current television program;

the input unit is used for inputting the continuous N frames of video images into a pre-trained convolutional neural network and acquiring the program type corresponding to each frame of video image in the continuous N frames of video images output by the convolutional neural network;

and the determining unit is used for determining the program type of the current television program according to the program type corresponding to each frame of video image in the continuous N frames of video images.

As an embodiment, the apparatus further comprises:

the training unit is used for dividing the television program into a plurality of program types; acquiring a video sample corresponding to each program type; extracting image characteristic data of each frame of image in the video sample as training data; and inputting the training data into a convolutional neural network for training to obtain a convolutional neural network model.

As an embodiment, the determining unit is specifically configured to count the number of program types each having the same program type in the consecutive N frames of video images output from the convolutional neural network; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types; respectively counting the number of the prediction types with the same prediction type from a plurality of continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

As an embodiment, the apparatus further comprises:

the first stopping unit is used for acquiring a next frame of video image if the program type corresponding to the current television program is not determined when N in the continuous N frames of video images is larger than or equal to a first threshold value; and if the program type corresponding to the current television program is not determined when N in the continuous N frames of video images is larger than or equal to a second threshold value, stopping acquiring the video images, and after a first time interval, starting to acquire the video images.

As an embodiment, the apparatus further comprises:

and the second stopping unit is used for stopping acquiring the video image if the program type of the current television program is the same as the program type of the television program acquired next time, and then starting acquiring the video image after waiting for a second time interval.

As can be seen from the above embodiments, the present application can obtain consecutive N frames of video images in the current television program; inputting the continuous N frames of video images into a pre-trained convolutional neural network to obtain a program type corresponding to each frame of video image in the output continuous N frames of video images; and then, counting the program type corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program type of the current television program. Compared with the prior art, the method can utilize the pre-trained convolutional neural network to identify the program types of the input multi-frame video images, preliminarily predict the identification results of the program types, perform statistical judgment on the program types corresponding to the identified multi-frame video images according to a multi-frame strategy, and finally determine the program types corresponding to the television programs.

Drawings

Fig. 1 is a flowchart illustrating an exemplary method for identifying tv program types according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an exemplary training scenario of the present application;

FIG. 3-1 is a schematic diagram of an exemplary first image feature extraction of the present application;

3-2 are schematic diagrams of exemplary second image feature extraction of the present application;

3-3 are schematic diagrams of exemplary third image feature extraction of the present application;

FIG. 4 is a diagram illustrating an exemplary multi-frame strategy of the present application;

FIG. 5 is a block diagram of an embodiment of a television program type identification apparatus of the present application;

FIG. 6 is a block diagram of one embodiment of a computer device of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

For the identification of the television program type, because the identification rate of some pictures is often low in the video broadcasting process, the television program type is similar to a certain program and is also similar to another program. For example: in sports news, because the program is a news program, but a lot of sports pictures are interspersed in the program, the classifier can distinguish the program which belongs to the news as sports, and errors occur; in a game program such as a scenario, when the scenario develops, the picture is very similar to a cartoon program, so that it is difficult for the classifier to judge whether the picture is a game or a cartoon. At present, the television program is usually identified by matching according to a program sample to judge the program type, but because the television program has a plurality of pictures, the structure is quite complex, and the updating speed is very high. All possible pictures cannot be listed for each type of television programs, so program samples cannot be made to face, and the program types are often misjudged, so that the efficiency of identifying the program types is low, and the effect of adjusting the pictures according to the program types is influenced.

In order to solve the above problems, the present application provides a method for identifying a television program type, which can identify a television program type by acquiring consecutive N frames of video images in a current television program; inputting the continuous N frames of video images into a pre-trained convolutional neural network to obtain a program type corresponding to each frame of video image in the output N frames of video images; and then, counting the program type corresponding to each frame of video image in the N frames of video images according to a preset strategy to obtain the program type of the current television program. Compared with the prior art, the method can utilize the pre-trained convolutional neural network to identify the program types of the input multi-frame video images, preliminarily predict the identification results of the program types, perform statistical judgment on the program types corresponding to the identified multi-frame video images according to a multi-frame strategy, and finally determine the program types corresponding to the television programs.

As follows, the following embodiments are shown to explain the television program type identification method provided by the present application.

The first embodiment,

Referring to fig. 1, a flowchart of an exemplary embodiment of a television program type identification method according to the present application is shown, where the method includes the following steps:

step 101, acquiring continuous N frames of video images in a current television program;

in this embodiment, the television program type identification method may be used for a television set or a set-top box or a computer device. When a television program is played, N consecutive frames of video images can be obtained from the current television program, where N is a positive integer greater than or equal to 2.

102, inputting the continuous N frames of video images into a pre-trained convolutional neural network, and acquiring a program type corresponding to each frame of video image in the continuous N frames of video images output by the convolutional neural network;

in this embodiment, the acquired continuous N frames of video images may be input to a pre-trained convolutional neural network, and according to a training result of the convolutional neural network, a program type may be determined for each frame of video image, and then a program type corresponding to each frame of video image in the continuous N frames of video images output by the convolutional neural network is acquired.

The training process for the convolutional neural network is specifically illustrated by the following examples.

Example II,

Please refer to fig. 2, which is a schematic diagram of an exemplary training scheme of the present application, wherein the training process specifically includes:

step 201, dividing a plurality of program types for a television program;

in this embodiment, the present application mainly divides the tv program types into the following 7 types, such as sports, news, games, animations, fantasy, movies, and other program types, and the present application can perform targeted identification for several main program categories in the tv programs.

Step 202, obtaining a video sample corresponding to each program type;

in this embodiment, through targeted learning of the salient features of the video images in the program, a typical video image in each program type can be obtained, and some video images which are determined to be disputed according to experience but often have uniform types can be obtained, for example, in a news image, image contents of other programs such as any sports, games, entertainment, movies and the like may appear, if it is determined that there is an obvious news feature, the type of the program can be considered as news, and it is seen that the video image of the type has a large dispute, but often belongs to a certain type of program. By obtaining the video sample corresponding to each program type, a more comprehensive training sample can be provided in the process of training the convolutional neural network, so that the identification accuracy of the convolutional neural network is improved.

Step 203, extracting image characteristic data of each frame of image in the video sample as training data;

in this embodiment, the image feature data of each frame of image in the video sample may be further extracted as training data, so as to obtain training data. Image characteristic data is typically image content that represents a relatively salient type of program.

For example, as shown in fig. 3-1, 3-2, and 3-3, in which the program type of fig. 3-1 is a movie, since the image features of the movie are black edges above and below the scene of the movie, the black edges above and below the image (the areas in the black line box in fig. 3-1) can be extracted as image feature data; the program type in fig. 3-2 is news, and since the image features of the news are the station caption and the lower caption at the upper left corner in the news scene, the station caption and the caption (the area in the black line frame in fig. 3-2) can be extracted as image feature data; where the program type of fig. 3-3 is sports, since the image feature of sports is green grass in the scene, grass (the area in the black box in fig. 3-3) can be extracted as image feature data. By extracting the image features in the typical scene as training data, the convolutional neural network can recognize the typical scene when judging the program type, so that the purpose of type recognition is achieved.

And step 204, inputting the training data into a convolutional neural network for training to obtain a convolutional neural network model.

And inputting the training data into a convolutional neural network for targeted processing and parameter training of a relevant learning algorithm to obtain a trained convolutional neural network model. Therefore, whether scenes played in a television have special attributes in each scene can be detected by using a trained model, and prejudgment on the types of scene programs with preset typical characteristics is completed under the guarantee of some prediction results with high reliability. In this embodiment, the structure of the convolutional neural network may include 5 convolutional layers and 3 fully-connected layers, or may be other combined structures, which is not limited in this application.

The description of the second embodiment is completed so far.

And 103, counting the program types corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program types of the current television programs.

In this embodiment, the program type corresponding to each frame of video image in consecutive N frames of video images output by the convolutional neural network is obtained, and then the program type corresponding to each frame of video image in the consecutive N frames of video images is counted according to a preset policy to obtain the final program type of the current television program.

The specific type determination method is specifically described by the following examples.

Example III,

In this embodiment, the used prediction policy is a multi-frame policy, specifically, the multi-frame policy includes two layers, and the first-layer multi-frame policy is used to count the number of program types having the same program type in the consecutive N-frame video images output by the convolutional neural network after the convolutional neural network outputs the program type corresponding to each frame of video image in the consecutive N-frame video images; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types; the second-layer multi-frame strategy is used for respectively counting the number of the prediction types with the same prediction type from a plurality of continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

For example, please refer to fig. 4, which is an exemplary multi-frame strategy diagram of the present application, wherein it is assumed that an output of a program type corresponding to each frame in consecutive N frames of video images is obtained according to a trained convolutional neural network model, in this embodiment, N may be selected as 10, where an interval of each frame is 1 second, first, a program type corresponding to consecutive 10 frames of video images, that is, a program type of 10 frames of video images numbered 1-10 in fig. 4, is obtained, and a number of program types in the program types corresponding to the 10 frames of video images, where each program type has the same program type, is counted, if a number of one of the program types is the largest and is greater than or equal to 9, the program type is output as a first prediction type, that is, prediction type —, otherwise, the program type is not output. Then obtaining the program type corresponding to the next frame of video image, namely the program type corresponding to the video image with the number of 11, forming a new prediction group by the 11 th frame and the previous 9 frames thereof, wherein the group still contains the program type corresponding to the 10 th frame of video image (namely the program type corresponding to the video image with the number of 2-11 th frames), then predicting the program type of the new prediction group according to the method to obtain a second prediction type, namely the prediction type II, repeating the operation from the 11 th frame to the 14 th frame by the analogy, obtaining 5 continuous prediction types for counting, obtaining the number of the same prediction type of each prediction type, if the number of one prediction type is the maximum and is more than or equal to 4, outputting the prediction type as the program type of the television program, otherwise, not outputting the prediction type.

In the prior art, the multi-frame strategy is not used, and in order to ensure the stability of image quality display, the probability that the image quality parameter does not change in the 10-frame process is p10, where p is the probability of correct identification and has a value less than 1.

If the multi-frame strategy is used, the probability that the image quality parameter is stable is p10+10 p9 (1-p) p9 (10-9 p). The first layer multiframe strategy can improve stability due to p9 × (10-9p) > p 10.

Similarly, the probability of using the second multi-frame strategy is p14 (5-4p1) > p15 with p9 (10-9p) being p1, and thus the probability of stabilizing the image quality parameters is high.

For the improvement of the accuracy, the identification accuracy p of the single-frame result is 93%, and the accuracy of the multi-frame strategy is 99%.

The first preset number of frames can be selected to be 9 frames in the first layer, and the second preset number of frames can be selected to be 4 frames in the second layer, so that the accuracy and the stability can be better.

Because the scene of the television program picture is very complex and has the characteristic of long-time continuity, in order to improve the identification accuracy and avoid frequent switching, the identification result is combined by using the multi-frame strategy, so that the identification accuracy is improved.

The description of the third embodiment is completed so far.

As an embodiment, if N in the consecutive N frames of video images is greater than or equal to a first threshold value, and the program type corresponding to the current television program is not determined, obtaining a next frame of video image; and if the program type corresponding to the current television program is not determined when N in the continuous N frames of video images is larger than or equal to a second threshold value, stopping acquiring the video images, and after a first time interval, starting to acquire the video images.

For example, if the first threshold is 15 frames and the second threshold is 30 frames, the above scheme is:

when 15 frames of video images are identified but no output result is obtained, acquiring the next frame (16 th frame) of video images based on the 15 frames of video images for identification, acquiring the program type corresponding to the 16 th frame of video images, forming a new prediction group with the first 9 frames (namely 7 th to 15 th frames) of video images, wherein the group still contains the program type corresponding to the 10 frames of video images (namely the program type corresponding to the numbered 7 th to 16 th frames of video images), then performing program type prediction on the new prediction group according to the method, and repeating the operation from the 16 th frame to the 30 th frame by analogy. If there is no recognition result in the 30 th frame, it is considered that the recognition is abnormal, and since the television program type is not clear now, the video image is acquired after the first time interval (for example, 30) s and recognized until the result can be recognized.

As an embodiment, if the program type of the current television program is the same as the program type of the television program obtained last time, stopping obtaining the video image, and after a second time interval, starting obtaining the video image; and if the currently acquired program type is the same as the program types of the last two times, stopping acquiring the video image, and after waiting for a third time interval, starting to acquire the video image. For example, if the type of the television program can be identified by using the multi-frame strategy, the identification is stopped, the video image is acquired again for identification after the second time interval (for example, 30s), if the second identification result is consistent with the last identification result, the interval is extended to 1min, and so on, the interval time is gradually extended. If the results of the two recognitions are inconsistent in the recognition process, the interval time is gradually prolonged from 30s again.

Because the calculation amount of the convolutional neural network is large when the television program type is identified, if the identification is carried out all the time, the television memory occupies a large amount, and therefore the continuous occupation of the television memory can be reduced through the mechanism of the identification interval.

At present, the adjustment of the image quality parameters of the tv programs is mainly based on the station caption and EPG (electronic program guide) information, which cannot guarantee a high accuracy, but for some live programs, the standard image quality is used all the time or the image quality parameters are adjusted by the user himself, so the error rate is high or manual operation is required.

According to the method and the device, after the television program type is determined, the image quality can be automatically adjusted according to the program type, so that the stability of image display can be improved, image quality parameters cannot be frequently changed due to occasional recognition errors, and the visual experience of audiences is improved.

Therefore, the method and the device can acquire continuous N frames of video images in the current television program; inputting the continuous N frames of video images into a pre-trained convolutional neural network to obtain a program type corresponding to each frame of video image in the output continuous N frames of video images; and then, counting the program type corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program type of the current television program. Compared with the prior art, the method can utilize the pre-trained convolutional neural network to identify the program types of the input multi-frame video images, preliminarily predict the identification results of the program types, perform statistical judgment on the program types corresponding to the identified multi-frame video images according to a multi-frame strategy, and finally determine the program types corresponding to the television programs.

Corresponding to the embodiment of the image processing method, the application also provides an embodiment of the image processing device.

Referring to fig. 5, which is a block diagram of an embodiment of a television program type identification apparatus of the present application, the apparatus 50 may include:

an obtaining unit 51, configured to obtain consecutive N frames of video images in a current television program;

an input unit 52, configured to input the consecutive N frames of video images into a pre-trained convolutional neural network, and obtain a program type corresponding to each frame of video image in the consecutive N frames of video images output by the convolutional neural network;

and the determining unit 53 is configured to count the program type corresponding to each frame of video image in the consecutive N frames of video images according to a preset policy to obtain the program type of the current television program.

As an embodiment, the apparatus further comprises:

a training unit 54 for classifying a plurality of program types for a television program; acquiring a video sample corresponding to each program type; extracting image characteristic data of each frame of image in the video sample as training data; and inputting the training data into a convolutional neural network for training to obtain a convolutional neural network model.

As an embodiment, the determining unit 53 is specifically configured to count the number of program types having the same program type in the consecutive N frames of video images output from the convolutional neural network; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types; respectively counting the number of the prediction types with the same prediction type from a plurality of continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

As an embodiment, the apparatus further comprises:

a first stopping unit 55, configured to, if N in the consecutive N frames of video images is greater than or equal to a first threshold value, not determine a program type corresponding to the current television program, obtain a next frame of video image; and if the program type corresponding to the current television program is not determined when N in the continuous N frames of video images is larger than or equal to a second threshold value, stopping acquiring the video images, and after a first time interval, starting to acquire the video images.

As an embodiment, the apparatus further comprises:

a second stopping unit 56, configured to stop acquiring the video image if the program type of the current television program acquired by the acquiring unit is the same as the program type of the television program acquired last time, and start acquiring the video image after a second time interval; and if the currently acquired program type is the same as the program types of the last two times, stopping acquiring the video image, and after waiting for a third time interval, starting to acquire the video image.

In summary, the present application can obtain continuous N frames of video images in the current television program; inputting the continuous N frames of video images into a pre-trained convolutional neural network to obtain a program type corresponding to each frame of video image in the output continuous N frames of video images; and then, counting the program type corresponding to each frame of video image in the continuous N frames of video images according to a preset strategy to obtain the program type of the current television program. Compared with the prior art, the method can utilize the pre-trained convolutional neural network to identify the program types of the input multi-frame video images, preliminarily predict the identification results of the program types, perform statistical judgment on the program types corresponding to the identified multi-frame video images according to a multi-frame strategy, and finally determine the program types corresponding to the television programs.

The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Corresponding to the embodiments of the image processing method, the application also provides embodiments of a computer device for executing the image processing method.

Referring to fig. 6, a computer device includes a processor 61, a communication interface 62, a memory 63, and a communication bus 64, as one embodiment;

the processor 61, the communication interface 62 and the memory 63 are in communication with each other through the communication bus 64;

the memory 63 is used for storing computer programs;

the processor 61 is configured to execute the computer program stored in the memory 63, and when the processor 61 executes the computer program, any step of the above television program type identification method is implemented.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the computer device, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to part of the description of the method embodiment.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the scope of protection of the present application.

Claims

1. A method for identifying a type of a television program, the method comprising:

acquiring continuous N frames of video images in a current television program;

inputting the continuous N frames of video images into a pre-trained convolutional neural network, and acquiring a program type corresponding to each frame of video image in the continuous N frames of video images output by the convolutional neural network;

counting the number of program types which are the same as the program types in each program type in the program types corresponding to each frame of video images in the continuous N frames of video images output from the convolutional neural network; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types;

counting the number of prediction types having the same prediction type as the prediction type for each prediction type from a plurality of output continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

2. The method of claim 1, wherein the method of training the convolutional neural network comprises:

dividing a television program into a plurality of program types;

acquiring a video sample corresponding to each program type;

3. The method of claim 1, further comprising:

4. The method of claim 1, further comprising:

if the program type of the current television program is the same as the program type of the television program obtained last time, stopping obtaining the video image, and after a second time interval, starting obtaining the video image; and if the currently acquired program type is the same as the program types of the last two times, stopping acquiring the video image, and after waiting for a third time interval, starting to acquire the video image.

5. An apparatus for identifying a type of a television program, the apparatus comprising:

the determining unit is used for counting the number of program types which are the same as the program types in each program type in the program types corresponding to each frame of video images in the continuous N frames of video images output from the convolutional neural network; if the number of the same program types is the maximum and is more than or equal to a first preset number, outputting the program types as prediction types; counting the number of prediction types having the same prediction type as the prediction type for each prediction type from a plurality of output continuous prediction types; and if the number of the same prediction types is the largest and is more than or equal to a second preset number, taking the prediction types as the program types of the current television program.

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 5, further comprising:

8. The apparatus of claim 5, further comprising:

the second stopping unit is used for stopping acquiring the video image if the program type of the current television program is the same as the program type of the television program acquired last time, and then starting acquiring the video image after a second time interval; and if the currently acquired program type is the same as the program types of the last two times, stopping acquiring the video image, and after waiting for a third time interval, starting to acquire the video image.