CN111797820A

CN111797820A - Video data processing method and device, electronic equipment and storage medium

Info

Publication number: CN111797820A
Application number: CN202010938303.3A
Authority: CN
Inventors: 王志慧; 李晓宇; 李明; 张月鹏; 姜秋宇; 裴广超
Original assignee: Beijing Ultrapower Intelligent Data Technology Co ltd
Current assignee: Beijing Ultrapower Intelligent Data Technology Co ltd
Priority date: 2020-09-09
Filing date: 2020-09-09
Publication date: 2020-10-20
Anticipated expiration: 2040-09-09
Also published as: CN111797820B

Abstract

The application provides a video data processing method, a video data processing device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring video data, wherein the video data is obtained by carrying out video live broadcast on a plurality of commodities; identifying target media data in the video data to obtain commodity information of each commodity and comment information of each commodity in the plurality of commodities, wherein the target media data comprises at least one item of image data, voice data and character data; associating the commodity information with the comment information to obtain comment information associated with the commodity information; and analyzing the comment information related to the commodity information to obtain an analysis result. In the implementation process, the commodity information and the comment information are subjected to correlation analysis, so that the comment information in the video data is effectively extracted, utilized and analyzed, and the utilization rate of the comment information in the video data is improved.

Description

Video data processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the technical field of video processing, image recognition, voice recognition and character recognition, and in particular, to a video data processing method, apparatus, electronic device and storage medium.

Background

In the current video live broadcast process, live broadcast users can evaluate activities such as commodity trial eating, commodity trial use, commodity recommendation and the like of live broadcast merchants, and after the live broadcast merchants communicate with the live broadcast users through live broadcast videos, a lot of user comment information is stored in obtained video data. Some comment information is embedded in a video file in a hard-caption manner, so that the comment information in the video data is difficult to be effectively analyzed and utilized.

Disclosure of Invention

An object of the embodiments of the present application is to provide a video data processing method, an apparatus, an electronic device, and a storage medium, which are used to improve the utilization rate of comment information in video data.

The embodiment of the application provides a video data processing method, which comprises the following steps: acquiring video data, wherein the video data is obtained by carrying out video live broadcast on a plurality of commodities; identifying target media data in the video data to obtain commodity information of each commodity and comment information of each commodity in the plurality of commodities, wherein the target media data comprises at least one item of image data, voice data and character data; associating the commodity information with the comment information to obtain comment information associated with the commodity information; and analyzing the comment information related to the commodity information to obtain an analysis result. In the implementation process, commodity information and comment information corresponding to a plurality of commodities are extracted from video data obtained by live broadcasting, and correlation analysis is performed on the commodity information and the comment information, so that the comment information in the video data is effectively extracted, utilized and analyzed, and the utilization rate of the comment information in the video data is improved.

Optionally, in this embodiment of the present application, identifying target media data in the video data, and obtaining commodity information of each commodity and comment information of each commodity in the plurality of commodities includes: extracting voice data from the video data, and carrying out voiceprint recognition on the voice data to obtain current voiceprint characteristics; judging whether the current voiceprint feature is the voiceprint feature of the target merchant; if yes, carrying out voice recognition on the voice data to obtain a first text, and extracting commodity information in the first text; if not, carrying out voice recognition on the voice data to obtain a second text, and extracting comment information in the second text. In the implementation process, the commodity information and the comment information of the voice data are determined by judging whether the current voiceprint feature is the voiceprint feature of the target merchant; according to the judgment result of the voiceprint characteristics, the commodity information introduced by the merchant for the live users and the information commented by the live users through voice can be effectively extracted, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

Optionally, in this embodiment of the present application, identifying target media data in the video data, and obtaining commodity information of each commodity and comment information of each commodity in the plurality of commodities includes: extracting a video image from the video data, performing target identification on the video image, and then capturing a subtitle area image and a bullet screen area image in the video image; performing character recognition on the caption area image to obtain a caption text, and extracting commodity information in the caption text; and performing character recognition on the bullet screen area image to obtain a bullet screen text, and extracting comment information in the bullet screen text. In the implementation process, the target recognition is carried out on the video image extracted from the video data, then the caption area image and the barrage area image in the video image are captured, and the commodity information in the caption text and the comment information in the barrage text are respectively extracted; therefore, the problem that the commodity information or the comment information is difficult to extract from the embedded video is solved, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

Optionally, in this embodiment of the present application, identifying target media data in the video data, and obtaining commodity information of each commodity and comment information of each commodity in the plurality of commodities includes: extracting character data from the video data, and performing sentence breaking on the character data to obtain a plurality of text sentences; identifying an occurrence location of each of a plurality of text sentences; if the appearance position of the text sentence is in the caption area, extracting commodity information in the text sentence; and if the appearance position of the text statement is in the bullet screen area, extracting comment information in the text statement. In the implementation process, a plurality of text sentences are obtained by sentence breaking of character data extracted from video data; extracting commodity information or comment information in the text sentences according to the appearance positions of the text sentences; therefore, the problem that whether the text sentences in the video data are commodity information or comment information is difficult to determine is solved, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

Optionally, in this embodiment of the present application, identifying target media data in the video data, and obtaining commodity information of each commodity and comment information of each commodity in the plurality of commodities includes: identifying commodity information from voice data or character data in the video data, and identifying comment information from video images in the video data; or identifying commodity information from video images or voice data in the video data, and identifying comment information from character data in the video data; or recognizing commodity information from text data or video images in the video data and recognizing comment information from voice data in the video data. In the implementation process, commodity information is obtained from one data of voice data, character data and video images in the video data, and comment information is obtained from the other data; therefore, the method for acquiring the commodity information and the comment information is effectively increased, the probability that the commodity information and the comment information cannot be acquired is reduced, and the follow-up analysis of the commodity information and the comment information can be effectively guaranteed to be normally carried out.

Optionally, in this embodiment of the present application, associating the commodity information with the comment information includes: if the time length between the commodity appearance time and the comment appearance time is less than the preset time length, correlating the commodity information with the comment information, wherein the commodity appearance time is the time length when the commodity information appears in the video data, and the comment appearance time is the time length when the comment information appears in the video data; or if the commodity information and the comment information are both in the preset time range, associating the commodity information with the comment information; or if the correlation value of the commodity information and the comment information exceeds a preset threshold value, associating the commodity information with the comment information. In the implementation process, whether the commodity information is associated with the comment information or not is determined according to the duration between the commodity appearance time and the comment appearance time, whether the commodity information and the comment information simultaneously appear in a preset time range or not, or whether the correlation value of the commodity information and the comment information exceeds a preset threshold value; therefore, the condition that the association between the commodity information and the comment information is unknown is improved, and the accuracy of association between the commodity information and the comment information is effectively improved.

Optionally, in an embodiment of the present application, the analysis result includes: commodity ordering information; analyzing the comment information related to the commodity information to obtain an analysis result, wherein the analysis result comprises the following steps: carrying out emotional tendency analysis on the comment information associated with the commodity information to obtain good comment times and poor comment times in the comment information; and sorting the commodity information according to the times of appearance of the commodity information in the comment information, the times of good evaluation and the times of poor evaluation to obtain commodity sorting information. In the implementation process, good comment times and bad comment times in the comment information are obtained by analyzing emotional tendency of the comment information associated with the commodity information; sorting the commodity information according to the times of appearance of the commodity information in the comment information, the good evaluation times and the bad evaluation times to obtain commodity sorting information; therefore, the most favorable or popular commodity information can be provided for the live broadcast merchant, so that the live broadcast merchant can arrange subsequent live broadcast commodities and the sequence of the live broadcast commodities, the analysis result is determined according to the number of the best evaluations and the number of the bad evaluations, and the accuracy of obtaining the most favorable or popular commodity information is effectively improved.

An embodiment of the present application further provides a video data processing apparatus, including: the video data acquisition module is used for acquiring video data, and the video data is obtained by carrying out video live broadcast on a plurality of commodities; the target data identification module is used for identifying target media data in the video data to obtain commodity information of each commodity and comment information of each commodity in the plurality of commodities, and the target media data comprise at least one item of image data, voice data and character data; the commodity comment associating module is used for associating the commodity information with the comment information to obtain comment information associated with the commodity information; and the analysis result obtaining module is used for analyzing the comment information related to the commodity information to obtain an analysis result.

Optionally, in an embodiment of the present application, the target data identification module includes: the voice extraction and identification module is used for extracting voice data from the video data and carrying out voiceprint identification on the voice data to obtain the current voiceprint characteristics; the voiceprint characteristic judgment module is used for judging whether the current voiceprint characteristic is the voiceprint characteristic of the target merchant; the first text extraction module is used for carrying out voice recognition on voice data to obtain a first text and extracting commodity information in the first text if the current voiceprint feature is the voiceprint feature of the target merchant; and the second text extraction module is used for carrying out voice recognition on the voice data to obtain a second text and extracting comment information in the second text if the current voiceprint feature is not the voiceprint feature of the target merchant.

Optionally, in an embodiment of the present application, the target data identification module includes: the image identification and interception module is used for extracting a video image from the video data, carrying out target identification on the video image and then intercepting a subtitle area image and a bullet screen area image in the video image; the first character recognition module is used for carrying out character recognition on the caption area image to obtain a caption text and extracting commodity information in the caption text; and the second character recognition module is used for carrying out character recognition on the bullet screen region image, obtaining a bullet screen text and extracting comment information in the bullet screen text.

Optionally, in an embodiment of the present application, the target data identification module includes: the text extraction and sentence break module is used for extracting text data from the video data and carrying out sentence break on the text data to obtain a plurality of text sentences; an appearance position identification module for identifying the appearance position of each text sentence in the plurality of text sentences; the first information extraction module is used for extracting commodity information in the text sentence if the appearance position of the text sentence is in the subtitle area; and the second information extraction module is used for extracting comment information in the text sentence if the appearance position of the text sentence is in the bullet screen area.

Optionally, in an embodiment of the present application, the target data identification module includes: the information identification module is used for identifying commodity information from voice data or character data in the video data and identifying comment information from video images in the video data; or identifying commodity information from video images or voice data in the video data, and identifying comment information from character data in the video data; alternatively, the commodity information is recognized from character data or video images in the video data, and the comment information is recognized from voice data in the video data.

Optionally, in an embodiment of the present application, the article review associating module includes: the information association module is used for associating the commodity information with the comment information if the time length between the commodity appearance time and the comment appearance time is less than the preset time length, wherein the commodity appearance time is the time length of the commodity information appearing in the video data, and the comment appearance time is the time length of the comment information appearing in the video data; or if the commodity information and the comment information are both in the preset time range, associating the commodity information with the comment information; or if the correlation value of the commodity information and the comment information exceeds a preset threshold value, associating the commodity information with the comment information.

Optionally, in an embodiment of the present application, the analysis result includes: commodity ordering information; an analysis result obtaining module comprising: the sentiment tendency analysis module is used for carrying out sentiment tendency analysis on the comment information associated with the commodity information to obtain good comment times and bad comment times in the comment information; and the ranking information obtaining module is used for ranking the commodity information according to the times of appearance of the commodity information in the comment information, the times of good evaluation and the times of poor evaluation to obtain commodity ranking information.

An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing processor-executable machine-readable instructions, the machine-readable instructions when executed by the processor performing the method as described above.

Embodiments of the present application also provide a storage medium having a computer program stored thereon, where the computer program is executed by a processor to perform the method as described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a video data processing method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a speech recognition process provided by an embodiment of the present application;

fig. 3 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

Before describing the video data processing method provided in the embodiment of the present application, some concepts related to the embodiment of the present application are described:

text classification (text classification), also known as text classification (text classification), refers to a process, a technique, and a method for automatically judging the category of text content and assigning a category identifier to a text based on a given classification system and method; specific examples thereof include: under a given classification system, automatically judging the text type according to the content of the text by using a computer, wherein the automatically judging the text type specifically comprises the following steps: judging the emotional color of the text, wherein the emotional color is as follows: whether the comment information is good comment or bad comment.

Voice Recognition (Voice Recognition), which refers to a technology or subject related to Voice communication with a machine, i.e. making the machine understand the words of a human being; the fields to which speech recognition technology relates include: signal processing, pattern recognition, probability and information theory, sound and hearing mechanisms, front-end processing, and the like.

The target detection network is a network for detecting a target object in an image, that is, detecting the target object in the image, and providing a position range, a classification and a probability of the target object in the image, where the position range may be specifically labeled in the form of a detection frame, the classification is a specific class of the target object, and the probability is a probability that the target object in the detection frame is a specific class.

Character recognition refers to a process of analyzing an image including characters, acquiring layout information and recognizing the characters therein, thereby converting a text image into an electronic text.

It should be noted that the video data processing method provided in the embodiments of the present application may be executed by an electronic device, where the electronic device refers to a device terminal or a server having a function of executing a computer program, and the device terminal includes, for example: a smart phone, a Personal Computer (PC), a tablet computer, a Personal Digital Assistant (PDA), a Mobile Internet Device (MID), a network switch or a network router, and the like.

Before describing the video data processing method provided by the embodiment of the present application, application scenarios applicable to the video data processing method are described, where the application scenarios include, but are not limited to: the video data processing method is used for analyzing the video data including the comment information in scenes such as live broadcast network videos, recorded broadcast network videos, live broadcast television programs or recorded broadcast television programs, or the video data processing method is used for analyzing any video including text information or comment information, and the obtained analysis result can help a main broadcast, a live broadcast business or a program producer to arrange the next live broadcast or program.

Please refer to fig. 1, which is a schematic flow chart of a video data processing method according to an embodiment of the present application; the main idea of the video data processing method is that commodity information and comment information corresponding to a plurality of commodities are extracted from video data obtained by live broadcast, and the commodity information and the comment information are subjected to correlation analysis, so that the comment information in the video data is effectively extracted, utilized and analyzed, and the utilization rate of the comment information in the video data is improved, and the video data processing method can comprise the following steps:

step S110: and obtaining video data, wherein the video data is obtained by carrying out live video broadcast on a plurality of commodities.

The obtaining manner of the video data in the step S110 includes: the first acquisition mode is that a video camera, a video recorder or a color camera and other acquisition equipment are used for shooting target commodities to acquire video data, wherein the video data can be shot after live broadcasting explanation of one commodity is finished, and then another commodity is shot, or two commodities which are related to each other are shot simultaneously; then the terminal equipment sends video data to the electronic equipment, and then the electronic equipment receives the video data sent by the acquisition equipment; the second obtaining mode is to obtain video data from a video server, and specifically includes: acquiring video data from a file system of a video server, or acquiring video data from a database of the video server, or acquiring video data from mobile storage equipment of the video server; the third acquisition mode is that video data sent to the terminal equipment by the video server is directly intercepted, or the video data intercepted by the intercepting equipment is received, wherein the video data is sent to the terminal equipment by the video server; a fourth obtaining mode, obtaining video data on the internet by using software such as a browser, or accessing the internet by using other application programs to obtain the video data; specific examples thereof include: and accessing a live video interface, a recorded broadcast video interface or an on-demand video interface provided by the live server to obtain video data.

After step S110, step S120 is performed: and identifying target media data in the video data to obtain commodity information of each commodity and comment information of each commodity in the plurality of commodities.

The target media data includes at least one item of data among image data, voice data and character data, which may be one of the data, two of the data, or all three of the data.

Since the above-mentioned implementation of step S120 is too numerous, there are three specific cases: in the first case, commodity information and comment information are acquired simultaneously from one of the three data, and there are three embodiments in this case; in the second case, commodity information is obtained from one of the three data, and comment information is obtained from the other data, and six embodiments are provided in this case; in a third case, similar to the second case, the commodity information is obtained from two of the three kinds of data together, and the comment information is obtained from the remaining one kind of data, or the comment information is obtained from two of the three kinds of data together, and the commodity information is obtained from the remaining one kind of data, which has six embodiments; therefore, the description of the embodiment of step S120 is put on obtaining the analysis results, and the above three cases will be explained in detail.

After step S120, step S130 is performed: and associating the commodity information with the comment information to obtain comment information associated with the commodity information.

There are many embodiments of the above step S130, including but not limited to the following:

the first implementation mode relates commodity information and comment information according to the time length between the appearance time of a commodity and the appearance time of a comment, and comprises the following steps:

step S131: and if the time length between the commodity appearance time and the comment appearance time is less than the preset time length, correlating the commodity information with the comment information.

The commodity appearance time is the appearance time of the commodity information in the video data, and the comment appearance time is the appearance time of the comment information in the video data.

The embodiment of step S131 described above is, for example: assuming that the preset time is 3 minutes, the commodity appearance time is 1 minute, and the comment appearance time is 2 minutes, the time between the commodity appearance time and the comment appearance time is 1 minute, it is easy to know that the time between the commodity appearance time and the comment appearance time is less than the preset time, and therefore, the commodity information and the comment information should be associated at this time. Of course, if the time length between the commodity appearance time and the comment appearance time is longer than the preset time length, the commodity information cannot be associated with the comment information, and at this time, no operation can be performed.

According to a second implementation mode, commodity information and comment information are correlated according to whether the commodity information and the comment information appear in a preset time range, the implementation mode comprises the following steps:

step S132: and if the commodity information and the comment information are both in the preset time range, associating the commodity information with the comment information.

The above embodiment of step S132 has two cases: the first situation is that the live broadcast commodity corresponding to each time period is not known, and the time period refers to a preset time range; then, taking 10 minutes as an example of a time range, specifically for example: if 9: 00 to 9: in 10, toothbrush information appears at 9: 03, and the comment information of user a appears at 9: 06, the toothbrush information and the comment information of the user a can be associated. The second situation is that the live broadcast commodity corresponding to each time period is known, and the time period refers to a preset time range; assuming that the live merchant knows live commodities in a preset time range in advance, for example, the live merchant between 9 o 'clock and 9 o' clock introduces a toothbrush, and the live merchant between 9 o 'clock and 10 o' clock introduces toothpaste, comment information between 9 o 'clock and 9 o' clock is directly related to the toothbrush, and comment information between 9 o 'clock and 10 o' clock is directly related to the toothpaste.

In a third embodiment, a product information and a comment information are associated with each other according to a correlation value between the product information and the comment information, the method includes: and calculating the correlation value of the commodity information and the comment information, and correlating the commodity information with the corresponding comment information, wherein the correlation value is greater than a preset threshold value.

Step S133: and if the correlation value of the commodity information and the comment information exceeds a preset threshold value, correlating the commodity information and the comment information.

The embodiment of step S133 described above includes, for example: calculating a relevance value of the commodity information and the comment information, specifically, multiplying the relevance value by corresponding weight according to factors such as similarity between the commodity information and the comment information, duration between the commodity appearance time and the comment appearance time and/or whether the commodity information and the comment information are extracted from the same data; assuming that the preset threshold is set to be 2, the similarity between the commodity information and the comment information is 1, the time length between the commodity appearance time and the comment appearance time is 1, and the weights of the similarity and the time length are 1.1 and 1.2, respectively, the correlation value calculated according to the similarity and the time length is 2.3, and therefore, the similarity value 2.3 is greater than the preset threshold 2, and the commodity information and the comment information should be correlated.

After step S130, step S140 is performed: and analyzing the comment information related to the commodity information to obtain an analysis result.

There are many embodiments of the step S140, including but not limited to the following:

a first embodiment of ranking the commodities according to the number of occurrences, the number of good reviews, and the number of bad reviews in the review information to obtain an analysis result of the commodity ranking information, includes:

step S141: and analyzing and counting emotional tendency of the comment information related to the commodity information to obtain good comment times and poor comment times in the comment information.

The embodiment of step S141 described above includes, for example: analyzing and counting emotional tendency of the comment information related to the commodity information by using a neural network model to obtain the number of good comments and the number of bad comments in the comment information, wherein the neural network model comprises but is not limited to: a text classification model, a Long Short-Term Memory (LSTM) network and a Bidirectional Long Short-Term Memory (Bi-LSTM) network; the text classification model is also referred to as a text classification neural network model, and refers to a neural network model for text classification obtained after training a neural network, that is, a text corpus is used as an input of the text classification model to obtain an output of a probability list, and the text classification neural network model can be used, for example: convolutional Neural Networks (CNN), Deep Neural Networks (DNN), and so on.

Step S142: and sorting the commodity information according to the times of appearance of the commodity information in the comment information, the times of good evaluation and the times of poor evaluation to obtain commodity sorting information.

The embodiment of step S142 described above is, for example: and giving corresponding weights to the occurrence frequency, the good evaluation frequency and the poor evaluation frequency of the commodity name in the commodity information in the comment information, wherein the corresponding weights are respectively 1: 1: 1, although the weight can be set according to specific situations; if the number of times of appearance of the name of the first commodity is 3, the number of times of good evaluation is 10, and the number of times of bad evaluation is 0, the ranking coefficient in the commodity ranking information is 3 × 1+10 × 1+0 × (-1) = 13; if the number of times of appearance of the name of the second article is 5, the number of times of good evaluation is 3, and the number of times of bad evaluation is 10, then the ranking coefficient in the article ranking information is 5 × 1+3 × 1+10 × (-1) = -2, it can be understood that the article information is ranked from large to small according to the ranking coefficient, and the article ranking information is, in order: a first commodity and a second commodity; the comment information includes good comment or bad comment, and the commodity name may or may not appear in the comment information.

Good comment times and bad comment times in the comment information are obtained by analyzing emotional tendency of the comment information associated with the commodity information; sorting the commodity information according to the times of appearance of the commodity information (such as commodity names) in the comment information, the good comment times and the bad comment times to obtain commodity sorting information; therefore, the most favorable or popular commodity information can be provided for the live broadcast merchant, so that the live broadcast merchant can arrange subsequent live broadcast commodities and the sequence of the live broadcast commodities, the analysis result is determined according to the number of the best evaluations and the number of the bad evaluations, and the accuracy of obtaining the most favorable or popular commodity information is effectively improved.

In a second embodiment, a preset number of commodities with the highest heat degree are screened from information of a plurality of commodities, and the embodiment includes:

step S143: and counting the number of times of appearance of each commodity information in the plurality of commodity information in the comment information.

The embodiment of step S143 is, for example: assume that the plurality of commodity information is three commodity information, which are: first commodity information, second commodity information, and third commodity information; the counted numbers of occurrences of the first commodity information, the second commodity information, and the third commodity information in the comment information are 89, 93, and 99, respectively.

Step S144: and sequencing according to the times of the commodity information appearing in the comment information to obtain a plurality of sequenced commodity information.

The embodiment of step S144 described above is, for example: assuming that the counted times of occurrence of the first commodity information, the second commodity information and the third commodity information in the comment information are 89, 93 and 99 respectively, sorting according to the times of occurrence of the commodity information in the comment information, and obtaining a plurality of sorted commodity information sequentially: the third commodity information, the second commodity information and the first commodity information; wherein the number of times of occurrence of the third item information, the second item information, and the first item information in the comment information is 99, 93, and 89, respectively.

Step S145: and screening preset quantity commodity information from the sorted commodity information.

The embodiment of step S145 above is, for example: the preset number can be set according to specific situations, such as 2, 3, 5, 10, and so on; for convenience of understanding and explanation, the preset number is 2 as an example, and the screened commodity information is the third commodity information and the second commodity information in sequence. It will be appreciated that the first and second embodiments described above may also be referred to and combined with each other and that the embodiments of the various steps may also be referred to each other.

In the implementation process, at least one item of data of image data, voice data and character data in the video data is identified to obtain commodity information and comment information, then the commodity information and the comment information are correlated to obtain comment information correlated to the commodity information, and finally the comment information correlated to the commodity information is analyzed to obtain an analysis result; that is to say, commodity information and comment information corresponding to a plurality of commodities are extracted from video data obtained by live broadcast, and the commodity information and comment information are subjected to correlation analysis, so that the comment information in the video data is effectively extracted, utilized and analyzed, and the utilization rate of the comment information in the video data is improved.

There are many cases in the implementation of the above step S120, including the following three cases:

in the first case, the commodity information and the comment information are acquired simultaneously from one of the three kinds of data, and this case may include the following three embodiments.

In the first embodiment, the commodity information and the comment information are extracted only from the voice data in the video data, and in the live video, the information that the merchant communicates with the user by voice is stored in the audio data in the video data, that is, the audio data includes: the merchant introduces the commodity information and the comment information of the commodity commented by the user, the commodity information and the comment information can be extracted by extracting the audio and carrying out voiceprint recognition and voice recognition on the audio, and the embodiment can comprise the following steps:

step S1211: and extracting voice data from the video data, and carrying out voiceprint recognition on the voice data to obtain the current voiceprint characteristics.

The embodiment of the step S1211 is, for example: and extracting voice data from the video data by using a voice enhancement adaptive algorithm designed based on the correlation size or a WaveNet neural network model, and carrying out voiceprint recognition on the voice data to obtain the current voiceprint characteristics.

Step S1212: and judging whether the current voiceprint characteristics are the voiceprint characteristics of the target merchant.

The embodiment of step S1212 is, for example: calculating the similarity value of the current voiceprint feature and the voiceprint feature of the target merchant, and if the similarity value of the current voiceprint feature and the voiceprint feature of the target merchant exceeds a preset threshold value, determining that the current voiceprint feature is the voiceprint feature of the target merchant; and if the similarity value of the current voiceprint feature and the voiceprint feature of the target merchant does not exceed the preset threshold value, determining that the current voiceprint feature is not the voiceprint feature of the target merchant.

Step S1213: and if the current voiceprint feature is the voiceprint feature of the target merchant, performing voice recognition on the voice data to obtain a first text, and extracting the commodity information in the first text.

Please refer to fig. 2, which is a schematic diagram of a speech recognition process provided in the embodiment of the present application; the embodiment of step S1213 above is, for example: if the current voiceprint feature is the voiceprint feature of the target merchant, voice recognition can be performed on the voice data by using neural network models such as an LSTM network model, a Bi-LSTM network model, a VGG network model, a Resnet network model, a Wide Resnet network model or an inclusion network model to obtain a first text, and commodity information in the first text is extracted by using a regular expression.

Of course, in the implementation process of speech recognition, after speech slicing, data processing, Mel Frequency Cepstrum Coefficient (MFCC) and WaveNet neural network processing may also be adopted, a speech recognition result, that is, the content of the first text, is obtained; wherein MFCC is a cepstrum coefficient extracted in the Mel scale frequency domain, and Mel scale describes the non-linear characteristic of human ear to frequency perception; the data processing process comprises the following steps: data preprocessing (also called data preprocessing), noise removal, moving window Function (Widnow Function), high-frequency signal enhancement and the like; the data preprocessing refers to frame cutting of a voice slice into a small segment of frame data and a small segment of frame data, and a certain time period is overlapped between two adjacent frames of data; specific examples thereof include: typically 25ms takes one frame and then moves by 10ms to take another frame, i.e. there is a 15ms overlap between each two frames. The two frames of data are overlapped because the voice signal is time-varying, and the feature change is small in a short-time range, so that the voice signal is treated as a steady state; but beyond this short time range the speech signal changes; the moving Window function may specifically use a hamming Window (hamming Window) function, which has a nonzero value in a certain interval and a value of 0 in other intervals, so that it is equivalent to only obtaining speech in the nonzero interval each time; the high-frequency signal enhancement is to pre-emphasize the input digital voice signal, and aims to enhance the high-frequency part of voice, remove the influence of lip radiation and increase the high-frequency resolution of voice.

Step S1214: and if the current voiceprint feature is not the voiceprint feature of the target merchant, performing voice recognition on the voice data to obtain a second text, and extracting comment information in the second text.

The implementation principle and implementation manner of step S1214 are similar to those of step S1213, except that in this step, when it is not the voiceprint feature of the target merchant, the comment information in the second text is extracted, so the implementation manner and implementation principle of this step are not described here, and if it is not clear, reference may be made to the description of step S1213.

In the implementation process, the commodity information and the comment information of the voice data are determined by judging whether the current voiceprint feature is the voiceprint feature of the target merchant; according to the judgment result of the voiceprint characteristics, the commodity information introduced by the merchant for the live users and the information commented by the live users through voice can be extracted, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

In a second embodiment, only the commodity information and the comment information are extracted from the video image in the video data, and for a hard subtitle or a hard bullet screen embedded in the video, the commodity information and the comment information may be obtained by means of image recognition and character recognition, and the embodiment may include:

step S1221: video images are extracted from the video data.

The embodiment of step S1221 described above includes, for example: the video data comprises a plurality of frames of video images, and one frame of video image in the video data is extracted by using a program or software, wherein each frame of video image can be subjected to the following processing procedure.

Step S1222: and carrying out target recognition on the video image, and intercepting a subtitle area image and a bullet screen area image in the video image according to a target recognition result.

It can be understood that the subtitle area and the barrage area are mainly different in the first difference, where the area positions where the subtitle area and the barrage area appear are different, the subtitle area of the live video of the commodity is usually below the video image, while the barrage area is in the live video of the commodity and is usually above the video image; the second difference is that the two types of animation are different, and in order to be convenient for the live user to recognize the subtitles, the subtitles are usually played in a fixed-position animation, but the non-fixed-position bullet screen is slid from the upper right of the video picture to the upper left of the video picture, or from the upper left of the video picture to the upper right of the video picture.

The embodiment of step S1222 is, for example: performing target recognition on the video image by using a support vector machine or a target detection network model, firstly recognizing a subtitle area image and a bullet screen area image in the video image, and then intercepting the subtitle area image and the bullet screen area image in the video image according to a target recognition result; target detection network models that may be used herein include, but are not limited to: RCNN, fast RCNN, and so on.

Step S1223: and performing character recognition on the caption area image to obtain a caption text, and extracting commodity information in the caption text.

The embodiment of step S1223 described above is, for example: performing character recognition on the subtitle region image by using a Text convolutional network (Text-CNN) to obtain a subtitle Text, and extracting commodity information in the subtitle Text by using a regular expression or a neural network model; the Text-CNN (Text-CNN) is an algorithm for classifying texts by using a convolutional neural network.

Step S1224: and performing character recognition on the bullet screen area image to obtain a bullet screen text, and extracting comment information in the bullet screen text.

The implementation principle and implementation manner of step S1224 are similar to those of step S1223, and the difference is only that in this step, the bullet screen text is recognized from the bullet screen area image, and then the comment information in the bullet screen text is extracted by using the regular expression or the neural network model, so that the implementation manner and implementation principle of this step are not described here, and if it is unclear, the description of step S1223 may be referred to.

In the implementation process, the target recognition is carried out on the video image extracted from the video data, then the caption area image and the barrage area image in the video image are captured, and the commodity information in the caption text and the comment information in the barrage text are respectively extracted; therefore, the problem that the commodity information or the comment information is difficult to extract from the embedded video is solved, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

In the third embodiment, only the commodity information and the comment information are extracted from the text data in the video data, and some of the externally-hung caption files are packaged with the video file, so that the text data can be obtained only by reading the caption file according to the video file format, and the embodiment may include:

step S1231: and extracting character data from the video data, and performing sentence breaking on the character data to obtain a plurality of text sentences.

The embodiment of the step S1231 includes: extracting character data from the video data according to the video file format, and performing sentence breaking on the character data to obtain a plurality of text sentences; the specific way of sentence segmentation for the text data is various: first, sentence break is performed according to preset tags, for example: linefeed tags, tabs and means linefeeds, and the like; second, sentence breaking is performed according to preset punctuation marks, for example: periods, commas, exclamation marks, semicolons, and the like; third, word data is sentence-segmented using Natural Language Processing (NLP) techniques.

Step S1232: an occurrence location of each of the plurality of text sentences is identified.

Step S1233: and if the appearance position of the text sentence is in the subtitle area, extracting the commodity information in the text sentence.

Step S1234: and if the appearance position of the text statement is in the bullet screen area, extracting comment information in the text statement.

Embodiments of the foregoing steps S1232 to S1234 include: firstly, if the character data are marked to be subtitle files, the occurrence position of each text statement in the character data can be determined to be a subtitle area; secondly, if the character data are marked to be bullet screen files, the occurrence position of each text statement in the character data can be determined to be a bullet screen area; thirdly, if each text sentence corresponds to the appearance position and animation form of the format tag marked text sentence, the appearance position may be extracted from the format tag, and step S1233 to step S1234 may be performed according to the appearance position.

In the implementation process, a plurality of text sentences are obtained by sentence breaking of character data extracted from video data; extracting commodity information or comment information in the text sentences according to the appearance positions of the text sentences; therefore, the problem that whether the text sentences in the video data are commodity information or comment information is difficult to determine is solved, and the accuracy of obtaining the commodity information and the comment information is effectively improved.

In the second case, the commodity information is obtained from one of the three data, and the comment information is obtained from the other data, where the manner of obtaining information from each data is similar to that in the first case, and where it is unclear, reference may be made to the description in the first case above, and details will not be given below.

Only six different embodiments of the second case are described below: in the first embodiment, commodity information is identified from voice data in video data, and comment information is identified from video images in the video data; in the second embodiment, commodity information is identified from character data in video data, and comment information is identified from video images in the video data; in the third embodiment, commodity information is identified from a video image in video data, and comment information is identified from character data in the video data; in the fourth embodiment, commodity information is identified from voice data in video data, and comment information is identified from text data in the video data; in the fifth embodiment, commodity information is identified from character data in video data, and comment information is identified from voice data in the video data; in the sixth embodiment, commodity information is identified from a video image in video data, and comment information is identified from voice data in the video data;

in the third case, the commodity information is obtained from two of the three data together, and the comment information is obtained from the remaining one data, or the comment information is obtained from two of the three data together, and the commodity information is obtained from the remaining one data, where the manner of obtaining information from each data is similar to that in the first case, and where it is unclear, reference may be made to the description in the first case above, and details thereof will not be described below.

Only six different embodiments of the third case are described below: in the first embodiment, commodity information is identified and summarized from voice data and character data in video data, and comment information is identified from video images in the video data; in the second implementation mode, commodity information is identified and summarized from video images and voice data in video data, and comment information is identified from character data in the video data; in the third embodiment, commodity information is identified and summarized from text data and video images in video data, and comment information is identified from voice data in the video data; in the fourth embodiment, comment information is identified and summarized from voice data and character data in video data, and commodity information is identified from video images in the video data; in the fifth implementation mode, comment information is identified and summarized from video images and voice data in video data, and commodity information is identified from character data in the video data; in the sixth implementation mode, comment information is identified and summarized from text data and video images in video data, and commodity information is identified from voice data in the video data;

in the implementation process, commodity information is obtained from one data of voice data, character data and video images in the video data, and comment information is obtained from the other data; therefore, the method for acquiring the commodity information and the comment information is effectively increased, the probability that the commodity information and the comment information cannot be acquired is reduced, and the follow-up analysis of the commodity information and the comment information can be effectively guaranteed to be normally carried out.

Please refer to fig. 3, which is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application; the embodiment of the present application provides a video data processing apparatus 200, including:

the video data obtaining module 210 is configured to obtain video data, where the video data is obtained by performing live video broadcast on a plurality of commodities.

The target data identification module 220 is configured to identify target media data in the video data, to obtain commodity information of each commodity in the plurality of commodities and comment information of each commodity, where the target media data includes at least one of image data, voice data, and text data.

And the product comment associating module 230 is configured to associate the product information with the comment information to obtain comment information associated with the product information.

And the analysis result obtaining module 240 is configured to analyze the comment information associated with the commodity information to obtain an analysis result.

Optionally, in this embodiment of the present application, the target data identification module may include:

and the voice extraction and identification module is used for extracting voice data from the video data and carrying out voiceprint identification on the voice data to obtain the current voiceprint characteristics.

And the voiceprint characteristic judging module is used for judging whether the current voiceprint characteristic is the voiceprint characteristic of the target merchant.

And the first text extraction module is used for carrying out voice recognition on the voice data to obtain a first text and extracting commodity information in the first text if the current voiceprint feature is the voiceprint feature of the target merchant.

And the second text extraction module is used for carrying out voice recognition on the voice data to obtain a second text and extracting comment information in the second text if the current voiceprint feature is not the voiceprint feature of the target merchant.

Optionally, in an embodiment of the present application, the target data identification module further includes:

and the image identification and interception module is used for extracting a video image from the video data, identifying a target of the video image and intercepting a subtitle area image and a bullet screen area image in the video image.

And the first character recognition module is used for carrying out character recognition on the caption area image to obtain a caption text and extracting commodity information in the caption text.

And the second character recognition module is used for carrying out character recognition on the bullet screen region image, obtaining a bullet screen text and extracting comment information in the bullet screen text.

Optionally, in this embodiment of the application, the target data identification module may further include:

and the character extraction sentence-breaking module is used for extracting character data from the video data and breaking sentences of the character data to obtain a plurality of text sentences.

And the appearance position identification module is used for identifying the appearance position of each text sentence in the plurality of text sentences.

And the first information extraction module is used for extracting the commodity information in the text sentence if the appearance position of the text sentence is in the subtitle area.

And the second information extraction module is used for extracting comment information in the text sentence if the appearance position of the text sentence is in the bullet screen area.

Optionally, in this embodiment of the present application, the target data identification module may further include:

the information identification module is used for identifying commodity information from voice data or character data in the video data and identifying comment information from video images in the video data; or identifying commodity information from video images or voice data in the video data, and identifying comment information from character data in the video data; alternatively, the commodity information is recognized from character data or video images in the video data, and the comment information is recognized from voice data in the video data.

Optionally, in an embodiment of the present application, the article review associating module includes:

the information association module is used for associating the commodity information with the comment information if the time length between the commodity appearance time and the comment appearance time is less than the preset time length, wherein the commodity appearance time is the time length of the commodity information appearing in the video data, and the comment appearance time is the time length of the comment information appearing in the video data; or if the commodity information and the comment information are both in the preset time range, associating the commodity information with the comment information; or if the correlation value of the commodity information and the comment information exceeds a preset threshold value, associating the commodity information with the comment information.

Optionally, in an embodiment of the present application, the analysis result includes: commodity ordering information; an analysis result obtaining module comprising:

and the emotional tendency analysis module is used for carrying out emotional tendency analysis on the comment information related to the commodity information to obtain the good comment times and the bad comment times in the comment information.

And the ranking information obtaining module is used for ranking the commodity information according to the times of appearance of the commodity information in the comment information, the times of good evaluation and the times of poor evaluation to obtain commodity ranking information.

It should be understood that the apparatus corresponds to the above-mentioned video data processing method embodiment, and can perform the steps related to the above-mentioned method embodiment, and the specific functions of the apparatus can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software function that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device.

Please refer to fig. 4 for a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 300 provided in an embodiment of the present application includes: a processor 310 and a memory 320, the memory 320 storing machine readable instructions executable by the processor 310, the machine readable instructions when executed by the processor 310 performing the method as above.

The embodiment of the present application further provides a storage medium 330, where the storage medium 330 stores thereon a computer program, and the computer program is executed by the processor 310 to perform the method as above.

The storage medium 330 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules of the embodiments in the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an alternative embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application.

Claims

1. A method of processing video data, comprising:

the method comprises the steps of obtaining video data, wherein the video data are obtained by carrying out video live broadcast on a plurality of commodities;

identifying target media data in the video data to obtain commodity information of each commodity in the plurality of commodities and comment information of each commodity, wherein the target media data comprises at least one item of image data, voice data and character data;

associating the commodity information with the comment information to obtain comment information associated with the commodity information;

and analyzing the comment information associated with the commodity information to obtain an analysis result.

2. The method of claim 1, wherein the identifying target media data in the video data to obtain commodity information for each commodity in the plurality of commodities and comment information for the each commodity comprises:

extracting voice data from the video data, and carrying out voiceprint recognition on the voice data to obtain current voiceprint characteristics;

judging whether the current voiceprint feature is the voiceprint feature of the target merchant;

if yes, carrying out voice recognition on the voice data to obtain a first text, and extracting commodity information in the first text;

and if not, carrying out voice recognition on the voice data to obtain a second text, and extracting comment information in the second text.

3. The method of claim 1, wherein the identifying target media data in the video data to obtain commodity information for each commodity in the plurality of commodities and comment information for the each commodity comprises:

extracting a video image from the video data, performing target identification on the video image, and then capturing a subtitle area image and a bullet screen area image in the video image;

performing character recognition on the caption area image to obtain a caption text, and extracting commodity information in the caption text;

and performing character recognition on the bullet screen area image to obtain a bullet screen text, and extracting comment information in the bullet screen text.

4. The method of claim 1, wherein the identifying target media data in the video data to obtain commodity information for each commodity in the plurality of commodities and comment information for the each commodity comprises:

extracting character data from the video data, and performing sentence breaking on the character data to obtain a plurality of text sentences;

identifying a location of occurrence of each of the plurality of textual statements;

if the appearance position of the text sentence is in the caption area, extracting commodity information in the text sentence;

and if the appearance position of the text statement is in the bullet screen area, extracting comment information in the text statement.

5. The method of claim 1, wherein the identifying target media data in the video data to obtain commodity information for each commodity in the plurality of commodities and comment information for the each commodity comprises:

recognizing the commodity information from voice data or text data in the video data, and recognizing the comment information from a video image in the video data; or

Recognizing the commodity information from video images or voice data in the video data, and recognizing the comment information from character data in the video data; or

The commodity information is recognized from text data or video images in the video data, and the comment information is recognized from voice data in the video data.

6. The method of claim 1, wherein said associating said merchandise information with said review information comprises:

if the time length between the commodity appearance time and the comment appearance time is less than the preset time length, correlating the commodity information with the comment information, wherein the commodity appearance time is the time length when the commodity information appears in the video data, and the comment appearance time is the time length when the comment information appears in the video data; or

If the commodity information and the comment information are both in a preset time range, associating the commodity information with the comment information; or

And if the correlation value of the commodity information and the comment information exceeds a preset threshold value, associating the commodity information with the comment information.

7. The method of claim 1, wherein the analysis results comprise: commodity ordering information; the analyzing the comment information related to the commodity information to obtain an analysis result comprises:

carrying out emotional tendency analysis on the comment information associated with the commodity information to obtain good comment times and poor comment times in the comment information;

and sorting the commodity information according to the times of the commodity information appearing in the comment information, the times of good evaluation and the times of bad evaluation to obtain commodity sorting information.

8. A video data processing apparatus, comprising:

the video data acquisition module is used for acquiring video data, and the video data is obtained by carrying out video live broadcast on a plurality of commodities;

a target data identification module, configured to identify target media data in the video data, to obtain commodity information of each commodity in the multiple commodities and comment information of each commodity, where the target media data includes at least one of image data, voice data, and text data;

the commodity comment associating module is used for associating the commodity information with the comment information to obtain comment information associated with the commodity information;

and the analysis result obtaining module is used for analyzing the comment information related to the commodity information to obtain an analysis result.

9. An electronic device, comprising: a processor and a memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the method of any of claims 1 to 7.

10. A storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1 to 7.