CN116320621A

CN116320621A - NLP-based streaming media content analysis method and system

Info

Publication number: CN116320621A
Application number: CN202310554226.5A
Authority: CN
Inventors: 潘春霞; 姜凤龙; 朱亚辉
Original assignee: Suzhou Jiyi Technology Co ltd
Current assignee: Suzhou Jiyi Technology Co ltd
Priority date: 2023-05-17
Filing date: 2023-05-17
Publication date: 2023-06-23
Anticipated expiration: 2043-05-17
Also published as: CN116320621B

Abstract

The invention is applicable to the technical field of information processing, and provides a streaming media content analysis method and system based on NLP, wherein the method comprises the following steps: receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword; screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video; receiving a function keyword input by a user, summarizing the function keyword and the search keyword into nouns, extracting adjectives and nouns in each piece of text information based on NLP, binding a noun for each adjective, and determining content evaluation information of the text information; and analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information. According to the invention, the streaming media evaluation information is automatically obtained, and the streaming media evaluation information can accurately reflect the overall public opinion guidance.

Description

NLP-based streaming media content analysis method and system

Technical Field

The invention relates to the technical field of information processing, in particular to a streaming media content analysis method and system based on NLP.

Background

When a new product is released or marketed, the knowledge of the streaming media content guidance is important for strategic layout adjustment of the new product, and along with the rising of short videos, accurate analysis is required to be performed on the content of the streaming media video, so that manufacturers can know the public opinion of the new product in time, and at present, more accurate public opinion analysis is difficult to automatically perform on a large amount of streaming media video content. Therefore, a method and a system for analyzing a streaming media content based on NLP are needed to solve the above problems.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a streaming media content analysis method and a streaming media content analysis system based on NLP, so as to solve the problems existing in the background art.

The invention is realized in such a way that a streaming media content analysis method based on NLP comprises the following steps:

receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword;

screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video;

receiving a function keyword input by a user, inducing the function keyword and the search keyword into nouns,

extracting adjectives and nouns in each text message based on NLP, binding a noun for each adjective, and determining content evaluation information of the text message;

and analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information.

As a further scheme of the invention: the step of processing the filtered streaming media videos and determining text information corresponding to each streaming media video specifically comprises the following steps:

judging whether the screened streaming media video has subtitle information or not;

when subtitle information exists, performing text recognition on the subtitle information in the streaming media video to obtain text information;

when the subtitle information does not exist, the audio information of the streaming media video is acquired, and the audio information is subjected to voice conversion to obtain text information.

As a further scheme of the invention: the step of extracting adjectives and nouns in each text message based on NLP specifically comprises the following steps:

determining the influence degree of a streaming media video author corresponding to the text information;

when the influence degree is smaller than or equal to the set influence value, extracting adjectives and nouns in the text information by using a word segmentation tool, and carrying out position marking on the extracted adjectives and nouns;

when the influence degree is larger than a set influence value, training corpus information is received, feature learning is conducted on the training corpus information based on the CNN-LSTM model to obtain an exclusive neural network model, text information is processed through the exclusive neural network model to obtain adjectives and nouns, and position marking is conducted on the obtained adjectives and nouns.

As a further scheme of the invention: the step of binding a noun for each adjective and determining the content evaluation information of the text information specifically comprises the following steps:

binding a noun for each adjective according to the position mark, and determining the part of speech of each adjective, wherein the part of speech comprises an identification word, a detraction word and a neutral word;

classifying all adjectives according to nouns to obtain a plurality of categories, wherein nouns corresponding to each category are identical;

and determining a text evaluation value of the text information, wherein the text evaluation value=the number of a×sense words+the number of b×devaluation words+the number of c×neutral words, and the category and the text evaluation value form content evaluation information.

As a further scheme of the invention: the step of analyzing and integrating all the content evaluation information to obtain the streaming media evaluation information specifically comprises the following steps:

integrating the categories in all the content evaluation information, and merging the categories corresponding to the same noun;

the influence degree of the streaming media video authors corresponding to each text evaluation value is called;

and determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.

Another object of the present invention is to provide an NLP-based streaming content analysis system, the system comprising:

the streaming media video determining module is used for receiving a search keyword input by a user and determining matched streaming media videos according to the search keyword;

the text information acquisition module is used for screening the streaming media videos according to the heat value, processing the screened streaming media videos and determining text information corresponding to each streaming media video;

a function keyword input module for receiving the function keywords input by the user, inducing the function keywords and the search keywords into nouns,

the adjective noun determining module is used for extracting adjectives and nouns in each piece of text information based on NLP, binding a noun for each adjective, and determining content evaluation information of the text information;

and the streaming media evaluation information module is used for analyzing and integrating all the content evaluation information to obtain streaming media evaluation information, and specially marking the evaluation content of the functional keywords in the streaming media evaluation information.

As a further scheme of the invention: the text information acquisition module comprises:

the subtitle information judging unit is used for judging whether the screened streaming media video has subtitle information or not;

the first text information unit is used for carrying out text recognition on the subtitle information in the streaming media video to obtain text information when the subtitle information exists;

and the second text information unit is used for acquiring the audio information of the streaming media video when the subtitle information does not exist, and carrying out voice conversion on the audio information to obtain text information.

As a further scheme of the invention: the adjective noun determination module includes:

the influence degree determining unit is used for determining the influence degree of the streaming media video author corresponding to the text information;

a first adjective noun unit, when the influence degree is smaller than or equal to a set influence value, extracting adjectives and nouns in the text information by using an adjective tool, and carrying out position marking on the extracted adjectives and nouns;

and the second adjective noun unit is used for receiving training corpus information when the influence degree is larger than a set influence value, performing feature learning on the training corpus information based on the CNN-LSTM model to obtain a proprietary neural network model, processing text information through the proprietary neural network model to obtain adjectives and nouns, and performing position marking on the obtained adjectives and nouns.

As a further scheme of the invention: the adjective noun determination module further includes:

an adjective noun binding unit, configured to bind a noun for each adjective according to the position mark, and determine a part of speech of each adjective, where the part of speech includes an identification word, a disambiguation word, and a neutral word;

the adjective classification unit is used for classifying all adjectives according to nouns to obtain a plurality of categories, and nouns corresponding to each category are the same;

and a text evaluation value unit for determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the positive words+b×the number of the negative words+c×the number of the neutral words, and the category and the text evaluation value form content evaluation information.

As a further scheme of the invention: the streaming media evaluation information module comprises:

the category integrating unit is used for integrating the categories in all the content evaluation information and combining the categories corresponding to the same noun;

the influence degree calling unit is used for calling the influence degree of the streaming media video author corresponding to each text evaluation value;

and the overall evaluation value unit is used for determining an overall evaluation value, wherein the overall evaluation value is = Σtext evaluation value multiplied by influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.

Compared with the prior art, the invention has the beneficial effects that:

the invention processes the screened streaming media video to determine the text information corresponding to each streaming media video; the functional keywords and the search keywords input by the user are generalized into nouns, adjectives and nouns in each text message are extracted based on NLP, a noun is bound for each adjective, and content evaluation information of the text message is determined; and analyzing and integrating all the content evaluation information to obtain the streaming media evaluation information. Thus, the streaming media evaluation information can be automatically analyzed and obtained, and the streaming media evaluation information can accurately reflect the overall public opinion guidance.

Drawings

Fig. 1 is a flowchart of a method for analyzing a streaming media content based on NLP.

Fig. 2 is a flowchart of determining text information of a streaming video in an NLP-based streaming content analysis method.

Fig. 3 is a flowchart of extracting adjectives and nouns in each text message in an NLP-based streaming media content analysis method.

Fig. 4 is a flowchart of a method for analyzing a streaming media content based on NLP, in which a noun is bound for each adjective.

Fig. 5 is a flowchart of obtaining streaming media evaluation information in an NLP-based streaming media content analysis method.

Fig. 6 is a schematic structural diagram of an NLP-based streaming media content analysis system.

Fig. 7 is a schematic structural diagram of a text information acquisition module in an NLP-based streaming media content analysis system.

Fig. 8 is a schematic structural diagram of an adjective noun determining module in an NLP-based streaming media content analysis system.

Fig. 9 is a schematic structural diagram of a streaming media evaluation information module in an NLP-based streaming media content analysis system.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more clear, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Specific implementations of the invention are described in detail below in connection with specific embodiments.

As shown in fig. 1, an embodiment of the present invention provides a method for analyzing streaming media content based on NLP, which includes the following steps:

s100, receiving a search keyword input by a user, and determining a matched streaming media video according to the search keyword;

s200, screening the streaming media videos according to the heat value, processing the screened streaming media videos, and determining text information corresponding to each streaming media video;

s300, receiving the function keywords input by the user, inducing the function keywords and the search keywords into nouns,

s400, extracting adjectives and nouns in each text message based on NLP, binding a noun for each adjective, and determining content evaluation information of the text message;

s500, analyzing and integrating all the content evaluation information to obtain stream media evaluation information, and carrying out special marking on the evaluation content of the functional keywords in the stream media evaluation information.

In the embodiment of the invention, when a manufacturer needs to know the public opinion of a new product, a search keyword is input, the search keyword can be a new product name, a streaming media video platform can determine a plurality of matched streaming media videos according to the search keyword, then the embodiment of the invention can screen the streaming media videos according to a heat value, the heat value is related to the praise amount, comment amount and forwarding amount of the streaming media videos, the streaming media videos with higher heat value are reserved, and the screened streaming media videos are processed to determine text information corresponding to each streaming media video; then, the function keyword which is needed to be input by the user is a new push function in a new product, and is a bright product point which is compared with the intention of a manufacturer, the embodiment of the invention can sum up the function keyword and the search keyword into nouns, then the embodiment of the invention can extract adjectives and nouns in each text message based on a natural language processing technology (NLP), bind a noun for each adjective, indicate that the adjectives describe the noun, obtain content evaluation information of the text message, finally analyze and integrate all the content evaluation information to obtain the streaming media evaluation information, the streaming media evaluation information can reflect the whole public opinion guide, and make the evaluation content of the function keyword in the streaming media evaluation information special mark, such as thickening, so that a manufacturer staff can conveniently see the market effect of the new function at a glance, and the evaluation content of the function keyword is easy to understand.

As shown in fig. 2, as a preferred embodiment of the present invention, the step of processing the filtered streaming video to determine text information corresponding to each streaming video specifically includes:

s201, judging whether subtitle information exists in the screened streaming media video;

s202, when subtitle information exists, performing text recognition on the subtitle information in the streaming media video to obtain text information;

and S203, when the subtitle information does not exist, acquiring the audio information of the streaming media video, and performing voice-to-text conversion on the audio information to obtain text information.

In the embodiment of the invention, in order to obtain text information, whether the screened streaming media video contains subtitle information is required to be judged, and if the screened streaming media video contains subtitle information, text information can be obtained by directly carrying out text recognition on the subtitle information in the streaming media video; if the subtitle information does not exist, the audio information of the streaming media video is required to be called, noise reduction processing is carried out on the audio information, and then voice conversion is carried out to obtain text information.

As shown in fig. 3, as a preferred embodiment of the present invention, the step of extracting adjectives and nouns in each text message based on NLP specifically includes:

s401, determining the influence degree of a streaming media video author corresponding to text information;

s402, extracting adjectives and nouns in text information by using a word segmentation tool when the influence degree is smaller than or equal to a set influence value, and carrying out position marking on the extracted adjectives and nouns;

and S403, when the influence degree is larger than the set influence value, receiving training corpus information, performing feature learning on the training corpus information based on the CNN-LSTM model to obtain a proprietary neural network model, processing text information through the proprietary neural network model to obtain adjectives and nouns, and performing position marking on the adjectives and nouns.

In the embodiment of the invention, the influence degree of the streaming media video author corresponding to each text message needs to be determined, the influence degree is determined according to the praise amount and the vermicelli amount of the video author, the influence degree=m×praise amount and +n×vermicelli amount, M and N are fixed values, when the influence degree is smaller than or equal to a set influence value, adjectives and nouns in the text message are directly extracted by using a word segmentation tool, the extracted adjectives and nouns are subjected to position marking, the position marking is used for indicating the position in the text message, and the word segmentation tool can use jieba, hanlp, ansj or standby. When the influence degree is larger than a set influence value, an exclusive neural network model of the streaming media video author is required to be built, so that analysis can be more accurate, in addition, the video author with larger influence degree in each field is limited, the limited exclusive neural network model is built, the streaming media video author can be always used after the first time of building is finished, during building, a user is required to upload training corpus information, the training corpus information is obtained according to the previous video of the video author, and then feature learning is carried out on the training corpus information based on a CNN-LSTM model to obtain the exclusive neural network model, so that the exclusive neural network model can carry out better semantic analysis on the video content of the video author.

As shown in fig. 4, as a preferred embodiment of the present invention, the step of binding a noun for each adjective and determining content rating information of the text information specifically includes:

s404, binding a noun for each adjective according to the position mark, and determining the part of speech of each adjective, wherein the part of speech comprises an identification word, a detraction word and a neutral word;

s405, classifying all adjectives according to nouns to obtain a plurality of categories, wherein nouns corresponding to each category are the same;

and S406, determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the sense words+b×the number of the devaluation words+c×the number of the neutral words, and the category and the text evaluation value form content evaluation information.

In the embodiment of the invention, a noun is bound for each adjective according to the position mark, the bound noun is the noun with the nearest position of the adjective in the same sentence, the part of speech of each adjective is determined, and the adjective can be input into an electronic dictionary to obtain the part of speech; and classifying all adjectives according to nouns, wherein nouns corresponding to each category are the same, forming a table, wherein the first column is the noun, the second column is the adjective corresponding to the noun, finally determining a text evaluation value of the text information, wherein the text evaluation value = a x the number of the positive words + b x the number of the negative words + c x the number of the neutral words, and the values of a, b and c are all definite values.

As shown in fig. 5, as a preferred embodiment of the present invention, the step of analyzing and integrating all content evaluation information to obtain streaming media evaluation information specifically includes:

s501, integrating the categories in all content evaluation information, and merging the categories corresponding to the same noun;

s502, the influence degree of the streaming media video authors corresponding to each text evaluation value is called;

s503, determining an overall evaluation value, wherein the overall evaluation value= Σtext evaluation value×influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.

In the embodiment of the invention, the content evaluation information corresponding to the screened streaming media video is integrated, the overall evaluation value is determined, and the overall evaluation value is accumulated after being equal to the total text evaluation value multiplied by the corresponding influence, and the overall evaluation value reflects the quality of the overall public opinion.

As shown in fig. 6, the embodiment of the present invention further provides a streaming media content analysis system based on NLP, where the system includes:

the streaming media video determining module 100 is configured to receive a search keyword input by a user, and determine a matched streaming media video according to the search keyword;

the text information acquisition module 200 is configured to screen the streaming media video according to the hotness value, process the screened streaming media video, and determine text information corresponding to each streaming media video;

a function keyword input module 300 for receiving a function keyword input by a user, generalizing the function keyword and the search keyword into nouns,

adjective noun determination module 400 extracts adjectives and nouns in each text message based on NLP, binds a noun for each adjective, and determines content evaluation information of the text message;

the streaming media evaluation information module 500 is configured to analyze and integrate all the content evaluation information to obtain streaming media evaluation information, and specially mark the evaluation content of the functional keywords in the streaming media evaluation information.

In the embodiment of the invention, when a manufacturer needs to know the public opinion of a new product, a search keyword is input, the search keyword can be a new product name, a streaming media video platform can determine a plurality of matched streaming media videos according to the search keyword, then the embodiment of the invention can screen the streaming media videos according to a heat value, the heat value is related to the praise amount, comment amount and forwarding amount of the streaming media videos, the streaming media videos with higher heat value are reserved, and the screened streaming media videos are processed to determine text information corresponding to each streaming media video; the embodiment of the invention can sum up the functional keywords and the search keywords into nouns, then the embodiment of the invention can extract adjectives and nouns in each text message based on Natural Language Processing (NLP), bind a noun for each adjective, indicate that the adjective is descriptive of the noun, obtain content evaluation information of the text message, finally analyze and integrate all the content evaluation information to obtain streaming media evaluation information, the streaming media evaluation information can reflect the overall public opinion guide, and the evaluation content of the functional keywords in the streaming media evaluation information is specially marked, thereby facilitating the manufacturers to see the market effect of the new function at a glance, and being easy to understand that the evaluation content of the functional keywords is the adjective corresponding to the noun.

As shown in fig. 7, as a preferred embodiment of the present invention, the text information acquiring module 200 includes:

a caption information determining unit 201, configured to determine whether the filtered streaming video has caption information;

a first text information unit 202, configured to perform text recognition on the subtitle information in the streaming media video to obtain text information when the subtitle information exists;

and the second text information unit 203 is configured to obtain audio information of the streaming video when the subtitle information does not exist, and perform voice-to-text conversion on the audio information to obtain text information.

As shown in fig. 8, as a preferred embodiment of the present invention, the adjective noun determining module 400 includes:

an influence degree determining unit 401, configured to determine an influence degree of a streaming media video author corresponding to the text information;

a first adjective noun unit 402 that extracts adjectives and nouns in the text information using the word segmentation tool and position-marks the extracted adjectives and nouns when the influence degree is less than or equal to a set influence value;

the second adjective noun unit 403 is configured to receive training corpus information when the influence degree is greater than the set influence value, perform feature learning on the training corpus information based on the CNN-LSTM model to obtain an exclusive neural network model, process text information through the exclusive neural network model to obtain adjectives and nouns, and perform position marking on the obtained adjectives and nouns.

As shown in fig. 8, as a preferred embodiment of the present invention, the adjective noun determining module 400 further includes:

an adjective noun binding unit 404, configured to bind a noun for each adjective according to the position mark, and determine a part of speech of each adjective, where the part of speech includes an identification word, a disambiguation word, and a neutral word;

an adjective classification unit 405, configured to classify all adjectives according to nouns, so as to obtain a plurality of categories, where nouns corresponding to each category are the same;

a text evaluation value unit 406, configured to determine a text evaluation value of the text information, where the text evaluation value=a×the number of positive words+b×the number of negative words+c×the number of neutral words, and the category and the text evaluation value form content evaluation information.

As shown in fig. 9, as a preferred embodiment of the present invention, the streaming media evaluation information module 500 includes:

a category integrating unit 501, configured to integrate categories in all content evaluation information, and combine categories corresponding to the same noun;

the influence degree retrieving unit 502 is configured to retrieve the influence degree of the streaming media video author corresponding to each text evaluation value;

the overall evaluation value unit 503 is configured to determine an overall evaluation value, where the overall evaluation value= Σtext evaluation value×influence degree, and the integrated category and the overall evaluation value form streaming media evaluation information.

The foregoing description of the preferred embodiments of the present invention should not be taken as limiting the invention, but rather should be understood to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in various embodiments may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily performed in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Other embodiments of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method for analyzing streaming media content based on NLP, the method comprising the steps of:

2. The method for analyzing the NLP-based streaming media content according to claim 1, wherein the step of processing the filtered streaming media videos to determine text information corresponding to each streaming media video specifically comprises:

3. The method for analyzing the streaming media content based on the NLP according to claim 1, wherein the step of extracting adjectives and nouns in each text message based on the NLP comprises the following steps:

4. The NLP-based streaming media content analysis method of claim 3, wherein the step of binding a noun for each adjective and determining content rating information of the text information comprises:

and determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the sense words+b×the number of the devaluation words+c×the number of the neutral words, the category and the text evaluation value form content evaluation information, and a, b and c are all constant values.

5. The method for analyzing and integrating the content rating information according to claim 4, wherein the step of analyzing and integrating all the content rating information to obtain the content rating information specifically comprises:

6. A NLP-based streaming media content analysis system, the system comprising:

7. The NLP-based streaming media content analysis system of claim 6, wherein the text information acquisition module comprises:

8. The NLP-based streaming media content analysis system of claim 6, wherein the adjective noun determination module comprises:

9. The NLP-based streaming media content analysis system of claim 8, wherein the adjective noun determination module further comprises:

and the text evaluation value unit is used for determining a text evaluation value of the text information, wherein the text evaluation value=a×the number of the positive words+b×the number of the negative words+c×the number of the neutral words, the category and the text evaluation value form content evaluation information, and a, b and c are all constant values.

10. The NLP-based streaming content analysis system of claim 9, wherein the streaming rating information module comprises: