CN114254202A - Intelligent media recommendation system, method and storage medium based on big data - Google Patents

Intelligent media recommendation system, method and storage medium based on big data Download PDF

Info

Publication number
CN114254202A
CN114254202A CN202111591656.1A CN202111591656A CN114254202A CN 114254202 A CN114254202 A CN 114254202A CN 202111591656 A CN202111591656 A CN 202111591656A CN 114254202 A CN114254202 A CN 114254202A
Authority
CN
China
Prior art keywords
data
user
media asset
asset database
labels
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111591656.1A
Other languages
Chinese (zh)
Inventor
申明
查万能
吴广
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guizhou Huachuangyun Technology Co ltd
Original Assignee
Guizhou Huachuangyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Huachuangyun Technology Co ltd filed Critical Guizhou Huachuangyun Technology Co ltd
Priority to CN202111591656.1A priority Critical patent/CN114254202A/en
Publication of CN114254202A publication Critical patent/CN114254202A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2291User-Defined Types; Storage management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of big data, in particular to a media intelligent recommendation system, a method and a storage medium based on big data, wherein the system comprises: the label subsystem is used for marking the user data and the media asset database data; the recommendation subsystem is used for acquiring user data and media asset database data, cleaning the data and sending the data to the storage subsystem; the system is also used for extracting labels from the user data and the media asset database data and calculating the similarity of the extraction results by adopting a similarity algorithm model; the system is also used for generating recommended data according to the calculation result, sorting the recommended data by adopting a sorting algorithm and outputting a sorting result; and the storage subsystem is used for storing the cleaned user data and media asset database data as well as the marked user data and media asset database data. The scheme can be used for carrying out personalized recommendation on the user and increasing the recommendation pertinence.

Description

Intelligent media recommendation system, method and storage medium based on big data
Technical Field
The invention relates to the technical field of big data, in particular to a media intelligent recommendation system and method based on big data and a storage medium.
Background
With the development of information technology and the internet, people gradually move from the times of lacking information to the times of information overload. In the era of information overload, both information consumers and information producers have met with great challenges: as information consumers, it is very difficult to find out the information which is interested by the consumers from a large amount of information; it is very difficult for information producers to make information produced by themselves stand out, and the information producers get attention from the wide range of users.
The recommendation system is an important tool for solving the contradiction, and the recommendation system searches for content similar to the user viewing record according to the historical data of the user, such as the viewing record, and recommends the content to the user, but the recommendation system is poor in pertinence and cannot perform personalized recommendation to the user.
Disclosure of Invention
One of the purposes of the invention is to provide a big data-based intelligent media recommendation system, which can perform personalized recommendation on users and increase the recommendation pertinence.
The invention provides a basic scheme I: big data based media intelligent recommendation system, including: the system comprises a tag subsystem, a recommendation subsystem and a storage subsystem;
the label subsystem is used for marking the user data and the media asset database data;
the recommendation subsystem is used for acquiring user data and media asset database data, cleaning the data and sending the data to the storage subsystem; the system is also used for extracting labels from the user data and the media asset database data and calculating the similarity of the extraction results by adopting a similarity algorithm model; the system is also used for generating recommended data according to the calculation result, sorting the recommended data by adopting a sorting algorithm and outputting a sorting result;
and the storage subsystem is used for storing the cleaned user data and media asset database data as well as the marked user data and media asset database data.
The beneficial effects of the first basic scheme are as follows: collecting user data and media asset database data, and cleaning the user data and the media asset database data, so that unreasonable data in the user data and the media asset database data are removed, subsequent processing is facilitated, the accuracy of the subsequent processing is improved, and the user data and the media asset database data are marked, namely labels are added, so that the portrait of a user is drawn; label extraction is carried out on user data and media asset database data, similarity calculation is carried out on extraction results by adopting a similarity calculation method model, recommendation data are generated according to calculation results, a sorting algorithm is adopted to sort the recommendation data, sorting results are output, the sorting results are determined according to portraits of users, namely labels, different users can add different labels according to the user data and the media asset database data of the users, different user portraits are obtained, accordingly, sorting results can be generated in a targeted mode, personalized recommendation is carried out on the users, and continuous optimization can be carried out according to changes of service scenes.
Further, the recommendation subsystem includes: a data acquisition module;
the data acquisition module is used for acquiring user data and media asset database data and cleaning the user data and the media asset database data; the cleaning comprises the following steps: judging whether the user data and the media asset database data accord with a preset range or not, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data;
the tag subsystem, comprising: a label setting module;
the label setting module is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data; and judging whether the labels of the data meet the preset row label condition, and if so, combining the labels to set the labels as row labels.
Has the advantages that: the data is cleared, unreasonable data is removed, subsequent calculation amount is reduced, calculation accuracy is improved, after corresponding labels are added to the user data and the media asset database data, if a plurality of labels are added, the labels can be combined and set as row labels, and therefore new labels are formed.
Further, the label setting module comprises an intelligent label setting submodule, a conditional label setting submodule and a combined label setting submodule;
the intelligent label setting submodule is used for carrying out automatic label setting on the user data and the media asset database data;
the condition label setting submodule is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data;
the combined label setting submodule is used for judging whether the labels of the data meet the preset row label condition or not, and if so, combining the labels to set the labels as row labels;
the label subsystem further comprises a label management module and a permission management module;
the label management module is used for adding, deleting, modifying and viewing labels;
and the authority management module is used for setting the authority of the operation and maintenance personnel.
Has the advantages that: the data is cleared, unreasonable data is removed, subsequent calculation amount is reduced, calculation accuracy is improved, after corresponding labels are added to the user data and the media asset database data, if a plurality of labels are added, the labels can be combined and set as row labels, and therefore new labels are formed.
Furthermore, the recommendation subsystem further comprises a feature extraction module, an algorithm library and an external data interface;
the external data interface includes: a real-time data stream interface and an offline data stream interface;
the characteristic extraction module is used for extracting labels of the user data and the media asset database data;
the algorithm library is used for storing a similarity algorithm model and a sorting algorithm, and similarity calculation is carried out on the extracted result by adopting the similarity algorithm model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; updating the generated recommended data according to the calculation result, and sorting the recommended data by adopting a sorting algorithm; judging whether the current media intelligent recommendation system is in an off-line state, if not, pushing a sequencing result to a user side through a real-time data stream interface; if yes, sending the sequencing result to a storage subsystem for storage;
and the storage subsystem is also used for storing the sequencing result and calling the recommended data according to the user side requirement and pushing the recommended data to the user side through the off-line data flow interface.
Has the advantages that: when the user is offline or the user is used for the first time, and the user data is not much, the user data is stored in the database in advance, and then the recommended data in the database is called according to the user side requirement and pushed to the user side, so that the acquisition of the recommended data of the user during offline is ensured.
Further, the user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data;
the media asset database data comprises IPTV data, mobile terminal television data and cache video data.
Has the advantages that: the user data and the media asset database data comprise various data, and similarity calculation can be performed from multiple aspects, so that more accurate and more targeted recommended data can be obtained.
The invention also aims to provide a big data-based intelligent media recommendation method, which is used for carrying out personalized recommendation on users and increasing the recommendation pertinence.
The invention provides a second basic scheme: the intelligent media recommendation method based on big data comprises the following steps:
collecting user data and media asset database data;
cleaning and marking the user data and the media asset database data;
extracting labels of the user data and the media asset database data;
performing similarity calculation on the extraction result by adopting a similarity calculation method model;
generating recommendation data according to the calculation result;
and sorting the recommended data by adopting a sorting algorithm, and outputting a sorting result.
The second basic scheme has the beneficial effects that: collecting user data and media asset database data, and cleaning the user data and the media asset database data, so that unreasonable data in the user data and the media asset database data are removed, subsequent processing is facilitated, the accuracy of the subsequent processing is improved, and the user data and the media asset database data are marked, namely labels are added, so that the portrait of a user is drawn; label extraction is carried out on user data and media asset database data, similarity calculation is carried out on extraction results by adopting a similarity calculation method model, recommendation data are generated according to calculation results, a sorting algorithm is adopted to sort the recommendation data, sorting results are output, the sorting results are determined according to portraits of users, namely labels, different users can add different labels according to the user data and the media asset database data of the users, different user portraits are obtained, accordingly, sorting results can be generated in a targeted mode, personalized recommendation is carried out on the users, and continuous optimization can be carried out according to changes of service scenes.
Further, the cleaning includes: judging whether the user data and the media asset database data accord with a preset range or not, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data;
the marking, comprising: judging whether the user data and the media asset database data meet preset label conditions, if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data;
and judging whether the labels of the data meet the preset row label condition, and if so, combining the labels to set the labels as row labels.
Has the advantages that: the data is cleared, unreasonable data is removed, subsequent calculation amount is reduced, calculation accuracy is improved, after corresponding labels are added to the user data and the media asset database data, if a plurality of labels are added, the labels can be combined and set as row labels, and therefore new labels are formed.
Further, the similarity calculation comprises a near line calculation and an off line calculation;
calculating a near line: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; updating the generated recommended data according to the calculation result of the near line calculation, sorting the recommended data by adopting a sorting algorithm, and pushing the sorting result to the user side;
and (3) off-line calculation: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; and updating the generated recommended data according to the calculation result of the off-line calculation, sequencing the recommended data by adopting a sequencing algorithm, sending the data to a database for storage, and calling the recommended data in the database according to the requirement of the user side and pushing the data to the user side when the user side is off-line.
Has the advantages that: the online calculation aims at data collected in real time, content recommendation can be timely carried out on a user side by performing online calculation, the offline calculation aims at data collected in history, the offline calculation is stored in a database in advance through offline calculation when the offline calculation is offline or the user uses the online calculation for the first time, and the recommended data in the database is called and pushed to the user side according to the user side requirement.
Further, the user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data;
the media asset database data comprises IPTV data, mobile terminal television data and cache video data.
Has the advantages that: the user data and the media asset database data comprise various data, and similarity calculation can be performed from multiple aspects, so that more accurate and more targeted recommended data can be obtained.
The invention also aims to provide a media intelligent recommendation storage medium based on big data, which can be used for carrying out personalized recommendation on users and increasing the pertinence of recommendation.
The invention provides a third basic scheme: the storage medium stores a computer program, and the computer program, when executed by a processor, implements any one of the above-mentioned steps of the method for intelligent media recommendation storage based on big data.
The third basic scheme has the beneficial effects that: the media intelligent recommendation storage medium based on big data is characterized in that a computer program is stored on the readable storage medium, and when being executed by a processor, the computer program realizes the steps of any one of the above media intelligent recommendation methods based on big data, so that the media intelligent recommendation method based on big data is applied to perform personalized recommendation on users, and the recommendation pertinence is increased.
Drawings
FIG. 1 is a logic block diagram of an embodiment of a big data based media intelligent recommendation system according to the present invention;
FIG. 2 is a flowchart illustrating an embodiment of a big data-based intelligent media recommendation method according to the present invention.
Detailed Description
The following is further detailed by way of specific embodiments:
example one
The embodiment is basically as shown in the attached figure 1: big data based media intelligent recommendation system, including: the system comprises a tag subsystem, a recommendation subsystem and a storage subsystem;
the label subsystem is used for marking the user data and the media asset database data; the marking is to add a label. Specifically, the method comprises the following steps: a tag subsystem, comprising: the system comprises a label setting module, a label management module and a permission management module;
the label setting module is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data; judging whether the labels of the data meet the preset row label condition or not, and if so, combining the labels to set the labels as row labels; specifically, the method comprises the following steps: the label setting module comprises an intelligent label setting submodule, a conditional label setting submodule and a combined label setting submodule;
the intelligent label setting submodule is used for automatically setting labels for user data and media asset database data, the intelligent label setting submodule automatically sets preset label conditions according to specific contents included by the user data and the media asset database data by adopting artificial intelligence, automatically adds the preset label conditions, and can automatically mark each user and resource according to results of word segmentation, word frequency calculation and theme recognition of text contents, for example: user data includes user attribute data, and user attribute data includes the age, and then intelligent label sets up the submodule piece according to the age, and the automatic setting is preset the label condition: adults aged 18 years or older, and minors aged 18 years or older; therefore, whether the age in the user data meets the preset label condition is judged, and if yes, a corresponding adult label or a minor label is added to the user data; if not, not adding a corresponding label for the user data;
the condition label setting submodule is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data; the preset label condition can be set according to an actual use scene;
the combined label setting submodule is used for judging whether the labels of the data meet the preset row label condition or not, and if so, combining the labels to set the labels as row labels; that is, the preset row label condition is set through a plurality of specific data in the user data and the media asset database data, for example: the user data comprises user attribute data, the user attribute data comprises age and gender, the labels of the user data comprise two types of labels with the age as a preset label condition and with the gender as a preset label condition, and the preset row label condition is as follows: the label comprises a child label and a female label, and the label combination is set as a girl label, so that a new label is formed.
The label management module is used for adding, deleting, modifying and viewing labels;
and the authority management module is used for setting the authority of the operation and maintenance personnel, and the operation and maintenance personnel with different authorities have different functions on the usable label management module.
The recommendation subsystem is used for acquiring user data and media asset database data, cleaning the data and sending the data to the storage subsystem; the system is also used for extracting labels from the user data and the media asset database data and calculating the similarity of the extraction results by adopting a similarity algorithm model; the system is also used for generating recommended data according to the calculation result, sorting the recommended data by adopting a sorting algorithm and outputting a sorting result; specifically, the recommendation subsystem includes: the system comprises a data acquisition module, a feature extraction module, an algorithm library and an external data interface;
the data acquisition module is used for acquiring user data and media asset database data and cleaning the user data and the media asset database data; the cleaning comprises the following steps: judging whether the user data and the media asset database data accord with a preset range, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data, for example: age among the user data, some users can set up the age in disorder to the age of gathering is unreasonable, consequently sets up preset range age and is greater than 2 two years and is less than 100 years, thereby clears away the age data that do not belong to preset range, reduces unreasonable data, can reduce follow-up calculated amount, can improve the degree of accuracy of calculation again. In this embodiment, the data acquisition module adopts an ETL data warehouse technology, and the acquisition module is connected to the user side to acquire user data in the user side, and is connected to the media asset library to acquire data in the media asset library. Collecting user data and media asset database data, which can be divided into offline data collection and real-time data collection, if the current media intelligent recommendation system is offline, collecting the user data and the media asset database data stored in a storage subsystem, and if the current media intelligent recommendation system can perform network communication, collecting the user data in a user side and the media asset database data in a media asset database;
the external data interface includes: a real-time data stream interface and an offline data stream interface; the real-time data stream interface is used for transmitting real-time data; the off-line data flow interface is used for off-line data transmission.
The characteristic extraction module is used for extracting labels of the user data and the media asset database data so as to respectively obtain a characteristic vector of the user and a characteristic vector of the content;
the algorithm library is used for storing the similarity algorithm model and then sequencing the algorithm, and similarity calculation is carried out on the extracted result by adopting the similarity algorithm model to obtain a calculation result comprising a feature vector, a sequencing vector and a recall index; the method specifically comprises the following steps: calculating a similarity matrix between users according to the characteristic vectors of the users, and calculating a similarity matrix of the content of each media asset database data according to the characteristic vectors of the content; calculating Jacard similarity coefficients, included angle cosines, Euclidean distances and Manhattan distances of the user data and the media asset database data respectively, wherein the larger the similarity parameters are, the higher the similarity is; and the recommendation processing module is further configured to update the generated recommendation data according to the calculation result, select media asset database data with similarity higher than a preset similarity, or user data and media asset database data corresponding to the user data, as the recommendation data, and sort the recommendation data by using a sorting algorithm, where the sorting algorithm in this embodiment includes: LR, GBDT and Xoost; judging whether the current media intelligent recommendation system is in an off-line state, if not, pushing a sequencing result to a user side through a real-time data stream interface; if yes, sending the sequencing result to a storage subsystem for storage;
the storage subsystem adopts a database and is used for storing the cleaned user data and media asset database data as well as the marked user data and media asset database data; and the system is also used for storing the sequencing result and calling the recommended data according to the requirement of the user side and pushing the recommended data to the user side through the offline data stream interface.
The user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data; the media asset database data comprises IPTV data, mobile terminal television data and cache video data, and can also be set according to actual requirements.
Example two
This embodiment is substantially the same as the above embodiment except that: the recommendation subsystem further comprises: the system comprises a scene information acquisition module, a scene analysis module and a scene matching recommendation module;
the scene information acquisition module is used for acquiring scene information of the current environment of the user; wherein the context information includes but is not limited to: user side network information, environmental sound information, position information and audio output end information;
the user terminal network information refers to information representing the network speed of the user terminal;
the environmental sound information refers to the sound information of the environment where the user is currently located;
the position information refers to the position of the current environment of the user;
the audio output end information refers to the audio output end of the current user end, and the audio output end comprises: a speaker and an earphone;
the scene analysis module is used for analyzing the scene information and acquiring the scene requirement information of the user; the user scene demand information comprises: network state information, environment information, position information and user side playing state information;
the method specifically comprises the following steps: according to the network information of the user side, evaluating the current network condition of the user to obtain network state information; in this embodiment, it is determined whether the network information of the user side is greater than the preset network speed, and if so, the network status information is qualified; if not, the network state information is unqualified;
according to the environmental sound information, a convolutional neural network is adopted to identify the current environment of the user, and the environmental information of the user is obtained;
judging whether the current position of the user meets the preset quiet area condition or not according to the position information, and if so, marking the position information as a quiet area; if not, marking the position information as a non-quiet area; in the embodiment, the quiet area is in the position of school, hospital and cinema; or setting a quiet area condition according to actual requirements, and marking the position information as a quiet area if the current position belongs to an area in the quiet area condition;
judging audio output end information, and acquiring playing state information of a user end; specifically, whether the audio output end is a loudspeaker or an earphone is judged, and if the audio output end is the loudspeaker, the playing state information of the user side is played outwards; if the earphone is used, the playing state information of the user side is not played.
The scene matching recommendation module is used for recommending according to the user scene demand information, a preset content recommendation strategy and an advertisement playing strategy; the method specifically comprises the following steps: judging whether the network state information in the user scene demand information is qualified, if so, triggering a recommendation subsystem, and sending the environment information to an algorithm library in the recommendation subsystem, wherein when the algorithm library sorts the recommendation data by adopting a sorting algorithm, if the similarity of the recommendation data is the same, the matching degree of the recommendation data and the environment information is judged, and the recommendation data with higher matching degree is arranged in front, for example: the recommendation data comprises a plurality of phase sound videos, wherein two phase sound videos have the same phase recognizability, one phase sound video is related to the station, the other phase sound video is related to the school, the environment information is the station, and the phase sound videos with the subjects related to the station are arranged in front; the advertisement data included in each recommendation data is selected according to the environmental information, for example: and if the environment information is the station, acquiring advertisement data of the activity related to the station as advertisement data in the recommendation data, wherein the advertisement data of the activity related to the station include but are not limited to: a travel APP, a taxi-taking APP and a ticket-grabbing APP; therefore, the user experience is improved, and advertisement marketing is carried out according to the positive direction;
judging whether the mark in the position information is a quiet area, if so, judging whether the playing state information of the user side is played outwards, and if so, deleting the recommended data which does not contain the caption in the recommended data; because the user in the quiet area can turn off the sound or turn down the sound to the sound volume without affecting other people under the condition of no earphone, the user can not hear the sound, the recommended data without subtitles in the recommended data is deleted, and the user selects the recommended data to play, the recommended data contains the subtitles, so that the viewing experience of the user is not affected even if no sound exists;
if not, the storage subsystem is triggered to call the storage recommendation data and push the storage recommendation data to the user side through the offline data stream interface.
EXAMPLE III
This embodiment is substantially as shown in figure 2: the intelligent media recommendation method based on big data comprises the following steps:
collecting user data and media asset database data;
cleaning and marking the user data and the media asset database data; wherein wash, include: judging whether the user data and the media asset database data accord with a preset range or not, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data;
marking, comprising: judging whether the user data and the media asset database data meet preset label conditions, if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data; specifically, marking includes: the automatic marking is realized, the preset label condition is automatically set according to specific contents included by user data and media asset database data, then the automatic adding is carried out, and each user and each resource can be automatically marked according to the results of word segmentation, word frequency calculation and theme recognition of text contents, for example: user data includes user attribute data, and user attribute data includes the age, and then intelligent label sets up the submodule piece according to the age, and the automatic setting is preset the label condition: judging whether the age in the user data meets a preset label condition or not by using the age of more than or equal to 18 years as an adult and the age of less than 18 years as a minor, and if so, adding a corresponding adult label or minor label to the user data; if not, not adding a corresponding label for the user data; marking is set by a user, and the preset label condition can be set according to the actual use scene;
judging whether the label of the data meets the preset line label condition, if so, combining the labels to be set as line labels, for example: the user data comprises user attribute data, the user attribute data comprises age and gender, the labels of the user data comprise two types of labels with the age as a preset label condition and with the gender as a preset label condition, and the preset row label condition is as follows: the label comprises a child label and a female label, and the label combination is set as a girl label, so that a new label is formed.
Label extraction is carried out on the user data and the media asset database data, so that a characteristic vector of a user and a characteristic vector of content are obtained respectively;
performing similarity calculation on the extraction result by adopting a similarity calculation method model; the similarity calculation comprises a near line calculation and an off line calculation;
calculating a near line: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; the method specifically comprises the following steps: calculating a similarity matrix between users according to the characteristic vectors of the users, and calculating a similarity matrix of the content of each media asset database data according to the characteristic vectors of the content; calculating Jacard similarity coefficients, included angle cosines, Euclidean distances and Manhattan distances of the user data and the media asset database data respectively, wherein the larger the similarity parameters are, the higher the similarity is; and the recommendation processing module is further configured to update the generated recommendation data according to the calculation result, select media asset database data with similarity higher than a preset similarity, or user data and media asset database data corresponding to the user data, as the recommendation data, and sort the recommendation data by using a sorting algorithm, where the sorting algorithm in this embodiment includes: LR, GBDT, and Xoost. Updating the generated recommended data according to the calculation result of the near line calculation, sorting the recommended data by adopting a sorting algorithm, and pushing the sorting result to the user side;
and (3) off-line calculation: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; updating the generated recommended data according to the calculation result of the off-line calculation, sequencing the recommended data by adopting a sequencing algorithm, sending the recommended data to a database for storage, and calling the recommended data in the database according to the requirement of a user side and pushing the recommended data to the user side when the user side is off-line;
generating recommendation data according to the calculation result;
and sorting the recommended data by adopting a sorting algorithm, and outputting a sorting result.
The user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data; the media asset database data comprises IPTV data, mobile terminal television data and cache video data.
The intelligent media recommendation method based on big data can be stored in a storage medium if the intelligent media recommendation method is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow in the method according to the above embodiments may be implemented by a computer program, which may be stored in a readable storage medium and used by a processor to implement the steps of the above method embodiments. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
Example four
This embodiment is substantially the same as the above embodiment except that: further comprising:
acquiring scene information of the current environment of a user; wherein the context information includes but is not limited to: user side network information, environmental sound information, position information and audio output end information;
the user terminal network information refers to information representing the network speed of the user terminal;
the environmental sound information refers to the sound information of the environment where the user is currently located;
the position information refers to the position of the current environment of the user;
the audio output end information refers to the audio output end of the current user end, and the audio output end comprises: a speaker and an earphone;
analyzing the scene information to obtain the scene requirement information of the user; the user scene demand information comprises: network state information, environment information, position information and user side playing state information; the method specifically comprises the following steps: according to the network information of the user side, evaluating the current network condition of the user to obtain network state information; in this embodiment, it is determined whether the network information of the user side is greater than the preset network speed, and if so, the network status information is qualified; if not, the network state information is unqualified;
according to the environmental sound information, a convolutional neural network is adopted to identify the current environment of the user, and the environmental information of the user is obtained;
judging whether the current position of the user meets the preset quiet area condition or not according to the position information, and if so, marking the position information as a quiet area; if not, marking the position information as a non-quiet area; in the embodiment, the quiet area is in the position of school, hospital and cinema; or setting a quiet area condition according to actual requirements, and marking the position information as a quiet area if the current position belongs to an area in the quiet area condition;
judging audio output end information, and acquiring playing state information of a user end; specifically, whether the audio output end is a loudspeaker or an earphone is judged, and if the audio output end is the loudspeaker, the playing state information of the user side is played outwards; if the earphone is used, the playing state information of the user side is not played.
Recommending according to the user scene demand information, a preset content recommendation strategy and an advertisement playing strategy; the method specifically comprises the following steps: judging whether the network state information in the user scene demand information is qualified, if so, performing near line calculation, and meanwhile, when sorting the recommended data by adopting a sorting algorithm, if the similarity of the recommended data is the same, judging the matching degree of the recommended data and the environmental information, and arranging the recommended data with higher matching degree in front, for example: the recommendation data comprises a plurality of phase sound videos, wherein two phase sound videos have the same phase recognizability, one phase sound video is related to the station, the other phase sound video is related to the school, the environment information is the station, and the phase sound videos with the subjects related to the station are arranged in front; the advertisement data included in each recommendation data is selected according to the environmental information, for example: and if the environment information is the station, acquiring advertisement data of the activity related to the station as advertisement data in the recommendation data, wherein the advertisement data of the activity related to the station include but are not limited to: a travel APP, a taxi-taking APP and a ticket-grabbing APP; therefore, the user experience is improved, and advertisement marketing is carried out according to the positive direction;
judging whether the mark in the position information is a quiet area, if so, judging whether the playing state information of the user side is played outwards, and if so, deleting the recommended data which does not contain the caption in the recommended data; because the user in the quiet area can turn off the sound or turn down the sound to the sound volume without affecting other people under the condition of no earphone, the user can not hear the sound, the recommended data without subtitles in the recommended data is deleted, and the user selects the recommended data to play, the recommended data contains the subtitles, so that the viewing experience of the user is not affected even if no sound exists;
if not, performing off-line calculation.
The foregoing is merely an example of the present invention, and common general knowledge in the field of known specific structures and characteristics is not described herein in any greater extent than that known in the art at the filing date or prior to the priority date of the application, so that those skilled in the art can now appreciate that all of the above-described techniques in this field and have the ability to apply routine experimentation before this date can be combined with one or more of the present teachings to complete and implement the present invention, and that certain typical known structures or known methods do not pose any impediments to the implementation of the present invention by those skilled in the art. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the patent. The scope of the claims of the present application shall be determined by the contents of the claims, and the description of the embodiments and the like in the specification shall be used to explain the contents of the claims.

Claims (10)

1. The intelligent media recommendation system based on big data is characterized in that: the method comprises the following steps: the system comprises a tag subsystem, a recommendation subsystem and a storage subsystem;
the label subsystem is used for marking the user data and the media asset database data;
the recommendation subsystem is used for acquiring user data and media asset database data, cleaning the data and sending the data to the storage subsystem; the system is also used for extracting labels from the user data and the media asset database data and calculating the similarity of the extraction results by adopting a similarity algorithm model; the system is also used for generating recommended data according to the calculation result, sorting the recommended data by adopting a sorting algorithm and outputting a sorting result;
and the storage subsystem is used for storing the cleaned user data and media asset database data as well as the marked user data and media asset database data.
2. The big data based media intelligent recommendation system according to claim 1, wherein: the recommendation subsystem includes: a data acquisition module;
the data acquisition module is used for acquiring user data and media asset database data and cleaning the user data and the media asset database data; the cleaning comprises the following steps: judging whether the user data and the media asset database data accord with a preset range or not, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data;
the tag subsystem, comprising: a label setting module;
the label setting module is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data; and judging whether the labels of the data meet the preset row label condition, and if so, combining the labels to set the labels as row labels.
3. The big data based media intelligent recommendation system according to claim 2, wherein: the label setting module comprises an intelligent label setting submodule, a conditional label setting submodule and a combined label setting submodule;
the intelligent label setting submodule is used for carrying out automatic label setting on the user data and the media asset database data;
the condition label setting submodule is used for judging whether the user data and the media asset database data meet the preset label condition, and if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data;
the combined label setting submodule is used for judging whether the labels of the data meet the preset row label condition or not, and if so, combining the labels to set the labels as row labels;
the label subsystem further comprises a label management module and a permission management module;
the label management module is used for adding, deleting, modifying and viewing labels;
and the authority management module is used for setting the authority of the operation and maintenance personnel.
4. The big data based media intelligent recommendation system according to claim 3, wherein: the recommendation subsystem further comprises a feature extraction module, an algorithm library and an external data interface;
the external data interface includes: a real-time data stream interface and an offline data stream interface;
the characteristic extraction module is used for extracting labels of the user data and the media asset database data;
the algorithm library is used for storing a similarity algorithm model and a sorting algorithm, and similarity calculation is carried out on the extracted result by adopting the similarity algorithm model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; updating the generated recommended data according to the calculation result, and sorting the recommended data by adopting a sorting algorithm; judging whether the current media intelligent recommendation system is in an off-line state, if not, pushing a sequencing result to a user side through a real-time data stream interface; if yes, sending the sequencing result to a storage subsystem for storage;
and the storage subsystem is also used for storing the sequencing result and calling the recommended data according to the user side requirement and pushing the recommended data to the user side through the off-line data flow interface.
5. The big data based media intelligent recommendation system according to claim 1, wherein: the user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data;
the media asset database data comprises IPTV data, mobile terminal television data and cache video data.
6. The intelligent media recommendation method based on big data is characterized by comprising the following steps: the method comprises the following steps:
collecting user data and media asset database data;
cleaning and marking the user data and the media asset database data;
extracting labels of the user data and the media asset database data;
performing similarity calculation on the extraction result by adopting a similarity calculation method model;
generating recommendation data according to the calculation result;
and sorting the recommended data by adopting a sorting algorithm, and outputting a sorting result.
7. The intelligent big data-based media recommendation method according to claim 6, wherein: the cleaning comprises the following steps: judging whether the user data and the media asset database data accord with a preset range or not, if so, retaining the user data and the media asset database data, and if not, clearing the user data and the media asset database data;
the marking, comprising: judging whether the user data and the media asset database data meet preset label conditions, if so, adding corresponding labels for the user data and the media asset database data; if not, not adding corresponding labels for the user data and the media asset database data;
and judging whether the labels of the data meet the preset row label condition, and if so, combining the labels to set the labels as row labels.
8. The intelligent big data-based media recommendation method according to claim 7, wherein: the similarity calculation comprises a near line calculation and an off line calculation;
calculating a near line: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; updating the generated recommended data according to the calculation result of the near line calculation, sorting the recommended data by adopting a sorting algorithm, and pushing the sorting result to the user side;
and (3) off-line calculation: performing similarity calculation on the extracted result by adopting a similarity calculation method model to obtain a calculation result comprising a feature vector, a sorting vector and a recall index; and updating the generated recommended data according to the calculation result of the off-line calculation, sequencing the recommended data by adopting a sequencing algorithm, sending the data to a database for storage, and calling the recommended data in the database according to the requirement of the user side and pushing the data to the user side when the user side is off-line.
9. The intelligent big data-based media recommendation method according to claim 8, wherein: the user data includes: user attribute data and user behavior data; wherein the user attribute data includes: age and sex; the user behavior data includes: user playing data, user clicking data, user product and service purchasing data and user searching data;
the media asset database data comprises IPTV data, mobile terminal television data and cache video data.
10. The intelligent media recommendation storage medium based on big data, the storage medium storing thereon a computer program, characterized in that: the computer program when executed by a processor performs the steps of the method for intelligent big data based recommendation storage of media of any of claims 6 to 9.
CN202111591656.1A 2021-12-23 2021-12-23 Intelligent media recommendation system, method and storage medium based on big data Pending CN114254202A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111591656.1A CN114254202A (en) 2021-12-23 2021-12-23 Intelligent media recommendation system, method and storage medium based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111591656.1A CN114254202A (en) 2021-12-23 2021-12-23 Intelligent media recommendation system, method and storage medium based on big data

Publications (1)

Publication Number Publication Date
CN114254202A true CN114254202A (en) 2022-03-29

Family

ID=80797269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111591656.1A Pending CN114254202A (en) 2021-12-23 2021-12-23 Intelligent media recommendation system, method and storage medium based on big data

Country Status (1)

Country Link
CN (1) CN114254202A (en)

Similar Documents

Publication Publication Date Title
CN111209440B (en) Video playing method, device and storage medium
CN110364146B (en) Speech recognition method, speech recognition device, speech recognition apparatus, and storage medium
CN106339507B (en) Streaming Media information push method and device
CN108509465A (en) A kind of the recommendation method, apparatus and server of video data
KR20120088650A (en) Estimating and displaying social interest in time-based media
CN112052387B (en) Content recommendation method, device and computer readable storage medium
CN111444357A (en) Content information determination method and device, computer equipment and storage medium
CN111046225B (en) Audio resource processing method, device, equipment and storage medium
CN103052953A (en) Information processing device, method of processing information, and program
CN113254711B (en) Interactive image display method and device, computer equipment and storage medium
EP3340073A1 (en) Systems and methods for processing of user content interaction
CN112000889A (en) Information gathering and presenting system
CN111026906A (en) Recommendation system for streaming listening audio content in vehicle-mounted scene
Oosterhuis et al. Semantic video trailers
CN109451334B (en) User portrait generation processing method and device and electronic equipment
CN117440182B (en) Intelligent recommendation method and system based on video content analysis and user labels
CN111488813A (en) Video emotion marking method and device, electronic equipment and storage medium
CN116051192A (en) Method and device for processing data
CN110516086B (en) Method for automatically acquiring movie label based on deep neural network
CN109800326B (en) Video processing method, device, equipment and storage medium
CN113407772A (en) Video recommendation model generation method, video recommendation method and device
CN114254202A (en) Intelligent media recommendation system, method and storage medium based on big data
KR102526263B1 (en) Method and System for Auto Multiple Image Captioning
CN110717100B (en) Context perception recommendation method based on Gaussian embedded representation technology
CN115130453A (en) Interactive information generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination