CN111026906A - Recommendation system for streaming listening audio content in vehicle-mounted scene - Google Patents

Recommendation system for streaming listening audio content in vehicle-mounted scene Download PDF

Info

Publication number
CN111026906A
CN111026906A CN201911235384.4A CN201911235384A CN111026906A CN 111026906 A CN111026906 A CN 111026906A CN 201911235384 A CN201911235384 A CN 201911235384A CN 111026906 A CN111026906 A CN 111026906A
Authority
CN
China
Prior art keywords
user
content
data
subsystem
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911235384.4A
Other languages
Chinese (zh)
Other versions
CN111026906B (en
Inventor
俞清木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongguang Intelligent Connected Vehicle Digital Media Shanghai Co ltd
Original Assignee
Internet Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Internet Beijing Technology Co Ltd filed Critical Internet Beijing Technology Co Ltd
Priority to CN201911235384.4A priority Critical patent/CN111026906B/en
Publication of CN111026906A publication Critical patent/CN111026906A/en
Application granted granted Critical
Publication of CN111026906B publication Critical patent/CN111026906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/637Administration of user profiles, e.g. generation, initialization, adaptation or distribution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation system for streaming listening audio content in a vehicle-mounted scene, which comprises a real-time data collection subsystem, an off-line model training subsystem and an on-line content delivery subsystem; the real-time data collection subsystem collects relevant information, the relevant information is recorded into the storage system, the offline model training subsystem calculates offline model data according to original data recorded into the storage system, and finally the online content delivery subsystem delivers the offline model data. The recommendation system for the streaming listening audio content in the vehicle-mounted scene solves the problem that active behavior data of a user are sparse in the vehicle-mounted scene; the radio station mode is adopted, and the influence on a driver is reduced by adopting the streaming listening; and moreover, the automobile information and the scene information are fused, so that the recommended audio content is more in line with the vehicle-mounted characteristics.

Description

Recommendation system for streaming listening audio content in vehicle-mounted scene
Technical Field
The invention relates to a system for providing personalized uninterrupted audio content for different users in a vehicle-mounted scene, in particular to a recommendation system for streaming audio listening content in the vehicle-mounted scene.
Background
With the rapid development of the internet, information overload is becoming more serious, and a recommendation system is one of important means for solving the problems. The current recommendation technology is basically a product serving a strong interaction mode of a mobile phone and a PC, and a vehicle-mounted scene has unique characteristics, so that the current recommendation technology has many problems:
1. on the mobile phone and the PC, the user is focused, and actively carries out explicit or implicit feedback on the recommendation result, such as scoring, praise, click playing and the like. In a vehicle-mounted scene, the user is attentive to driving and is listening to content in an accompanying manner, and user behavior data are sparse.
2. The existing recommendation technology is mainly used for products in an on-demand mode, and in a vehicle-mounted scene, streaming continuous listening is needed, so that the influence on driving is reduced.
3. Existing recommendation techniques are based on user information and behavioral data. In the vehicle-mounted scene, vehicle information (such as road conditions and vehicle speed) and scene information (such as commute, travel, midnight long distance) need to be fused.
Disclosure of Invention
The invention aims to provide a system for providing personalized uninterrupted audio content for different users in a vehicle-mounted scene, which can solve the existing problems and solve the problem of sparse active behavior data of the users in the vehicle-mounted scene by utilizing big data and expert knowledge.
In order to achieve the purpose, the invention provides a recommendation system for streaming listening audio content in a vehicle-mounted scene, which is used by matching with a client, a server, a local file system and a storage system, wherein the recommendation system comprises a real-time data collection subsystem, an offline model training subsystem and an online content delivery subsystem; the off-line model training subsystem calculates off-line model data according to original data recorded into the storage system, and finally the on-line content delivery subsystem delivers the off-line model data; the related information comprises user behavior data, automobile information and scene information.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is characterized in that the operation process of the offline model training subsystem comprises two links of candidate set generation and candidate set sorting, the candidate set generation is divided into user active behavior and offline model calculation, and the candidate set sorting is used for calculating the preference degree of a user to a candidate set.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is characterized in that the user active behavior is that the user actively fills favorite content tags in a corresponding product form, and the favorite content tags comprise a customized play list and interest selection; displaying the customized playlist on a product interface, and defining the content of the playlist by a user, wherein the content of the playlist is based on content classification, content labels and content keywords; the interest selection is an interface activated by a user in registration and selects the interested content tags of the user.
In the recommendation system for the streaming listening audio content in the vehicle-mounted scene, the offline model calculation is that the offline model training subsystem analyzes data through an algorithm, so as to obtain a content label that a user likes, wherein the data comprises user information, user behavior, automobile information and scene information; the off-line model calculation is composed of four parts of drama pursuit, user portrait, user attribute recommendation and popular content.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is characterized in that the episode tracing is performed by analyzing a user listening history record stored in the storage system by the offline model training subsystem, and the process is as follows: grouping according to the unique mark of each user, reserving the listening records of the programs of the continuous listening type, reserving the listened program lists of the last three months according to the time reverse order, inquiring the next content of each program, and finally storing the result.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is characterized in that the user portrait is obtained by first obtaining user behavior data and audio information, then associating the two types of data according to unique audio marks by an offline model training subsystem, then grouping according to each user, calculating the user portrait of each user, and calculating the label weight of each user through the user label weight, namely the behavior type weight, time attenuation, namely TF-IDF, and obtaining the behavior times of the label weight on each user; the user behavior data comprises audio listening duration, subscription, click on a playlist list, search click, album on demand, next and negative feedback; the audio information comprises duration, an album to which the audio information belongs, a label of the album, and a category to which the audio information belongs; the formula for the user portrait tag weight is: norm (W)behavior*FtC TF IDF), wherein the behavior type weight Wbehavior{ subscription: 5, the playlist clicks: 1.4R, search: 1.3R, album on demand: 1.2R, next: 1R, negative feedback: 0.1}, album end rate R ═ Σ PlayTimeaudio/∑Durationaudio(ii) a Time decay Ft=max(1,1*e-0.8*max(0,(now-playtime)/(24*3600))) Now is the current time, and playtime is the time of behavior occurrence in ms; the behavior times C are calculated by the day and are times of the same behavior type aiming at the same album; importance of labels
Figure BDA0002304745260000031
The numerator of the TF calculation formula represents the number of times a certain tag appears on a user, the denominator represents the total number of user tags, the numerator of the power of the IDF calculation formula represents the total number of users, and the denominator represents the number of users +1 containing a certain tag.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is configured, where the user attribute recommendation is based on the collected attributes of the seed users and the information of the customized playlist, and the operation experience, and the off-line model training subsystem calculates the preference degrees of the users with different attributes to the playlist content, and performs the recommendation by using the following formula:
Figure BDA0002304745260000041
the method comprises the steps of calculating the relative probability of a user liking a label l by knowing user attributes u1, u2, … … and un, calculating the relative probability of the user liking the label l by N and N respectively being the total number of data and the frequency of the label l being liked by Ni and Ni respectively being the total number of data under the attribute i and the frequency of the label l being liked by Ni, similarly to tf-idf, wherein the first item is a penalty item, the higher the label heat is, the lower the value is (idf), the second item is the sum of conditional probabilities, the higher the label occurrence probability under the attribute is, the higher the value is (tf), the (N- α) is a penalty item coefficient, α is regarded as 1 by default, the recommendation interval 0 is not less than α and not more than 1, β is regarded as the weight of weakening a hot door label in each attribute, the penalty is regarded as 1 by default, the personality recommendation interval 1 is not less than β and not more than 2, the α value is larger, the smaller the popularity is the smaller, the popularity is the larger, the;the bigger the β value is, the stronger the heat weakening is, the personalized the score is, and the smaller the β value is, the weaker the heat weakening is, the popular the score is.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene is configured, where the popular content is behavior data obtained by counting clicks of an album of the user, and the offline model training subsystem calculates the importance of each content classification in each hour and performs the following steps:
Figure BDA0002304745260000042
the numerator of the TF calculation formula represents the number of times that a certain content classification appears in a certain hour, and the denominator represents the total number of the content classification in the hour; the numerator of the power of the IDF calculation represents the total hours of the day, 24, and the denominator represents the number of hours +1 containing the content classification.
In the recommendation system for the streaming listening audio content in the vehicle-mounted scene, the candidate set is ranked by using the user portrait at the initial stage of less forward feedback behavior of the user through the offline model training subsystem, and the obtained content label weight is used as the basis of overall ranking, and the click rate pre-estimation model can be used for automatically learning the proportion and the final ranking of the candidate set at the later stage along with the increase of forward feedback data.
In the recommendation system for the streaming listening audio content in the vehicle-mounted scene, the online content delivery subsystem performs online content delivery according to the calculation result of the offline model training subsystem, and the online content delivery is divided into two links of recall and sequencing; the recalling is to obtain various candidate sets calculated by an offline model of an offline model training subsystem from a storage system, and then calculate the ratio of each candidate set according to the obtained offline data statistics; the sequencing is to obtain the related information of the current user and the intermediate data of off-line calculation, extract the characteristics, calculate the content sequencing most likely to be liked by the user through a model, and deliver the final result.
The recommendation system for the streaming listening audio content in the vehicle-mounted scene has the following advantages:
1. the system adopts the streaming listening, reduces excessive interactive operation of a driver in the driving process, and further reduces the risk of traffic accidents.
2. The problem that the active behavior data of a user is sparse in the streaming listening in a vehicle-mounted scene is solved. On the product, a user is guided to customize a playlist, a favorite content label is selected during registration, and then user behavior data are collected in a multi-dimensional mode by combining subscription, click behavior, searching, negative feedback and the like. In the algorithm, user attributes of the seed users are collected, a playlist is customized, a model is established, and the preference degrees of the user attributes and the content labels are calculated, so that user attribute recommendation is realized; and calculating the importance of content classification according to the hour dimension, and realizing hot recommendation.
3. The forward feedback data in the early stage is sparse, and the weight of the content label of the user portrait can be adopted as the standard of result sorting. When the magnitude of the generated forward feedback reaches a certain program (usually about 10 times of the characteristic magnitude) for the content recommended by the system by the user, a supervised learning model, namely click rate estimation, can be adopted to optimize the sequencing of the recommendation results.
4. In the aspect of algorithm modeling, besides information related to users and contents, information of automobiles and scenes is fused, so that the recommended contents are more suitable for vehicle-mounted scenes.
Drawings
Fig. 1 is a schematic architecture diagram of a recommendation system for streaming listening to audio content in a vehicle-mounted scene according to the present invention.
Fig. 2 is a model diagram of click through rate pre-estimation ranking of the recommendation system for streaming audio content listening in a vehicle-mounted scene according to the present invention.
Fig. 3 is a recall flowchart of the recommendation system for streaming audio content in an in-vehicle scenario of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings.
The invention provides a recommendation system for streaming listening audio content in a vehicle-mounted scene, which is used by matching with a client, a server, a local file system and a storage system. The storage system comprises a distributed cache subsystem, an inverted index subsystem, a relational database subsystem and a distributed file subsystem. The recommendation system also depends on a middleware service system, and the middleware service system comprises an asynchronous communication subsystem based on an Actor model, a distributed real-time processing subsystem, a distributed computing subsystem and a real-time log collecting subsystem. As shown in fig. 1.
The real-time data collection subsystem collects relevant information through a client program, reports the relevant information to the http web server, records the relevant information into a local file system through the http web server, performs operations such as information completion, splitting, cleaning and the like through the real-time log collection subsystem, and then records the information into a distributed file subsystem of the storage system, the offline model training subsystem calculates offline model data according to the original data recorded into the storage system, and finally the online content delivery subsystem delivers the offline model data; the related information includes user behavior data, car information, scene information, and the like.
The operation process of the off-line model training subsystem comprises two important links of candidate set generation and candidate set sequencing, the candidate set generation is divided into user active behavior and off-line model calculation, and the candidate set sequencing is used for calculating the preference degree of a user to a candidate set.
The user active behavior is that the user actively fills in favorite content labels through corresponding product forms, and the favorite content labels comprise customized playing lists and interest selections; displaying the customized playlist on a product interface, and defining the content of the playlist by a user, wherein the content of the playlist is based on content classification, content labels and content keywords; the interest selection is an interface activated by a user in registration and selects the interested content tags of the user.
The off-line model calculation is that the off-line model training subsystem analyzes data through an algorithm so as to obtain a content label which a user likes, wherein the data comprises user information, user behaviors, automobile information, scene information and the like; the off-line model calculation is composed of four parts of drama pursuit, user portrait, user attribute recommendation and popular content.
The episode tracing is implemented by analyzing a user listening history record stored in a storage system by using a distributed computing subsystem and an offline model training subsystem, and the process is as follows: grouping according to the unique mark of each user, reserving the listening records (such as novels) of the programs of the continuous listening type, reserving the listened program lists of the last three months according to the time reverse order, inquiring the next content of each program, and finally storing the calculated result in the reverse index subsystem for storage.
The user portrait is that firstly user behavior data is obtained from a distributed file subsystem, audio information is obtained from a relational database subsystem, then the two types of data are associated according to unique audio marks by an offline model training subsystem, then grouping is carried out according to each user, the user portrait of each user is calculated, and the label weight on each user is calculated through the user label weight, namely the behavior type weight, time attenuation, namely TF-IDF; the user behavior data comprises audio listening duration, subscription, click on a playlist list, search click, album on demand, next, negative feedback and the like; the audio information comprises duration, an album to which the audio information belongs, a label of the album, a category to which the audio information belongs, and the like; the formula for the user portrait tag weight is: norm (W)behavior*FtC TF IDF), wherein the behavior type weight Wbehavior{ subscription: 5, the playlist clicks: 1.4R, search: 1.3R, album on demand: 1.2R, next: 1R, negative feedback: 0.1}, album end rate R ═ Σ PlayTimeaudio/∑DurationaudioTime decay Ft=max(1,1*e-0.8*max(0,(now-playtime)/(24*3600))) Now is the current time, and playtime is the time of behavior occurrence in ms; the behavior times C are calculated by the day and are times of the same behavior type aiming at the same album; importance of labels
Figure BDA0002304745260000081
The numerator of TF calculation formula represents the number of times a certain label appears on a user, the denominator represents the total number of user labels, the numerator of the power of IDF calculation formula represents the total number of users, and the denominatorIndicating the number of users +1 who contain a certain tag.
The user attribute recommendation is based on the collected attributes of the seed users and the information of the customized playlist, for example, through WeChat applet collection and operation experience, the off-line model training subsystem calculates the preference degrees of the users with different attributes to the playlist content, and the recommendation is performed through the following formula:
Figure BDA0002304745260000082
namely known user attributes u1, u2, … …, un, calculating the relative probability that the user likes tag 1; the process is as follows:
Figure BDA0002304745260000083
independent line hypothesis
P(u1u2…un|l=1)=P(u1|l=1)P(u2|l=1)…P(un|l=1)
Bayesian formula
Figure BDA0002304745260000091
To obtain
Figure BDA0002304745260000092
Setting up
P(l=1)=p,P(l=0)=1-p,P(l=1|ui)=qi,P(l=0|ui)=1-qi
Figure BDA0002304745260000093
Finally obtain
Figure BDA0002304745260000094
N and N are respectively the total number of data, the frequency that a label 1 is liked is high, Ni and Ni are respectively the total number of data under an attribute i, the frequency that the label 1 is liked is high, similarly to tf-idf, a first item is a punishment item, the higher the heat degree of the label is, the lower the value is (idf), a second item is the summation of conditional probabilities, the higher the occurrence probability of the label under the attribute is, the higher the value is (tf), the (N- α) is a punishment item coefficient, α defaults to be 1 (not punishment), a recommendation interval 0 is not less than α and not less than 1, β is the weight for weakening the hot label in each attribute, defaults to be 1 (not weakening), the recommendation interval 1 is not less than β and not more than 2, the larger the heat degree is, the larger the punishment is made, the smaller the larger the punishment is made for the α, the larger is made for the individualization is made, the value is the β, the stronger the weakening for the heat degree is made individualization, and the.
The popular content is behavior data of counting user album clicks, the importance of each content classification in each hour is calculated by the offline model training subsystem, and the method is carried out by the following formula:
Figure BDA0002304745260000095
the numerator of the TF calculation formula represents the number of times that a certain content classification appears in a certain hour, and the denominator represents the total number of the content classification in the hour; the numerator of the power of the IDF calculation represents the total hours of the day, 24, and the denominator represents the number of hours +1 containing the content classification.
And candidate set sorting is realized by an offline model training subsystem, at the initial stage of less forward feedback behaviors of a user, the user portrait can be utilized, the obtained content label weight is used as the basis of overall sorting, and the click rate pre-estimation model can be used at the later stage along with the increase of forward feedback data to automatically learn the proportion and final sorting of the candidate sets. The process of ranking the click rate estimation model comprises the following steps: the method comprises the steps of collecting user behavior data and service content data, extracting features including scene features, automobile features, user features, content features and the like by an offline model training subsystem, discretizing the features, thermally coding the features, writing the features into a storage system, simultaneously using logistic regression training data, adding behavior data of recommended results, obtaining model data, writing the model data into the storage system, reading the features and the model data from the storage system, calculating click rates of recommended candidate results in real time, and finally sorting the recommended results according to the click rates. As shown in fig. 2.
The online content delivery subsystem builds high-performance and high-availability distributed application based on an Actor model asynchronous communication subsystem. The online content delivery subsystem carries out online content delivery according to the calculation result of the offline model training subsystem, and the whole online content delivery subsystem is divided into two links of recall and sequencing; the recalling is to obtain various candidate sets calculated by an offline model of the offline model training subsystem from a distributed cache subsystem, an inverted index subsystem and a relational database subsystem of the storage system, and then calculate the ratio of each candidate set according to the obtained offline data statistics; the specific process comprises the following steps: firstly, accessing a user-defined play list by a user, entering user drama chase if the user-defined play list exists, wherein the percentage of the drama chase is not more than 50% according to the time reverse order, and then combining the drama chase with the user-defined play list; if not, switching to other strategies, determining the weight of the self-selected content label, the user drama chase, the user portrait, the user attribute and the default playlist to wait for the selection of the set, setting the initialized weight as the self-selected content label 4, the user drama chase 2, the user portrait 2, the user attribute 1, the default playlist 1 and the like, setting the weight of each candidate set comprehensively and manually according to the self-selected content label, the user drama chase, the user portrait, the user attribute, the default playlist and the like, and immediately taking effect after the weight is changed. As shown in fig. 3. The sorting is to obtain the related information of the current user and the intermediate data of off-line calculation from a distributed real-time processing subsystem, a distributed cache subsystem, an inverted index subsystem and a relational database subsystem, extract the characteristics, calculate the most likely favorite content sorting of the user through a model and put in the final result.
The following describes the recommendation system for streaming audio content listening in a car scene according to the present invention with reference to the following embodiments.
Example 1
A recommendation system for streaming audio content listening in a vehicle-mounted scene is matched with a client, a server, a local file system and a storage system for use. The recommendation system comprises a real-time data collection subsystem, an offline model training subsystem and an online content delivery subsystem.
1. A real-time collection subsystem. And the client collects the audio playing behavior data and reports the data to the nginx web server. And collecting and gathering by a log collecting subsystem flume, supplementing album information, and storing the album information to the distributed storage subsystem hdfs according to time. Nginx (engine x) is a high performance HTTP and reverse proxy web server, while also providing IMAP/POP3/SMTP services. The flash is a highly available, highly reliable and distributed system for collecting, aggregating and transmitting mass logs provided by Cloudera. Hdfs (Hadoop Distributed File System) refers to a Distributed File System (Distributed File System) designed to fit on general purpose hardware (comfort hardware).
2. And an off-line model training subsystem.
(1) And (5) tracing the drama.
And writing a distributed computing program MapReduce. Reading the user listening records of the last three months by the map of the task 1, and reserving the records classified as novel; grouping according to the unique user identifier and providing the grouping for the reduce; reduce descending the data by time, keeping the listening record of the latest time of each album. The task 2 reads the data of the task 1 and adds all the audio information of the album; map is grouped according to the unique mark of the album and provided for reduce; reduce calculates the next set of audio content in the listening history. The task 3map reads the data of the task 2 and groups the data according to the unique mark of the user; reduce stores the grouped data into the inverted indexing subsystem elastic search. MapReduce is a programming model for parallel operation of large-scale data sets (greater than 1 TB). An elastic search is a Lucene-based search server.
(2) User portrait calculation.
Firstly, cleaning original data: the playing end event data is associated with the audio information in the service library through the audio unique mark; combining the new data and the historical data through a unique user mark; calculating the attenuation weight of each audio playing time length according to the unique user mark and the unique album mark of the merged data, and accumulating; and finally, arranging the attenuation weights in a descending order.
Calculating a user label: and acquiring a label blacklist and album information (including content classification and labels), filtering the blacklist of the album labels, and rejecting albums containing the blacklist. And (4) performing association and combination on the data cleaned in the previous step according to the unique album mark. The decay weights are accumulated for each album label under each user, and then the final weight is calculated by a normalization formula.
(3) Hot recommendations, i.e., user attribute recommendations and hot content.
And collecting the data of the album clicks of the users, and counting the number of the album clicks of each category in each hour, dividing the number of all categories in the hour by the quotient to be tf. The logarithm of the base-10 quotient is calculated as idf by dividing 24 by the sum of the hours of occurrence +1 for each class. tf is multiplied by idf as the importance of a certain classification at a certain hour. Then, each category is subdivided according to the entertainment, knowledge, life and information modes, and the importance of each large category in each hour is calculated. The data is saved in the inverted indexing subsystem elastic search. And recalling the content recommended by the online release subsystem every hour, recalling the major category with the highest importance, and then performing normalization processing according to the classification importance to match, so that the recall rate is improved.
3. And an online content delivery subsystem. The user requests a subsystem service interface, a user unique mark uid is transmitted, and the system acquires a user-defined play list, a drama, a self-selected content tag, a user portrait and user attributes according to the uid. If the user-defined play list exists, acquiring related albums from the inverted indexing subsystem Elasticissearch through album labels stored in the play list, and forming a candidate set by combining with the episode. If the user-defined playlist is not included, an album label which the user likes is obtained through the user attribute and the user attribute recommendation model, a self-selected content label, the user portrait and the hot label are added, and a related album is obtained from the elastic search to form a candidate set. And carrying out quantity distribution on each candidate set according to respective weight. And finally, sorting according to the label weight of the user portrait and recommending.
The invention provides a recommendation system for streaming listening audio content in a vehicle-mounted scene, which is a system and a method for providing personalized uninterrupted audio content for different users in the vehicle-mounted scene, and solves the problem of sparse active behavior data of the users in the vehicle-mounted scene by utilizing big data and expert knowledge. And the radio station mode is adopted, and the influence on a driver is reduced by adopting the streaming listening. And moreover, the automobile information and the scene information are fused, so that the recommended audio content is more in line with the vehicle-mounted characteristics.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (10)

1. A recommendation system for streaming listening audio content in a vehicle-mounted scene is matched with a client, a server, a local file system and a storage system for use, and is characterized in that the recommendation system comprises a real-time data collection subsystem, an offline model training subsystem and an online content delivery subsystem; the off-line model training subsystem calculates off-line model data according to original data recorded into the storage system, and finally the on-line content delivery subsystem delivers the off-line model data; the related information comprises user behavior data, automobile information and scene information.
2. The recommendation system for audio content streamed according to claim 1, wherein the off-line training subsystem comprises two links of candidate set generation and candidate set ranking, the candidate set generation is divided into active behavior of the user and off-line model calculation, and the candidate set ranking is used to calculate the user's likeness to the candidate set.
3. The recommendation system for audio content streamed listened to under vehicular scenario as claimed in claim 2, wherein the user active action is that the user actively fills in favorite content tags including customized play list and interest selection through corresponding product form; displaying the customized playlist on a product interface, and defining the content of the playlist by a user, wherein the content of the playlist is based on content classification, content labels and content keywords; the interest selection is an interface activated by a user in registration and selects the interested content tags of the user.
4. The recommendation system for audio content streamed listened to under vehicle scene as recited in claim 2, wherein the off-line model calculation is that the off-line model training subsystem analyzes the data by algorithm to obtain the favorite content label of the user, the data comprises user information, user behavior, car information and scene information; the off-line model calculation is composed of four parts of drama pursuit, user portrait, user attribute recommendation and popular content.
5. The recommendation system for streaming audio content for listening in an on-vehicle scenario as recited in claim 4, wherein said episode is analyzed by an offline model training subsystem with respect to a user listening history stored in a storage system by: grouping according to the unique mark of each user, reserving the listening records of the programs of the continuous listening type, reserving the listened program lists of the last three months according to the time reverse order, inquiring the next content of each program, and finally storing the result.
6. The system of claim 4, wherein the user representation is generated by first obtaining user behavior data and audio information, associating the two types of data according to a unique audio label by the offline model training subsystem, grouping the two types of data according to each user, calculating a user representation for each user, and passing the user representation through the userThe label weight is the behavior type weight and time attenuation TF-IDF, and the label weight on each user and the behavior times are calculated; the user behavior data comprises audio listening duration, subscription, click on a playlist list, search click, album on demand, next and negative feedback; the audio information comprises duration, an album to which the audio information belongs, a label of the album, and a category to which the audio information belongs; the formula for the user portrait tag weight is: norm (W)behavior*FtC TF IDF), wherein the behavior type weight Wbehavior{ subscription: 5, the playlist clicks: 1.4R, search: 1.3R, album on demand: 1.2R, next: 1R, negative feedback: 0.1}, album end rate R ═ Σ PlayTimeaudio/∑Durationaudio(ii) a Time decay Ft=max(1,1*e-0.8*max(0,(now-playtime)/(24*3600))) Now is the current time, and playtime is the time of behavior occurrence in ms; the behavior times C are calculated by the day and are times of the same behavior type aiming at the same album; importance of labels
Figure FDA0002304745250000031
The numerator of the TF calculation formula represents the number of times a certain tag appears on a user, the denominator represents the total number of user tags, the numerator of the power of the IDF calculation formula represents the total number of users, and the denominator represents the number of users +1 containing a certain tag.
7. The recommendation system for audio content streamed according to claim 4, wherein the user attribute recommendation is based on the collected attributes of the seed user and the information of the customized playlist, and the operation experience, and the off-line model training subsystem calculates the preference degree of the users with different attributes for the playlist content, and the preference degree is calculated according to the following formula:
Figure FDA0002304745250000032
Figure FDA0002304745250000033
i.e., known user attributes u1, u2, … …, un, calculate the relative preference of the user for tab lThe probability N and N are the total data number and the frequency of the label l being liked, Ni and Ni are the total data number under the attribute i and the frequency of the label l being liked, similarly to tf-idf, the first item is a punishment item, the higher the heat degree of the label is, the lower the value is (idf), the second item is the summation of conditional probabilities, the higher the occurrence probability of the label under the attribute is, the higher the value is, (tf) is, (N- α) is a punishment item coefficient, α defaults to 1 (no punishment), the recommendation interval 0 is not more than α and not more than 1, β is the weight of the hot label in each attribute, defaults to 1 (no defaultation), the recommendation interval 1 is not more than β and not more than 2, the larger the α value is, the punishment is more popular, the α value is smaller, the personality is larger, the greater the heat punishment is, the greater the value is the greater the weakening is the personality, the greater the value is the greater the β value is the weakening of the popularity, and the personality is the.
8. The recommendation system for audio content streamed listened to in vehicular scenario as claimed in claim 4, wherein the popular content is behavior data of statistical user album click, then the off-line model training subsystem calculates the importance of each content classification per hour by the following formula:
Figure FDA0002304745250000041
the numerator of the TF calculation formula represents the number of times that a certain content classification appears in a certain hour, and the denominator represents the total number of the content classification in the hour; the numerator of the power of the IDF calculation represents the total hours of the day, 24, and the denominator represents the number of hours +1 containing the content classification.
9. The recommendation system for audio content streamed according to claim 2, wherein the candidate set ranking is performed by an offline model training subsystem, at an early stage of a low forward feedback behavior of the user, using the user profile, using the obtained content tag weight as the basis of the overall ranking, and at a later stage, with the increase of the forward feedback data, using a click rate estimation model to automatically learn the proportion and the final ranking of the candidate set.
10. The recommendation system for audio content streamed according to claim 1, wherein the online content delivery subsystem performs online content delivery according to the calculation result of the offline model training subsystem, and the online content delivery is divided into two links of recall and sorting; the recalling is to obtain various candidate sets calculated by an offline model of an offline model training subsystem from a storage system, and then calculate the ratio of each candidate set according to the obtained offline data statistics; the sequencing is to obtain the related information of the current user and the intermediate data of off-line calculation, extract the characteristics, calculate the content sequencing most likely to be liked by the user through a model, and deliver the final result.
CN201911235384.4A 2019-12-05 2019-12-05 Recommendation system for streaming audio content listening in vehicle-mounted scene Active CN111026906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911235384.4A CN111026906B (en) 2019-12-05 2019-12-05 Recommendation system for streaming audio content listening in vehicle-mounted scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911235384.4A CN111026906B (en) 2019-12-05 2019-12-05 Recommendation system for streaming audio content listening in vehicle-mounted scene

Publications (2)

Publication Number Publication Date
CN111026906A true CN111026906A (en) 2020-04-17
CN111026906B CN111026906B (en) 2023-12-08

Family

ID=70207681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911235384.4A Active CN111026906B (en) 2019-12-05 2019-12-05 Recommendation system for streaming audio content listening in vehicle-mounted scene

Country Status (1)

Country Link
CN (1) CN111026906B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723234A (en) * 2020-06-15 2020-09-29 中国第一汽车股份有限公司 Audio providing method, device, equipment and storage medium
CN111767430A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Video resource pushing method, video resource pushing device and storage medium
CN113535700A (en) * 2021-07-19 2021-10-22 福建凯米网络科技有限公司 User information updating method for digital audio-visual place and computer readable storage medium
CN113626539A (en) * 2021-08-13 2021-11-09 深圳墨世科技有限公司 User behavior data statistical method, server and client

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160371589A1 (en) * 2015-06-17 2016-12-22 Yahoo! Inc. Systems and methods for online content recommendation
CN106326277A (en) * 2015-06-30 2017-01-11 上海证大喜马拉雅网络科技有限公司 User behavior-based personalized audio recommendation method and system
CN106953887A (en) * 2017-01-05 2017-07-14 北京中瑞鸿程科技开发有限公司 A kind of personalized Organisation recommendations method of fine granularity radio station audio content
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160371589A1 (en) * 2015-06-17 2016-12-22 Yahoo! Inc. Systems and methods for online content recommendation
CN106326277A (en) * 2015-06-30 2017-01-11 上海证大喜马拉雅网络科技有限公司 User behavior-based personalized audio recommendation method and system
CN106953887A (en) * 2017-01-05 2017-07-14 北京中瑞鸿程科技开发有限公司 A kind of personalized Organisation recommendations method of fine granularity radio station audio content
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723234A (en) * 2020-06-15 2020-09-29 中国第一汽车股份有限公司 Audio providing method, device, equipment and storage medium
CN111767430A (en) * 2020-06-30 2020-10-13 平安国际智慧城市科技股份有限公司 Video resource pushing method, video resource pushing device and storage medium
CN113535700A (en) * 2021-07-19 2021-10-22 福建凯米网络科技有限公司 User information updating method for digital audio-visual place and computer readable storage medium
CN113626539A (en) * 2021-08-13 2021-11-09 深圳墨世科技有限公司 User behavior data statistical method, server and client

Also Published As

Publication number Publication date
CN111026906B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
US10579646B2 (en) Systems and methods for classifying electronic documents
CN111026906B (en) Recommendation system for streaming audio content listening in vehicle-mounted scene
US11570512B2 (en) Watch-time clustering for video searches
US11989213B2 (en) Character based media analytics
US9703783B2 (en) Customized news stream utilizing dwelltime-based machine learning
CN108875022B (en) Video recommendation method and device
US7921069B2 (en) Granular data for behavioral targeting using predictive models
US9706008B2 (en) Method and system for efficient matching of user profiles with audience segments
US20130262966A1 (en) Digital content reordering method and digital content aggregator
US9760907B2 (en) Granular data for behavioral targeting
US20210311955A1 (en) Optimizing digital video distribution
CA2610038A1 (en) Providing community-based media item ratings to users
CN110717093B (en) Movie recommendation system and method based on Spark
CN102693252A (en) System and method for effectively providing entertainment recommendations to device users
US11494811B1 (en) Artificial intelligence prediction of high-value social media audience behavior for marketing campaigns
WO2010127150A2 (en) Targeting advertisements to videos predicted to develop a large audience
CN117540093A (en) User behavior analysis method and system based on big data
CN113852864A (en) User customized service recommendation method and system for IPTV terminal application
EP3114846B1 (en) Character based media analytics
CN113836422B (en) Information searching method and device
WO2014014473A1 (en) Method and system for predicting association item affinities using second order user item associations
CN117993978A (en) Internet big data information processing system
CN114254202A (en) Intelligent media recommendation system, method and storage medium based on big data
CN114282103A (en) Intelligent recommendation engine system based on user behavior habit algorithm
Langer et al. Content-based Recommendations for Radio Stations with Deep Learned Audio Fingerprints

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210616

Address after: Room 316, building 4, 2 Fuxingmenwai street, Xicheng District, Beijing

Applicant after: CCTV new media culture media (Beijing) Co.,Ltd.

Address before: 100089 0900, 9th floor, No.65, North Fourth Ring Road West, Haidian District, Beijing

Applicant before: Internet (Beijing) Technology Co.,Ltd.

TA01 Transfer of patent application right
CB02 Change of applicant information

Address after: Room 168, Floor 1, Building 3, No. 20 Yong'an Road, Shilong Economic Development Zone, Mentougou District, Beijing, 102308

Applicant after: Yangguang Yunting Cultural Media Co.,Ltd.

Address before: Room 316, building 4, 2 Fuxingmenwai street, Xicheng District, Beijing

Applicant before: CCTV new media culture media (Beijing) Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20231008

Address after: 201203, 2nd Floor, Building 13, No. 27 Xinjinqiao Road, China (Shanghai) Pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant after: Zhongguang Intelligent Connected Vehicle Digital Media (Shanghai) Co.,Ltd.

Address before: Room 168, Floor 1, Building 3, No. 20 Yong'an Road, Shilong Economic Development Zone, Mentougou District, Beijing, 102308

Applicant before: Yangguang Yunting Cultural Media Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant