CN108170845A - Multimedia data processing method, device and storage medium - Google Patents

Multimedia data processing method, device and storage medium Download PDF

Info

Publication number
CN108170845A
CN108170845A CN201810044934.3A CN201810044934A CN108170845A CN 108170845 A CN108170845 A CN 108170845A CN 201810044934 A CN201810044934 A CN 201810044934A CN 108170845 A CN108170845 A CN 108170845A
Authority
CN
China
Prior art keywords
pending
medium data
list
labels
multimedia
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810044934.3A
Other languages
Chinese (zh)
Other versions
CN108170845B (en
Inventor
张龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Music Entertainment Technology Shenzhen Co Ltd
Original Assignee
Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Music Entertainment Technology Shenzhen Co Ltd filed Critical Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority to CN201810044934.3A priority Critical patent/CN108170845B/en
Publication of CN108170845A publication Critical patent/CN108170845A/en
Application granted granted Critical
Publication of CN108170845B publication Critical patent/CN108170845B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/48Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/432Query formulation
    • G06F16/434Query formulation using image data, e.g. images, photos, pictures taken by a user
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • G06F16/435Filtering based on additional data, e.g. user or group profiles
    • G06F16/436Filtering based on additional data, e.g. user or group profiles using biological or physiological data of a human being, e.g. blood pressure, facial expression, gestures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Physiology (AREA)
  • Molecular Biology (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of multimedia data processing method, device and storage medium, this method includes:Receive pending multi-medium data, and obtain the attribute information of pending multi-medium data, wherein attribute information includes multiple list of labels, multimedia number, multimedia amount and multi-media tag, and each list of labels is used to mark a kind of pending multi-medium data;When multiple list of labels are consistent, the popular degree of pending multi-medium data is generated according to multimedia number and multimedia amount, and the novel degree of pending multi-medium data is generated according to multi-media tag;It is less than default cold threshold in popular degree, and when novelty degree is more than default novelty threshold value, obtains the user information of pending multi-medium data;According to the corresponding history multimedia data information of user information, multiple list of labels, hot topic degree and the novelty degree of pending multi-medium data handle pending multi-medium data.The present invention improves the accuracy of processing multi-medium data.

Description

Multimedia data processing method, device and storage medium
Technical field
The present invention relates to a kind of multimedia technology field more particularly to multimedia data processing method, device and storages to be situated between Matter.
Background technology
With constantly popularizing for terminal, user can obtain more and more information by terminal.By taking music as an example, music It is that the universal of an important consumption element, especially terminal in people's life allows user easily to be obtained by terminal Music information.
In order to make user's quick obtaining to desired music data, more and more music applications provide music data and push away Take business.For example, referring to Fig. 1, Fig. 1 is music-playing interface of the prior art, in the interface, music application service provider There is provided the single push button a of song, user can sing single push button a by clicking, and the song that oneself is edited singly is pushed to music Application service provider.
Music application service provider is after these song lists are received, and there are two types of processing modes, and one kind is by background work personnel Singing in antiphonal style singly carries out audit processing, is then pushed to other users;Another kind is machine audit, that is, sing in antiphonal style single title content, song are single The unitary variants such as picture are detected.
In above two processing mode, if handled by background work personnel audit, easily generate song list and overstock, part is excellent The problem of matter song list is accidentally examined commonly to sing list.If audited using machine, exist and only unitary variant is carried out compared with shallow hierarchy The problem of detection, causes to sing the accuracy rate singly handled too low.
Invention content
The embodiment of the present invention provides a kind of multimedia data processing method, device and storage medium, can improve multimedia The accuracy rate of data processing.
The embodiment of the present invention provides a kind of multimedia data processing method, including:
Pending multi-medium data is received, and obtains the attribute information of the pending multi-medium data, wherein the category Property information include multiple list of labels, multimedia number, multimedia amount and multi-media tag, each list of labels is for marking A kind of pending multi-medium data of note;
Judge whether the multiple list of labels is consistent, if unanimously, according to the multimedia number and the multimedia Playback volume generates the popular degree of the pending multi-medium data, and generates pending more matchmakers according to the multi-media tag The novel degree of volume data;
The hot topic degree and the novelty degree are analyzed and processed, if the hot topic degree is less than default cold threshold, and The novelty degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data;And
According to the corresponding history multimedia data information of the user information, the pending multi-medium data it is described more A list of labels, the hot topic degree and the novelty degree handle the pending multi-medium data.
The embodiment of the present invention also provides a kind of apparatus for processing multimedia data, including:
Receiving module for receiving pending multi-medium data, and obtains the attribute letter of the pending multi-medium data Breath, wherein the attribute information includes multiple list of labels, multimedia number, multimedia amount and multi-media tag, it is each List of labels is used to mark a kind of pending multi-medium data;
Judgment module, for judging whether the multiple list of labels is consistent, if unanimously, according to the multimedia number The popular degree of the pending multi-medium data is generated with the multimedia amount, and institute is generated according to the multi-media tag State the novel degree of pending multi-medium data;
Analysis module, for being analyzed and processed to the hot topic degree and the novelty degree, if the hot topic degree is less than in advance If cold threshold, and the novelty degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data; And
Processing module, for according to the corresponding history multimedia data information of the user information, pending more matchmakers The multiple list of labels of volume data, the hot topic degree and the novelty degree handle the pending multi-medium data.
The embodiment of the present invention also provides a kind of storage medium, is stored with processor-executable instruction, which leads to It crosses execution described instruction and multimedia data processing method as described above is provided.
Multimedia data processing method, device and the storage medium of the embodiment of the present invention, by first obtaining pending more matchmakers The attribute information of volume data, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and more matchmakers Body identifies.The consistency of list of labels is analyzed again, when consistent, is generated according to multimedia number and multimedia amount Popular degree, and novelty degree is generated according to multi-media tag.It is less than default cold threshold, and novelty degree is new more than default in popular degree During clever threshold value, the corresponding history multimedia data information of user information of pending multi-medium data is obtained.Finally gone through according to this History multimedia data information, list of labels, hot topic degree and novelty degree handle the pending multi-medium data.The program is not only right Multiple detection variables of pending multi-medium data have carried out consistency analysis, also from profound level to pending multi-medium data into It has gone analysis, has effectively raised the accuracy rate of pending multimedia-data procession.
Description of the drawings
Below in conjunction with the accompanying drawings, it is described in detail by the specific embodiment to the present invention, technical scheme of the present invention will be made And other beneficial effects are apparent.
Fig. 1 is existing music-playing interface schematic diagram provided in an embodiment of the present invention.
Fig. 2 is the schematic diagram of a scenario of multimedia data processing method provided in an embodiment of the present invention.
Fig. 3 is the flow diagram of multimedia data processing method provided in an embodiment of the present invention.
Fig. 4 is the formation schematic diagram of list of labels provided in an embodiment of the present invention.
Fig. 5 is the mapping relations schematic diagram of label provided in an embodiment of the present invention and subject term.
Fig. 6 is another formation schematic diagram of list of labels provided in an embodiment of the present invention.
Fig. 7 is another flow diagram of multimedia data processing method provided in an embodiment of the present invention.
Fig. 8 is the schematic diagram of a scenario of the pending picture of processing provided in an embodiment of the present invention.
Fig. 9 is the another formation schematic diagram of list of labels provided in an embodiment of the present invention.
Figure 10 is the structure diagram of apparatus for processing multimedia data provided in an embodiment of the present invention.
Figure 11 is the structure diagram of receiving module provided in an embodiment of the present invention.
Figure 12 is the structure diagram of judgment module provided in an embodiment of the present invention.
Figure 13 is the structure diagram of processing module provided in an embodiment of the present invention.
Figure 14 is the structure diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, the every other implementation that those skilled in the art are obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Referring to Fig. 2, which is the schematic diagram of a scenario of multimedia data processing method provided in an embodiment of the present invention, the scene In, apparatus for processing multimedia data can be realized as entity, can also be integrated in the electronic equipments such as terminal or server It realizes, which can include smart mobile phone, tablet computer and personal computer etc..
As shown in Fig. 2, can include terminal a, server b and terminal c in the scene, wherein terminal a, terminal c can be Smart mobile phone, personal computer etc..The author of pending multi-medium data can upload this to server b using terminal a and wait to locate Manage multi-medium data.Server b first receives the pending multi-medium data, and obtains the attribute letter of the pending multi-medium data Breath, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and multi-media tag, multiple List of labels is for a kind of pending multi-medium data.Then judge whether multiple list of labels is consistent, in multiple label columns The popular degree of the pending multi-medium data is generated when table is consistent according to multimedia number and multimedia amount, and according to more matchmakers Body mark generates the novel degree of the pending multi-medium data.Popular degree and novelty degree are analyzed and processed again, in popular degree Less than default cold threshold, and when novelty degree is more than default novel threshold value, the user information of the pending multi-medium data is obtained. Finally according to the corresponding history multimedia data information of the user information, multiple list of labels of the pending multi-medium data, Popular degree and novelty degree, handle the pending multi-medium data, for example delete the pending multi-medium data or retain this and wait to locate Manage multi-medium data.Further, the pending multi-medium data of reservation can be sent to terminal c by server b, by professional people Member is further processed the pending multi-medium data of the reservation using terminal c.
The embodiment of the present invention provides a kind of multimedia data processing method, device and storage medium, will carry out respectively below It is described in detail.
In embodiments of the present invention, it will be described from the angle of apparatus for processing multimedia data, at the multi-medium data Reason device can specifically integrate in the electronic device.
A kind of multimedia data processing method, including:Pending multi-medium data is received, and obtains the pending multimedia The attribute information of data, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and multimedia Mark, each list of labels are used to mark a kind of pending multi-medium data;Judge whether multiple list of labels is consistent, if one It causes, then the popular degree of the pending multi-medium data is generated according to the multimedia number and the multimedia amount, and according to this Multi-media tag generates the novel degree of the pending multi-medium data;The hot topic degree and the novelty degree are analyzed and processed, if The hot topic degree is less than default cold threshold, and the novelty degree is more than default novel threshold value, then obtains the pending multi-medium data User information;And according to the corresponding history multimedia data information of the user information, which is somebody's turn to do Multiple list of labels, the hot topic degree and the novelty degree, handle the pending multi-medium data.
Fig. 3 is please referred to, Fig. 3 is the flow chart of multimedia data processing method provided in an embodiment of the present invention, and this method can To include:
Step S101 receives pending multi-medium data, and obtains the attribute information of the pending multi-medium data, wherein The attribute information includes multiple list of labels, multimedia number, multimedia amount and multi-media tag.
From file type angle, it is different types of can multi-medium data to be divided into audio, video, picture and text etc. Multi-medium data.As shown in figure 4, multi-medium data is showed with singing simple form formula, wherein having several songs in song list.The song list is not Only include audio, video, picture, further include the texts such as the single title of song, the single description of song.The attribute information of multi-medium data is used to mark Note the substantive characteristics or characteristic of various multi-medium datas, such as the size of the style with label for labelling audio, for another example audio.At this In preferred embodiment, the attribute information of pending multi-medium data that gets includes multiple list of labels, multimedia number, more Media play amount and multi-media tag etc..
In embodiments of the present invention, corresponding list of labels, i.e. a list of labels per a kind of pending multi-medium data For marking a kind of pending multi-medium data.Wherein the list of labels includes one or more labels.For example, for Fig. 4 In song list, can extract outgoing label " Guangdong language ", " classics " from the single title " review Guangdong language classics and sing well " of song, be described from song is single In can extract outgoing label " melancholy ", " vicissitudes " and " classics ", above-mentioned label composition list of labels { Guangdong language, classical, melancholy, deep blue Mulberry, classical, for marking the text sung in list.Similarly, label " Guangdong language " and " prevalence " are obtained to audio " friend " analysis, Label " Guangdong language " and " prevalence " are obtained to audio " 15th day of a month serenade " analysis, all audios in the song list are divided successively Analysis obtains whole labels, and then forms list of labels { Guangdong language, popular }, for marking the audio sung in list.
Step S102 judges whether multiple list of labels is consistent, if unanimously, according to the multimedia number and more matchmakers Body playback volume generates the popular degree of the pending multi-medium data, and generates the pending multimedia number according to the multi-media tag According to novel degree.
In specific implementation process, judge that whether consistent multiple list of labels step be as follows:
Based on default mapping relations, by the label mapping in list of labels to default subject term library on corresponding subject term, with Each list of labels is made to form a corresponding subject term list.
Judge whether the list of any two subject term all has identical subject term.
If the list of any two subject term all has identical subject term, it is determined that multiple list of labels is consistent.
Assuming that default subject term library is { curing, ACG (Animation Comic Game), excited }, label and default subject term The mapping relations of subject term are as shown in figure 5, different labels is centered around around corresponding subject term, such as subject term " healing " in library, There are mapping relations with label " warm ", " pure tone ", " lonely " etc..As shown in fig. 6, pending multi-medium data is including pending Text, pending picture, four kinds of file types of pending audio and pending video multi-medium data.Wherein, it is if pending The corresponding list of labels a1 of text is { pure and fresh, love song, epic }, then corresponding subject term list { is cured, excited } for b1.If it waits to locate It is { animation, warm } to manage the corresponding list of labels a2 of picture, then corresponding subject term list b2 is { ACG is cured }, if pending sound Frequently corresponding list of labels a3 is { pure tone is expressed one's emotion, passion }, then corresponding subject term list b3 is { curing, excited }, pending to regard Frequently corresponding list of labels a4 is { pure tone, love song }, then corresponding subject term list b4 is { healing }.
Above-mentioned subject term list b1, b2, b3 and b4 are compared two-by-two, if any two subject term list be respectively provided with it is identical Subject term, it is determined that above-mentioned list of labels a1, a2, a3 are consistent with a4, do not have identical subject term if there is two subject term lists, then Determine that above-mentioned list of labels a1, a2, a3 and a4 are inconsistent.Since subject term list b1, b2, b3 and b4 have subject term " healing ", Thus may determine that above-mentioned list of labels a1, a2, a3 are consistent with a4, also illustrate all kinds of multimedia numbers in pending multi-medium data According to style be unified.
If multiple label is inconsistent, illustrate the multi-medium data style of different file types in pending multi-medium data Disunity, therefore the pending multi-medium data can be transferred in other storage devices and stored, it can also directly delete.Such as The multiple label of fruit is consistent, then further the popular degree of pending multi-medium data and novelty degree can be detected.
Wherein it is possible to the popular degree of the pending multi-medium data is generated according to multimedia number and multimedia amount, It is as follows:
According to multimedia number and multimedia amount, generation multimedia is averaged playback volume.
The popular degree of pending multi-medium data is determined according to the multimedia playback volume that is averaged.
It is averaged playback volume h specifically, can be carried out calculating multimedia according to equation below:
Wherein, PiRepresent i-th of multimedia multimedia amount, N represents multimedia quantity, and N is positive integer.Popular degree It can be determined according to the playback volume grade that multimedia is averaged residing for playback volume.For example number of songs are included in multi-medium data, Playback volume that then song can be averaged is more than that the multi-medium data of 1,000,000 times/month is set as the first playback volume grade, corresponding first heat Door grade;The song multi-medium data that playback volume is between ten thousand times/month of 50-100 that is averaged is set as the second playback volume grade, Corresponding second popular grade.
Assuming that pending multi-medium data includes three songs, playback of songs amount be respectively 1,000,000 times/month, 500,000 times/ Month and 300,000 times/month, then according to above-mentioned formula can calculate pending multi-medium data song be averaged playback volume be 600,000 Secondary/moon, and then the song that can obtain the pending multi-medium data is averaged playback volume as the second playback volume grade, i.e., this waits to locate Multi-medium data is managed as the second popular grade.
In some embodiments, popular degree hot_score can also be directly calculated according to equation below:
Wherein, it is counted by the playback volume that is averaged of the multimedia to existing multi-medium data, when making PiWith million times/month During for unit, the value of popular degree hot_score can be made to be between [0,10].
By for statistical analysis to a large amount of multi-medium datas it is found that as the popular degree hot_ of pending multi-medium data When score is less than 0.69, the corresponding multimedia of the pending multi-medium data playback volume P that be averaged is less than 5 times/month.Due to such Pending multi-medium data makes apparatus for processing multimedia data lack enough prioris to be judged, therefore can incite somebody to action Such pending multi-medium data retains, to carry out next step detection.It is and pending more than 6 for popular degree hot_score For multi-medium data, inner multimedia is averaged playback volume P in million times/month or more, is found through statistics, this kind of pending more matchmakers Volume data is intended to simply collect popular video, popular audio, and often its theme is indefinite, style disunity, therefore can To be deleted.
To sum up, popular upper limit value 6 can be set as to default cold threshold, it should be noted that herein not to default heat Door threshold value is specifically limited.When popular degree is not less than the cold threshold, which can be deleted or The pending multi-medium data is transferred to other storage devices.
In specific implementation process, the novel degree of pending multi-medium data, tool can also be generated according to multi-media tag Body step is as follows:
Obtain the multi-media tag for having retained multi-medium data.
According to the multi-media tag and the multi-media tag of pending multi-medium data for having retained multi-medium data, determine The novel degree of pending multi-medium data.
Specifically, can novelty degree N be calculated according to equation below:
S be pending multi-medium data, SiFor i-th of multimedia multi-media tag in pending multi-medium data, In, n>=i>0, i is positive integer, and n is the multimedia quantity in pending multi-medium data.K is has retained multi-medium data, Kj To have retained j-th of multimedia multi-media tag in multi-medium data.Wherein, m>=j>0, j is positive integer, and m is has retained Multimedia quantity in multi-medium data.
Preferably, a default novel threshold value can be set, when more than the default novel threshold value, illustrate pending more matchmakers Volume data and the similarity for having retained multi-medium data are smaller.When default novel no more than this, illustrate the pending multimedia Data and the similarity for having retained multi-medium data are higher, in order to improve the abundant degree of multi-medium data, can delete this and treat The pending multi-medium data is transferred to other storage devices by processing multi-medium data.
Step S103 analyzes and processes the hot topic degree and the novelty degree, if the hot topic degree is less than default cold threshold, And the novelty degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data.
Wherein, if the hot topic degree is less than default cold threshold, and the novelty degree is more than default novel threshold value, illustrates pending Multi-medium data was not dsc data, and not high with having retained multi-medium data similarity, therefore can further obtain this and treat The user information of processing multi-medium data is analyzed.Wherein, user information includes the author of the pending multi-medium data.Such as Song shown in Fig. 4 is single, and author is a goldfish Ji, i.e. the single corresponding user information of the song.
Step S104, according to the corresponding history multimedia data information of the user information, the pending multi-medium data Multiple list of labels, the hot topic degree and the novelty degree, handle the pending multi-medium data.
Wherein, history multimedia data information includes the letters such as the scoring of history multi-medium data, history multi-medium data quantity Breath.Specifically, can be using the multi-medium data retained as training set, above-mentioned history multimedia data information, this waits to locate Multiple list of labels, hot topic degree and the novelty degree of multi-medium data are managed as characteristic value, input logic regression model (Logistic Regression it) is trained, to judge whether the pending multi-medium data is satisfactory multi-medium data.
It can be seen from the above, multimedia data processing method provided in an embodiment of the present invention, by first obtaining pending more matchmakers The attribute information of volume data, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and more matchmakers Body identifies.The consistency of list of labels is analyzed again, when consistent, is generated according to multimedia number and multimedia amount Popular degree, and novelty degree is generated according to multi-media tag.It is less than default cold threshold, and novelty degree is new more than default in popular degree During clever threshold value, the corresponding history multimedia data information of user information of pending multi-medium data is obtained.Finally gone through according to this History multimedia data information, list of labels, hot topic degree and novelty degree handle the pending multi-medium data.The program is not only right Multiple detection variables of pending multi-medium data have carried out consistency analysis, also from profound level to pending multi-medium data into It has gone analysis, has effectively raised the accuracy rate of pending multimedia-data procession.
According to the multimedia data processing method that above-described embodiment describes, citing is described further below.In this hair It in bright embodiment, will be described from the angle of apparatus for processing multimedia data, which specifically can be with It integrates in the electronic device.
Please refer to Fig. 7, another flow charts of the Fig. 7 for multimedia data processing method provided in an embodiment of the present invention, the party Method can include:
Step S201 receives pending multi-medium data, and obtains the attribute information of the pending multi-medium data, wherein The attribute information includes multiple list of labels, multimedia number, multimedia amount and multi-media tag.
As shown in figure 8, when pending multi-medium data includes pending picture, pending more matchmakers are obtained in step S201 The attribute information of volume data includes:Noise figure, fuzziness and the exposure of pending picture are extracted, place is treated according to preset formula Reason picture is calculated, generation marking result.Whether judge to give a mark result less than preset fraction threshold value.If the marking result is not small In preset fraction threshold value, then the label of pending picture is extracted, to form the list of labels.
It is calculated specifically, following preset formula may be used:
Score=1-0.8*blur-0.1*noise-0.1*abs (exposure)
Wherein, score be marking as a result, blur is fuzziness, noise is noise figure, and exposure is exposure, abs () is ABS function.If result of giving a mark score is less than preset fraction threshold value, illustrate that pending picture quality is not up to standard, therefore Pending multi-medium data can be deleted or the pending multi-medium data is transferred in other storage devices.If marking As a result score is not less than preset fraction threshold value, then further pending picture can be analyzed, to obtain its corresponding mark Label form list of labels.
Specifically, CNN (Convolutional Neural Network, convolutional neural networks), DBN may be used (Recurrent Neural Network recycle nerve net by (Deep Belief Network, depth belief network), RNN Network), the network models such as recurrent neural tensor network, image identification, Face datection are carried out, to extract the mark of pending picture Label.For example, analyzing the picture in Fig. 8, the emotional parameters value of personage in picture is obtained -- passive degree (negative) is 0.012, it is 0.988 actively to spend (positive), then can mark the picture using label " joy ".
In some embodiments, it can also be analyzed by the label extracted to pending picture, filter out and include The pending multi-medium data of flame.Such as setting one includes the default label of the defective product labels such as " violence ", " bloody " List, when the default list of labels of list of labels and this of pending picture has same label, by the pending multimedia number According to being considered as the multi-medium data for including flame, therefore the pending multi-medium data can be deleted.It should be noted that In the present embodiment, the label that can also first extract pending picture is analyzed, then extracts noise figure, fuzziness and exposure Picture quality is detected.
As shown in figure 9, when pending multi-medium data includes pending audio, pending more matchmakers are obtained in step S201 The attribute information of volume data includes:Obtain the corresponding multiple default labels of pending audio;Multiple default label is gathered Class obtains cluster labels, to form the list of labels.
Multiple default labels are clustered specifically, K-means methods may be used, K cluster labels are obtained, with shape Into list of labels, wherein K is positive integer.As shown in figure 9, in pending multi-medium data " my song list ", each is pending Audio all there are one or multiple default labels, first obtain these default labels, then these labels clustered, obtain 2 Cluster labels " animation " and " healing ", this 2 cluster labels form the corresponding list of labels of pending audio, and { animation is controlled More }.
When pending multi-medium data includes pending text, the category of pending multi-medium data is obtained in step S201 Property information includes:Based on default phrase template, nominal phrase is extracted from the pending text in the range of default number of words As label, to form the list of labels;And/or based on TextRank algorithm, from the pending text beyond default number of words range Label is extracted in this, to form the list of labels.
Wherein, TextRank algorithm is used for as text generation keyword and abstract.Specifically, assume default number of words ranging from 0-10, then in song list as shown in Figure 4, since the single title " review Guangdong language classics and sing well " of song presets number of words range in this, because This can be based on default phrase template and extract keyword " Guangdong language " from the song list title, as label.For another example sing in list Description text has exceeded the default number of words range of lid, therefore can be based on TextRank algorithm, therefrom proposes " classics " keyword.So These labels are formed into list of labels { Guangdong language, classical } afterwards.
Based on default mapping relations, the label in the list of labels maps to corresponding in default subject term library by step S202 Subject term on so that each list of labels formed a corresponding subject term list.
Assuming that default subject term library is { curing, ACG (Animation Comic Game), excited }, label and default subject term The mapping relations of subject term are as shown in figure 5, different labels is centered around around corresponding subject term, such as subject term " healing " in library, There are mapping relations with label " warm ", " pure tone ", " lonely " etc..As shown in fig. 6, pending multi-medium data is including pending Text, pending picture, four kinds of file types of pending audio and pending video multi-medium data.Wherein, it is if pending The corresponding list of labels a1 of text is { pure and fresh, love song, epic }, then corresponding subject term list { is cured, excited } for b1.If it waits to locate It is { animation, warm } to manage the corresponding list of labels a2 of picture, then corresponding subject term list b2 is { ACG is cured }, if pending sound Frequently corresponding list of labels a3 is { pure tone is expressed one's emotion, passion }, then corresponding subject term list b3 is { curing, excited }, pending to regard Frequently corresponding list of labels a4 is { pure tone, love song }, then corresponding subject term list b4 is { healing }.
Step S203, judges whether any two subject term list all has identical subject term.
Above-mentioned subject term list b1, b2, b3 and b4 are compared two-by-two, if any two subject term list be respectively provided with it is identical Subject term, it is determined that above-mentioned list of labels a1, a2, a3 are consistent with a4, are transferred to step S204;Do not have if there is two subject term lists There is identical subject term, it is determined that above-mentioned list of labels a1, a2, a3 and a4 are inconsistent, are transferred to step S211.
Step S204 is more with this according to the multimedia number if any two subject term list all has identical subject term Media play amount generates the popular degree of the pending multi-medium data, and generates the pending multimedia according to the multi-media tag The novel degree of data.
Since subject term list b1, b2, b3 and b4 have subject term " healing ", thus may determine that above-mentioned list of labels a1, A2, a3 are consistent with a4, and the style for also illustrating all kinds of multi-medium datas in pending multi-medium data is unified, therefore further Popular degree to pending multi-medium data and novelty degree are detected.
In specific implementation process, the pending multimedia number can be generated according to multimedia number and multimedia amount According to popular degree, be as follows:
According to multimedia number and multimedia amount, generation multimedia is averaged playback volume.
The popular degree of pending multi-medium data is determined according to the multimedia playback volume that is averaged.
It is averaged playback volume h specifically, can be carried out calculating multimedia according to equation below:
Wherein, PiRepresent i-th of multimedia multimedia amount, N represents multimedia quantity, and N is positive integer.Popular degree It can be determined according to the playback volume grade that multimedia is averaged residing for playback volume.For example number of songs are included in multi-medium data, Playback volume that then song can be averaged is more than that the multi-medium data of 1,000,000 times/month is set as the first playback volume grade, corresponding first heat Door grade;The song multi-medium data that playback volume is between ten thousand times/month of 50-100 that is averaged is set as the second playback volume grade, Corresponding second popular grade.
Assuming that pending multi-medium data includes three songs, playback of songs amount be respectively 1,000,000 times/month, 500,000 times/ Month and 300,000 times/month, then according to above-mentioned formula can calculate pending multi-medium data song be averaged playback volume be 600,000 Secondary/moon, and then the song that can obtain the pending multi-medium data is averaged playback volume as the second playback volume grade, i.e., this waits to locate Multi-medium data is managed as the second popular grade.
In some embodiments, popular degree hot_score can also be directly calculated according to equation below:
Wherein, it is counted by the playback volume that is averaged of the multimedia to existing multi-medium data, when making PiWith million times/month During for unit, the value of popular degree hot_score can be made to be between [0,10].
By for statistical analysis to a large amount of multi-medium datas it is found that as the popular degree hot_ of pending multi-medium data When score is less than 0.69, the corresponding multimedia of the pending multi-medium data playback volume P that be averaged is less than 5 times/month.Due to such Pending multi-medium data makes apparatus for processing multimedia data lack enough prioris to be judged, therefore can incite somebody to action Such pending multi-medium data retains, to carry out next step detection.It is and pending more than 6 for popular degree hot_score For multi-medium data, inner multimedia is averaged playback volume P in million times/month or more, is found through statistics, this kind of pending more matchmakers Volume data is intended to simply collect popular video, popular audio, and often its theme is indefinite, style disunity, therefore can To be deleted.
In specific implementation process, the novel degree of pending multi-medium data, tool can also be generated according to multi-media tag Body step is as follows:
Obtain the multi-media tag for having retained multi-medium data.
According to the multi-media tag and the multi-media tag of pending multi-medium data for having retained multi-medium data, determine The novel degree of pending multi-medium data.
Specifically, can novelty degree N be calculated according to equation below:
S be pending multi-medium data, SiFor i-th of multimedia multi-media tag in pending multi-medium data, In, n>=i>0, i is positive integer, and n is the multimedia quantity in pending multi-medium data.K is has retained multi-medium data, Kj To have retained j-th of multimedia multi-media tag in multi-medium data.Wherein, m>=j>0, j is positive integer, and m is has retained Multimedia quantity in multi-medium data.
Step S205, judges whether the hot topic degree is less than default cold threshold, and whether the novelty degree is more than default novelty Threshold value.
To sum up, popular upper limit value 6 can be set as to default cold threshold, it should be noted that in the present embodiment not Specific limit is made to default cold threshold.When popular degree is more than or equal to the cold threshold, step S211 can be transferred to and treat this Processing multi-medium data is deleted or the pending multi-medium data is transferred to other storage devices.
One default novel threshold value can also be set, when more than the default novel threshold value, illustrate the pending multimedia number It is smaller according to the similarity with having retained multi-medium data.When default novel less than or equal to this, illustrate the pending multimedia number It is higher according to the similarity with having retained multi-medium data, in order to improve the abundant degree of multi-medium data, step can be transferred to S211 deletes the pending multi-medium data.
If the hot topic degree is less than default cold threshold, and the novelty degree is more than default novel threshold value, illustrates pending more matchmakers Volume data was not dsc data, and not high with having retained multi-medium data similarity, therefore can be transferred to step S206 acquisitions and be somebody's turn to do The user information of pending multi-medium data is analyzed.
Step S206, if the hot topic degree is less than default cold threshold, and the novelty degree is more than default novel threshold value, then obtains The user information of the pending multi-medium data.
Wherein, user information includes the author of the pending multi-medium data.Song list as shown in Figure 4, author are goldfish The single corresponding user information of a Ji, the i.e. song.
Step S207 obtains the corresponding history multimedia data information of the user information.
The quality of pending multi-medium data can be carried out with the corresponding history multimedia data information of counting user information Prediction.Specifically, the corresponding history multi-medium data scoring of user information and the progress of history multi-medium data quantity can be obtained Analysis.It scores if the scoring of history multi-medium data is less than default multi-medium data, and the history multi-medium data quantity is more than pre- If multi-medium data quantity, illustrate the pending multi-medium data may quality it is too poor, therefore can delete or unloading this wait to locate Manage multi-medium data.
By taking user edits song single-throw original text as an example, if the monthly average submission amount of the user is twice of other users, still It has thrown song Dan Douwei to be retained, then the pending song that can predict the user is singly second-rate song list, therefore can be deleted The song is singly transferred to other storage devices by the song list of user submission.
Step S208 will retain multi-medium data as training set, the history multimedia data information, multiple mark Label list, the hot topic degree and the novelty degree are trained as characteristic value input logic regression model, obtain training result.
Since in the detecting step of above-mentioned steps S201-S207, accurate appraisal can not be made to some detection datas.Such as In step S204, when the popular degree hot_score of pending multi-medium data is less than 0.69, such pending multi-medium data Apparatus for processing multimedia data is made to lack enough prioris to be judged.Therefore for these uncontrollable testing numbers According to, can further use Logic Regression Models carry out learning training, to improve the accuracy of pending multimedia-data procession.
Specifically, can be using the multi-medium data retained as training set, it, should above-mentioned history multimedia data information Multiple list of labels, hot topic degree and the novelty degree of pending multi-medium data are as characteristic value, input logic regression model (Logistic Regression) is trained, to judge whether the pending multi-medium data meets default item.
Step S209, judges whether the training result meets preset condition.
If the training result meets preset condition, it is transferred to step S210;If the training result is unsatisfactory for default item Part is then transferred to step S211.
Step S210 if the training result meets preset condition, retains the pending multi-medium data.
If the training result meets preset condition, illustrate that the pending multi-medium data meets preset quality requirement, because This can retain the pending multi-medium data, be for further processing with being sent to professional.
Step S211, if there are two subject term lists not to have identical subject term or the hot topic degree more than or equal to default hot topic If threshold value or the novelty degree are less than or equal to default novel threshold value or the training result is unsatisfactory for preset condition, delete this and wait to locate Manage multi-medium data.
If judging in step S203, multiple label is inconsistent, illustrates different file types in pending multi-medium data Multi-medium data style disunity, therefore the pending multi-medium data can be deleted or be transferred in other storage devices.
Similarly, if the hot topic degree is more than or equal to default cold threshold or the novelty degree is less than or equal to default novel threshold value, Illustrate the pending multi-medium data for overheat multi-medium data or excessively high with other multi-medium data similarities, if the training As a result preset condition is unsatisfactory for, illustrates that the pending multi-medium data does not meet preset quality requirement, therefore can treat this Processing multi-medium data is deleted or is transferred in other storage devices.
It can be seen from the above, the multimedia data processing method of the embodiment of the present invention, to the pending more of different file types Media data takes corresponding tag extraction method, can get more accurately list of labels.Further, believed according to user Corresponding history multimedia data information is ceased, the quality of pending multi-medium data is predicted, so as to further improve The accuracy rate that is handled pending multi-medium data.
According to the described method of above-described embodiment, the present embodiment will be further from the angle of apparatus for processing multimedia data It is described, which can integrate in the electronic device
Please refer to Figure 10, structure charts of the Figure 10 for apparatus for processing multimedia data provided in an embodiment of the present invention, the device It can include receiving module 301, judgment module 302, analysis module 303 and processing module 304.
(1) receiving module 301
Receiving module 301 obtains the attribute letter of the pending multi-medium data for receiving pending multi-medium data Breath, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and multi-media tag, Mei Yibiao Label list is used to mark a kind of pending multi-medium data.
From file type angle, multi-medium data can be divided into audio, video and picture relevant with music, text Different types of multi-medium data such as this.As shown in figure 4, multi-medium data is showed with singing simple form formula, wherein having in song list several Song.The song list not only includes the corresponding audio of song, video, picture, further includes the texts such as the single title of song, the single description of song. The attribute information of multi-medium data is used to marking the substantive characteristics or characteristic of various multi-medium datas, such as with label for labelling audio Style, for another example audio size.In the preferred embodiment, the pending multi-medium data that receiving module 301 is got Attribute information includes multiple list of labels, multimedia number, multimedia amount and multi-media tag etc..
In embodiments of the present invention, corresponding list of labels, i.e. a list of labels per a kind of pending multi-medium data For marking a kind of pending multi-medium data.Wherein the list of labels includes one or more labels.For example, for Fig. 4 In song list, from the single title " review Guangdong language classics and sing well " of song, receiving module 301 can extract outgoing label " Guangdong language ", " warp Allusion quotation ", from the single description of song, receiving module 301 can extract outgoing label " melancholy ", " vicissitudes " and " classics ", above-mentioned label composition List of labels { Guangdong language, classical, melancholy, vicissitudes are classical }, for marking the text sung in list.Similarly, receiving module 301 is right Audio " friend " analysis obtains label " Guangdong language " and " prevalence ", and receiving module 301 marks audio " 15th day of a month serenade " analysis It signs " Guangdong language " and " prevalence ", all audios in the song list is analyzed successively, obtain whole labels, and then form label column Table { Guangdong language, popular }, for marking the audio sung in list.
Specifically, as shown in figure 11, receiving module 301 includes marking submodule 3011,3012 and of the first judging submodule Extracting sub-module 3013.
When pending multi-medium data includes pending picture, marking submodule 3011 is used to extract pending picture Noise figure, fuzziness and exposure calculate pending picture according to preset formula, generation marking result.First judges Whether submodule 3012 is less than preset fraction threshold value for the result that judges to give a mark.Extracting sub-module 3013 is used in the marking result During not less than preset fraction threshold value, the label of pending picture is extracted, to form the list of labels.
Specifically, marking submodule 3011 may be used following preset formula and be calculated:
Score=1-0.8*blur-0.1*noise-0.1*abs (exposure)
Wherein, score be marking as a result, blur is fuzziness, noise is noise figure, and exposure is exposure, abs () is ABS function.If the first judging submodule 3012 judges marking result, score is less than preset fraction threshold value, and explanation is treated It is not up to standard to handle picture quality, therefore extracting sub-module 3013 can delete pending multi-medium data or this is pending Multi-medium data is transferred in other storage devices.If the first judging submodule 3012 judges marking result score not less than pre- If score threshold, then extracting sub-module 3013 can also further analyze pending picture, to obtain its corresponding mark Label form list of labels.
Specifically, CNN (Convolutional Neural Network, convolution god may be used in extracting sub-module 3013 Through network), DBN (Deep Belief Network, depth belief network), (Recurrent Neural Network, are followed RNN Ring neural network), the network models such as recurrent neural tensor network, image identification, Face datection are carried out, to extract pending figure The label of piece.For example, extracting sub-module 3013 analyzes the picture in Fig. 8, the emotional parameters of personage in picture are obtained Value -- passive degree (negative) is 0.012, and it is 0.988 actively to spend (positive), then label " joy " can be used to mark The picture.
In some embodiments, extracting sub-module 3013 can also be divided by the label extracted to pending picture Analysis, filters out the pending multi-medium data comprising flame.Such as setting one is bad comprising " violence ", " bloody " etc. The default list of labels of label, when the default list of labels of list of labels and this of pending picture has same label, by this Pending multi-medium data is considered as the multi-medium data comprising flame, therefore can delete the pending multi-medium data It removes.It should be noted that in the present embodiment, the label that extracting sub-module 3013 can also first extract pending picture is divided Analysis, then extract noise figure, fuzziness and exposure and picture quality is detected.
As shown in figure 11, receiving module 301 further includes acquisition submodule 3014 and cluster submodule 3015.When pending more When media data includes pending audio, acquisition submodule 3014 is used to obtain the corresponding multiple default labels of pending audio. Cluster submodule 3015 obtains cluster labels, to form the list of labels for being clustered to multiple default label.
Specifically, cluster submodule 3015 may be used K-means methods and multiple default labels are clustered, K is obtained A cluster labels, to form list of labels, wherein K is positive integer.As shown in figure 9, pending multi-medium data " my song list " In, each pending audio there are one or multiple default labels, first obtain these default labels, then to these labels into Row cluster, obtains 2 cluster labels " animation " and " healing ", this 2 cluster labels form the corresponding label of pending audio List { animation is cured }.
As shown in figure 11, receiving module 301 further includes the first extracting sub-module 3016 and the second extracting sub-module 3017.When When pending multi-medium data includes pending text, the first extracting sub-module 3016 is used for based on default phrase template, from Nominal phrase is extracted in the pending text in the range of number of words as label in presetting, to form the list of labels;And/or the Two extracting sub-modules 3017 are used to, based on TextRank algorithm, mark is extracted from the pending text beyond default number of words range Label, to form the list of labels.
Assuming that default number of words ranging from 0-10, then in song list as shown in Figure 4, since the single title of song " reviews Guangdong language classics Good song " presets number of words range in this, therefore the first extracting sub-module 3016 can be based on default phrase template from the song list mark " Guangdong language " is extracted in topic, as label.It for another example sings the description text in list and has exceeded the default number of words range of lid, therefore second carries It takes submodule 3017 that can be based on TextRank algorithm, therefrom proposes " classics ".Then by these labels composition list of labels { Guangdong Language, classical.
(2) judgment module 302
Judgment module 302 for judging whether multiple list of labels consistent, if unanimously, according to the multimedia number and The multimedia amount generates the popular degree of the pending multi-medium data, and it is pending more according to the multi-media tag to generate this The novel degree of media data.
As shown in figure 12, which can specifically include:Mapping submodule 3021,3022 and of judging submodule First determination sub-module 3023.
Wherein, mapping submodule 3021 is used for based on default mapping relations, by the label mapping in list of labels to default In subject term library on corresponding subject term, so that each list of labels forms a corresponding subject term list.Judging submodule 3022 is used for Judge whether the list of any two subject term all has identical subject term.First determination sub-module 3023 is used to arrange in any two subject term When table all has identical subject term, determine that multiple list of labels is consistent.
Assuming that default subject term library is { curing, ACG (Animation Comic Game), excited }, label and default subject term The mapping relations of subject term are as shown in figure 5, different labels is centered around around corresponding subject term, such as subject term " healing " in library, There are mapping relations with label " warm ", " pure tone ", " lonely " etc..As shown in fig. 6, pending multi-medium data is including pending Text, pending picture, four kinds of file types of pending audio and pending video multi-medium data.Wherein, it is if pending The corresponding list of labels a1 of text is { pure and fresh, love song, epic }, then mapping submodule 3021 is by the label in list of labels a1 It is mapped in the default subject term library, obtains corresponding subject term list and { cured, excited } for b1.If the corresponding label of pending picture List a2 is { animation, warm }, then the label mapping in list of labels a2 is preset subject term library by mapping submodule 3021 to this In, obtain corresponding subject term list b2 as { ACG, cure }, if the corresponding list of labels a3 of pending audio for pure tone is expressed one's emotion, Passion }, then mapping submodule 3021 presets the label mapping in list of labels a3 in subject term library to this, obtains corresponding master Word list b3 is { cure, excited }, and the corresponding list of labels a4 of pending video is { pure tone, love song }, then mapping submodule 3021 preset the label mapping in list of labels a4 in subject term library to this, obtain corresponding subject term list b4 as { healing }.
Judging submodule 3022 compares above-mentioned subject term list b1, b2, b3 and b4 two-by-two, if any two subject term List is respectively provided with identical subject term, then the first determination sub-module 3023 determines that above-mentioned list of labels a1, a2, a3 are consistent with a4, if There are two subject term lists not to have identical subject term, it is determined that above-mentioned list of labels a1, a2, a3 and a4 are inconsistent.Due to subject term List b1, b2, b3 and b4 have subject term " healing ", thus may determine that above-mentioned list of labels a1, a2, a3 are consistent with a4, The style for illustrating all kinds of multi-medium datas in pending multi-medium data is unified.
If multiple label is inconsistent, illustrate the multi-medium data style of different file types in pending multi-medium data Disunity, therefore the pending multi-medium data can be deleted or be transferred in other storage devices.If multiple label Unanimously, then judgment module 302 can further be detected the popular degree of pending multi-medium data and novelty degree.
Wherein, judgment module 302 can generate the pending multimedia number according to multimedia number and multimedia amount According to popular degree, be as follows:
According to multimedia number and multimedia amount, generation multimedia is averaged playback volume.
The popular degree of pending multi-medium data is determined according to the multimedia playback volume that is averaged.
Specifically, judgment module 302 can be carried out calculating multimedia according to equation below and is averaged playback volume h:
Wherein, PiRepresent i-th of multimedia multimedia amount, N represents multimedia quantity, and N is positive integer.Popular degree It can be determined according to the playback volume grade that multimedia is averaged residing for playback volume.For example number of songs are included in multi-medium data, Playback volume that then song can be averaged is more than that the multi-medium data of 1,000,000 times/month is set as the first playback volume grade, corresponding first heat Door grade;The song multi-medium data that playback volume is between ten thousand times/month of 50-100 that is averaged is set as the second playback volume grade, Corresponding second popular grade.
Assuming that pending multi-medium data includes three songs, playback of songs amount be respectively 1,000,000 times/month, 500,000 times/ Month and 300,000 times/month, then according to above-mentioned formula can calculate pending multi-medium data song be averaged playback volume be 600,000 Secondary/moon, and then the song that can obtain the pending multi-medium data is averaged playback volume as the second playback volume grade, i.e., this waits to locate Multi-medium data is managed as the second popular grade.
In some embodiments, judgment module 302 can also directly calculate popular degree hot_score according to equation below:
Wherein, it is counted by the playback volume that is averaged of the multimedia to existing multi-medium data, when making PiWith million times/month During for unit, the value of popular degree hot_score can be made to be between [0,10].
By for statistical analysis to a large amount of multi-medium datas it is found that as the popular degree hot_ of pending multi-medium data When score is less than 0.69, the corresponding multimedia of the pending multi-medium data playback volume P that be averaged is less than 5 times/month.Due to such Pending multi-medium data makes apparatus for processing multimedia data lack enough prioris to be judged, therefore can incite somebody to action Such pending multi-medium data retains, to carry out next step detection.It is and pending more than 6 for popular degree hot_score For multi-medium data, inner multimedia is averaged playback volume P in million times/month or more, is found through statistics, this kind of pending more matchmakers Volume data is intended to simply collect popular video, popular audio, and often its theme is indefinite, style disunity, therefore can To be deleted or be transferred to other storage devices.
To sum up, popular upper limit value 6 can be set as to default cold threshold.When popular degree is more than or equal to the cold threshold When, which can be deleted or be transferred to other storage devices.
As shown in figure 12, judgment module 302 further includes mark 3024 and second determination sub-module 3025 of acquisition submodule.Mark Acquisition submodule 3024 is known for obtaining the multi-media tag for having retained multi-medium data.Second determination sub-module 3025 is used for root According to the multi-media tag and the multi-media tag of pending multi-medium data for having retained multi-medium data, pending more matchmakers are determined The novel degree of volume data.
Specifically, can novelty degree N be calculated according to equation below:
S be pending multi-medium data, SiFor i-th of multimedia multi-media tag in pending multi-medium data, In, n>=i>0, i is positive integer, and n is the multimedia quantity in pending multi-medium data.K is has retained multi-medium data, Kj To have retained j-th of multimedia multi-media tag in multi-medium data.Wherein, m>=j>0, j is positive integer, and m is has retained Multimedia quantity in multi-medium data.
One default novel threshold value can be set, when more than the default novel threshold value, illustrate the pending multi-medium data It is smaller with the similarity that has retained multi-medium data.When default novel no more than this, illustrate the pending multi-medium data with The similarity for having retained multi-medium data is higher, and in order to improve the abundant degree of multi-medium data, it is pending more can to delete this Media data.
(3) analysis module 303
Analysis module 303 is for analyzing and processing the hot topic degree and the novelty degree, if the hot topic degree is less than default heat Door threshold value, and the novelty degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data.
If the hot topic degree is less than default cold threshold, and the novelty degree is more than default novel threshold value, illustrates pending more matchmakers Volume data was not dsc data, and not high with having retained multi-medium data similarity, therefore it is pending further to obtain this The user information of multi-medium data is analyzed.Wherein, user information includes the author of the pending multi-medium data.Such as Fig. 4 Shown song list, author are a goldfish Ji, i.e. the single corresponding user information of the song.
(4) processing module 304
Processing module 304 is used for according to the corresponding history multimedia data information of the user information, the pending multimedia Multiple list of labels of data, the hot topic degree and the novelty degree, handle the pending multi-medium data.
Preferably, processing module 304 can be with the corresponding history multimedia data information of counting user information, to pending more The quality of media data is predicted.Specifically, the corresponding history multi-medium data scoring of user information and history can be obtained Multi-medium data quantity is analyzed.It scores, and the history is more if the scoring of history multi-medium data is less than default multi-medium data Media data quantity is more than default multi-medium data quantity, illustrates that the possible quality of the pending multi-medium data is too poor, because herein Reason module 304 can delete or the unloading pending multi-medium data.
By taking user edits song single-throw original text as an example, if the monthly average submission amount of the user is twice of other users, still It has thrown song Dan Douwei to be retained, then the pending song that can predict the user is singly second-rate song list, therefore processing module 304 can delete the song list of user submission or the song is singly transferred to other storage devices.
As shown in figure 13, processing module 304 includes acquisition of information submodule 3041, training submodule 3042, second judges Submodule 3043 and reservation submodule 3044.
Wherein acquisition of information submodule 3041 is used to obtain the corresponding history multimedia data information of user information.Training Module 3042 is for will retain multi-medium data as training set, the history multimedia data information, multiple label column Table, the hot topic degree and the novelty degree are trained as characteristic value input logic regression model, obtain training result.Second judges Submodule 3043 is used to judge whether the training result meets preset condition.Retain submodule 3044 for expiring in the training result During sufficient preset condition, retain the pending multi-medium data.
Specifically, training submodule 3042 can be more above-mentioned history using the multi-medium data retained as training set Media data scoring, history multi-medium data quantity, multiple list of labels, hot topic degree and the novelty of the pending multi-medium data Degree is trained as characteristic value, input logic regression model (Logi st i c Regress i on), then passes through second Judging submodule 3043 judges whether the pending multi-medium data meets default item, when meeting preset condition, retains submodule Block 3044 retains the pending multi-medium data.
The apparatus for processing multimedia data of the embodiment of the present invention is believed by the attribute for first obtaining pending multi-medium data Breath, the wherein attribute information include multiple list of labels, multimedia number, multimedia amount and multi-media tag.Again to mark The consistency of label list is analyzed, when consistent, according to multimedia number and multimedia amount generation hot topic degree, and according to Multi-media tag generates novelty degree.It is less than default cold threshold in popular degree, and when novelty degree is more than default novel threshold value, acquisition The corresponding history multimedia data information of user information of pending multi-medium data.Finally believed according to the history multi-medium data Breath, list of labels, hot topic degree and novelty degree handle the pending multi-medium data.The program is not only to pending multimedia number According to multiple detection variables carried out consistency analysis, also pending multi-medium data is analyzed from profound level, effectively The accuracy rate for improving pending multimedia-data procession.
Correspondingly, the embodiment of the present invention also provides a kind of electronic equipment, as shown in figure 14, it illustrates the embodiment of the present invention The structure diagram of involved electronic equipment, specifically:
The electronic equipment can include one or more than one processing core processor 401, one or more The components such as memory 402, power supply 403 and the input unit 404 of computer readable storage medium.Those skilled in the art can manage It solves, the electronic devices structure shown in Figure 14 does not form the restriction to electronic equipment, can include more more or less than illustrating Component either combine certain components or different components arrangement.Wherein:
Processor 401 is the control centre of the electronic equipment, utilizes various interfaces and the entire electronic equipment of connection Various pieces are stored in by running or performing the software program being stored in memory 402 and/or module and call Data in reservoir 402 perform the various functions of electronic equipment and processing data, so as to carry out integral monitoring to electronic equipment. Optionally, processor 401 may include one or more processing cores;Preferably, processor 401 can integrate application processor and tune Demodulation processor processed, wherein, the main processing operation system of application processor, user interface and application program etc., modulatedemodulate is mediated Reason device mainly handles wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 401 In.
Memory 402 can be used for storage software program and module, and processor 401 is stored in memory 402 by operation Software program and module, so as to perform various functions application and data processing.Memory 402 can mainly include storage journey Sequence area and storage data field, wherein, storing program area can storage program area, the application program (ratio needed at least one function Such as sound-playing function, image player function) etc.;Storage data field can be stored uses created number according to electronic equipment According to etc..In addition, memory 402 can include high-speed random access memory, nonvolatile memory can also be included, such as extremely A few disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 402 can also wrap Memory Controller is included, to provide access of the processor 401 to memory 402.
Electronic equipment further includes the power supply 403 powered to all parts, it is preferred that power supply 403 can pass through power management System and processor 401 are logically contiguous, so as to realize management charging, electric discharge and power managed etc. by power-supply management system Function.Power supply 403 can also include one or more direct current or AC power, recharging system, power failure monitor The random components such as circuit, power supply changeover device or inverter, power supply status indicator.
The electronic equipment may also include input unit 404, which can be used for receiving the number or character of input Information and generate keyboard, mouse, operating lever, optics or the trace ball signal related with user setting and function control Input.
Although being not shown, electronic equipment can also be including display unit etc., and details are not described herein.Specifically in the present embodiment In, the processor 401 in electronic equipment can correspond to the process of one or more application program according to following instruction Executable file be loaded into memory 402, and the application program being stored in memory 402 is run by processor 401, It is as follows so as to fulfill various functions:
Pending multi-medium data is received, and obtains the attribute information of the pending multi-medium data, wherein the attribute is believed Breath includes multiple list of labels, multimedia number, multimedia amount and multi-media tag, and each list of labels is for mark one The pending multi-medium data of class;
Judge whether multiple list of labels is consistent, if unanimously, according to the multimedia number and the multimedia amount The popular degree of the pending multi-medium data is generated, and the novelty of the pending multi-medium data is generated according to the multi-media tag Degree;
The hot topic degree and the novelty degree are analyzed and processed, if the hot topic degree is less than default cold threshold, and the novelty Degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data;And
According to the corresponding history multimedia data information of the user information, multiple label of the pending multi-medium data List, the hot topic degree and the novelty degree, handle the pending multi-medium data.
The electronic equipment can realize that any apparatus for processing multimedia data institute that the embodiment of the present invention is provided can be real Existing effective effect, refers to the embodiment of front, details are not described herein.
The electronic equipment of the embodiment of the present invention, by first obtaining the attribute information of pending multi-medium data, the wherein category Property information include multiple list of labels, multimedia number, multimedia amount and multi-media tag.Again to the consistent of list of labels Property analyzed, when consistent, hot topic degree is generated according to multimedia number and multimedia amount, and give birth to according to multi-media tag Into novel degree.It is less than default cold threshold in popular degree, and when novelty degree is more than default novelty threshold value, obtains pending multimedia The corresponding history multimedia data information of user information of data.Finally according to the history multimedia data information, list of labels, Popular degree and novelty degree handle the pending multi-medium data.The program is not only to multiple detections of pending multi-medium data Variable has carried out consistency analysis, and also pending multi-medium data is analyzed from profound level, effectively raises and waits to locate Manage the accuracy rate of multimedia-data procession.
There is provided herein the various operations of embodiment.In one embodiment, one or more operations can be with structure The computer-readable instruction stored on into one or more computer-readable mediums will make to succeed in one's scheme when being performed by electronic equipment It calculates equipment and performs the operation.Describing the sequences of some or all of operations, to should not be construed as to imply that these operations necessarily suitable Sequence is relevant.It will be appreciated by those skilled in the art that the alternative sequence of the benefit with this specification.Furthermore, it is to be understood that Not all operation must exist in each embodiment provided in this article.
Moreover, although the disclosure, this field skill has shown and described relative to one or more realization methods Art personnel will be appreciated that equivalent variations and modification based on the reading and understanding to the specification and drawings.The disclosure include it is all this The modifications and variations of sample, and be limited only by the scope of the following claims.In particular, to by said modules (such as element, Resource etc.) various functions that perform, the term for describing such component is intended to correspond to the specified work(for performing the component The random component (unless otherwise instructed) of energy (such as it is functionally of equal value), even if illustrated herein with execution in structure The disclosure exemplary implementations in function open structure it is not equivalent.In addition, although the special characteristic of the disclosure Through being disclosed relative to the only one in several realization methods, but this feature can with such as can be to given or specific application For be it is expected and one or more other features combinations of other advantageous realization methods.Moreover, with regard to term " comprising ", " tool Have ", " containing " or its deformation be used in specific embodiment or claim for, such term be intended to with term The similar mode of "comprising" includes.
Each functional unit in the embodiment of the present invention can be integrated in a processing module or each unit list Solely be physically present, can also two or more units be integrated in a module.Above-mentioned integrated module both may be used The form of hardware is realized, can also be realized in the form of software function module.If the integrated module is with software function The form of module is realized and is independent product sale or is situated between in use, a computer-readable storage can also be stored in In matter.Storage medium mentioned above can be read-only memory, disk or CD etc..Above-mentioned each device or system, can be with Perform the method in correlation method embodiment.
In conclusion although the present invention is disclosed above with embodiment, the serial number before embodiment only makes for convenience of description With not causing to limit to the sequence of various embodiments of the present invention.Also, above-described embodiment is not to limit the present invention, this field Those of ordinary skill, without departing from the spirit and scope of the present invention, can make it is various change and retouch, therefore the present invention The range that claim of protection domain being subject to defines.

Claims (15)

1. a kind of multimedia data processing method, which is characterized in that including:
Pending multi-medium data is received, and obtains the attribute information of the pending multi-medium data, wherein the attribute is believed Breath includes multiple list of labels, multimedia number, multimedia amount and multi-media tag, and each list of labels is for mark one The pending multi-medium data of class;
Judge whether the multiple list of labels is consistent, if unanimously, according to the multimedia number and the multimedia Amount generates the popular degree of the pending multi-medium data, and generates the pending multimedia number according to the multi-media tag According to novel degree;
The hot topic degree and the novelty degree are analyzed and processed, if the hot topic degree is less than default cold threshold, and described Novel degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data;And
According to the corresponding history multimedia data information of the user information, the multiple mark of the pending multi-medium data List, the hot topic degree and the novelty degree are signed, handles the pending multi-medium data.
2. multimedia data processing method according to claim 1, which is characterized in that described to judge the multiple label column The whether consistent step of table, including:
Based on default mapping relations, the label in the list of labels is mapped in default subject term library on corresponding subject term, with Each list of labels is made to form a corresponding subject term list;
Judge whether subject term list described in any two all has identical subject term;
If subject term list described in any two all has identical subject term, it is determined that the multiple list of labels is consistent.
3. multimedia data processing method according to claim 1, which is characterized in that the pending multi-medium data packet Pending text is included, and the attribute information step for obtaining the pending multi-medium data includes:
Based on default phrase template, nominal phrase is extracted from the pending text in the range of default number of words as mark Label, to form the list of labels;And/or
Based on TextRank algorithm, label is extracted from the pending text beyond default number of words range, to form the label List.
4. multimedia data processing method according to claim 1, which is characterized in that the pending multi-medium data packet Pending picture is included, and the attribute information step for obtaining the pending multi-medium data includes:
Extract noise figure, fuzziness and the exposure of the pending picture, according to preset formula to the pending picture into Row calculates, generation marking result;
Judge whether the marking result is less than preset fraction threshold value;
If the marking result extracts the label of the pending picture, to be formed not less than the preset fraction threshold value State list of labels.
5. multimedia data processing method according to claim 1, which is characterized in that the pending multi-medium data packet Pending audio is included, and the attribute information step for obtaining the pending multi-medium data includes:
Obtain the corresponding multiple default labels of the pending audio;
The multiple default label is clustered, obtains cluster labels, to form the list of labels.
6. according to the multimedia data processing method described in claim 1-5 any one, which is characterized in that described in the basis The corresponding history multimedia data information of user information, it is the multiple list of labels of the pending multi-medium data, described Popular degree and the novelty degree handle the pending multi-medium data step, including:
Obtain the corresponding history multimedia data information of the user information;
Multi-medium data will be retained as training set, the history multimedia data information, the multiple list of labels, institute It states popular degree and the novelty degree is trained as characteristic value input logic regression model, obtain training result;
Judge whether the training result meets preset condition;
If the training result meets preset condition, retain the pending multi-medium data.
7. according to the multimedia data processing method described in claim 1-5 any one, which is characterized in that described in the basis Multi-media tag generates the novel degree step of the pending multi-medium data, including:
Obtain the multi-media tag for having retained multi-medium data;
According to the multi-media tag for having retained multi-medium data and the multi-media tag of the pending multi-medium data, Determine the novel degree of the pending multi-medium data.
8. a kind of apparatus for processing multimedia data, which is characterized in that including:
Receiving module for receiving pending multi-medium data, and obtains the attribute information of the pending multi-medium data, Described in attribute information include multiple list of labels, multimedia number, multimedia amount and multi-media tag, each label column Table is used to mark a kind of pending multi-medium data;
Judgment module, for judging whether the multiple list of labels is consistent, if unanimously, according to the multimedia number and institute The popular degree that multimedia amount generates the pending multi-medium data is stated, and is treated according to multi-media tag generation Handle the novel degree of multi-medium data;
Analysis module, for being analyzed and processed to the hot topic degree and the novelty degree, if the hot topic degree is less than default heat Door threshold value, and the novelty degree is more than default novel threshold value, then obtains the user information of the pending multi-medium data;And
Processing module, for according to the corresponding history multimedia data information of the user information, the pending multimedia number According to the multiple list of labels, the hot topic degree and the novelty degree, handle the pending multi-medium data.
9. apparatus for processing multimedia data according to claim 8, which is characterized in that the judgment module includes:
For being based on default mapping relations, the label in the list of labels is mapped in default subject term library for mapping submodule On corresponding subject term, so that each list of labels forms a corresponding subject term list;
Judging submodule, for judging whether subject term list described in any two all has identical subject term;
First determination sub-module, for when subject term list described in any two all has identical subject term, determining the multiple mark It is consistent to sign list.
10. apparatus for processing multimedia data according to claim 8, which is characterized in that the pending multi-medium data Including pending text, and the receiving module includes:
First extracting sub-module for being based on default phrase template, is carried from the pending text in the range of default number of words Nominal phrase is taken as label, to form the list of labels;And/or
Second extracting sub-module for being based on TextRank algorithm, is extracted from the pending text beyond default number of words range Label, to form the list of labels.
11. apparatus for processing multimedia data according to claim 8, which is characterized in that the pending multi-medium data Including pending picture, and the receiving module includes:
Marking submodule, for extracting noise figure, fuzziness and the exposure of the pending picture, according to preset formula to institute It states pending picture to be calculated, generation marking result;
First judging submodule, for judging whether the marking result is less than preset fraction threshold value;
Extracting sub-module, for when the marking result is not less than the preset fraction threshold value, extracting the pending picture Label, to form the list of labels.
12. apparatus for processing multimedia data according to claim 8, which is characterized in that the pending multi-medium data Including pending audio, and the receiving module includes:
Acquisition submodule, for obtaining the corresponding multiple default labels of the pending audio;
Submodule is clustered, for being clustered to the multiple default label, cluster labels are obtained, to form the label column Table.
13. according to the apparatus for processing multimedia data described in claim 8-12 any one, which is characterized in that the processing mould Block includes:
Acquisition of information submodule, for obtaining the corresponding history multimedia data information of the user information;
Training submodule, for multi-medium data will to have been retained as training set, the history multimedia data information, described Multiple list of labels, the hot topic degree and the novelty degree are trained as characteristic value input logic regression model, are instructed Practice result;
Second judgment submodule, for judging whether the training result meets preset condition;
Retain submodule, for when the training result meets preset condition, retaining the pending multi-medium data.
14. according to the apparatus for processing multimedia data described in claim 8-12 any one, which is characterized in that the judgement mould Block includes:
Acquisition submodule is identified, for obtaining the multi-media tag for having retained multi-medium data;
Second determination sub-module, for having retained the multi-media tag of multi-medium data and pending more matchmakers according to The multi-media tag of volume data determines the novel degree of the pending multi-medium data.
15. a kind of storage medium, is stored with processor-executable instruction, which is provided such as by performing described instruction Any multimedia data processing method in claim 1-7.
CN201810044934.3A 2018-01-17 2018-01-17 Multimedia data processing method, device and storage medium Active CN108170845B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810044934.3A CN108170845B (en) 2018-01-17 2018-01-17 Multimedia data processing method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810044934.3A CN108170845B (en) 2018-01-17 2018-01-17 Multimedia data processing method, device and storage medium

Publications (2)

Publication Number Publication Date
CN108170845A true CN108170845A (en) 2018-06-15
CN108170845B CN108170845B (en) 2020-10-13

Family

ID=62514530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810044934.3A Active CN108170845B (en) 2018-01-17 2018-01-17 Multimedia data processing method, device and storage medium

Country Status (1)

Country Link
CN (1) CN108170845B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110688517A (en) * 2019-09-02 2020-01-14 平安科技(深圳)有限公司 Audio distribution method, device and storage medium
CN110704648A (en) * 2019-09-27 2020-01-17 北京达佳互联信息技术有限公司 Method, device, server and storage medium for determining user behavior attribute
CN112199564A (en) * 2019-07-08 2021-01-08 Tcl集团股份有限公司 Information filtering method and device and terminal equipment
CN115412375A (en) * 2022-11-01 2022-11-29 山东省电子信息产品检验院(中国赛宝(山东)实验室) Industrial Internet data protection system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528174A (en) * 2015-12-10 2016-04-27 广东欧珀移动通信有限公司 Song sharing method and user terminal
CN106227816A (en) * 2016-07-22 2016-12-14 北京小米移动软件有限公司 Push the method and device that song is single
US20170323277A1 (en) * 1999-08-27 2017-11-09 Zarbaña Digital Fund Llc Music distribution systems
CN107544963A (en) * 2016-06-23 2018-01-05 南京中兴软件有限责任公司 Multimedia file storage method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323277A1 (en) * 1999-08-27 2017-11-09 Zarbaña Digital Fund Llc Music distribution systems
CN105528174A (en) * 2015-12-10 2016-04-27 广东欧珀移动通信有限公司 Song sharing method and user terminal
CN107544963A (en) * 2016-06-23 2018-01-05 南京中兴软件有限责任公司 Multimedia file storage method and device
CN106227816A (en) * 2016-07-22 2016-12-14 北京小米移动软件有限公司 Push the method and device that song is single

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112199564A (en) * 2019-07-08 2021-01-08 Tcl集团股份有限公司 Information filtering method and device and terminal equipment
CN110688517A (en) * 2019-09-02 2020-01-14 平安科技(深圳)有限公司 Audio distribution method, device and storage medium
WO2021043101A1 (en) * 2019-09-02 2021-03-11 平安科技(深圳)有限公司 Audio assignment method and device, and storage medium
CN110688517B (en) * 2019-09-02 2023-05-30 平安科技(深圳)有限公司 Audio distribution method, device and storage medium
CN110704648A (en) * 2019-09-27 2020-01-17 北京达佳互联信息技术有限公司 Method, device, server and storage medium for determining user behavior attribute
CN115412375A (en) * 2022-11-01 2022-11-29 山东省电子信息产品检验院(中国赛宝(山东)实验室) Industrial Internet data protection system
CN115412375B (en) * 2022-11-01 2023-04-18 山东省信息技术产业发展研究院(中国赛宝(山东)实验室) Industrial Internet data protection system

Also Published As

Publication number Publication date
CN108170845B (en) 2020-10-13

Similar Documents

Publication Publication Date Title
US10558674B2 (en) Methods and apparatus for determining a mood profile associated with media data
Kaminskas et al. Location-aware music recommendation using auto-tagging and hybrid matching
CN110209844B (en) Multimedia data matching method, device and storage medium
WO2022078102A1 (en) Entity identification method and apparatus, device and storage medium
Shah et al. Advisor: Personalized video soundtrack recommendation by late fusion with heuristic rankings
CN109710841B (en) Comment recommendation method and device
CN108769772A (en) Direct broadcasting room display methods, device, equipment and storage medium
CN108170845A (en) Multimedia data processing method, device and storage medium
CN108304379A (en) A kind of article recognition methods, device and storage medium
CN108009228A (en) A kind of method to set up of content tab, device and storage medium
CN110209843A (en) Multimedia resource playback method, device, equipment and storage medium
CN106874279A (en) Generate the method and device of applicating category label
CN106789543A (en) The method and apparatus that facial expression image sends are realized in session
CN108304373A (en) Construction method, device, storage medium and the electronic device of semantic dictionary
CN104836720A (en) Method for performing information recommendation in interactive communication, and device
CN106919575A (en) application program searching method and device
CN108334601A (en) Song recommendations method, apparatus and storage medium based on label topic model
CN106202073A (en) Music recommends method and system
CN106528538A (en) Method and device for intelligent emotion recognition
CN110852047A (en) Text score method, device and computer storage medium
CN108133058A (en) A kind of video retrieval method
CN108920649A (en) A kind of information recommendation method, device, equipment and medium
CN110198482A (en) A kind of video emphasis bridge section mask method, terminal and storage medium
CN114328799A (en) Data processing method, device and computer readable storage medium
Kaushal et al. How good is a video summary? A new benchmarking dataset and evaluation framework towards realistic video summarization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant