CN104809114A - Video big data oriented parallel data mining method - Google Patents

Video big data oriented parallel data mining method Download PDF

Info

Publication number
CN104809114A
CN104809114A CN201410035192.XA CN201410035192A CN104809114A CN 104809114 A CN104809114 A CN 104809114A CN 201410035192 A CN201410035192 A CN 201410035192A CN 104809114 A CN104809114 A CN 104809114A
Authority
CN
China
Prior art keywords
video
data
large data
data mining
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410035192.XA
Other languages
Chinese (zh)
Inventor
宫夏屹
柴旭东
王恒
谢晓丹
曲慧杨
谷牧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simulation Center
Original Assignee
Beijing Simulation Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simulation Center filed Critical Beijing Simulation Center
Priority to CN201410035192.XA priority Critical patent/CN104809114A/en
Publication of CN104809114A publication Critical patent/CN104809114A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a video big data oriented parallel data mining method. The video big data oriented parallel data mining method comprises the steps of 1 establishing a video big data mining system, 2 using a big data indexing and description module to establish video big data index, 3 using a feature extraction and video abstraction accelerating module to accelerate the key information extraction process of video big data and 4 adopting a data mining algorithm and strategy module to mine video key information data. By means of the video big data oriented parallel data mining method, video data mining process can be optimized, algorithm applicability is improved, and video big data mining can be quickly and efficiently performed.

Description

A kind of parallel data mining method towards the large data of video
Technical field
The present invention relates to a kind of data digging method, particularly a kind of parallel data mining method towards the large data of video.
Background technology
Large data refer to cannot within a certain period of time with the data acquisition that traditional database software instrument captures its content, manage and processes.Large data have 4 characteristic features: the large scale of construction, diversity, value density is low, speed is fast.The public safety video of magnanimity has the characteristic feature of large data as a kind of unstructured data, and is important directions that large data mining is studied for the data mining of the large data of video, is also technological difficulties.Domestic research work in large data is analyzed, can find that the research of large data is at present also more scattered, mostly based on Hadoop technology on large data processing platform (DPP) framework, large quantifier elimination concentrates in the mining analysis method of large data, does not also form the correlation technique system supporting the exploitation of large data processing platform (DPP).And the research and apply of data mining technology in public security work is still in the starting stage, many public business infosystems also rest on primary treatment level, lack comprehensive Application and Development, and intelligentized analysis is studied and judged, scientific warning.Set up not yet completely towards the standards system of public business simultaneously.
Due in actual public safety service application, large data digging system is usually directed to the video data of magnanimity, and the description of the large data of video and video index are difficult to carry out; Frequently-used data mining algorithm does not consider the multi-class of data, it is made to be difficult to be suitable in the excavation of unstructured data, it is large that simultaneously traditional P mining method runs expense, the problem that adaptability is very poor, this just needs a kind of method effectively can carrying out index construct and P mining to the large data of video, to ensure the efficient analysis process of the large data of video, thus support the service application of public safety field.
Summary of the invention
The diversity had for the video data of the applications such as public security, the requirement that value density is low, processing speed is fast, study the P mining technology of large data, from aspects such as large data description, feature extraction, data mining and intelligent association analyses, the solution of integration is proposed.Generally speaking, provide a kind of parallel data mining method towards the large data of video herein, solve incidence relation in the large data of video and excavate, the problem that efficient, intelligent analysis links.
Object of the present invention is achieved through the following technical solutions:
Towards a parallel data mining method for the large data of video, the method comprises:
1) the large data digging system of video is built;
2) large data directory and describing module build the large data directory of video;
3) feature extraction, video frequency abstract accelerating module carries out key message leaching process to the large data of video provides acceleration;
4) Parallel Algorithms for Data Mining and policy module are excavated Video Key information data.
The large data digging system of described video comprises:
Large data directory and describing module, for building the index of the large data of video;
Feature extraction, video frequency abstract accelerating module, for carrying out intellectual analysis to the large data of support video, and realize the extraction of Video Key feature and the acceleration of video frequency abstract process based on CUDA;
Parallel Algorithms for Data Mining and policy module, for classifying to video data, association analysis.
The index of the large data of described video comprises access level index, R tree index and the category index of supporting all kinds of video data.
Described Parallel Algorithms for Data Mining and policy module adopt the improvement Apriori algorithm based on MapReduce programming model to carry out data mining to the large data of video, and concrete steps are as follows:
401) transaction database is carried out horizontal division by MapReduce storehouse, is divided into the data subset that n scale is suitable, and n data subset is sent to the node that m performs Map task;
402) n data subset is formatd, produce <key1, value1> couple, specifically be formatted as <Tid, list>, here Tid represents the transaction identifiers in transaction database, and list is list value corresponding to the affairs in transaction database;
403) task of Map function scans each record <Tid, the list> of the data subset of input, and produce the set of a local candidate, be denoted as Cp, the support counting of each candidate is 1;
404) on the machine of every platform execution Map task, an optional Combiner function is increased, first Map function exports once to merge in this locality and exports <itemsets by Combiner function, sup>, sup represents the support counting of itemsets in data subset, then utilize partition functions hash (key) mod R the middle key-value pair that Combiner function produces to be divided into the individual different subregion of R, each subregion is assigned to the Reduce function of specifying;
405) node being assigned with Reduce task reads the data <itemsets of Combiner function submission, sup>, because many different candidate item rallies are mapped to identical Reduce function, therefore to key assignments itemsets sort make to have same candidate item collection data aggregate together, form <itemsets, list (sup) >;
406) the item Lp of the output of r Reduce function after is relatively gathered, just obtain the set of final frequent item set, be denoted as L.
The invention has the advantages that:
This method achieves the foundation of the unified index towards the large data of video, can support to retrieve accessing video data rapidly.By introducing CUDA framework, concurrent technique is adopted to accelerate the extraction process of video feature extraction, video frequency abstract further; By introducing the improvement Apriori algorithm based on MapReduce programming model, optimizing video data digging process, improve algorithm applicability, making can carry out quickly and efficiently the data mining of the large data of video.Be applicable to that system scale is large, the large and data mining of the large data of video stored for formula respectively of the video data volume, be applicable to public safety field.
Accompanying drawing explanation
Fig. 1: the inventive method process flow diagram.
Embodiment
A kind of parallel data mining method towards the large data of video of the present invention is described in detail below in conjunction with Fig. 1.The concrete steps of the method are as follows.
The first step: build the large data digging system of video
The large data digging system of video comprises: large data directory and describing module, feature extraction, video frequency abstract module and Parallel Algorithms for Data Mining and policy module.Large data directory and describing module build the index of the large data of video, comprise level index, R tree index and category index etc. to support the access of all kinds of video data; Feature extraction, video frequency abstract accelerating module carry out intellectual analysis to the large data of support video, realize the extraction of Video Key feature and the acceleration of video frequency abstract process based on CUDA; Parallel Algorithms for Data Mining and policy module are classified to video data, association analysis.
Second step: large data directory and describing module build the large data directory of video
Large data directory and describing module adopt and store index model, by setting up level index tree, R sets index and category index and jointly forms a unified interface, namely construct a unified access interface and user interactions, user is conducted interviews to large data by this interface.
The large data of video have multi-class feature, and for this feature, setting up with classification is the category index of content, by the comprehensive inquiry of category index to required thematic data.It is a kind of hierarchical data structure dynamic index algorithm that R sets index, adopt minimum boundary rectangle (Minimum Bounding Rectangle, MBR) complicated spatial object is similar to, without the need to predicting the index range of whole survey region, be applicable to regional space data, therefore spatial data can adopt R to set sets up index, provide simple and query interface fast.Set up the relation between two kinds of index content, because MBR and category index cannot direct opening relationships, consider that separately setting up the 3rd stores index model to set up both contacts, and be supplied to the interface accessing public safety data of user, this interface can conduct interviews to two kinds of data simultaneously.Storing content that index model comprises MBR and R, to set index corresponding, comprises corresponding content and category index simultaneously and set up and contact.
3rd step: feature extraction, video frequency abstract accelerating module carries out key message leaching process to the large data of video provides acceleration, for leaching process accelerates.
After second step sets up the large data directory of video, feature extraction, video summarization system can carry out information extraction to the large data of video.Feature extraction, video frequency abstract accelerating module, based on CUDA framework, utilize the acceleration of method realization to feature extraction, video frequency abstract process of parallel processing.CUDA provides a very powerful processing platform of GPU easily, can provide the speed-up ratio of several times and even hundreds of times in Video processing.Based on CUDA framework, the treatment progress of feature extraction, video frequency abstract is divided into host end and device holds two parts, Host end refers to the part performed on CPU, and device end is then the part performed on display chip, and it can walk abreast and carry out video data process.The program of Device end is also called " kernel ".Usual host program of holding by after DSR, can copy in the internal memory of video card, then performs device end program by display chip, result is fetched from the internal memory of video card after completing by host program of holding again.
Under CUDA framework, least unit when display chip performs is thread.Several thread can form a block.Thread in a block can access the internal memory that same is shared, and can carry out synchronous action fast.The thread number that each block can comprise is limited.But, perform the block of same program, can grid be formed.Thread in different block cannot access same shared internal memory, therefore cannot directly intercommunication or carry out synchronously.Therefore, the degree that the thread in different block can cooperate is lower.But, utilize this pattern, program can be allowed not worry the thread number restriction that in fact display chip can perform simultaneously.Such as, one has the display chip seldom measuring performance element, the thread order in each block may be performed, and non-concurrent performs.Different grid then can perform different programs (i.e. kernel).The relation of Grid, block and thread.
Each thread has the space of own share register and local memory.Each thread in same block then has shared a share memory.In addition, all thread(comprise the thread of different block) all share a global memory, constantmemory and texture memory.Different grid then has respective global memory, constantmemory and texture memory.So just greatly can promote the processing speed of feature extraction to video data and video frequency abstract.
4th step: Parallel Algorithms for Data Mining and policy module are excavated Video Key information data
Parallel Algorithms for Data Mining and policy module adopt the improvement Apriori algorithm based on MapReduce programming model to carry out data mining to the large data of video.Service logic complicated in multiple programming can be carried out abstract by MapReduce programming model, represents simply calculating as interface, and all hides the parallelization process of complexity, fault-tolerant, Data distribution8 and load balance.
The execution step of the improvement Apriori algorithm of MapReduce programming model is as follows:
Step one: horizontal division is carried out by being used for storing the large data transactions database studied herein in MapReduce storehouse, is divided into the data subset that n scale is suitable, is sent to n data subset the node that m performs Map task.
Step 2: n data subset is formatd, produce <key1, value1> couple, specifically be formatted as <Tid, list>, here Tid represents the transaction identifiers in transaction database, and list is list value corresponding to the affairs in transaction database.
Step 3: the task of Map function scans each record <Tid, the list> of the data subset of input, and produce the set of a local candidate, be denoted as Cp, the support counting of each candidate is 1.Map function generates and exports middle <key2, value2> couple, and be defined as <itemsets here, 1> couple, itemsets represent the candidate in Cp.Here is the false code section of map:
Step 4: increase an optional Combiner function on the machine of every platform execution Map task, first Map function exports once to merge in this locality and exports <itemsets by Combiner function, sup>, sup represents the support counting of itemsets in data subset, then utilize partition functions hash (key) mod R the middle key-value pair that Combiner function produces to be divided into the individual different subregion of R, each subregion is assigned to the Reduce function of specifying.
Step 5: the node being assigned with Reduce task reads the data <itemsets of Combiner function submission, sup>, because many different candidate item rallies are mapped to identical Reduce function, therefore to key assignments itemsets sort make to have same candidate item collection data aggregate together, form <itemsets, list (sup) >.Intermediate data after the sequence of working terminal traversal, by <itemsets, list (sup) > passes to Reduce function, then Reduce function adds up the support counting of same candidate item collection itemsest, just obtain the actual support counting of this candidate in whole transaction database, then compare with minimum support counting min_sup, determine the set of Local frequent itemset, be denoted as Lp.
Step 6: the item Lp of the output of r Reduce function after is relatively gathered, just obtains the set of final frequent item set, be denoted as L.
Algorithm performs end.
Should be appreciated that above is illustrative and not restrictive by preferred embodiment to the detailed description that technical scheme of the present invention is carried out.Those of ordinary skill in the art can modify to the technical scheme described in each embodiment on the basis of reading instructions of the present invention, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (4)

1., towards a parallel data mining method for the large data of video, it is characterized in that, the method comprises:
1) the large data digging system of video is built;
2) large data directory and describing module build the large data directory of video;
3) feature extraction, video frequency abstract accelerating module carries out key message leaching process to the large data of video provides acceleration;
4) Parallel Algorithms for Data Mining and policy module are excavated Video Key information data.
2. a kind of parallel data mining method towards the large data of video according to claim 1, it is characterized in that, the large data digging system of described video comprises:
Large data directory and describing module, for building the index of the large data of video;
Feature extraction, video frequency abstract accelerating module, for carrying out intellectual analysis to the large data of support video, and realize the extraction of Video Key feature and the acceleration of video frequency abstract process based on CUDA;
Parallel Algorithms for Data Mining and policy module, for classifying to video data, association analysis.
3. a kind of parallel data mining method towards the large data of video according to claim 2, is characterized in that, the index of the large data of described video comprises access level index, R tree index and the category index of supporting all kinds of video data.
4. according to a kind of parallel data mining method towards the large data of video according to claim 1, it is characterized in that, described Parallel Algorithms for Data Mining and policy module adopt the improvement Apriori algorithm based on MapReduce programming model to carry out data mining to the large data of video, and concrete steps are as follows:
401) transaction database is carried out horizontal division by MapReduce storehouse, is divided into the data subset that n scale is suitable, and n data subset is sent to the node that m performs Map task;
402) n data subset is formatd, produce <key1, value1> couple, specifically be formatted as <Tid, list>, here Tid represents the transaction identifiers in transaction database, and list is list value corresponding to the affairs in transaction database;
403) task of Map function scans each record <Tid, the list> of the data subset of input, and produce the set of a local candidate, be denoted as Cp, the support counting of each candidate is 1;
404) on the machine of every platform execution Map task, an optional Combiner function is increased, first Map function exports once to merge in this locality and exports <itemsets by Combiner function, sup>, sup represents the support counting of itemsets in data subset, then utilize partition functions hash (key) mod R the middle key-value pair that Combiner function produces to be divided into the individual different subregion of R, each subregion is assigned to the Reduce function of specifying;
405) node being assigned with Reduce task reads the data <itemsets of Combiner function submission, sup>, because many different candidate item rallies are mapped to identical Reduce function, therefore to key assignments itemsets sort make to have same candidate item collection data aggregate together, form <itemsets, list (sup) >;
406) the item Lp of the output of r Reduce function after is relatively gathered, just obtain the set of final frequent item set, be denoted as L.
CN201410035192.XA 2014-01-24 2014-01-24 Video big data oriented parallel data mining method Pending CN104809114A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410035192.XA CN104809114A (en) 2014-01-24 2014-01-24 Video big data oriented parallel data mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410035192.XA CN104809114A (en) 2014-01-24 2014-01-24 Video big data oriented parallel data mining method

Publications (1)

Publication Number Publication Date
CN104809114A true CN104809114A (en) 2015-07-29

Family

ID=53693945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410035192.XA Pending CN104809114A (en) 2014-01-24 2014-01-24 Video big data oriented parallel data mining method

Country Status (1)

Country Link
CN (1) CN104809114A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105975544A (en) * 2016-04-28 2016-09-28 天津贝德曼科技有限公司 Big data mining-based ''special technique library'' construction technology
CN106126341A (en) * 2016-06-23 2016-11-16 成都信息工程大学 It is applied to many Computational frames processing system and the association rule mining method of big data
CN106708620A (en) * 2015-11-13 2017-05-24 苏宁云商集团股份有限公司 Data processing method and system
CN107273435A (en) * 2017-05-23 2017-10-20 北京环境特性研究所 Video personnel's fuzzy search parallel method based on MapReduce
CN107707328A (en) * 2016-08-08 2018-02-16 北京京东尚科信息技术有限公司 Summary info transmission method and device
CN110399397A (en) * 2018-04-19 2019-11-01 北京京东尚科信息技术有限公司 A kind of data query method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663005A (en) * 2012-03-19 2012-09-12 杭州海康威视***技术有限公司 Mass video file storage system based on cloud computation, analysis method and system thereof
US20120275363A1 (en) * 2009-10-23 2012-11-01 Zte Corporation Method and system for realizing carrier control

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120275363A1 (en) * 2009-10-23 2012-11-01 Zte Corporation Method and system for realizing carrier control
CN102663005A (en) * 2012-03-19 2012-09-12 杭州海康威视***技术有限公司 Mass video file storage system based on cloud computation, analysis method and system thereof

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
佚名: "大数据技术引领视频监控发展", 《HTTP://WWW.CTIFORUM.COM/NEWS/GUANDIAN/379338.HTML》 *
劳定雄: "视频监控大数据的关键技术和应用", 《HTTP://WWW.CSPMAG.CN/JSCX/JCJS/201401/673.HTML》 *
孙元成: "基于Hadoop的视频监控数据中心关键支撑技术研究与应用", 《中国优秀硕士学位论文全文数据库》 *
张敏: "云计算环境下的并行数据挖掘策略研究", 《中国优秀硕士学位论文全文数据库》 *
韩海雯: "基于云计算的广域级视频监控综合业务平台", 《计算机工程与设计》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708620A (en) * 2015-11-13 2017-05-24 苏宁云商集团股份有限公司 Data processing method and system
CN105975544A (en) * 2016-04-28 2016-09-28 天津贝德曼科技有限公司 Big data mining-based ''special technique library'' construction technology
CN106126341A (en) * 2016-06-23 2016-11-16 成都信息工程大学 It is applied to many Computational frames processing system and the association rule mining method of big data
CN107707328A (en) * 2016-08-08 2018-02-16 北京京东尚科信息技术有限公司 Summary info transmission method and device
CN107707328B (en) * 2016-08-08 2020-11-24 北京京东尚科信息技术有限公司 Abstract information transmission method and device
CN107273435A (en) * 2017-05-23 2017-10-20 北京环境特性研究所 Video personnel's fuzzy search parallel method based on MapReduce
CN110399397A (en) * 2018-04-19 2019-11-01 北京京东尚科信息技术有限公司 A kind of data query method and system

Similar Documents

Publication Publication Date Title
Malicevic et al. Everything you always wanted to know about multicore graph processing but were afraid to ask
CN104809114A (en) Video big data oriented parallel data mining method
Breß et al. Why it is time for a HyPE: A hybrid query processing engine for efficient GPU coprocessing in DBMS
Zhang et al. Spatial queries evaluation with mapreduce
Wen et al. Exploiting GPUs for efficient gradient boosting decision tree training
CN104933095A (en) Heterogeneous information universality correlation analysis system and analysis method thereof
Li et al. Research on clustering algorithm and its parallelization strategy
CN104820708B (en) A kind of big data clustering method and device based on cloud computing platform
CN104834557B (en) A kind of data analysing method based on Hadoop
Zhang et al. Large-scale spatial data processing on GPUs and GPU-accelerated clusters
Orakzai et al. k/2-hop: fast mining of convoy patterns with effective pruning
CN107341210B (en) C-DBSCAN-K clustering algorithm under Hadoop platform
You et al. Spatial join query processing in cloud: Analyzing design choices and performance comparisons
Yan et al. A parallel algorithm for mining constrained frequent patterns using MapReduce
CN111475837B (en) Network big data privacy protection method
CN103995827A (en) High-performance ordering method for MapReduce calculation frame
Chen et al. HiClus: Highly scalable density-based clustering with heterogeneous cloud
Güvenoglu et al. A qualitative survey on frequent subgraph mining
CN105302551A (en) Orthogonal decomposition construction and optimization method and system for big data processing system
Zoraghchian et al. Parallel frequent itemsets mining using distributed graphic processing units
CN104834733A (en) Big data mining and analyzing method
CN114138679A (en) Test data construction method and device, computer readable medium and electronic equipment
Chong et al. A Multi-GPU framework for in-memory text data analytics
Gao et al. Construction and Optimization of Co-occurrence-attribute-interaction Model for Column Semantic Recognition.
Zhang et al. Data Parallel Quadtree Indexing and Spatial Query Processing of Complex Polygon Data on GPUs.

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150729