WO2020095408A1

WO2020095408A1 - Information processing apparatus, system, method and program

Info

Publication number: WO2020095408A1
Application number: PCT/JP2018/041485
Authority: WO
Inventors: Salita SOMBATSIRI
Original assignee: Nec Corporation
Priority date: 2018-11-08
Filing date: 2018-11-08
Publication date: 2020-05-14
Also published as: JP2022505985A; JP7156521B2

Abstract

An information processing apparatus (1) includes a construction unit (11) configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels, an evaluation unit (12) configured to evaluate quality of each of the hierarchy structure information, and an choice unit (13) configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

Description

INFORMATION PROCESSING APPARATUS, SYSTEM, METHOD AND PROGRAM

　　The present disclosure relates to an information processing apparatus, system, method and program. In particular, the present disclosure relates to an information processing apparatus, system, method and program for multi-label classification.

　　<Part1 Explanation of Multi-Label Classification>
　　A multi-label classification assigns more than one label, each of which represents a characteristic, to each input instance. Its application includes image labelling, music classification, text labelling, object detection, human tracking, scene labelling, and etc. Assuming a multi-label classification problem that labelling an input instance with L_c label.

　　The problem is formulated as follows:
　　(1) Regarding Input,

　　(2) Regarding Output,

　　(3) Regarding Objective,

　　Prior arts in multi-label classification includes a Binary Relevance (BR), a Classifier Chain (CC), a Recurrent Neural Networks (RNN)-based or recurrent method, and a Convolutional Neural Networks (CNN)-based method.

　　<Part2 BR>
　　Non-Patent Literature 1 discloses the BR, which is a multi-label classification method using multiple binary classifiers. As illustrated in Fig. 21, for labelling an input instance 210, a BR system 220 includes L_c binary classifiers C₁, C₂, …, C_Lc to assign L_c labels l₁, l₂, …, l_Lc as output labels 230. Each binary classifier C_i is responsible for assigning or predicting the presence or absence of label l_i to the input instance 210. Specifically, C ₁ 221, C ₂ 222, …, and C _Lc 223 assign l ₁ 231, l ₂ 232, …, l _Lc 233, respectively. The value l_i = 1 means that the characteristic of label l_i is present and the value l_i = 0 means that the characteristic of label l_i is absent. Each classifier works independently, so the computation of all classifiers can be parallelized.

　　<Part3 CC>
　　Non-Patent Literature 2 discloses the CC, which is a multi-label classification method that labels the input instance 310 using cascaded binary classifiers as shown in Fig. 22 in order to leverage correlation between labels. The CC system 320 includes L_c binary classifiers C₁, C₂, …, C_Lc that are organized in a cascaded manner to assign L_c labels as output labels 330. The classifier C_i 321 uses the input instance 310 and labels l₁ to l_i-1 assigned by previous classifiers C₁ to C_i-1 in the chain to assign the value of label l_i. For example, C₁ 321 assigns l ₁ 331 first from input instance 310, and then, C ₂ 322 assigns l ₂ 332 using input instance 310 and l₁ as its input, and so on. Finally, C _Lc 323 assigns l _Lc 333 using input instance 310 and l₁, l₂, …, l_Lc-1 as its input.

　　<Part4 RNN-based>
　　The RNN-based or recurrent method captures label correlation with repeated label assignment or prediction process.

Non-Patent Literatures

3, 4, and 5 disclose the RNN-based method and Patent Literature 1 discloses the recurrent method. As illustrated in Fig. 23, these methods feed the previously predicted labels 440 back to the label prediction process 420 and assign output labels 430 sequentially based on the input instance 410 and the previously predicted labels 440. These methods are capable of capturing label correlation.

　　<Part5 CNN-based>
　　Non-Patent

Literatures

6, 7, and 8 disclose the CNN-based method in which a CNN functions as binary descriptor and solves multi-label classification by interpreting each bit as one label. The method learns both features and binary representation at the same time through an objective-based loss function.

PTL 1: US Patent Application Publication No. US20180157743A1

NPL 1: Tsoumakas et al., "Mining Multi-label Data", Data Mining and Knowledge Discovery Handbook, PP. 667-685, 2010
NPL 2: Read et al., "Classifier Chain for Multi-label Classification", the European Conference on Machine Learning and Knowledge Discovery in Databases, PP. 254-269, 2009
NPL 3: Wang et al., "CNN-RNN: A Unified Framework for Multi-label Image Classification", CVPR2016, PP. 2285-2294, 2016
NPL 4: Chen et al., "Order-free RNN with Visual Attention for Multi-label Classification", AAAI-18, 2018
NPL 5: Yang et al., "Deep Learning with a Rethinking Structure for Multi-label Classification", arXiv preprint arXiv:1802.01697, February 15, 2018
NPL 6: Liong et al., "Deep Hashing for Compact Binary Codes Learning", CVPR2015, PP. 2475-2483, 2015
NPL 7: Lin et al., "Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks", CVPR2016, PP. 1183-1192, 2016
NPL 8: Zhang et al., "Instance Similarity Deep Hashing for Multi-label Image Retrieval", arXiv preprint arXiv:1803.02987, March 19, 2018

　　However, each of the above mentioned techniques has the following problems. First, in the BR, if the BR system fails to capture label correlation, it may lead to low accuracy and the assigned labels may conflict with each other. Further, in the CC, since the assignment of each label depends on the other labels, the CC can exploit label correlation well, but it cannot be processed in parallel and finding an optimal label order is very difficult. Moreover, the RNN-based or recurrent method is costly in terms of the amount of computation for label prediction in each recurrent step and it may take as many label prediction time steps as the number of labels. Furthermore, in the CNN-based method, even though the method are highly parallel, the loss function forces uncorrelated bits in the output binary code. Therefore, there is a problem that it is difficult to efficiently leverage the label correlation in the multi-label classification.

　　The present disclosure has been accomplished to solve the above problems and an object of the present disclosure is thus to provide an information processing apparatus, system, method and program to efficiently leverage the label correlation in the multi-label classification.

　　An information processing apparatus according to a first exemplary aspect of the present disclosure includes a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels; an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　An information processing system according to a second exemplary aspect of the present disclosure includes a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels; an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　An information processing method according to a third exemplary aspect of the present disclosure includes constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels; evaluating quality of each of the hierarchy structure information; and choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　A non-transitory computer readable medium storing a program according to a fourth exemplary aspect of the present disclosure, the program causes a computer to execute: a process for constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels; a process for evaluating quality of each of the hierarchy structure information; and a process for choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　According to the exemplary aspects of the present disclosure, it is possible to provide an information processing apparatus, system, method and program for efficiently leveraging the label correlation in the multi-label classification.

Fig. 1 is a block diagram illustrating a structure of a first exemplary embodiment of the present disclosure. Fig. 2 is a diagram illustrating a structure of a classifier hierarchy according to the first exemplary embodiment of the present disclosure. Fig. 3 illustrates the theoretical trade-off graph between Parallelism and Accuracy with the examples of classifier hierarchy candidates according to the first exemplary embodiment of the present disclosure. Fig. 4 is a flowchart for explaining an information processing method according to the first exemplary embodiment of the present disclosure. Fig. 5 is a table illustrating an example of classifier hierarchy for a multi-label classification. Fig. 6 is a table illustrating an example of classifier hierarchy for a multi-label classification. Fig. 7 is a block diagram illustrating a structure of a second exemplary embodiment of the present disclosure. Fig. 8 is a flowchart for explaining an information processing method according to the second exemplary embodiment of the present disclosure. Fig. 9 illustrates the coexistence of labels. Fig. 10 shows an example of a Graphical User Interface (GUI) according to a third exemplary embodiment of the present disclosure. Fig. 11 is a block diagram illustrating a structure of a fourth exemplary embodiment of the present disclosure. Fig. 12 is a flowchart for explaining an information processing method according to the fourth exemplary embodiment of the present disclosure. Fig. 13 illustrates an example of GUI for choosing or alternating sub-problem partitioning according to the fourth exemplary embodiment of the present disclosure. Fig. 14 illustrates an example of GUI for choosing or alternating usage of coexistence/correlation according to the fourth exemplary embodiment of the present disclosure. Fig. 15 is a block diagram showing a configuration of a classifier hierarchy evaluation unit according to a fifth exemplary embodiment of the present disclosure. Fig. 16 is a flowchart for explaining an evaluation method according to the fifth exemplary embodiment of the present disclosure. Fig. 17 is a block diagram showing a configuration of an information processing apparatus according to a sixth exemplary embodiment of the present disclosure. Fig. 18 is a flowchart for explaining an information processing method according to the sixth exemplary embodiment of the present disclosure. Fig. 19 is a block diagram showing a configuration of an information processing system according to the sixth exemplary embodiment of the present disclosure. Fig. 20 shows a concept of internal structures of a hierarchy structure information and quality, respectively. Fig. 21 is a diagram illustrating a structure of a Binary Relevance system for multi-label classification. Fig. 22 is a diagram illustrating a structure of a Classifier Chain system for multi-label classification. Fig. 23 is a diagram illustrating a structure of a RNN-based or recurrent method for multi-label classification. Fig. 24 is a block diagram showing a configuration of the system according to the related art.

　　Hereinafter, specific embodiments to which the present disclosure including the above-described example aspects is applied will be described in detail with reference to the drawings. In the drawings, the same elements are denoted by the same reference signs, and repeated descriptions are omitted for clarity of the description.

　　The problem to be solved by the present disclosure is explained below by another aspect. There are two problems in the above mentioned multi-label classification system.

　　A first problem is that label correlation cannot be leveraged in multi-label classification without either a large amount of repeated computation or losing the ability of inter-label parallelism like the CC and RNN-based or recurrent methods. The reason for the occurrence of the first problem is label dependencies in capturing label correlation. Since the prediction of a label depends on the prediction of some previous labels, the prediction of all labels cannot take place at once. The CC predicts each label in a cascaded manner, while the RNN-based and recurrent methods iteratively predict labels of an input instance. Meanwhile, the methods that are capable of inter-label parallelism, such as BR and CNN-based methods, cannot take advantage of label correlation to improve prediction accuracy.

　　A second problem is a large architecture space of classifier hierarchies, aka cascaded classifiers, that finding an optimal classifier architecture for multi-label classification manually is costly in terms of man hour. Hereinafter, in order to explain the second problem, an example of a system according to the above mentioned related art.

　　Fig. 24 is a block diagram showing a configuration of the system according to the above mentioned related art. A multi-label classification system 1900 includes a classifier hierarchy construction unit 1910 and a classifier hierarchy evaluation unit 1920. The classifier hierarchy construction unit 1910 receives a label list 21, training data 22 and constraints 23 as input from the outside, constructs a classifier hierarchy and outputs the constructed classifier hierarchy to the classifier hierarchy evaluation unit 1920. Note that, the label list 21 is a set of a plurality of labels and the like. The training data 22 is a set of an input data and its correct label pairs. The constraints 23 is predetermined constraints. The classifier hierarchy evaluation unit 1920 receives the constructed classifier hierarchy from the classifier hierarchy construction unit 1910, generates a classifier chain 48 or a cascaded classifier 49 based on the constructed classifier hierarchy and outputs the classifier chain 48 or the cascaded classifier 49 to the outside. The classifier hierarchy evaluation unit 1920 also evaluates the classifier chain 48 or the cascaded classifier 49.

　　Therefore, the system according to the above mentioned techniques manually decide labelling order to leverage label correlation based on statistical basis like occurrence frequencies, construct and evaluate the CC or cascaded classifiers as shown in Fig. 24, which is inefficient and the order may not be optimal.

　　The reason for the occurrence of the second problem is that multiple labels of multi-label classification can be correlated in several ways. As a consequence, there are a large number of possible classifier hierarchies in terms of label order and maximizing parallelism.

　　Thus, embodiments for solving at least one of the above-described problems will be described hereinafter.

<First exemplary embodiment>
　　<Explanation of Structure>
　　First, a first exemplary embodiment of the present disclosure is elaborated below referring to the accompanying drawings.
　　Fig. 1 is a block diagram illustrating the structure of the first exemplary embodiment of the present disclosure. A classifier hierarchy exploration system 100 includes a classifier hierarchy exploration unit 110, a classifier hierarchy construction unit 120 and a classifier hierarchy evaluation unit 130.

　　The classifier hierarchy exploration system 100 can be implemented using, but not limited to, a general-purpose processor system or a specific circuit, such as Graphic Processing Unit (GPU), Application-Specific Integrated Circuit (ASIC) and Application-Specific Instruction set Processor (ASIP), and a reconfigurable devices, such as Field Programmable Gate Array (FPGA).

　　An input of the classifier hierarchy exploration system 100 includes a label list 21, a training data 22 ({X, Y}) and a constraints 23. The label list 21 includes, for example, a plurality of labels and sub-problems of the multi-label classification. The label indicates for example a type of label or label name. The multi-label classification may be partitioned into smaller classification sub-problems of binary, multi-class, or multi-label classification by grouping one or more labels to be predicted by the same classifier. The training data 22 includes pairs of input data and output labels of the multi-label classification for training and validation, and/or label coexistence or correlation. Note that, the output labels included the training data 22 must align with the label list 21.

　　The constraints 23 includes conditions, regulations, priorities, or limitation to be raised by the user. An example of Constrains includes, but not limited to, information of hardware, classification accuracy, execution time for the computation in inference phase on the specified hardware (referred to as inference time for the rest of this specification), constraints regarding dependencies of sub-problems, and priority/criteria in choosing optimal candidates, e.g. candidate that yield highest accuracy or candidate that consumes shortest inference time when performing classification in inference phase (model deployment). The information of hardware means a specification of the hardware, which includes, but not limited to, available storage, computing unit (computing capacity), hardware architecture, etc. The hardware can be, but not limited to, a specific circuit, such as Application-Specific Integrated Circuit (ASIC) and Application-Specific Instruction set Processor (ASIP), and a reconfigurable devices, such as Field Programmable Gate Array (FPGA).

　　An output of the classifier hierarchy exploration system 100 includes at least one classifier hierarchy candidate 41. The classifier hierarchy candidate 41 indicates hierarchical architecture of the classifiers.

　　Fig. 2 is a diagram illustrating the hierarchical architecture of the classifiers according to the first exemplary embodiment of the present disclosure. The hierarchical architecture of the classifier is, for example, the cascaded organization of label classifiers that predict output labels 550. In other words, The hierarchical architecture of the classifier has a plurality of classifier layers and the plurality of classifier layers are executed in series. Each of classifier layers consists of one or more classifiers.

　　The classifiers C₁ to C_m belong to the classifier layer 1 520 and are aligned therein. Each of classifiers C₁ to C_m predicts labels l₁ to l_m from input instance or features 510, respectively. Note that, Each of classifiers C₁ to C_m can perform parallel processing. The classifier C_m+1 to C_n belong to the classifier layer 2 530 and are aligned therein. Each of classifiers C_m+1 to C_n predicts labels l_m+1 to l_n from the input instance or features 510 and some or all labels l₁ to l_m, respectively. Note that, Each of classifiers C_m+1 to C_n can perform parallel processing. There may be more than two classifier layers in the hierarchical architecture of the classifiers. Therefore, classifiers belonging to the classifier layer k (k is a natural number of 2 or more.) 540 are aligned therein and each of the classifiers belonging to the classifier layer k 540 predicts labels from the input instance or features 510 and some or all labels predicted by the previous classifier layers, respectively. The classifiers can be one or a mixture of machine learning or static classification models, such as, but not limited to, binary classifier, Support Vector Machine, and Neural Networks.

　　Moreover, the classifier hierarchy candidate 41 may include their evaluation results. The evaluation results includes the validation metrics and/or estimated inference time.

　　The classifier hierarchy exploration system 100 searches the architecture space exploration of classifier hierarchy as a part of training phase to find optimal classifier hierarchy candidates. Those classifier hierarchy candidates will further be deployed in inference phase of the classification problem and operated on the hardware specified by the Information of hardware. That is, the classifier hierarchy exploration system 100 choose at least one candidate for multi label classification model from the plurality of classifier hierarchy candidates.

　　The above mentioned units generally operate as follows.
　　The classifier hierarchy exploration unit 110 receives the label list 21, the training data 22 and the constraints 23 from the outside of classifier hierarchy exploration system 100 and controls an exploration flow. The exploration flow means exploring a list of classifier hierarchy candidates. Further, The exploration flow includes constructing, evaluating, and selecting optimal candidates. Note that, the list includes candidates repeatedly constructed by the classifier hierarchy construction unit 120 to be described later. Therefore, the classifier hierarchy exploration unit 110 generates some information, such as portioned sub-problems or multiple classifier's information, for the classifier hierarchy construction unit 120 to construct a hierarchy structure information from the label list 21 and the like and outputs the generated information to the classifier hierarchy exploration unit 110. Further, the classifier hierarchy exploration unit 110 searches the exploration flow and selects optimal candidates from the list of classifier hierarchy candidates according to the constraints 23, a classifier hierarchy 31 and an evaluation results 32 which are received from the classifier hierarchy evaluation unit 130 to be described later. The classifier hierarchy 31 was constructed by the classifier hierarchy construction unit 120. The evaluation results 32 was evaluated by the classifier hierarchy evaluation unit 130. The classifier hierarchy exploration unit 110 outputs the selected candidates to the outside.

　　The classifier hierarchy construction unit 120 constructs one classifier hierarchy candidate as illustrated in Fig. 2 based on the information received from the classifier hierarchy exploration unit 110. The classifier hierarchy candidate is an example of a hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure. Each of the plurality of groups is a group of the plurality of classifiers. The above mentioned classifier layer is an example of the group. Each of classifiers assigns/predicts one or more labels of a sub-problems. In other words, each of classifiers classifies input data into a plurality of labels. The classifiers in each of the classifier layers are independent of each other, hence they can operates in parallel on the hardware specified by the information of hardware in inference phase. The classifiers in different classifier layers are dependent and operate sequentially. The classifier hierarchy candidates are constructed considering one or more of the followings: (1) constraints regarding dependencies of sub-problems; (2) information of hardware; (3) label coexistence or correlation.

　　The classifier hierarchy evaluation unit 130 evaluates quality of the classifier hierarchy candidate constructed by the classifier hierarchy construction unit 120. The quality includes the validation metrics and/or estimated inference time. Examples of the validation metrics are, but not limited to, classification accuracy, precision, and recall. Then, the classifier hierarchy evaluation unit 130 feed the classifier hierarchy candidate 31 and their quality (the evaluation results 32) back to the classifier hierarchy exploration unit 110 for candidate selection.

　　The classifier hierarchy exploration unit 110, classifier hierarchy construction unit 120 and classifier hierarchy evaluation unit 130 mutually operate in such a way that the classifier hierarchy exploration system 100 gives one or more classifier hierarchy candidates 41 as output. On one hand, the classifier hierarchy exploration system 100 may output the best classifier hierarchy candidate according to the priority/criteria constraint. For example, if the priority/criteria constraint is the highest accuracy, the classifier hierarchy exploration system 100 outputs the classifier hierarchy candidate that consumes the shortest inference time among the classifier hierarchy candidates that yield the highest accuracy. Another example is that if the priority/criteria constraint is the shortest inference time, the classifier hierarchy exploration system 100 outputs the classifier hierarchy candidate that yields the highest accuracy among the classifier hierarchy candidates that consumes the shortest inference time. On the other hand, the classifier hierarchy exploration system 100 may output Pareto optimal solutions of classifier hierarchy candidates along the Pareto frontier of the trade-off graph between the capability of parallelism and classification accuracy.

　　In multi-objective optimization problem that multiple objective functions are optimized simultaneously, a solution that optimizes all objective functions usually does not exist. Instead, there exists a number of solutions that are optimized for some objective functions. Pareto optimal solutions refers to the solutions that no objective values can be improved without degrading other objective values. Pareto frontier refers to the boundary formed by the Pareto optimal solutions in the multi-objective trade-off plot.

　　The present disclosure considers, but not limited to, the capability of parallelism (referred to as Parallelism in Fig. 3) and Classification accuracy (referred to as Accuracy in Fig. 3) as objective functions. An alternative for the capability of parallelism can be inference time and an alternative for accuracy can be label correlation or other validation metrics. Fig. 3 illustrates the theoretical trade-off graph between Parallelism and Accuracy with the examples of classifier hierarchy candidates under the assumption that the accuracy is improved when label coexistence/correlation is leveraged by using cascaded classifier layers (each of which is referred to as CL in Fig. 3). Higher parallelism implies shorter inference time. The BR 610 contains the highest level of inter-label parallelism among all other

classifier hierarchy candidates

620, 630, and 640, but yields the lowest accuracy. On the contrary, CC 640 is incapable of inter-label parallelism, but yields the highest accuracy since it leverages label correlations through cascaded classifiers. The classifier hierarchy candidates shown with

item

620 and 630 are the compromise of the BR 610 and the CC 640. They are capable of both leveraging inter-label parallelism and label correlation in order to achieve a higher accuracy or shorter inference time.

　　The CL1 621 and CL2 622 may assign different number of labels. Likewise, the CL1 631, CL2 632 and CL3 633 may assign different number of labels. The CL1 621 and CL2 622 assign a larger number of labels than the CL1 631, CL2 632 and CL3 633. From the Fig. 3, the classifier hierarchy candidate 620 can operates with a higher level of parallelism, while the classifier hierarchy candidate 630 can leverage more label correlation with more CLs. Therefore, the classifier hierarchy candidate 620 consumes shorter inference time, but the classifier hierarchy candidate 630 achieves higher accuracy. The present disclosure is not limited to the multi-label classification that the output label is either 0 (the characteristic represented by each label is absence) and 1 (the characteristic represented by each label is presence),

but also applicable to the one that the output is real number,

　　<Description of Operation>
　　Next, referring to flowcharts in Fig. 4, the operation of the information processing method according to the first exemplary embodiment of the present disclosure elaborated.

　　First, the classifier hierarchy exploration unit 110 receives an input including the label list 21, the training data 22 ({X, Y}) and the constraints 23, and initializes an empty list of classifier hierarchy candidates (step A1). Then, according to the sub-problems specified in the label list 21, the classifier hierarchy exploration unit 110 partitions a multi-label classification problem into multiple binary, multi-class, or multi-label classification sub-problems (step A2), each of which assigns/predicts multiple labels with one classifier. This step A2 can be seen as grouping multiple individual labels into one sub-problem. Next, step A3 to A8 operates repeatedly.

　　The classifier hierarchy construction unit 120 construct a new classifier hierarchy candidate (step A3) by organizing classifiers of sub-problems into one or more classifier layers of the classifier hierarchy in Fig. 2. The classifier hierarchy evaluation unit 130 evaluates quantitative quality of the constructed new classifier hierarchy candidate (step A4). The evaluated quality includes such as validation metrics and inference time.

　　The classifier hierarchy exploration unit 110 compares the new classifier hierarchy candidate constructed in step A3 with existing classifier hierarchy candidate in the list of classifier hierarchy candidates (step A5). If the new candidate satisfies all constraints 23 and is the current Pareto optimal or the best candidates (step A6, YES), the classifier hierarchy exploration unit 110 stores the new candidate into the list of classifier hierarchy candidates (step A7). Then, if one or more existing candidates in the list of classifier hierarchy candidates is no longer Pareto optimal or the best candidates, the classifier hierarchy exploration unit 110 disposes the one or more existing candidates from the list of classifier hierarchy candidates (step A8). Finally, if all candidates are searched (step A8, YES), the classifier hierarchy exploration system 100 ends its operation. Otherwise (step A8, NO), the classifier hierarchy exploration system 100 repeats step A3 to construct new classifier hierarchy candidate.

　　In step A3, classifier hierarchy candidate construction considers the constraints 23, and/or label coexistence/correlation (included in the training data 22). Usually, when the labels are independent or uncorrelated, organizing the classifiers of sub-problems in the same classifier layer to parallelize inter-label computation reduces inference time. When the labels exhibits coexistence or correlation in both positive (labels usually exist simultaneously or values increases simultaneously) and negative (labels never exist simultaneously or values increases as the other decreases) way, organizing the classifiers of sub-problems in different classifier layer to leverage label coexistence/correlation improves validation metrics, e.g. accuracy.

　　The classifier hierarchy exploration system 100 is capable of offering one or more Pareto optimal classifier hierarchy candidates. It balances the trade-off between Parallelism or inference time and classification accuracy or other validation metrics during classifier hierarchy candidate construction. Due to the limitation in the available hardware resource, parallelizing inter-label classifier might not reduce inference time. In such case, organizing them in different classifier layers might improve validation metrics without increasing inference time. On the other hand, sacrificing improvement of validation metrics by organizing the classifiers of sub-problems in the same classifier layer reduces inference time in the case that there are idle hardware resources.

　　The classifier hierarchy exploration system 100 explores or searches architecture space exploration of classifier hierarchy and constructs classifier hierarchy candidates using, but not limited to, a brute force, a search tree method, a simulated annealing, or a genetic algorithm. The classifier hierarchy candidates may be constructed without the consideration of label coexistence/correlation using the brute force and the search tree. However, the consideration of label coexistence/correlation benefits in shorter time for exploration because it narrows architecture space exploration of classifier hierarchy by defining and restricting effective organization of classifier layer through coexistence/correlation.

　　For example, assuming a multi-label classification problem of labeling human appearance from image. The label list 21 includes 'male' and 'female' of gender sub-problem, 'bald' and 'short hair' and 'long hair' of hair length sub-problem, and 'jeans' and 'suit' and 'skirt' of lower-body outfit sub-problem. In step A2, according to the label list 21, the classifier hierarchy exploration unit 110 partitions the 8-label classification problem into three classifiers, each of which is responsible for one sub-problem. Assuming that (1) there exist a strong correlation between gender and hair length, a weak correlation between gender and lower-body outfit, and no correlation between hair length and lower-body outfit; (2) there are five Processor and one Processor can assign one label for the ease of explanation.

　　Fig. 5 and Fig. 6 shows the 13 classifier hierarchy candidates that can be constructed in step A3 and their evaluation results that can be evaluated in step A4. It can be seen that Candidate ID 2 to 13 achieves higher accuracy in one or more labels by leveraging label correlation through the hierarchical architecture compared to Candidate ID 1. The Pareto optimal classifier hierarchy candidates when considering average accuracy and Inference time are Candidate 5 (58% accuracy in 2 unit time), 7, 8, 9 and 13 (60% accuracy in 3 unit time). The Candidate 7 achieves the same accuracy within the same inference time compared to

Candidate

8 and 9 even though it has less classifier layer because of the hardware resource constraints. The

Candidate

8 and 9 might be preferable in the case that the label correlation might further improve accuracy.

　　<Description of Effect>
　　Next, the effect of the present exemplary embodiment is described.
　　As the present exemplary embodiment is configured in such a manner that the classifier hierarchy exploration system 100 explores the architecture space of classifier hierarchy with both the validation metrics of classification problem and the information of hardware, it is possible to find multiple Pareto optimal classifier hierarchy candidates so that users can choose according to deployment objective. In addition, the exemplary embodiment is configured in such a manner that each classifier hierarchy leverages label correlation and maximally utilizes the available hardware to achieve the superior values of validation metrics in the shortest inference time.

<Second Exemplary Embodiment>
　　<Explanation of Structure>
　　Next, a second exemplary embodiment of the present disclosure is elaborated referring to the accompanying drawings.

　　Referring to Fig. 7, the second exemplary embodiment of the present disclosure, a classifier hierarchy exploration system 900, includes a sub-problem partitioning unit 910, a coexistence/correlation analysis unit 920, a classifier hierarchy exploration unit 930 (same as item 110 in Fig. 1), a classifier hierarchy construction unit 940 (same as item 120 in Fig. 1), and a classifier hierarchy evaluation unit 950 (same as item 130 in Fig. 1).

　　Similar to the input of the classifier hierarchy exploration system 100, the input of the classifier hierarchy exploration system 900 includes the label list 21, the training data 22 ({X, Y}) and the constraints 23. The label list 21 includes the labels and sub-problems of the multi-label classification, where the sub-problems are optional. The training data 22 includes pairs of input data and output labels of the multi-label classification for training and validation, and/or label coexistence or correlation, where label coexistence or correlation is optional.

　　The above mentioned means generally operate as follows.
　　The sub-problem partitioning unit 910 takes the label list 21 as input and partitions labels in the label list 21 into sub-problems.

　　The coexistence/correlation analysis unit 920 takes the training data 22 ({X, Y}) as input and analyzes label coexistence or correlation. The classifier hierarchy exploration unit 930, classifier hierarchy construction unit 940 and classifier hierarchy evaluation unit 950 operates in the same way as the classifier hierarchy exploration unit 110, classifier hierarchy construction unit 120 and classifier hierarchy evaluation unit 130 of the classifier hierarchy exploration system 100, respectively. The classifier hierarchy exploration system 900 may include only the sub-problem partitioning unit 910, or only the coexistence/correlation analysis unit 920, or both the sub-problem partitioning unit 910 and the coexistence/correlation analysis unit 920.

　　<Description of Operation>
　　Next, referring to flowcharts in Fig. 8, the operation of the information processing method according to the second exemplary embodiment of the present disclosure elaborated.

　　First, the classifier hierarchy exploration system 900 receives input including the label list 21, the training data 22 ({X, Y}) and the constraints 23. In particular, the sub-problem partitioning unit 910 receives the label list 21. The coexistence/correlation analysis unit 920 receives the training data 22. The classifier hierarchy exploration unit 930 receives the label list 21, the training data 22 and the constraints 23. And the classifier hierarchy exploration unit 930 initializes an empty list of classifier hierarchy candidates (step B1).

　　Then, the sub-problem partitioning unit 910 partitions a multi-label classification problem into multiple binary, multi-class, or multi-label classification sub-problems (step B2). Next, the coexistence/correlation analysis unit 920 analyzes label coexistence or correlation (step B3). The step B4 to B10 are the same as the step A3 to A9 in the first exemplary embodiment.

　　The sub-problem partitioning unit 910 partitions the labels into sub-problems (step B2) based on, but not limited to, semantic analysis of label names. For example, assuming that there are 'male', 'female', 'bald', 'short hair', 'long hair', 'jeans', 'suit' and 'skirt' as the label names. In this case, the sub-problem partitioning unit 910 partitions the labels in to gender sub-problem ('male', 'female'), hair length sub-problem ('bald', 'short hair', 'long hair') and lower-body outfit sub-problem ('jeans', 'suit', 'skirt') by the semantic analysis. The partitioning can be done manually by the user (the same case as the first exemplary embodiment that Label list includes sub-problems) or done automatically using tools such as word2vec model.

　　The coexistence/correlation analysis unit 920 analyzes label coexistence or correlation (step B3). For example, the analysis can be done in, but not limited to, label-pairwise manner or sub-problem-wise manner to examine strength of the coexistence/correlation of the labels (strong, weak, or no coexistence/correlation). The coexistence/correlation can be both positive and negative. There are several methods to analyze coexistence/correlation, such as coexistence analysis or Jaccard-Needham or etc. for binary correlation, and Pearson analysis or Kendall or etc. for real-value correlation.

　　Fig. 9 shows an example of a pairwise coexistence analysis of the aforementioned labels. It can be seen that when the label 'male' is assigned to the input, the probability of coexistence of the label 'short hair' (0.9 means that 90% of the input instances is that 'male' is assigned and 'short hair' is also assigned) and 'skirt' (0.001 means that 0.1% of the input instances is that 'male' is assigned and 'skirt' is also assigned) shows strong positive and strong negative coexistence, respectively. However, the coexistence of 'male' and 'jeans' is the about the same as the coexistence of 'female' and 'jeans'. This implies that the coexistence of gender and 'jeans' is weakly correlated or irrelevance.

<Third Exemplary Embodiment>
　　<Explanation of Operation>
　　Next, a third exemplary embodiment of the present disclosure is elaborated referring to the accompanying drawings.

　　Fig. 10 shows an example of a Graphical User Interface (GUI) according to the third exemplary embodiment of the present disclosure. Note that, the example of the GUI is also applicable to the first and second exemplary embodiment. A item 1200 shows the screen of the classifier hierarchy exploration system 100 and/or 900. A user can add labels and sub-problems (a part of the label list 21) manually through input devices, such as keyboard, or by using one or more files by pressing item 1210. The training data 22 and the constraints 23 can be specified by pressing

item

1220 and 1230, respectively. The exploration starts when the button item 1240 is pressed. All items are not limited to button interface.

<Fourth Exemplary Embodiment>
　　<Explanation of Structure>
　　Next, a fourth exemplary embodiment of the present disclosure is elaborated referring to the accompanying drawings.

　　Referring to Fig. 11, in addition to the input of the first and second exemplary embodiment, the fourth exemplary embodiment of the present disclosure, a classifier hierarchy exploration system 1300, also takes User-specified choice of sub-problem and coexistence/correlation 24 as input. The classifier hierarchy exploration system 1300 includes a sub-problem partitioning unit 1310, a coexistence/correlation analysis unit 1320, a classifier hierarchy exploration unit 1330, a classifier hierarchy construction unit 1340 and a classifier hierarchy evaluation unit 1350. The sub-problem partitioning unit 1310 and the coexistence/correlation analysis unit 1320 are same as

item

910 and 920 in Fig. 7. The classifier hierarchy construction unit 1340 and the classifier hierarchy evaluation unit 1350 are same as

item

120 and 130 in Fig. 1.

　　The classifier hierarchy exploration unit 1330 further receives the User-specified choice of sub-problem and coexistence/correlation 24 from the outside, in addition to the label list 21, the training data 22 and the constraints 23. The classifier hierarchy exploration unit 1330 chooses or alters the sub-problems partitioned by the sub-problem partitioning unit 1310 and/or coexistence/correlation results analysed by the coexistence/correlation analysis unit 1320 using the User-specified choice of sub-problem and coexistence/correlation 24.

　　<Description of Operation>
　　Next, referring to flowcharts in Fig. 12, the operation of the information processing method according to the fourth exemplary embodiment of the present disclosure is elaborated.

　　First, the classifier hierarchy exploration system 1300 receives input including the label list 21, the training data 22 ({X, Y}) and the constraints 23 (step C1). Then, the sub-problem partitioning unit 1310 partitions a multi-label classification problem into multiple binary, multi-class, or multi-label classification sub-problems (step C2). The coexistence/correlation analysis unit 1320 analyzes label coexistence or correlation (step C3).

　　In step C4, the classifier hierarchy exploration system 1300 shows GUI for the user to choose or alter the sub-problems partitioned in step C2 and/or coexistence/correlation results analysed in step C3. After receiving the User-specified choice of sub-problem and coexistence/correlation 24, the classifier hierarchy exploration unit 1330 constructs a new classifier hierarchy candidate by additionally affecting the User-specified choice of sub-problem and coexistence/correlation 24 (step C5). The step C5 to C11 are the same as step A3 to A9 in the first exemplary embodiment.

　　The example GUI showed at step C4 is shown in Fig. 13 and Fig. 14. Note that, the GUI showed at step C4 is not limited to the following example.

　　In Fig. 13, item 1500 shows the screen to choose or alter sub-problem. Item 1510 represents the area to illustrate the sub-problem partitioned in step C2. Item 1520 represents the labels that are not grouped into any sub-problem in step C2. Each sub-problem in

area

1510 and 1520 is consisted of item 1511 to specify type of sub-problem, item to specify labels in each sub-problem (in the case of sub-problem1,

item

1512, 1513 and 1514), and item 1515 as, but not limited to, a button for deleting the sub-problem. The item 1511 can be expanded in the form of drop-down list (item 1516) by pressing the upside-down triangle button to alter type of sub-problem to be, but not limited to, Binary, Multi-class or Multi-label classification. The labels, such as

item

1512, 1513, 1514, 1521, 1522 and 1523, can be drag to alter the sub-problem of the label. If item 1515 is pressed, the sub-problem will be deleted and the labels, such as

item

1512, 1513 and 1514, will be moved to area 1520.

　　Item 1530 represents the sub-problem created by user in step C4. Note that, the

Item

1510 and 1530 may be the same area. Item 1540 represents a button to create a new group. When the user presses a button item 1540, then the new sub-problem is created in area 1530. Then the user can choose type of sub-problem and drag the label to the sub-problem.

　　Item 1550 represents a button to save the user-specified choice to the classifier hierarchy exploration system. When the button item 1550 is pressed, the user-specified choice is save to the system. Item 1560 represents a button to choose or alter the coexistence or correlation. When the button item 1560 is pressed, the classifier hierarchy exploration system 1300 shows another screen (Fig. 14) to enable the user to change to choose or alter the coexistence or correlation. Item 1570 represents to start the exploration. When the button item 1570 is pressed, the exploration starts by the classifier hierarchy exploration unit 1330.

　　In Fig. 14, item 1600 shows the screen to choose or alter coexistence or correlation. Item 1610 represents the area to choose labels to show coexistence or correlation. Item 1620 represents the area to show coexistence or correlation.

Item

1611 and 1612 is the choice of labels to show coexistence or correlation in area 1620. The labels displayed in

item

1611 and 1612 can be sub-problems or labels. When the user choose a sub-problem or label, the selection can be shown by using a highlighted band similar to item 1613.

　　Item 1630 represents the area to select the usage of the label pairs. In area 1630, the item 1631 shows pairs of labels and the item 1632 is the drop-down list to choose whether to use the pair as "strong", "weak" or "don't use" the coexistence or correlation as shown in item 1633.

　　Item 1640 represents a button to save the user-specified choice to the classifier hierarchy exploration system. When the button item 1640 is pressed, the user-specified choice is save to the system. Item 1650 represents a button to choose or alter sub-problem. When the button item 1650 is pressed, the classifier hierarchy exploration system 1300 shows the screen 1500 to enable the user to choose or alter sub-problem. Item 1660 represents to start the exploration. When the button item 1660 is pressed, the exploration starts by the classifier hierarchy exploration unit 1330.

<Fifth Exemplary Embodiment>
　　<Explanation of Structure>
　　Next, a fifth exemplary embodiment of the classifier hierarchy evaluation unit 1700 (item 130 in Fig. 1) is elaborated referring to the accompanying drawings.

　　Referring to Fig. 17, the classifier hierarchy evaluation unit 1700 includes a classifier training unit 1710, a validation metric evaluation unit 1720, and an inference time evaluation unit 1730.

　　The above mentioned means generally operate as follows.
　　The classifier training unit 1710 trains classifier hierarchy with the Training data for training. That is, The classifier training unit 1710 receives the training data 22 and the classifier hierarchy 33 constructed by the classifier hierarchy construction unit 120 and the like. Then the classifier training unit 1710 trains a model in which each classifiers is deployed based on the classifier hierarchy 33 using the training data 22. Note that, The classifier training unit 1710 is optional.

　　The validation metric evaluation unit 1720 evaluates the validation metrics of the classifier hierarchy 33 with the training data 22 for validation. That is, the validation metric evaluation unit 1720 calculates the validation metrics of the trained model.

　　The inference time evaluation unit 1730 evaluates the inference time of the classifier hierarchy from the information of hardware. That is, the inference time evaluation unit 1730 receives the classifier hierarchy 33 and the information of hardware 232 included in the constraints 23. Then the inference time evaluation unit 1730 estimates the inference time required for the model using the information of hardware 232. The model is one in which each classifiers is deployed based on the classifier hierarchy 33. The inference time is a time required to inference all labels from all classifiers. In other words, the inference time is a time required for classification of all labels by the plurality of classifiers for each classifier hierarchy.

　　<Description of Operation>
　　Next, referring to flowcharts in Fig. 16, the operation of an evaluation method according to the fifth exemplary embodiment of the present disclosure elaborated.

　　First, the classifier hierarchy evaluation unit 1700 receives input including the training data 22 ({X, Y}), the classifier hierarchy 33 and the information of hardware 232 (step D1). Then the classifier training unit 1710 trains the model corresponding to the classifier hierarchy 33 using the training data 22 (step D2). For example, the classifier training unit 1710 input an input data included in the training data 22 into the model. More specifically, the classifier training unit 1710 input the input data into each classifiers deployed based on the classifier hierarchy 33. Then the classifier training unit 1710 compares the classified labels which are output by the each classifiers in the model with the correct labels included in the training data 22, which corresponds to the input data. The classifier training unit 1710 adjusts parameters in each classifiers so that the difference between the classified labels and the correct labels becomes small.

　　After the step D2, the validation metric evaluation unit 1720 evaluates the validation metrics such as accuracy and the like of the trained model (step D3). For example, the validation metric evaluation unit 1720 calculates the validation metrics of the trained model using the training data 22 for validation.

　　At the same time as the steps D2 and D3, the inference time evaluation unit 1730 evaluates the inference time of the model corresponding to the classifier hierarchy 33 using the information of hardware 232 (step D4). For example, the inference time evaluation unit 1730 estimates a time required to inference all labels from all classifiers as the inference time.

　　Finally, the classifier hierarchy evaluation unit 1700 outputs the results of the validation metric evaluation by the step D3 and the inference time evaluation by the step D4 as the evaluation results (step D5).

<Sixth Exemplary Embodiment>
　　A sixth exemplary embodiment of the present disclosure is elaborated below referring to the accompanying drawings. The sixth exemplary embodiment indicates a minimum configuration common to the first to fifth exemplary embodiments.

　　Fig. 17 is a block diagram showing a configuration of an information processing apparatus according to a sixth exemplary embodiment of the present disclosure. The information processing apparatus 1 corresponds to the above mentioned classifier

hierarchy exploration system

100, 900 and 1300. The information processing apparatus 1 includes a construction unit 11, an evaluation unit 12 and a choice unit 13.

　　The construction unit 11 is a minimum configuration common to the classifier

hierarchy construction unit

120, 940 and 1340. The construction unit 11 constructs a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels. The hierarchy structure information corresponds to the above mentioned classifier hierarchy.

　　The evaluation unit 12 is a minimum configuration common to the classifier

hierarchy evaluation unit

130, 950, 1350 and 1700. The evaluation unit 12 evaluates quality of each of the hierarchy structure information.

　　The choice unit 13 is a minimum configuration common to the classifier

hierarchy exploration unit

110, 930 or 1330. The choice unit 13 chooses at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　Fig. 18 is a flowchart for explaining an information processing method according to a sixth exemplary embodiment of the present disclosure. First, the construction unit 11 constructs the plurality of hierarchy structure information (S11). Next, the evaluation unit 12 evaluates quality of each of the hierarchy structure information (S12). Then, the choice unit 13 chooses at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints (S13).

　　Note that, the above mentioned information processing apparatus 1 may be implemented by an information processing system having a plurality of computers. Fig. 19 is a block diagram showing a configuration of an information processing system according to a sixth exemplary embodiment of the present disclosure. The information processing system 6 corresponds to the above mentioned classifier

hierarchy exploration system

100, 900 and 1300. The information processing system 6 includes a storage 61, a processor 62, a memory 63 and an IF (InterFace) unit 64.

　　The storage 61 is a storage device such as a hard disk drive, flash memory or the like. The storage 61 stores

hierarchy structure information

6111, 6112, … ,

quality

6121, 6122, … , constraints 613 and a program 614. Fig. 20 shows a concept of internal structures of the hierarchy structure information 7 and the quality 8, respectively. The hierarchy structure information 7 is an example of the

hierarchy structure information

6111, 6112 or the like. The hierarchy structure information 7 includes

group

711, 721, … and

layer

712, 722, …. The group 711 corresponds to the layer 712 and the group 721 corresponds to the layer 722. The group 711 includes a

classifier

7111, 7112, …. The classifier 7111 and the like are just information indicating at least an identifier of hardware or software on which the classifier is installed. The group 721 and the like have the same configuration as the group 711 and include one or more classifiers, respectively. However, the classifiers included in each groups are different from classifiers included in other groups. The layer 712 and the like are information including an indication of the hierarchy, such as a layer number in the hierarchy, positional relationship with other layers, and labels which are output from the classifiers included in the previous (upper) layer and are input into each classifiers included in the group corresponding to the own layer.

　　The quality 8 is an example of the

quality

6121, 6122 or the like. The quality 8 corresponds to the hierarchy structure information 7. The quality 8 includes a validation metrics 81 and an estimated inference time 82. The validation metrics 81 and the estimated inference time 82 are the same as those described above, respectively.

　　Returning to Fig. 19, the explanation will be continued. The constraints 613 corresponds to the above mentioned constraints 23. The program 614 is a computer program in which the information processing method according to at least one of the first to fifth exemplary embodiments is implemented.

　　The memory 63 is a volatile storage device such as a RAM (Random Access Memory) and a storage area for temporarily holding information generated during an operation of the processor 62. The IF unit 64 is an interface that performs input and output with the outside.

　　The processor 62 is a control unit, such as a CPU or the like. The processor 62 reads the program 614 from the storage 61 to the memory 63 and executes the program 614. By doing this, the processor 62 realizes functions of a reception unit 621, a partition unit 622, an analysis unit 623, a construction unit 624, an evaluation unit 625 and a choice unit 626.

　　The reception unit 621 corresponds to the classifier hierarchy exploration unit 110, the sub-problem partitioning unit 910, the coexistence/correlation analysis unit 920, the classifier hierarchy exploration unit 930, the sub-problem partitioning unit 1310, the coexistence/correlation analysis unit 1320 or the classifier hierarchy exploration unit 1330.

　　The partition unit 622 corresponds to the classifier hierarchy exploration unit 110, the

sub-problem partitioning unit

910 or 1310. The analysis unit 623 corresponds to the coexistence/

correlation analysis unit

920 or 1320. 　　The construction unit 624 corresponds to the classifier

hierarchy construction unit

120, 940 or 1340. The evaluation unit 625 corresponds to the classifier

hierarchy evaluation unit

130, 950 or 1340. The choice unit 626 corresponds to the classifier

hierarchy exploration unit

110, 930 or 1330.

　　A first effect is to ensure that the classifier hierarchy labels of an instance accurately while maximizing hardware usage. The reason for the effect is that it can both leverage label correlation through the hierarchy (similar to classifier chain) and parallelize the computation of uncorrelated labels.

　　A second effect is to ensure that the output of the system is Pareto optimal classifier hierarchy, so that the user can choose an appropriate classifier hierarchy for deployment according to the constraints. The reason for the effect is that the present disclosure automatically explores the architecture space of classifier hierarchy by repeatedly constructing and evaluating the classifier hierarchy according to label correlations and available hardware.

　　<Other exemplary embodiments of the invention>
　　Those skilled in the art will recognize that the system, operation and method of the present disclosure may be implemented in several manners and as such are not to be limited by the foregoing embodiments and examples. In other words, functional elements being performed by single or multiple components in various combinations of hardware, software or firmware may be distributed among software applications in the server side (the SP side). Furthermore, the embodiments of the methods presented in the flowchart in this disclosure are provided by way of example in order to provide a more complete understanding of the technology. Alternative embodiments can be contemplated wherein the various components can be altered functionally in order to attain the same goals. Although, various embodiments have been described for the purposes of this disclosure, such embodiments should not be deemed to limit the teaching of this disclosure to those embodiments. Various changes and modifications may be made to the elements and operations described above to obtain a result that remains within the scope of the systems and operations described in this disclosure.

　　Additionally, it is obvious that the present invention is not limited by the above exemplary embodiments but various modifications can be made thereto without departing from the scope of the already mentioned present invention. For example, the above exemplary embodiments explained the present invention as being a hardware configuration, but the present invention is not limited to this. The present invention can also be realized by causing a CPU (Central Processing Unit) to execute arbitrary processes on a computer program. In this case, the program can be stored and provided to a computer using any type of non-transitory computer readable media.

　　Examples of non-transitory computer readable media include magnetic storage media (such as floppy disks, magnetic tapes, hard disk drives, etc.), optical magnetic storage media (e.g. magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (Blu-ray (registered trademark) Disc), and semiconductor memories (such as mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (Random Access Memory), etc.). The program may be provided to a computer using any type of transitory computer readable media. Examples of transitory computer readable media include electric signals, optical signals, and electromagnetic waves. Transitory computer readable media can provide the program to a computer via a wired communication line (e.g. electric wires, and optical fibers) or a wireless communication line.

　　Part of or all the foregoing embodiments can be described as in the following appendixes, but the present invention is not limited thereto.
(Supplementary Note 1)
　　An information processing apparatus comprising:
　　a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels:
　　an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and
　　an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
(Supplementary Note 2)
　　The information processing apparatus according to Supplementary Note 1, wherein
　　the each of the plurality of groups is grouped two or more classifiers for performing parallel processing among the plurality of classifiers.
(Supplementary Note 3)
　　The information processing apparatus according to

Supplementary Note

1 or 2, wherein
　　the construction unit constructs the hierarchy structure information based on a dependency among the plurality of labels.
(Supplementary Note 4)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 3, wherein
　　the construction unit constructs the hierarchy structure information which defined to connect part of outputs of each classifiers belonging to a first group to input of each classifiers belonging to a second group, the first and second groups are included in the hierarchy structure information.
(Supplementary Note 5)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 4, wherein
　　the construction unit constructs the hierarchy structure information based on the predetermined constraints.
(Supplementary Note 6)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 5, wherein
　　the construction unit constructs the hierarchy structure information based on information which indicates a dependency between labels.
(Supplementary Note 7)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 6, wherein
　　the quality includes at least one of validation metrics on a classification accuracy of each classifiers in the hierarchy structure information and inference time required for classification of all labels by the plurality of classifiers for each classifier hierarchy.
(Supplementary Note 8)
　　The information processing apparatus according to Supplementary Note 7, wherein the evaluation unit comprises
　　a first evaluation unit configured to calculate the validation metrics as the quality of the hierarchy structure information and;
　　a second evaluation unit configured to estimate the inference time as the quality of the hierarchy structure information.
(Supplementary Note 9)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 8, wherein
　　the predetermined constraints includes at least one of information of hardware, classification accuracy of each classifiers in the hierarchy structure information, information which indicates a dependency between labels and priority/criteria to choose the candidate.
(Supplementary Note 10)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 9, further comprising:
　　a partition unit configured to partition the plurality of labels to be input into a plurality of sub problems in multi label classification problem; and
　　wherein the construction unit
　　generates the plurality of groups by grouping the plurality of classifiers based on the plurality of sub problems, and
　　constructs the hierarchy structure information from the generated plurality of groups.
(Supplementary Note 11)
　　The information processing apparatus according to Supplementary Note 10, wherein
　　the partition unit partitions each labels into the plurality of sub problems based on semantic analysis of label names corresponding to the plurality of labels.
(Supplementary Note 12)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 11, further comprising:
　　an analysis unit configured to analyze a strength of coexistence/correlation between labels based on a dependency among the plurality of labels to be input; and
　　the construction unit constructs the hierarchy structure information based on the analyzed strength of coexistence/correlation between labels.
(Supplementary Note 13)
　　The information processing apparatus according to any one of Supplementary Notes 1 to 12, further comprising:
　　a reception unit configured to receive an input of the predetermined constraints and training data which includes pairs of the input data and correct label; and
　　each of the construction unit, the evaluation unit and the choice unit starts processing according to the received input.
(Supplementary Note 14)
　　The information processing apparatus according to Supplementary Note 13, wherein
　　the reception unit further receives designation of correspondence between the plurality of labels and a plurality of sub problems in multi label classification problem as the input, and saves the correspondence to a storage according to the designation; and
　　wherein the construct unit constructs the hierarchy structure information based on the correspondence saved in the storage.
(Supplementary Note 15)
　　The information processing apparatus according to Supplementary Note 13 or 14, wherein
　　the reception unit further receives designation of information which indicates coexistence/correlation between labels as input, and saves the information to a storage according to the designation; and
　　wherein the construct unit constructs the hierarchy structure information based on the information saved in the storage.
(Supplementary Note 16)
　　The information processing apparatus according to any one of Supplementary Notes 13 to 15, wherein
　　the evaluation unit trains a model in which each classifiers is deployed based on the hierarchy structure information using the training data, and evaluates the quality of the trained model.
(Supplementary Note 17)
　　An information processing system comprising:
　　a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels:
　　an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and
　　an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
(Supplementary Note 18)
　　The information processing system according to Supplementary Note 17, wherein
　　the each of the plurality of groups is grouped two or more classifiers for performing parallel processing among the plurality of classifiers.
(Supplementary Note 19)
　　An information processing method using a computer comprising:
　　constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels:
　　evaluating quality of each of the hierarchy structure information; and
　　choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
(Supplementary Note 20)
　　A non-transitory computer readable medium storing a control program causing a computer to execute:
　　a process for constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels:
　　a process for evaluating quality of each of the hierarchy structure information; and
　　a process for choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.

　　The present disclosure is applicable to a system and an apparatus for solving a multi-label classification and/or labelling the input instances with one or more labels. The present disclosure is also applicable to applications such as object detection, human tracking, scene labelling, and other applications for classification and artificial intelligence.

1 information processing apparatus
11 construction unit
12 evaluation unit
13 choice unit
100 classifier hierarchy exploration system
110 classifier hierarchy exploration unit
120 classifier hierarchy construction unit
130 classifier hierarchy evaluation unit
21 label list
22 training data
23 constraints
232 information of hardware
29 labelling order
31 classifier hierarchy
32 evaluation results
33 classifier hierarchy
34 evaluation results
41 classifier hierarchy candidate
48 classifier chain
49 cascaded classifier
210 input instance
220 BR system
230 output labels
310 input instance
320 CC system
330 output labels
410 input instance
420 label prediction process
430 output labels
440 previously predicted labels
510 input instance or features
520 classifier layer 1
530 classifier layer 2
540 classifier layer k
550 output labels
900 classifier hierarchy exploration system
910 sub-problem partitioning unit
920 coexistence/correlation analysis unit
930 classifier hierarchy exploration unit
940 classifier hierarchy construction unit
950 classifier hierarchy evaluation unit
1200 screen
1300 classifier hierarchy exploration system
1310 sub-problem partitioning unit
1320 coexistence/correlation analysis unit
1330 classifier hierarchy exploration unit
1340 classifier hierarchy construction unit
1350 classifier hierarchy evaluation unit
1500 screen
1600 screen
1700 classifier hierarchy evaluation unit
1710 classifier training unit
1720 validation metric evaluation unit
1730 inference time evaluation unit
1900 multi-label classification system
1910 classifier hierarchy construction unit
1920 classifier hierarchy evaluation unit
6 information processing system
61 storage
6111 hierarchy structure information
6112 hierarchy structure information
6121 quality
6122 quality
613 constraints
614 program
62 processor
621 reception unit
622 partition unit
623 analysis unit
624 construction unit
625 evaluation unit
626 choice unit
63 memory
64 IF unit
7 hierarchy structure information
711 group
7111 classifier
7112 classifier
712 layer
721 group
722 layer
8 quality
81 validation metrics
82 estimated inference time

Claims

　　An information processing apparatus comprising:
　　a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels;
　　an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and
　　an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
　　The information processing apparatus according to Claim 1, wherein
　　the each of the plurality of groups is grouped two or more classifiers for performing parallel processing among the plurality of classifiers.
　　The information processing apparatus according to Claim 1 or 2, wherein
　　the construction unit constructs the hierarchy structure information based on a dependency among the plurality of labels.
　　The information processing apparatus according to any one of Claims 1 to 3, wherein
　　the construction unit constructs the hierarchy structure information which defined to connect part of outputs of each classifiers belonging to a first group to input of each classifiers belonging to a second group, the first and second groups are included in the hierarchy structure information.
　　The information processing apparatus according to any one of Claims 1 to 4, wherein
　　the construction unit constructs the hierarchy structure information based on the predetermined constraints.
　　The information processing apparatus according to any one of Claims 1 to 5, wherein
　　the construction unit constructs the hierarchy structure information based on information which indicates a dependency between labels.
　　The information processing apparatus according to any one of Claims 1 to 6, wherein
　　the quality includes at least one of validation metrics on a classification accuracy of each classifiers in the hierarchy structure information and inference time required for classification of all labels by the plurality of classifiers for each classifier hierarchy.
　　The information processing apparatus according to Claim 7, wherein the evaluation unit comprises
　　a first evaluation unit configured to calculate the validation metrics as the quality of the hierarchy structure information and;
　　a second evaluation unit configured to estimate the inference time as the quality of the hierarchy structure information.
　　The information processing apparatus according to any one of Claims 1 to 8, wherein
　　the predetermined constraints includes at least one of information of hardware, classification accuracy of each classifiers in the hierarchy structure information, information which indicates a dependency between labels and priority/criteria to choose the candidate.
　　The information processing apparatus according to any one of Claims 1 to 9, further comprising:
　　a partition unit configured to partition the plurality of labels to be input into a plurality of sub problems in multi label classification problem; and
　　wherein the construction unit
　　generates the plurality of groups by grouping the plurality of classifiers based on the plurality of sub problems, and
　　constructs the hierarchy structure information from the generated plurality of groups.
　　The information processing apparatus according to Claim 10, wherein
　　the partition unit partitions each labels into the plurality of sub problems based on semantic analysis of label names corresponding to the plurality of labels.
　　The information processing apparatus according to any one of Claims 1 to 11, further comprising:
　　an analysis unit configured to analyze a strength of coexistence/correlation between labels based on a dependency among the plurality of labels to be input; and
　　the construction unit constructs the hierarchy structure information based on the analyzed strength of coexistence/correlation between labels.
　　The information processing apparatus according to any one of Claims 1 to 12, further comprising:
　　a reception unit configured to receive an input of the predetermined constraints and training data which includes pairs of the input data and correct label; and
　　each of the construction unit, the evaluation unit and the choice unit starts processing according to the received input.
　　The information processing apparatus according to Claim 13, wherein
　　the reception unit further receives designation of correspondence between the plurality of labels and a plurality of sub problems in multi label classification problem as the input, and saves the correspondence to a storage according to the designation; and
　　wherein the construct unit constructs the hierarchy structure information based on the correspondence saved in the storage.
　　The information processing apparatus according to Claim 13 or 14, wherein
　　the reception unit further receives designation of information which indicates coexistence/correlation between labels as input, and saves the information to a storage according to the designation; and
　　wherein the construct unit constructs the hierarchy structure information based on the information saved in the storage.
　　The information processing apparatus according to any one of Claims 13 to 15, wherein
　　the evaluation unit trains a model in which each classifiers is deployed based on the hierarchy structure information using the training data, and evaluates the quality of the trained model.
　　An information processing system comprising:
　　a construction unit configured to construct a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels;
　　an evaluation unit configured to evaluate quality of each of the hierarchy structure information; and
　　an choice unit configured to choose at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
　　The information processing system according to Claim 17, wherein
　　the each of the plurality of groups is grouped two or more classifiers for performing parallel processing among the plurality of classifiers.
　　An information processing method using a computer comprising:
　　constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels;
　　evaluating quality of each of the hierarchy structure information; and
　　choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.
　　A non-transitory computer readable medium storing a control program causing a computer to execute:
　　a process for constructing a plurality of hierarchy structure information in which dependency relationships of execution order among a plurality of groups are defined in a hierarchical structure, each of the plurality of groups is a group of a plurality of classifiers for classifying input data into a plurality of labels;
　　a process for evaluating quality of each of the hierarchy structure information; and
　　a process for choosing at least one candidate for multi label classification model from the plurality of hierarchy structure information based on the evaluated quality and predetermined constraints.