WO2018112514A1

WO2018112514A1 - Deep learning systems and methods for use in computer vision

Info

Publication number: WO2018112514A1
Application number: PCT/AU2017/051388
Authority: WO
Inventors: Chris MCCOOL
Original assignee: Queensland University Of Technology
Priority date: 2016-12-23
Filing date: 2017-12-14
Publication date: 2018-06-28

Abstract

A method and system for generating a compressed machine learning model is provided. The method includes providing a first set of training images to a first machine learning model to generate a first set of outputs; providing a second set of training images to a second machine learning model to generate a second set of outputs, wherein the second set of training images corresponds to the first set of training images at a lower resolution; and updating the second machine learning model according to a difference between the first and second sets of outputs to generate the compressed machine learning model.

Description

DEEP LEARNING SYSTEMS AND METHODS FOR USE IN COMPUTER VISION

TECHNICAL FIELD

[0001] The present invention relates to computer vision. In particular, although not exclusively, the present invention relates to the generation of compressed machine learning models for image-based object classification, segmentation and recognition.

BACKGROUND ART

[0002] Machine learning is a valuable tool in image classification and recognition, where models are taught using training data to classify objects in images. As an illustrative example, machine learning may be used in agriculture to classify plants based upon training images.

[0003] One problem with machine learning is that it is generally complex. As a result, machine learning models may not be well suited to portable computing devices where processing and battery resources are very limited.

[0004] A further problem with machine learning is that models are generally hard to train, particularly if you have limited data. For example, several state of the art models have been trained with over 1 million training images spanning over 1000 categories. In practice, much fewer training images may be available for a specific classification task.

[0005] As a result, boutique solutions are generally used for classification in agriculture, including shape and pixel-based weed classification, stacked auto-encoders (SAEs) for almond segmentation, and combinations of features including local binary patterns, SAEs and histograms of gradients for sweet pepper segmentation and detection.

[0006] Boutique solutions are generally undesirable in that new solutions are required for new recognition problems, which is very time consuming and costly. Furthermore, the boutique solutions of the prior art generally still have difficulty classifying plants whose appearance can change considerably (e.g. from juvenile to adult), and can appear similar to other plants (i.e. where different plant varieties are only subtly different).

[0007] As such, there is clearly a need for improved methods and systems for use in computer vision.

[0008] It will be clearly understood that, if a prior art publication is referred to herein, this reference does not constitute an admission that the publication forms part of the common general knowledge in the art in Australia or in any other country.

SUMMARY OF INVENTION

[0009] The present invention is directed to methods and systems for generating machine learning models, classification and identification using said models, which may at least partially overcome at least one of the abovementioned disadvantages or provide the consumer with a useful or commercial choice.

[0010] With the foregoing in view, in a first aspect, the present invention resides broadly in a method for generating a compressed machine learning model including:

providing a first set of training images to a first machine learning model to generate a first set of outputs;

providing a second set of training images to a second machine learning model to generate a second set of outputs, wherein the second set of training images corresponds to the first set of training images at a lower resolution; and

updating the second machine learning model according to a difference between the first and second sets of outputs to generate the compressed machine learning model.

[0011] Advantageously, the compressed machine learning model is more compact than the first machine learning model, and thus able to operate at lower complexity and/or using less memory. The compressed models generally may provide a good trade off between

computational efficiency (model complexity) and classification accuracy (model accuracy). As such, the compressed machine learning model may be particularly suited to portable computing devices.

[0012] Preferably, the first and second machine learning models are convolutional neural networks. Suitably, the machine learning models are deep convolutional neural networks. The second machine learning model may be updated to mimic, at least in part, the first machining learning model according to the first and second sets of outputs.

[0013] Preferably, the compressed machine learning model is configured to classify an image.

[0014] Preferably, the compressed machine learning model is configured to identify objects in an image. The objects may, for example, comprise a fruit, a plant, and/or a weed.

[0015] Preferably, the compressed machine learning model includes fewer layers and/or parameters than the first machine learning model.

[0016] Preferably, the first and second sets of outputs include logit outputs (these logit outputs are usually normalised, e.g. by a softmax function, to obtain class probabilities).

[0017] The second set of training images may be down-sampled from the first set of training images. Alternatively, the first set of training images may be up-sampled from the second set of training images. Alternatively again, the first and second sets of training images may be generated from another set of images.

[0018] Preferably, the method includes generating the first machine learning model, at least in part using the first set of training images.

[0019] Preferably, the first machine learning model comprises a specialised deep model that is generated by adapting a non-specialised (generic) deep model and using the first set of training images.

[0020] Preferably, the specialised deep model is generated by generating a new output layer, having outputs corresponding to classes of the training images, replacing a final decision layer of the non-specialised (generic) deep model with the new output layer, and using the first set of training images to train at least the replaced final decision layer of the non-specialised (generic) deep model.

[0021] Suitably, the new output layer is randomly initialised.

[0022] Preferably, one or more layers of the specialised deep model are fixed during training. Weights may be applied to the parameters defining how much each of the parameters is influenced during training.

[0023] Preferably, the non-specialised (generic) deep model has been trained with nonspecific (generic) image data. Suitably, the non-specific (generic) image data does not include data relating to classifiers of the training images.

[0024] Preferably, the non-specialised (generic) deep model has been trained using more images than the number of training images. Suitably, the non- specialised (generic) deep model has been trained using at least 10 times more images than the number of training images.

[0025] The non-specialised (generic) deep model may comprise the GoogLeNet model. The non- specialised (generic) deep model may be trained using the ImageNet dataset. [0026] The method may include receiving a further image, and classifying the further image according to the compressed model.

[0027] Preferably, the further image is generated by applying a sliding window to a larger image.

[0028] The compressed model may include a plurality of classifiers.

[0029] The compressed model may be used in a computer vision task of a robot.

[0030] The compressed model may, for example, be configured for weed classification or crop segmentation.

[0031] The method may comprise generating a plurality of models, and using the plurality of models together to classify the further image.

[0032] Suitably, the plurality of models may be generated using the same method, but with the training images provided in a different order.

[0033] In a second aspect, the invention resides broadly in a system for generating a compressed machine learning model including:

a processor; and

a memory, coupled to the processor, the memory including instruction code executable by the processor for:

[0034] Preferably, the system further includes a camera, for capturing an image, wherein the instruction code is further executable to classify the captured image according to the compressed machine learning model.

[0035] Preferably, the system includes one or more actuators, configured to act based upon classification of the captured image. The actuators may form part of or be associated with a robotic arm, a spray nozzle, a harvesting tool or a mechanical implement. [0036] In a third aspect, the present invention resides in a system for classifying images using a model generated according to the first or second aspects.

[0037] Any of the features described herein can be combined in any combination with any one or more of the other features described herein within the scope of the invention.

[0038] The reference to any prior art in this specification is not, and should not be taken as an acknowledgement or any form of suggestion that the prior art forms part of the common general knowledge.

BRIEF DESCRIPTION OF DRAWINGS

[0039] Various embodiments of the invention will be described with reference to the following drawings, in which:

[0040] Figure 1 illustrates a schematic of a deep learning model generation system, for use in computer vision, according to an embodiment of the present invention;

[0041] Figure 2 illustrates a schematic of an image classification system, according to an embodiment of the present invention;

[0042] Figure 3 illustrates a schematic of an image classification system, according to a further embodiment of the present invention;

[0043] Figure 4 illustrates a method for generating a model and classifying an image according to the generated model, according to an embodiment of the present invention;

[0044] Figure 5 illustrates a method of generating a specialised deep model by adapting a generic model, according to an embodiment of the present invention;

[0045] Figure 6 illustrates a method of compressing a specialised deep model, according to an embodiment of the present invention; and

[0046] Figure 7 illustrates a computing device, according to an embodiment of the present invention.

[0047] Preferred features, embodiments and variations of the invention may be discerned from the following Detailed Description which provides sufficient information for those skilled in the art to perform the invention. The Detailed Description is not to be regarded as limiting the scope of the preceding Summary of the Invention in any way. DESCRIPTION OF EMBODIMENTS

[0048] Methods and systems are described below which provide for the generation of a compressed machine learning model, and classification using the compressed model. The methods include providing a first set of training images to a first machine learning model to generate a first set of outputs; providing a second set of training images to a second machine learning model to generate a second set of outputs, wherein the second set of training images correspond to the first set of training images at a lower resolution; and updating the second machine learning model according to a difference between the first and second sets of outputs to generate the compressed machine learning model. The difference between the first and second sets of outputs may be defined by a loss function.

[0049] Figure 1 illustrates a schematic of a deep learning model generation system 100, for use in computer vision, according to an embodiment of the present invention.

[0050] The system 100 can be used for any suitable computer vision application, including weed classification based on detected regions, dense (per-pixel) weed segmentation and dense (per-pixel) crop segmentation. Weed classification based on regions is important for inter-row weed management in fallow periods as the weeds are generally sparse. Dense weed segmentation is important for intra-row weed management as the weeds can be close to the crop and their exact location needs to be determined. Dense crop segmentation is important for automated harvesting, as it requires knowledge of the exact location of the crop.

[0051] The system 100 includes a generic deep model 105, which has been trained with nonspecific (generic) image data. As an illustrative example, the model 105 may be similar or identical to the GoogLeNet model (Szegedy et al., 2015), which comprises a 22 layer deep network that was trained using the ImageNet dataset (Russakovsky et al , 2015) using 1,000 images each for 1,000 classes (1,000,000 images in total).

[0052] The generic deep model 105 is trained using images having a wide variety of classifications, such as images of various types of animals, devices, foods, and objects. The generic deep model 105 has not, however, been trained to the specific task in question, such as weed detection. As such, the generic deep model 105 is adapted (fined tuned) using high resolution images 110 to become a specialised deep model 115, and then compressed using low resolution images 120, as outlined below, to form an output model 125. The output model 125 may then be used in classification and segmentation tasks.

[0053] The generic deep model 105 is adapted (fine-tuned) by replacing a final decision layer of the model with a randomly initialised layer that includes the number of output classes C desired by the specialised model 115. As described above, the generic deep model 105 may have been trained for a large number of classes (e.g. C = 1000), but the specialised deep model 115 may only require a few classes (e.g. C < 10).

[0054] The number of output classes C also corresponds to the number of classes in the high resolution training images 110. This enables the high resolution training images 110 to train the model.

[0055] As an illustrative example, a model that is adapted to detect several different weed types may have a final decision layer of the model replaced by a decision layer including an output class corresponding to each weed type. In such case, the high resolution training images 110 also include training data for each of the output classes, and thus weed types.

[0056] The model with the replaced final decision layer is then retrained using the high resolution images 110, to update at least the new final decision layer. In some embodiments, parts of the model may be fixed during retraining, to prevent the image data from influencing certain aspects of the model too much. In such case, it is generally desirable to fix layers furthest away from the final decision layer.

[0057] According to certain embodiments of the present invention, several layers of the generic deep model 105 may be replaced as part of the fine tuning. This may be particularly useful when multiple layers of the generic deep model 105 are directly related to the final classification, and in such case, it is generally desirable to replace layers closest to the final decision layer.

[0058] In other embodiments, weights may be applied to parameters defining how much training applies to each of the parameters. For example, layers closer to the final decision layer may be weighted such that they are most influenced during training, whereas layers further from the final decision layer are least influenced.

[0059] According to certain embodiments, the replaced layers need not be randomly initialised. For example, the replaced layers may be initialised using default settings, or settings specific to the classification task of the model.

[0060] Once the generic deep model 105 is adapted to form the specialised deep model 115, the specialised deep model 115 is compressed to generate the output model 120. This process is also referred to as distillation and results in an output model 125 that is simpler than the generic deep model 105, and thus better suited for low complexity tasks, such as classification on portable computing devices. Typically, the output model 120 will include fewer layers and/or parameters than the specialised deep model 115.

[0061] The output model 125 is generated using low resolution images that correspond to the high resolution images, the logit outputs of the specialised deep model 115, z, as well as the true class labels, y, that correspond to both the high and low resolution images. The ability of the output model to replicate the true class labels, y, and logit outputs (of the specialised deep model), z, is determined by a loss function. The loss function for producing the correct class labels and the loss function for replicating the logit outputs can then be combined to train the system. This ensures that the system learns how to replicate the output of the complex specialised deep model while still achieving high classification accuracy.

[0062] In short, the output model 125 is trained to approximate the function learned by the larger, more complex specialised deep model 115. By training the output model 125 to approximate the function learned by the deep model 115, a more accurate output model may be provided than if the output model 125 had been trained directly using the same data as the specialised deep model 115.

[0063] The specialised deep model 115 and the output model 125 include outputs at each of the layers, and an activation layer, to assist in interpreting the outputs as probabilities. As an illustrative example, the outputs may include logit outputs, which are logarithms of predicted probabilities of each of the classes, and the activation layer may comprise a softmax activation function to convert the logit outputs to probabilities.

[0064] By training the output model 125 using the logit outputs generated by specialised deep model 115, rather than the output probabilities of the activation layer, the output model 125 is able to not only learn from the ultimate output of the specialised deep model 115, but also the internal workings of the model. In short, such training does not suffer from the information loss that occurs from passing through logits to probability.

[0065] As an illustrative example, if the specialised deep model 115 predicts three targets with probability [2xl0^~9, 4xl0^~5, 0.9999] and those probabilities are used as prediction targets with cross entropy minimization, the output model will focus on the third target and tend to ignore the first and second targets based upon their relative values. However, by training using the logits for these targets [10, 20, 30], the output model 125 is able to better learn to mimic the detailed behaviour of the specialised deep model 115. [0066] As mentioned earlier, the low resolution training images 120 and the high resolution training images 110 correspond directly to each other. This enables the output model 125 to learn based upon the decisions made at the specialised deep model, while having a smaller amount of input data, which reduces complexity.

[0067] The low resolution training images 120 may be down-sampled from the high resolution training images 110. Alternatively, the high resolution training images 110 may be up-sampled from the low resolution training images 120. Alternatively again, the high and low resolution training images may be generated from another common image.

[0068] As an illustrative example, the high resolution training images 110 may be 120x120 pixels in size, and the low resolution training images 120 may be 81x81 pixels in size.

[0069] Once the output model 125 is generated, it may then be used to classify images.

[0070] Figure 2 illustrates a schematic of an image classification system 200, according to an embodiment of the present invention.

[0071] The image classification system 200 includes a model 205, which may correspond to the output model 125, an input image 210, and a prediction 215.

[0072] The input image 210 is advantageously the same size as the low resolution training images 120 (e.g. 81x81 pixels). If the input image 210 is provided in another resolution, the image 210 may be cropped and/or resized to be the same size as the low resolution training images 120.

[0073] The prediction 215 comprises an output of the model 205, and may comprise data relating to probabilities of each of the possible classes (e.g. 0.1 and 0.9 for two classes), or a class label (e.g. "weed").

[0074] According to certain embodiments, the input image 210 is generated by applying a sliding window to a larger image. In particular, a sliding window corresponding to the size of the low resolution training images 120 may be used to generate a plurality of input images 210 corresponding to a plurality of regions of the larger image. This enables the system 200 to classify regions of an image.

[0075] According to certain embodiments, a plurality of models are generated by the system 100, and used together to make a prediction, as outlined below. [0076] Figure 3 illustrates a schematic of an image classification system 300, according to a further embodiment of the present invention.

[0077] The image classification system 300 is similar to the image classification system 200, and includes an input image 210, for classification, but a plurality of models 305.

[0078] Each model 305 may be generated using the system 100, but using training images in a different order, or otherwise adding variation into the system. As such, each model 305 has been generated to perform the same task as the other models 305.

[0079] While only two models 305 are illustrated, the skilled addressee will readily appreciate that any number of models may be provided.

[0080] Each model 305 is configured to provide an individual prediction 310, much like the prediction 215 of Figure 2. The individual predictions 310 are then combined to form the prediction 315. This may be performed using an average of outputs, median of outputs, or any other suitable selection and/or combination of outputs.

[0081] By combining several individual predictions, simple models in parallel used to generate improved prediction while maintaining low complexity.

[0082] Figure 4 illustrates a method 400 for generating a model and classifying an image according to the generated model, according to an embodiment of the present invention.

[0083] At step 405, a generic (non-specific) model is received. As discussed above, the generic model may comprise the GoogLeNet model, which has been trained using a large number of images illustrating various classes.

[0084] At step 410, specialised training images are received. The specialised training images relate to a specialised task, such as weed detection, and may include one or more different classes (e.g. different types of weeds).

[0085] At step 415, a specialised deep model is generated by adapting the generic model using the training images. The specialised deep model is able to classify images relating to the specialised task (e.g. weed classification). Further detail of how the specialised deep model is generated is provided below.

[0086] At step 420, low resolution versions of the specialised training images are generated. Any suitable sampling algorithm may be used, including bilinear and bicubic resampling algorithms, to generate the low resolution versions of the specialised training images.

[0087] At step 425, a compressed model is generated using the specialised deep model and the low resolution training images. The compressed model is also able to classify images relating to the specialised task (e.g. weed classification), and is trained to mimic the decisions of the specialised deep model. Further detail of how the compressed model is generated is provided below.

[0088] At step 430 a new image is received for classification. Unlike the training images, the new image is not associated with any class label y. The new image may, for example, be received from a camera of a classification device, or be part of a larger image captured.

[0089] At step 435 the new image is classified using the compressed model. As outlined above, the new image is classified using probabilities output from the compressed model, wherein a classification having a highest probability is chosen.

[0090] Steps 405-425 relate to the generation of the compressed model, and steps 430-435 relate to use of the compressed model. The skilled addressee will readily appreciate that the compressed model may be generated once only, and using different hardware than the compressed model. For example, steps 405-425 may be performed off-site, whereas the steps 430-435 may be performed on-site.

[0091] According to alternative embodiments (not shown), the specialised deep model may be trained from scratch, i.e. without adapting the generic model. This is useful if there are sufficient training images to train the specialised deep model and/or if no suitable generic model is available. In such case, steps 405-415 may be replaced by a step including receiving the specialised deep model.

[0092] Figure 5 illustrates a method 500 of generating a specialised deep model by adapting a generic model, according to an embodiment of the present invention. The method 500 may be similar or identical to that used in step 415 of the method 400, for example.

[0093] At step 505, a new output layer of a deep model is generated, where output of the new layer correspond to classes in the training images. In case of weed classification, the outputs may correspond to each weed which may be classified.

[0094] At step 510, a final decision layer of the generic model is replaced with the new output layer generated at step 505. At this step, outputs of the earlier layers of the model are coupled to the new final decision layer.

[0095] At step 515, the model, including the new output layer, is trained using training images. In this case, the new layer is updated, and potentially other layers of the model.

Preferably, the layers closest to the new layer are updated more than the layers further from the model, which ensures that the model is trained to recognise the specialised data, while retaining general image classification features from the generic model.

[0096] The training process may include incrementally updating the layers. For example, the training images may be used to update the model, after which the training images are provided to the updated model, and so on, until a desired level of training is reached.

[0097] Figure 6 illustrates a method 600 of compressing a specialised deep model, according to an embodiment of the present invention. The method 500 may be similar or identical to that used in step 425 of the method 400, for example.

[0098] At step 605, a base model is generated as a starting point for the compressed model. The base model includes the same number of outputs as the specialised deep model in which it will mimic, but fewer layers and parameters.

[0099] At step 610, low resolution images are provided to the base model, the low resolution image corresponding to a high resolution images on which the specialised deep model was trained.

[00100] At step 615, logit outputs of the base model generated using the low resolution images are compared with logit outputs of the deep model with the corresponding high resolution image.

[00101] At step 620, the base model is updated based upon the difference in logit outputs. This enables the base model to mimic the specialised deep model, as described above.

[00102] Steps 610-620 are then repeated until a desired level of training is reached, upon which the base model mimics the specialised deep model.

[00103] Figure 7 illustrates a computing device 700, according to an embodiment of the present invention. The computing device 700 may, for example, implement the methods 400, 500, 600 disclosed above. Similarly, the systems 100, 200, 300 disclosed above may be implemented on similar or identical hardware to that of the computing device 700. [00104] The computing device 700 includes a central processor 705, and a memory 710 coupled to the processor 705. The memory 710 includes instruction code, executable by the processor 705 to implement the methods 400, 500, 600, or parts thereof.

[00105] A camera 715 is coupled to the processor 705, and is configured to capture one or more images. The images may be stored on the memory 710 for immediate processing, or saved in a database 720, for later use (e.g. as training images).

[00106] The database 720 generally includes a plurality of training images, and

corresponding class identifiers, and one or more models. The training images enable the models to be adapted to suit specific data of the training images.

[00107] The processor 705 is further coupled to a data interface 725, which may in turn be coupled to a monitor and/or data input device, to enable a user to control the computing device 700. The data interface may also be coupled to input-output devices, such as portable memories, disk drives and the like.

[00108] Similarly, a network interface 730 is coupled to the processor to enable network based input, output and control. For example, the computing device may be configured to retrieve test images from an external data source on the Internet.

[00109] Finally, the computing device 700 includes a robotic arm 735, for picking fruit or destroying weeds, based upon the classification.

[00110] The skilled addressee will readily appreciate that the robotic arm may incorporate (or be replaced by) a spray nozzle (in case of herbicide based weed eradication), a harvesting tool (in case of a harvesting robot), mechanical implements (in case of mechanical weed destruction), or any other suitable tool or implement.

[00111] Similarly, the skilled addressee will readily appreciate that the computing device 700 may readily be adapted to suit other purposes, such as medical imaging, where blood cells, tissue and/or other material is classified, pedestrian recognition in the context of vehicle safety, classification of fruit or vegetables, or any other suitable classification or identification task.

[00112] EXAMPLE 1 - WEED CLASSIFICATION

[00113] A weed classification model generated according to the method 400 of Figure 4 was assessed for three weeds: volunteer cotton, sow thistle and wild oats. These weeds are herbicide resistant and of importance for Queensland. [00114] The training images were taken in a field, and validation images were taken in a similar field, at a similar time. The evaluation image set is completely separate data that was captured four months later in a similar field.

[00115] The classification was based on detecting a region of interest (Rol) and then determining a class for the Rol.

[00116] TABLE I -CLASSIFICATION ACCURACY OF THREE PLANTS IN USING 3,280 IMAGES WITH A SPLIT OF 2105:637:541 COTTON: SOW: WILD FOR VALIDATION AND 118 IMAGES FOR EVALUATION WITH A SPLIT OF 69:45:4.

[00117] Adapted-IV3 corresponds to a specialised deep model according to the method 500 of Figure 5 obtained by adapting a generic model. WeedNet- l corresponds to the method 500 of Figure 5, where the specialised deep model is trained from scratch (i.e. without adapting a generic model). AgNet corresponds to method 400 of Figure 4 where the specialised deep model is Adapted-IV3. For comparison purposes, LBP RF relates to local binary patterns (Ojala et al., 2002) with random forest.

[00118] The LBP RF system achieves an accuracy of 87.7% for the validation set and 83.9% for the test set. This is lower than the worst performing deep learnt feature, i.e. where the specialised deep model is trained from scratch, which achieves an accuracy of 97.1% for the validation set and 86.4% for the test set. Training a compressed model (AgNet), corresponding to the method 400 of Figure 4, leads to a more accurate model than training from scratch as shown by the results for AgNet vs WeedNet-vl where AgNet achieves an accuracy of 97.9% for the validation set and 89.8% for the evaluation set.

[00119] In short, deep networks (i.e. Adapted-IV3 and WeedNet-vl) perform the best, however training from scratch (WeedNet-vl) doesn't perform as well as fine-tuning a well trained model (Adapted-IV3). Training a compressed model (AgNet) leads to a more accurate model than training from scratch as shown by the results for AgNet and WeedNet-v 1.

[00120] Finally, AgNet is about an order of magnitude faster than Adapted-IV3, thus providing a trade off between speed and complexity against accuracy.

[00121] EXAMPLE 2 - WEED SEGMENTATION

[00122] A weed segmentation model was generated according to the method 400 of Figure 4 and was assessed on the Crop/Weed Field Image Dataset (CWFID) (Haug and Ostermann, 2014).

[00123] In this case, each pixel was classified as either crop or weed by extracting a region of interest (Rol) around the pixel.

[00124] TABLE II - ACCURACY RATES IN CLASSIFICATION OF WEEDS IN CWFID DATA SET

[00125] Adapted-IV3 corresponds to the specialised deep model according to the method 500 of Figure 5. As can be seen, deep networks (i.e. Adapted-IV3, AgNet, Mix-AgNet,

Minilnception and Mix-Minilnception) outperform handcrafted methods (LBP and Shape+Stat.). Minilnception and AgNet correspond to method 400 of Figure 4 using different network architectures (Minilnception and AgNet). Mix-Minilnception and Mix-AgNet correspond to a combination of N compressed models corresponding to the system 300 of Figure 3. Using a mixture (Mix-AgNet or Mix-Minilnception) or combination of compressed models (AgNet or Minilnception) improves the accuracy of the system but also increases the complexity (number of parameters) and decreases the speed of the system, as can be seen in Table II. [00126] EXAMPLE 3 - CROP SEGMENTATION

[00127] A weed segmentation model was generated according to the method 400 of Figure 4 and was assessed in relation to capsicum (sweet pepper) segmentation on the Sweet Pepper Dataset (McCool et al., 2016). In this case, each pixel was classified as being capsicum or not by extracting a region of interest (Rol) around the pixel.

[00128] TABLE III - ACCURACY RATES ΓΝ CLASSIFICATION OF CAPSICUM USING AUC METRIC AS USED IN (McCool et al, 2016).

[00129] As can be seen, deep networks (i.e Adapted-IV3, Minilnception and Mix- Minilnception) outperform the two baseline methods (McCool et al., 2016), which comprise a combination of three visual features and colour (Baseline-Fusion), and the single best visual feature (Baseline-LBP). Minilnception and AgNet correspond to the method 400 of Figure 4, and Adapted-IV3 corresponds to the specialised deep model according to the method 500 of Figure 5. Mix-Minilnception corresponds to a combination of N compressed models (Minilnception) corresponding to the system 300 of Figure 3.

[00130] The methods and systems provide a clear trade off between the number of parameters and speed against classification accuracy.

[00131] By adapting a deep network rather than training it from scratch when limited data is available, overall accuracy can be improved.

[00132] In addition to providing improved accuracy, embodiments of the present invention provide improvements in computational efficiency (speed) by more than an order of magnitude.

[00133] In the present specification and claims (if any), the word 'comprising' and its derivatives including 'comprises' and 'comprise' include each of the stated integers but does not exclude the inclusion of one or more further integers.

[00134] Reference throughout this specification to One embodiment' or 'an embodiment' means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearance of the phrases 'in one embodiment' or 'in an embodiment' in various places throughout this specification are not necessarily all referring to the same embodiment.

Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more combinations.

[00135] In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims (if any) appropriately interpreted by those skilled in the art.

CITATION LIST

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, Going Deeper with Convolutions, Computer Vision and Pattern Recognition, 2015

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg A, Fei-Fei L, ImageNet Large Scale Visual Recognition Challenge.

International Journal of Computer Vision, 2015

Haug S, Ostermann J, A crop /weed field image dataset for the evaluation of computer vision based precision agriculture tasks, ECCV Workshop on Computer Vision Problems in Plant Phenotyping, 2014.

McCool C, Sa I, Dayoub F, Lehnert C, Perez T, Upcroft B, Visual detection of occluded crop: For automated harvesting, ΓΕΕΕ International Conference on Robotics and Automation, 2016.

Ojala T., Pietikainen M., Maenpaa T. (2002). "Multi-Resolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns." IEEE Transactions on Pattern Analysis and Machine Intelligence 24(7): 971-987

Claims

1. A method for generating a compressed machine learning model including:

2. The method of claim 1, wherein the first and second machine learning models comprise convolutional neural networks.

3. The method of claim 2, wherein the first and second machine learning models comprise deep convolutional neural networks.

4. The method of claim 1, wherein the second machine learning model is updated to mimic, at least in part, the first machining learning model according to the first and second sets of outputs.

5. The method of claim 1, wherein the compressed machine learning model is configured to classify an image.

6. The method of claim 5, wherein the compressed machine learning model is configured to identify objects in an image.

7. The method of claim 6, wherein the objects comprise one or more of a fruit, a plant, and/or a weed.

8. The method of claim 1, wherein the compressed machine learning model includes fewer layers and/or parameters than the first machine learning model.

9. The method of claim 1, wherein the first and second sets of outputs include logit outputs.

10. The method of claim 1, wherein either: the second set of training images is down- sampled from the first set of training images; or the first set of training images is up-sampled from the second set of training images.

11. The method of claim 1, wherein the method includes generating the first machine learning model, at least in part using the first set of training images.

12. The method of claim 1, wherein the first machine learning model comprises a specialised deep model that is generated by adapting a non- specialised deep model and using the first set of training images.

13. The method of claim 12, wherein the specialised deep model is generated by generating a new output layer, having outputs corresponding to classes of the training images, replacing a final decision layer of the non-specialised deep model with the new output layer, and using the first set of training images to train at least the replaced final decision layer of the non- specialised deep model.

14. The method of claim 13, wherein the new output layer is randomly initialised.

15. The method of claim 12, wherein one or more layers of the specialised deep model are fixed during training.

16. The method of claim 12, wherein weights are applied to parameters of the specialised deep model defining how much each of the parameters is influenced during training.

17. The method of claim 12, wherein the non-specialised deep model has been trained with non-specific or generic image data.

18. The method of claim 17, wherein the non-specific image data does not include data relating to classifiers of the training images.

19. The method of claim 12, wherein the non-specialised deep model has been trained using more images than the number of training images in the first set of training images and the second set of training images.

20. The method of claim 19, wherein the non-specialised deep model has been trained using at least 10 times more images than the number of training images in the first set of training images and the second set of training images.

21. The method of claim 1, further including receiving a further image, and classifying the further image according to the compressed model.

22. The method of claim 21, wherein the further image is generated by applying a sliding window to a larger image.

23. The method of claim 1, wherein the compressed model includes a plurality of classifiers.

24. The method of claim 1, wherein the compressed model is used in a computer vision task of a robot.

25. The method of claim 1, wherein the compressed model is configured for weed classification or crop segmentation.

26. The method of claim 1, further comprising generating a plurality of models, and using the plurality of models together to classify an image.

27. The method of claim 26, wherein the plurality of models are generated using the same method, but with the training images provided in a different order.

28. A system for generating a compressed machine learning model including:

a processor,

29. The system of claim 28, further including a camera, for capturing an image, wherein the instruction code is further executable to classify the captured image according to the compressed machine learning model.

30. The system of claim 28, further including one or more actuators, configured to act based upon classification of the captured image.

31. The system of claim 30, wherein the actuators form part of, or are associated with, a robotic arm, a spray nozzle, a harvesting tool or a mechanical implement.

32. A system for classifying images using a model generated according to the method of claim 1 or the system of claim 28.