Image data requirements

This page describes data requirements to train models on image data.

Data Requirements

Training data

AI Builder expects image classification data to be JPG or PNG files organized in folders that correspond to the categories of the classification. To load images into Model Builder, provide the path to a single top-level directory:

  • This top-level directory contains one subfolder for each of the categories to predict.

  • Each subfolder contains the image files belonging to its category.

Best practices for image data used to train AutoML models

  • AutoML models are optimized for photographs of objects in the real world.

  • The training data should be as close as possible to the data on which predictions are to be made. For example, if your use case involves blurry and low-resolution images (such as from a security camera), your training data should be composed of blurry, low-resolution images. In general, you should also consider providing multiple angles, resolutions, and backgrounds for your training images.

  • Robot AI models can't generally predict labels that humans can't assign. So, if a human can't be trained to assign labels by looking at the image for 1-2 seconds, the model likely can't be trained to do it either.

  • We recommend about 1000 training images per label. The minimum per label is 10. In general it takes more examples per label to train models with multiple labels per image, and resulting scores are harder to interpret. The model works best when there are at most 100x more images for the most common label than for the least common label. We recommend removing very low frequency labels.

Object Detection

General image requirements

Supported file types

JPEG, PNG

Types of images

AutoML models are optimized for photographs of objects in the real world.

Training image file size (MB)

30MB maximum size.

Prediction image file* size (MB)

1.5MB maximum size.

Image size (pixels)

1024 pixels by 1024 pixels suggested maximum. For images much larger than 1024 pixels by 1024 pixels some image quality may be lost during image normalization process.

Label instances for training

10 annotations (instances) minimum.

Annotation requirements

For each label you must have at least 10 images, each with at least one annotation (bounding box and the label).

Label ratio (most common label to least common label):

The model works best when there are at most 100x more images for the most common label than for the least common label.

Bounding box edge length

At least 0.01 * length of a side of an image. For example, a 1000 * 900 pixel image would require bounding boxes of at least 10 * 9 pixels. Bound box minium size: 8 pixels by 8 pixels. Note: The final bounding box pixel size is subject to preprocessing resizing. For more information, see Internal image preprocessing.

Training data and dataset requirements

The training data should be as close as possible to the data on which predictions are to be made. For example, if your use case involves blurry and low-resolution images (such as from a security camera), your training data should be composed of blurry, low-resolution images. In general, you should also consider providing multiple angles, resolutions, and backgrounds for your training images.

Internal image preprocessing

After images are imported, Robot AI performs preprocessing on the data. The preprocessed images are the actual data used to train the model. Image preprocessing (resizing) occurs when the image's smallest edge is greater than 1024 pixels. In the case where the image's smaller side is greater than 1024 pixels, that smaller side is scaled down to 1024 pixels. The larger side and specified bounding boxes are both scaled down by the same amount as the smaller side. Consequently, any scaled down annotations (bounding boxes and labels) are removed if they are less than 8 pixels by 8 pixels. Images with a smaller side less than or equal to 1024 pixel are not subject to preprocessing resizing.

Last updated