Create, annotate, review a Dataset
In this tutorial we will go through the creation of a training-ready dataset from raw images.
Last updated
In this tutorial we will go through the creation of a training-ready dataset from raw images.
Last updated
Here are all the steps that you will acomplish :
Upload your images to the Datalake
Create a new dataset version with tags
Set the labels for your dataset
Annotate your images
Review your annotations
This journey starts in the Datalake
Click on the green Upload data
.
You should see a modal with an input field and a button. In the input, let's write some tags so we will be able to search for our images later and then click on Browse files to open your local disk and select some images.
Now click on Upload and go grab a cup of coffee while your files are uploading to your Datalake.
At the top of the page, you will see a text input that allows you to search through your Datalake using our Data Query Language (DQL).
For example, if you want to find the images we just uploaded, you can search for the tag 'new_tag' that we applied during upload like so :
We can see that id does show us only the 21 images we just uploaded !
Now that your images are filtered with your tag, you can click on the Select All
button to select them :
You now have access to several options :
Now that we have selected some assets, we want to create a dataset that only contains them. To do this click on the green create dataset button.
A new modal should appear that allows you to choose a name for your dataset and also write a short description.
When you have entered the desired information, please click on the green Create button.
You should see a green message at the top of the screen saying that your dataset has been created successfully.
Now you are still in your Datalake, click on the 'All Datasets' link in the sidebar to see our brand new dataset.
In your dataset list, click on the card corresponding to your new dataset.
Now that we have created the dataset, let's get to the next step.
Before annotating, we have to choose what this dataset will be used for and set up the labels, to do this please click on the settings
tabs then click on labels
.
You are now on the page to configure your labels. First, you need to select the type of dataset you want.
This will restrain the available tools for annotation, for example if you select Object detection, you will be able to annotate with bounding-boxes only.
Speaking about this, let's choose Object detection as a type for our dataset.
Now you have the ability to enter your labels, we will create three labels here named cat, bird, person.
Click on Create Labels to finish the set up. You will be redirected to your dataset and you should now see the labels you just created display in the upper-middle card.
Now that you have set up labels for this dataset, you should see green buttons in the last column of the table listing your images.
Click on Annotate in the first row to start annotating your dataset.
To start drawing bounding-boxes, let's click on the desired label on the upper-right of the interface.
Now you can click and drag and you will see your bounding-box appear on the image, release the left click to finish the shape.
Congratulations ! You've just annotated your first object 🎉
Repeat the operation with as many shapes as you need. Once you are finished, click on the Save button on the right of the interface to validate your annotations.
You will be automatically moved to the next image. Now that you know how to annotate one image and save your work, you can do it on as many image as you want.
When you consider having annotated enough images, click on the back to dataset on the upper-left of the screen.
Now you should be on your dataset page. We can see that our table has changed and contains more information than earlier, let's see the first row.
Let's describe the new columns quickly :
Instances: The object annotated in one image
Annotations : The list of people that have annotated the image.
Now imagine that one of your colleague has annotated all the images (yes, you can thank him), now you want to check if his annotations are accurate so they don't mess with your algorithm later.
This is what we call the review process, to do this, click on your profile picture thumbnail in the Annotations column in the first row.
You are back to the interface used for annotation, but if you pay attention, you will see that some element has changed.
On the left side, you should see something like this
This means that you are looking at an annotation that you made on 2021-06-14, if it was someone else you would see its profile pic and its username.
Now you will want to review all the images annotated and either accept or reject the annotations so they will be flaged and you (or your annotator colleague) will know what annotation to fix.
To do this, look at the bottom-right part of the screen , you should see two buttons
If you think the annotation is ok, you can click on accepted, otherwise you can click on rejected so you will know you have to pay more attention to this image later.
Once you click on review or accepted, you don't need to click on the Save button to validate, you can change the image or leave the interface.
When you have finished reviewing all of your images (if you need to) you can go back to your dataset.
Tadam, you now have a pixel-perfect annotated dataset ready for training.
If you want to use it as ground-truth for your next training, please follow the next tutorial that will guides you through all the steps from project creation to having a trained model.