Meta has recently unveiled an artificial intelligence model called the Segment Anything Model (SAM). This is a program that can quickly recognize and remove specific objects within an image or a video. Along with SAM, Meta has also released the Segment Anything 1-Billion (SA-1B) dataset. With this, they claim it is the largest segmentation dataset ever made.
The reason why Meta made the dataset available to the public was to decrease the demand for “task-specific modeling expertise, training compute, and custom data annotation for image segmentation”. The SAM model is said to be built using the SA-1B dataset, which is made up of 11 million high-resolution and privacy-respecting photos with over 1 billion masks.
SAM allows users to choose objects in an image with just a click or by entering text commands. Let’s take an example of a photo of wild animals. Writing the word tiger would prompt the tool to identify and draw boxes around each tiger within the photo. SAM generates many valid masks in case there are any uncertainties in the object being segmented. This is a crucial and essential capability for solving segmentations in the real world.
The model relies on a transformer vision network. This helps easily find the connection between two sequential pieces of data, such as words in a phrase or objects in a photo. Meta also revealed that the model has the ability to quickly segment an object in under 50 milliseconds after getting a prompt.
Today we're releasing the Segment Anything Model (SAM) — a step toward the first foundation model for image segmentation.
— Meta AI (@MetaAI) April 5, 2023
In previous segmentation models, there were two groups of ways to tackle any segmentation problem. The first method, interactive segmentation, required the presence of an individual to lead the process by repeatedly refining a mask. The second method is called automatic segmentation. This requires a significant number of manually annotated objects. It also needs the assistance of necessary computing resources and technical knowledge regarding the training of the segmentation model.
SAM is a single model that can carry both of these segmentation techniques with ease. This removes the requirement for users to gather segmentation data and customize a model. In short, it will help the users save time and effort. The interface of the model also allows people to easily access and use it in flexible ways.
In addition to these, Meta shared that they tested the SAM model to interactively annotate photos, and the freshly annotated data was then used to update SAM in turn. However, the developers discovered that depending only on image annotation wasn’t sufficient to produce the 1 billion mask dataset. This was one of the primary reasons that Meta designed the SA-1B dataset, which includes three main processes.
As mentioned above, the model assists annotators in the first gear. Meanwhile, the second procedure combines automatic and assisted annotations. This is to increase the diversity of the gathered masks. The third and final process of the dataset is completely automatic mask creation, allowing the dataset to reach more than the required 1.1 million segmentation masks.
The Segment Anything Model (SAM) by Meta AI is a step toward the first foundation model for image segmentation. SAM is capable of one-click segmentation of any object from photos or videos + zero-shot transfer to other segmentation tasks.
— Meta AI (@MetaAI) April 11, 2023
SAM – Additional uses
According to Meta, SAM has a wide range of applications and can be used right out of the box in new image domains. An example would be underwater photography or cell microscopy without any sort of extra training (zero-shot transfer). The company further added that the model could be useful in power applications.
There are other use cases for SAM in other fields that require identifying and segmenting objects in images. This application can help programmers get a clear view of the visual and text content of a webpage. Meta hopes that SAM would also prove useful in the VR domain like choosing an object based on a user’s gaze and lifting it into 3D. For content creators, the model can enhance creative applications by providing more options like removing image sections for collages or video editing. They also stated that the model could benefit many scientific studies related to natural occurrences on Earth or even in space.
Photo credit: The feature image has been provided by Meta for press usage.