Deep learning package to classify objects using vision language models. A brief summary of the item is not available. Add a brief summary about the item.
Deep learning package
by
Item created: Dec 11, 2024 Item updated: Mar 3, 2025 Number of downloads: 531
Description
This Deep Learning Package (DLPK) acts as a bridge between ArcGIS Pro and vision language models from OpenAI and Meta. Vision-language models are renowned for their advanced capabilities in natural language processing and understanding, as well as their ability to interpret and generate human-like text. The integration of these models into a DLPK enhances their utility by enabling them to process images and perform zero-shot classification of objects in imagery.
Use this deep learning package to leverage the power of large vision language models to perform object classification on images and rasters within ArcGIS Pro. This DLPK allows for flexibility in classifying objects, as it is not restricted to predefined classes; users can specify custom class labels at the time of running the tool. This capability opens up new avenues for analysis and interpretation of spatial data, making it easier for professionals in fields such as environmental science, urban planning, and remote sensing to extract meaningful insights from their visual datasets.
Note: This model requires internet connection to work. The data used for classification, including the imagery and possible class labels, will be shared with OpenAI when using the GPT models. However, if you are using the Llama Vision model, it operates locally and does not require an internet connection, ensuring that your data remains on your machine without being shared externally. This model is not supported in ArcGIS Online.
Using the model
Follow the guide to use the model. Before using the Llama vision model, ensure that the supported deep learning libraries are installed. For more details, check the Deep Learning Libraries Installer for ArcGIS. OpenAI models do not require deep learning libraries to be installed.
Fine-tuning the model
This model cannot be fine-tuned using ArcGIS.
Input
8-bit RGB imagery.
Output
Feature class with classification of features in the imagery.
Applicable geographies
This model is expected to work well globally.
Model architecture
The implementation uses OpenAI's vision language models or Llama Vision models.
Sample results
Here are a few sample results from the model.
An in-depth description of the item is not available.
Layers
Tools
Tables
Basemap
Project Contents:
Solution Contents
Contents
Layers
Screenshots
Terms of Use
No special restrictions or limitations on using the item's content have been provided.
Details
Dashboard views: Desktop
Source:
Creating data in:
Published as:
Other Views:
Dependent items in the recycle bin
Applicable: 2d
Size: 29.926 MB
ID: 6ec47aa99588404aa74246d62896dbb9
Image Count: 0
Image Properties
Layer Drawing
Using tiles from a cache
Dynamically from data
Share
Owner
Folder
Categories
This item has not been categorized.
Tags
dlpk, prompt, zeroshot, zeroshotclassifier, vision, language, vision language
Credits (Attribution)
No acknowledgements.OpenAI, Esri
Comments (0)