Towards Zero-Shot Camera Trap Image Categorization

Vyskočil, Jiří

Towards Zero-Shot Camera Trap Image Categorization

Files

978-3-031-92387-6_3.pdf (3.18 MB)

Date issued

2025

Authors

Vyskočil, Jiří

Picek, Lukáš

Publisher

Springer

Abstract

This paper describes the search for an alternative approach to the automatic categorization of camera trap images. First, we benchmark state-of-the-art classifiers using a single model for all images. Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting. Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario. Evaluation carried out on two publicly available datasets (WCT from New Zealand, CCT20 from the Southwestern US) and a private dataset (CEF from Central Europe) revealed that combining MegaDetector with two separate classifiers achieves the highest accuracy. This approach reduced the relative error of a single BEiTV2 classifier by approximately 42\% on CCT20, 48\% on CEF, and 75\% on WCT. Besides, as the background is removed, the error in terms of accuracy in new locations is reduced to half. The proposed zero-shot pipeline based on DINOv2 and FAISS achieved competitive results (1.0\% and 4.7\% smaller on CCT20, and CEF, respectively), which highlights the potential of zero-shot approaches for camera trap image categorization.

Subject(s)

camera traps, classification, retrieval, BLIP, DINOv2, zero-shot, vision and language, ChatGPT, SAM, MegaDetector

Item identifier

http://hdl.handle.net/11025/67353
https://doi.org/10.1007/978-3-031-92387-6_3

Collections

Conference papers (NTIS)

Show full item record

Towards Zero-Shot Camera Trap Image Categorization

Files

Date issued

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Subject(s)

Citation

Item identifier

Collections