Towards Zero-Shot Camera Trap Image Categorization

Vyskočil, Jiří

Towards Zero-Shot Camera Trap Image Categorization

dc.contributor.author	Vyskočil, Jiří
dc.contributor.author	Picek, Lukáš
dc.date.accessioned	2026-03-24T19:05:25Z
dc.date.available	2026-03-24T19:05:25Z
dc.date.issued	2025
dc.date.updated	2026-03-24T19:05:25Z
dc.description.abstract	This paper describes the search for an alternative approach to the automatic categorization of camera trap images. First, we benchmark state-of-the-art classifiers using a single model for all images. Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting. Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario. Evaluation carried out on two publicly available datasets (WCT from New Zealand, CCT20 from the Southwestern US) and a private dataset (CEF from Central Europe) revealed that combining MegaDetector with two separate classifiers achieves the highest accuracy. This approach reduced the relative error of a single BEiTV2 classifier by approximately 42\% on CCT20, 48\% on CEF, and 75\% on WCT. Besides, as the background is removed, the error in terms of accuracy in new locations is reduced to half. The proposed zero-shot pipeline based on DINOv2 and FAISS achieved competitive results (1.0\% and 4.7\% smaller on CCT20, and CEF, respectively), which highlights the potential of zero-shot approaches for camera trap image categorization.	en
dc.format	17
dc.identifier.doi	10.1007/978-3-031-92387-6_3
dc.identifier.isbn	978-3-031-92386-9
dc.identifier.issn	0302-9743
dc.identifier.obd	43944169
dc.identifier.orcid	Vyskočil, Jiří 0000-0002-6443-2051
dc.identifier.orcid	Picek, Lukáš 0000-0002-6041-9722
dc.identifier.uri	http://hdl.handle.net/11025/67353
dc.language.iso	en
dc.project.ID	SS05010008
dc.publisher	Springer
dc.relation.ispartofseries	Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024
dc.subject	camera traps	en
dc.subject	classification	en
dc.subject	retrieval	en
dc.subject	BLIP	en
dc.subject	DINOv2	en
dc.subject	zero-shot	en
dc.subject	vision and language	en
dc.subject	ChatGPT	en
dc.subject	SAM	en
dc.subject	MegaDetector	en
dc.title	Towards Zero-Shot Camera Trap Image Categorization	en
dc.type	Stať ve sborníku (D)
dc.type	STAŤ VE SBORNÍKU
dc.type.status	Published Version
local.files.count	1	*
local.files.size	3339477	*
local.has.files	yes	*
local.identifier.eid	2-s2.0-105007140291

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: 978-3-031-92387-6_3.pdf
Size:: 3.18 MB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Conference papers (NTIS)