Towards Zero-Shot Camera Trap Image Categorization
| dc.contributor.author | Vyskočil, Jiří | |
| dc.contributor.author | Picek, Lukáš | |
| dc.date.accessioned | 2026-03-24T19:05:25Z | |
| dc.date.available | 2026-03-24T19:05:25Z | |
| dc.date.issued | 2025 | |
| dc.date.updated | 2026-03-24T19:05:25Z | |
| dc.description.abstract | This paper describes the search for an alternative approach to the automatic categorization of camera trap images. First, we benchmark state-of-the-art classifiers using a single model for all images. Next, we evaluate methods combining MegaDetector with one or more classifiers and Segment Anything to assess their impact on reducing location-specific overfitting. Last, we propose and test two approaches using large language and foundational models, such as DINOv2, BioCLIP, BLIP, and ChatGPT, in a zero-shot scenario. Evaluation carried out on two publicly available datasets (WCT from New Zealand, CCT20 from the Southwestern US) and a private dataset (CEF from Central Europe) revealed that combining MegaDetector with two separate classifiers achieves the highest accuracy. This approach reduced the relative error of a single BEiTV2 classifier by approximately 42\% on CCT20, 48\% on CEF, and 75\% on WCT. Besides, as the background is removed, the error in terms of accuracy in new locations is reduced to half. The proposed zero-shot pipeline based on DINOv2 and FAISS achieved competitive results (1.0\% and 4.7\% smaller on CCT20, and CEF, respectively), which highlights the potential of zero-shot approaches for camera trap image categorization. | en |
| dc.format | 17 | |
| dc.identifier.doi | 10.1007/978-3-031-92387-6_3 | |
| dc.identifier.isbn | 978-3-031-92386-9 | |
| dc.identifier.issn | 0302-9743 | |
| dc.identifier.obd | 43944169 | |
| dc.identifier.orcid | Vyskočil, Jiří 0000-0002-6443-2051 | |
| dc.identifier.orcid | Picek, Lukáš 0000-0002-6041-9722 | |
| dc.identifier.uri | http://hdl.handle.net/11025/67353 | |
| dc.language.iso | en | |
| dc.project.ID | SS05010008 | |
| dc.publisher | Springer | |
| dc.relation.ispartofseries | Workshops that were held in conjunction with the 18th European Conference on Computer Vision, ECCV 2024 | |
| dc.subject | camera traps | en |
| dc.subject | classification | en |
| dc.subject | retrieval | en |
| dc.subject | BLIP | en |
| dc.subject | DINOv2 | en |
| dc.subject | zero-shot | en |
| dc.subject | vision and language | en |
| dc.subject | ChatGPT | en |
| dc.subject | SAM | en |
| dc.subject | MegaDetector | en |
| dc.title | Towards Zero-Shot Camera Trap Image Categorization | en |
| dc.type | Stať ve sborníku (D) | |
| dc.type | STAŤ VE SBORNÍKU | |
| dc.type.status | Published Version | |
| local.files.count | 1 | * |
| local.files.size | 3339477 | * |
| local.has.files | yes | * |
| local.identifier.eid | 2-s2.0-105007140291 |
Files
Original bundle
1 - 1 out of 1 results
No Thumbnail Available
- Name:
- 978-3-031-92387-6_3.pdf
- Size:
- 3.18 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 out of 1 results
No Thumbnail Available
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: