A multi-modal retrieval augmented framework for user editable 3D CAD model generation
| dc.contributor.author | Ananthakrishnan, A. | |
| dc.contributor.author | Bharathi, Anush | |
| dc.contributor.author | Dharanivendhan, V. | |
| dc.contributor.author | Ramanathan, M. | |
| dc.date.accessioned | 2025-07-30T08:18:52Z | |
| dc.date.available | 2025-07-30T08:18:52Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract-translated | Computer-Aided Design (CAD) has revolutionized design and manufacturing by enabling precise, complex models in collaborative environments. While similar CAD models with application-specific modifications are often required, designs are typically created from scratch due to challenges in retrieving existing models or generating editable ones. Although parametric CAD modeling has advanced through deep generative approaches treating CAD as a language task to generate user-editable designs, building truly scalable multi-modal datasets and networks tailored for 3D design tasks, particularly in engineering domains remains a significant challenge. Developing such datasets, especially those incorporating images, point clouds and user-like text and hand-drawn sketches is difficult as these modalities demand fine-grained geometric understanding and extensive human-in-the-loop evaluations. While large foundational models like CLIP have improved cross-modal retrieval, they are primarily trained on natural images and fail to capture the geometric and structural complexities inherent to CAD data. In this paper, we propose a novel multi-modal pipeline for CAD command sequence generation using state-of-the-art Vision-Language Models (VLMs). We introduce a unique multimodal CAD dataset comprising hand-drawn sketches, CAD command sequences, images and basic text prompts. These modalities are integrated through a Multi-modal Retrieval-Augmented Generation (MM-RAG) framework to enable user-editable CAD model retrieval and generation. Our RAG-based pipeline streamlines the CAD design process by enabling iterative, user-guided model generation based on simple sketches or text queries. This approach aims to streamline CAD model design by creating an advanced, end-to-end pipeline that supports design workflows. The dataset and code will be made publicly available at: https://github.com/ananthu2014/cadrag. | en |
| dc.format | 12 s. | cs |
| dc.format.mimetype | application/pdf | |
| dc.identifier.doi | http://www.doi.org/10.24132/JWSCG.2025-11 | |
| dc.identifier.issn | 1213-6972 (print) | |
| dc.identifier.issn | 1213-6964 (online) | |
| dc.identifier.uri | http://hdl.handle.net/11025/62205 | |
| dc.language.iso | en | en |
| dc.publisher | Václav Skala - UNION Agency | cs |
| dc.rights | © Václav Skala - UNION Agency | en |
| dc.rights.access | openAccess | en |
| dc.subject | počítačem podporované navrhování (CAD) | cs |
| dc.subject | vyhledávání 3D tvarů | cs |
| dc.subject | multimodální datová sada | cs |
| dc.subject.translated | Computer Aided Design(CAD) | en |
| dc.subject.translated | 3D shape retrieval | en |
| dc.subject.translated | multi-modal dataset | en |
| dc.title | A multi-modal retrieval augmented framework for user editable 3D CAD model generation | en |
| dc.type | článek | cs |
| dc.type | article | en |
| dc.type.status | Peer-reviewed | en |
| dc.type.version | publishedVersion | en |
| local.files.count | 1 | * |
| local.files.size | 2440960 | * |
| local.has.files | yes | * |