A multi-modal retrieval augmented framework for user editable 3D CAD model generation

dc.contributor.authorAnanthakrishnan, A.
dc.contributor.authorBharathi, Anush
dc.contributor.authorDharanivendhan, V.
dc.contributor.authorRamanathan, M.
dc.date.accessioned2025-07-30T08:18:52Z
dc.date.available2025-07-30T08:18:52Z
dc.date.issued2025
dc.description.abstract-translatedComputer-Aided Design (CAD) has revolutionized design and manufacturing by enabling precise, complex models in collaborative environments. While similar CAD models with application-specific modifications are often required, designs are typically created from scratch due to challenges in retrieving existing models or generating editable ones. Although parametric CAD modeling has advanced through deep generative approaches treating CAD as a language task to generate user-editable designs, building truly scalable multi-modal datasets and networks tailored for 3D design tasks, particularly in engineering domains remains a significant challenge. Developing such datasets, especially those incorporating images, point clouds and user-like text and hand-drawn sketches is difficult as these modalities demand fine-grained geometric understanding and extensive human-in-the-loop evaluations. While large foundational models like CLIP have improved cross-modal retrieval, they are primarily trained on natural images and fail to capture the geometric and structural complexities inherent to CAD data. In this paper, we propose a novel multi-modal pipeline for CAD command sequence generation using state-of-the-art Vision-Language Models (VLMs). We introduce a unique multimodal CAD dataset comprising hand-drawn sketches, CAD command sequences, images and basic text prompts. These modalities are integrated through a Multi-modal Retrieval-Augmented Generation (MM-RAG) framework to enable user-editable CAD model retrieval and generation. Our RAG-based pipeline streamlines the CAD design process by enabling iterative, user-guided model generation based on simple sketches or text queries. This approach aims to streamline CAD model design by creating an advanced, end-to-end pipeline that supports design workflows. The dataset and code will be made publicly available at: https://github.com/ananthu2014/cadrag.en
dc.format12 s.cs
dc.format.mimetypeapplication/pdf
dc.identifier.doihttp://www.doi.org/10.24132/JWSCG.2025-11
dc.identifier.issn1213-6972 (print)
dc.identifier.issn1213-6964 (online)
dc.identifier.urihttp://hdl.handle.net/11025/62205
dc.language.isoenen
dc.publisherVáclav Skala - UNION Agencycs
dc.rights© Václav Skala - UNION Agencyen
dc.rights.accessopenAccessen
dc.subjectpočítačem podporované navrhování (CAD)cs
dc.subjectvyhledávání 3D tvarůcs
dc.subjectmultimodální datová sadacs
dc.subject.translatedComputer Aided Design(CAD)en
dc.subject.translated3D shape retrievalen
dc.subject.translatedmulti-modal dataseten
dc.titleA multi-modal retrieval augmented framework for user editable 3D CAD model generationen
dc.typečlánekcs
dc.typearticleen
dc.type.statusPeer-revieweden
dc.type.versionpublishedVersionen
local.files.count1*
local.files.size2440960*
local.has.filesyes*

Files

Original bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
D11.pdf
Size:
2.33 MB
Format:
Adobe Portable Document Format
License bundle
Showing 1 - 1 out of 1 results
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: