A multi-modal retrieval augmented framework for user editable 3D CAD model generation

Ananthakrishnan, A.

A multi-modal retrieval augmented framework for user editable 3D CAD model generation

dc.contributor.author	Ananthakrishnan, A.
dc.contributor.author	Bharathi, Anush
dc.contributor.author	Dharanivendhan, V.
dc.contributor.author	Ramanathan, M.
dc.date.accessioned	2025-07-30T08:18:52Z
dc.date.available	2025-07-30T08:18:52Z
dc.date.issued	2025
dc.description.abstract-translated	Computer-Aided Design (CAD) has revolutionized design and manufacturing by enabling precise, complex models in collaborative environments. While similar CAD models with application-specific modifications are often required, designs are typically created from scratch due to challenges in retrieving existing models or generating editable ones. Although parametric CAD modeling has advanced through deep generative approaches treating CAD as a language task to generate user-editable designs, building truly scalable multi-modal datasets and networks tailored for 3D design tasks, particularly in engineering domains remains a significant challenge. Developing such datasets, especially those incorporating images, point clouds and user-like text and hand-drawn sketches is difficult as these modalities demand fine-grained geometric understanding and extensive human-in-the-loop evaluations. While large foundational models like CLIP have improved cross-modal retrieval, they are primarily trained on natural images and fail to capture the geometric and structural complexities inherent to CAD data. In this paper, we propose a novel multi-modal pipeline for CAD command sequence generation using state-of-the-art Vision-Language Models (VLMs). We introduce a unique multimodal CAD dataset comprising hand-drawn sketches, CAD command sequences, images and basic text prompts. These modalities are integrated through a Multi-modal Retrieval-Augmented Generation (MM-RAG) framework to enable user-editable CAD model retrieval and generation. Our RAG-based pipeline streamlines the CAD design process by enabling iterative, user-guided model generation based on simple sketches or text queries. This approach aims to streamline CAD model design by creating an advanced, end-to-end pipeline that supports design workflows. The dataset and code will be made publicly available at: https://github.com/ananthu2014/cadrag.	en
dc.format	12 s.	cs
dc.format.mimetype	application/pdf
dc.identifier.doi	http://www.doi.org/10.24132/JWSCG.2025-11
dc.identifier.issn	1213-6972 (print)
dc.identifier.issn	1213-6964 (online)
dc.identifier.uri	http://hdl.handle.net/11025/62205
dc.language.iso	en	en
dc.publisher	Václav Skala - UNION Agency	cs
dc.rights	© Václav Skala - UNION Agency	en
dc.rights.access	openAccess	en
dc.subject	počítačem podporované navrhování (CAD)	cs
dc.subject	vyhledávání 3D tvarů	cs
dc.subject	multimodální datová sada	cs
dc.subject.translated	Computer Aided Design(CAD)	en
dc.subject.translated	3D shape retrieval	en
dc.subject.translated	multi-modal dataset	en
dc.title	A multi-modal retrieval augmented framework for user editable 3D CAD model generation	en
dc.type	článek	cs
dc.type	article	en
dc.type.status	Peer-reviewed	en
dc.type.version	publishedVersion	en
local.files.count	1	*
local.files.size	2440960	*
local.has.files	yes	*

Files

Original bundle

Showing 1 - 1 out of 1 results

Name:: D11.pdf
Size:: 2.33 MB
Format:: Adobe Portable Document Format

Download

License bundle

Showing 1 - 1 out of 1 results

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Volume 33, number 1-2 (2025)