Generative Design-for-Robot-Assembly (GDfRA) involves generating and physically constructing a design based on a natural language prompt (e.g., “giraffe”) and an image of available physical components. Blox-Net is a GDfRA system which leverages a Vision Language Model, physics simulator, and real robot to generate and reliably assemble designs resembling a text prompt from 3D printed blocks with no human input. Blox-Net achieves a 99.2% block placement success rate!
Blox-Net generates designs by sequentially prompting GPT-4o to develop a design plan and place blocks in a simulator. Then, GPT-4o receives visual and stability feedback from the simulator and makes adjustments. This process is performed 10 times in parallel and GPT-4o is used to select the design which looks the best. An automated perturbation redesign process introduces tolerances and improves stability to improve the constructability of the design.
If you use this work or find it helpful, please consider citing: (bibtex)
@misc{goldberg2024bloxnet, title={Blox-Net: Generative Design-for-Robot-Assembly Using VLM Supervision, Physics Simulation, and a Robot with Reset}, author={Andrew Goldberg and Kavish Kondap and Tianshuang Qiu and Zehan Ma and Letian Fu and Justin Kerr and Huang Huang and Kaiyuan Chen and Kuan Fang and Ken Goldberg}, year={2024}, eprint={2409.17126}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2409.17126}, }
Credit: The design of this project page is heavily based on LERF.