Blox-Net

Generative Design-for-Robot-Assembly using VLM Supervision, Physics Simulation, and A Robot with Reset

1UC Berkeley   2Cornell

TL;DR: Blox-Net iteratively prompts a VLM with a simulation in-the-loop to generate designs based on a text prompt using a set of blocks and assembles them with a real robot!

Click the thumbnails below to watch Blox-Net build generated assemblies

Overview

Generative Design-for-Robot-Assembly (GDfRA) involves generating and physically constructing a design based on a natural language prompt (e.g., “giraffe”) and an image of available physical components. Blox-Net is a GDfRA system which leverages a Vision Language Model, physics simulator, and real robot to generate and reliably assemble designs resembling a text prompt from 3D printed blocks with no human input. Blox-Net achieves a 99.2% block placement success rate!

Blox-Net Design Generation

Blox-Net generates designs by sequentially prompting GPT-4o to develop a design plan and place blocks in a simulator. Then, GPT-4o receives visual and stability feedback from the simulator and makes adjustments. This process is performed 10 times in parallel and GPT-4o is used to select the design which looks the best. An automated perturbation redesign process introduces tolerances and improves stability to improve the constructability of the design.

Additional Generation Examples

Citation

If you use this work or find it helpful, please consider citing: (bibtex)

@misc{goldberg2024bloxnet,
    title={Blox-Net: Generative Design-for-Robot-Assembly
    Using VLM Supervision, Physics Simulation, and a Robot with Reset}, 
    author={Andrew Goldberg and Kavish Kondap and Tianshuang Qiu and Zehan Ma and Letian Fu and Justin Kerr and Huang Huang and Kaiyuan Chen and Kuan Fang and Ken Goldberg},
    year={2024},
    eprint={2409.17126},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2409.17126}, 
    }

Credit: The design of this project page is heavily based on LERF.