High quality, resolution, and efficiency
TRELLIS.2 generates fully textured assets with high fidelity and efficient generation, supporting 512 cubed, 1024 cubed, and 1536 cubed output regimes.
Native and compact structured latents
An open-source 4B-parameter image-to-3D model for generating up to 1536 cubed PBR textured assets, powered by native 3D VAEs, O-Voxel representation, and 16x spatial compression.
Overview
TRELLIS.2 generates fully textured assets with high fidelity and efficient generation, supporting 512 cubed, 1024 cubed, and 1536 cubed output regimes.
The model uses native and compact structured latents to preserve geometry and appearance while keeping the representation small enough for large-scale generative modeling.
Training and inference conversion are designed to be rendering-free and optimization-free, making the path between textured meshes and model-ready data more direct.
Key features
TRELLIS.2 handles open surfaces, non-manifold geometry, and enclosed interior structures. It also supports rich material attributes including base color, roughness, metallic, and opacity for physically based rendering and photorealistic relighting.
Total shape and material generation time reported on NVIDIA H100.
Higher-resolution textured asset generation with compact structured latents.
Large PBR asset output while preserving shape and material detail.
Tech innovations
Meshes are transformed into O-Voxel, a field-free sparse voxel structure that encodes precise geometry and complex appearance together.
Geometry uses flexible dual grids for arbitrary topology and sharp edges, while appearance stores PBR attributes for realistic material behavior.
SC-VAE directly compresses voxel data with sparse residual autoencoding, reaching 16x downsampling and roughly 9.6K latent tokens for 1024 cubed assets.
Textured mesh to O-Voxel conversion can run in under 10 seconds on a single CPU, while O-Voxel back to textured mesh can complete in under 100ms with CUDA acceleration.
Research snapshot
High-resolution image-to-3D generation for fully textured assets that can preserve both geometry and material detail.
O-Voxel unifies geometry and appearance in a sparse structure, avoiding the constraints of iso-surface fields.
SC-VAE makes the latent space compact enough for scalable modeling while maintaining negligible perceptual degradation.
The project is presented for academic and research exploration of 3D generation technologies, with responsible AI considerations included in the research process.
FAQ
TRELLIS.2 is an open-source 4B-parameter image-to-3D model focused on native and compact structured latents for high-fidelity 3D generation.
O-Voxel encodes geometry and appearance together, including shape structure and PBR material attributes such as base color, metallic, roughness, and alpha.
Spatial compression reduces the latent representation so large, textured 3D assets can be modeled efficiently without giving up meaningful perceptual quality.
The method is designed for arbitrary topology, including open surfaces, non-manifold geometry, and interior structures.