We present a real-time visual-inertial localization approach directly integrable in a wearable immersive system for simulation and training. In this context, while CAVE systems typically require complex and expensive set-up, our approach relies on visual and inertial information provided by consumer monocular camera and Inertial Measurement Unit, embedded in a wearable stereoscopic HMD. 6-DOF localization is achieved through image registration with respect to a 3D map of descriptors of the training room and robust tracking of visual features. We propose a novel efficient and robust pipeline based on state-of-the-art image-based localization and sensor fusion approaches, which makes use of robust orientation information from IMU, to cope with camera fast motion and limit motion jitters. The proposed system runs at 30 fps on a standard PC
and requires very limited set-up for its intended application.