Hand Tracking

For specifically how to design with hand tracking, please also look in the Interaction Design section. For how to align input with interfaces, follow up with the Interface Elements and Behaviors as well as the Button States and Object Manipulation sections.

Let's talk for a second...

TL;DR it's a rant about how hard this input is to design for at this present moment

Look, I'm aware of the benefits of hands. People in this industry cannot get enough of hand tracking. It is one step closer to the Truly Immersiveβ„’ place that we want VR/AR to be in, and we as an industry are working super hard to make fetch happen. Hand tracking lets you touch digital stuff outside of your normal range of motion, it requires no additional hardware, helps enrich social presence, and your hands are free to adjust and attend to other physical objects.

I'm not saying we can't get there and we shouldn't try, I'm just saying we're not there yet.

The Challenges

There are a TON of complications that come up when designing experiences for hands. Thanks to sci-fi movies and TV shows, people have exaggerated expectations of what hands can do in VR/AR.

There are inherent technological limitations like limited tracking volume and issues with occlusion

Virtual objects don’t provide any haptic feedback that we rely on when interacting with real-life objects and it breaks presence incredibly easily when things go wrong with object interactions

Hand tracking has a high rate of accidental triggers due to the fact that a lot of people are just effusive talkers and fingertips are hard to exactly pinpoint

Sensitivity to unnatural hand movement (body cringe) is incredibly prevalent as there can be a digital glitch that appears to be a kind of physical break in the way your tracked hand is represented in virtual environments

But the main three reasons why I truly ***** and moan so much about this input?

Human Embodiment, Perception, and Ergonomics

Human embodiment and ergonomics are really let down by current hand tracking because of the lack of fidelity that happens in current VR/AR systems. Hands (and fingers) are particularly good at dexterity, which involves both precision and flexibility. Just take for example that for the past two decades we've been learning to type exclusively with our thumbs in quite literally an area no bigger than the palm of our hands. This level of small, accurate microinteractions is what makes up a good chunk of the usage of our hands. We are not currently in a place where we can say that VR/AR will reasonably catch up with that expectation without a massive assistance from hardware and machine learning adaptations.

Human perception is actually quite a bit more difficult to pin down. Specifically because the way people pick up and interact with objects is a deeply personal way. Individuals have physical, psychological, cognitive, behavioral, relational, and situational behaviors that they exhibit throughout their lifetime. Not only that - these behaviors could be because of familial, cultural, or societal values that have been taught as expectations in order to fit in. That's a LOT of adaptation!

There isn't currently a standard for many physical-world interactions, from turning on a shower to turning on a car, and we're going to be designing a system that has to (at least allegorically) replicate all of it!

Look, I know things will get better...

We've come SO far and I have no doubt in 5 years (maybe even less) I'll be eating my words. I hope I do. Designing for hands is exhausting work. Demanding people use their hands in a brand-approved but totally-relatable way to play with objects that aren't really there is also, not how I'd like to imagine spending my time on this planet. I truly hope we get to a place where not only does hand input work, it is a cornerstone into how this technology is adopted globally.

Now, back to our typical programming...

Basics of Hand Tracking

There are two main ways hand tracking is used, and I talk about it more and better in Interaction Design, but here's a couple sentences and some pictures:

Direct manipulation is an input model that involves touching content directly with your hands. The idea behind this concept is that objects behave just as they would in the real world. Buttons can be activated simply by pressing them, objects can be picked up by grabbing them, and 2D content behaves like a virtual touchscreen. Direct manipulation is affordance-based, meaning it's user-friendly. There are no symbolic gestures to teach users. All interactions are built around a visual element that you can touch or grab. It's considered a "near" input model in that it's used for interacting with content within arms reach.

Point and commit with hands is an input model that lets users hover, select, and manipulate out of reach content. This "far" interaction technique is unique to VR/AR because humans don't naturally interact with the real world that way.

Audio and Visual Cues

Haptics provide an incredible amount of data to a user that’s integral to an experience, so you have to make up for it with an over-inclusion of audio and visual cues. To make up for it, make sure every single place that could have a haptic UI interaction has a spatialized sound effect at that location. If I know anything about audio (of which I know close to nothing) it's the importance of using audio cues in repllacing the positional awareness gained by haptics.

Audio is of course important, but it also must be followed along with visual cues that relay information about the interaction itself. This often comes up more in AR, but very frequently you'll find user frustration pools around the inconsistenty of how the device is percieving their hands. The first key is to showcase some type of visual information that the hands are being tracked. This could include a skeleton, an actual hand model, a point on the finger tip, particle effects dispersing - anything that clues your user into the idea that the device is registering their hands. In this way, your affirming a form of neutral or passive interaction where players aren't actively performing any action but they are aware of themselves as a form of input.

The biggest of props to the designers at Ultraleap (Leap Motion). For absolutely everything they've ever put out, but also these blog posts! They were doing this in 2016 & 2017! My god.

What Makes a Spoon a Spoon? Form and Function in VR Industrial Design

Building Blocks: A Deep Dive Into Leap Motion Interactive Design

Design Sprints at Leap Motion: A Playground of 3D User Interfaces

Interaction Engine 1.0: Object Interactions, UI Toolkit, Handheld Controller Support, and More

Beyond Flatland: User Interface Design for VR

Obviously I just did call out Keichii Matsuda, Barrett Fox, and Martin Schubert for their amazing work above (please let me know if I missed more Leap Motion designers because I WILL name them all!). I also want to give credit to Luca Mefisto and Jonathan Ravasz for their work as well!

Last updated