Remote collaboration includes any situation where multiple people are working or playing together but some of the participants are not at the same location.
You’re driving through a desert on a vacation and smoke starts to billow out of the hood of your car. You pull over, pop the hood open, and stare blankly at the unfamiliar rental car engine. You don’t know how to find out what’s wrong and have little hope that you’ll be able to fix the problem even if you figure out the problem. You pull out your phone and make a Face Time call to your mechanic friend. Together you work to diagnose the issue with the car but it’s difficult because the phone camera and screen provide only a flat 2D experience and the car engine has nooks and crannies that matter. Your friend has difficulty pointing out specific parts of the engine as they are near, far, under, over, or around other parts. You unscrew the wrong thing and make matters worse.
Augmented Reality (AR) technology allows for the creation of an experience where both the real world and the virtual world may interact and influence each other.
Now consider the situation if you have an AR system that pulls in some data from the real world. You put on your head-mounted display (HMD) such as a HoloLens (that you take with you on vacation everywhere!) and look in to the engine bay. Your mechanic friend is sent a 3D mapping of the engine; as you look around you see screws and bolts in the real world glow yellow through the HMD view as she points and clicks on them from her end of the remote call. You tighten the right screw and the day is saved. You fix your engine with the help of your trusty mechanic friend, AR, and remote collaboration.
This is an ideal scenario and there are many challenges to face when developing a remote collaboration system like this. This article will focus on a seemingly small part of the above scenario: remote embodied interaction.
An interaction is Embodied when it incorporates some aspect of the real world—physical constructs and/or social relationships—into the interaction.
As your friend chooses which screws and bolts to point out to you there are actually many possibilities for how that communication is handled in your AR experience. The yellow glow is an example of a remote embodied interaction, your friend might have simply touched the screw on her touch-screen display but that physical interaction needs to be translated and then remotely communicated to you. This can be done in many ways—a virtual hand points at parts of the engine, a tiny virtual avatar sits atop your engine and follows your friends movements, your friend draws lines into your view, and many other possibilities. Lets look at some of these solutions for remote embodied interaction that are being explored in research and commercial products.
Image from: http://leapmotion.com
One approach to deliver a remote participants interactions is through tracking and creating a virtual simulation of the person’s limbs—commonly the hands and/or arms. The current trend (2019) is to use hardware dedicated to tracking the hands such as Leap Motion. Leap Motion and similar hardware try to track your hands in a 1-to-1 way (Leap Motion uses 2 IR cameras and IR led light sources) and sometimes provide additional software to perform inverse kinematics to help determine the location of fingers and forearms. While hands are tracked 1-to-1 in these systems there is still a choice about what is remotely shown.
Teo et al.  render a remote user’s hands in a 1-to-1 sense—in both location of the hands and tracking of fingers—using the Leap Motion. They found that enhancing the virtual hands with additional powers, such as ray pointing or drawing annotations, was beneficial in a preliminary user study. User’s that used only “natural” hand gestures performed tasks more slowly and users that had extra cues (i.e. ray pointing and drawing) made less errors in picking objects.
Another approach from Feick et al.  is to use the tracking of the hands as input to manipulate a remote virtual object. In this preliminary work, a remote expert manipulates a virtual model by placing their hand inside the mesh of the model and closing their fist (a grabbing motion and pose), from there the rotation of their fist directly rotates the virtual object for the remote worker. In this case, showing the virtual hands during the rotational fist pose is likely unnecessary—or potentially confusing—the important embodied information is the rotation of the object and not the visual aspect of the remote expert’s hands.
In normal collaboration we naturally use many hand based gestures so communicating those 1-to-1 is understandable to the observer
We naturally use hand gestures so there isn’t any learning required to make those gestures (if no additional “powers” are included)
Hand gestures can be misinterpreted
Current tracking hardware can be limited
Image from: http://empathiccomputing.org/project/mini-me/
Moving beyond disembodied limbs both research and commercial products have used various forms of more full-bodied 3D avatars to represent remote participants in AR. In the simplest form, avatars try to place a remote user inside the remote workspace so that their collaborators can see them as if they were really there in the workspace together. For example, Spatial (below video)  is a commercial product that lets users collaborate remotely across various devices including AR HMDs such as the HoloLens. Spatial attempts to recreate the remote user through using techniques such as texture mapping the user’s picture onto the avatar and moving the avatars mouth when the remote user speaks.
Current research in to using avatars in mixed reality scenarios looks at how we might move beyond the 1-to-1 representation of user-to-avatar. The examples below from show cases where altering the scale, position, and orientation of an avatar can be useful and lead to new possible interactions. For instance, a remote user in Snow Dome can transform themselves to be a “giant” so they can grab and move trees in the scene.
Piumsomboon et al.  created Snow Dome, a system that experimented with both miniature and giant-size avatars for remote participants. The AR user can see their remote collaborator (who is using VR) as a 3D avatar at a miniature scale if the remote user enters the dome to interact with the miniature objects. Alternatively, the remote user will be seen as giant to the AR user if the remote user scales the snow dome down to interact with it.
Piumsomboon et al.  also created Mini-Me, an avatar that dynamically adjusts in size, position, and orientation based on both the remote worker and the AR viewer. As the AR user looks at different surfaces the avatar of the remote user is positioned (e.g., standing on a table) and sized on those surfaces to fit into the AR view. Additionally, the pointing gesture and head direction of the virtual avatar are adjusted so that despite the avatar being moved around the scene it will continue to point and look at whatever the remote user is pointing at and looking at.
Feeling of co-presence with collaborators
Can provide a “natural” way to communicate remote user position and orientation in the scene
What can we expect to see in the future of communicating a remote user’s embodied interactions? Certainly we will see improvements in tracking technology. Tracking down to the fine motor movements of a user’s hand and fingers is sure to be useful in some remote collaborative scenarios but I see potentially more intriguing opportunities for how we might alter the representation of that movement rather than simply playing back the tracking 1-to-1 (user-to-avatar or user-to-virtual limb). For instance, in real life it can be difficult to see what someone is pointing at in the distance. Sousa et al. (below)  in Warping Deixis warp the user’s avatar so that their pointing gestures can be more accurately followed when compared to the real life scenario.
This research was performed in VR and it would be interesting to see if similar enhancements to pointing gestures would carry over to an AR experience. We can also think of other potential gestures that may be possible to enhance through modifying or warping a tracked remote collaborator. For instance, people sometimes “nod” their head/chin towards objects as a kind of pointing but in real life there are times this can be ambiguous. Maybe we can turn this head nod in to a more accurate virtual pointer for remote collaboration?
Another opportunity for remote embodied interactions in an AR scenario is for collaborators to be able to record and play back actions. For instance, a remote expert could record themselves doing an action and then choose where to place the recording in the scene for their remote collaborators. Additionally, a user could then modify the recording in various ways such as using slow motion or changing the scale. This could be useful for teaching scenarios where learning requires watching actions repeatedly.
Student's in class please read Mini-Me: An Adaptive Avatar for Mixed Reality Remote Collaboration https://doi.org/10.1145/3173574.3173620
Theophilus Teo, Gun A. Lee, Mark Billinghurst, and Matt Adcock. 2018. Hand gestures and visual annotation in live 360 panorama-based mixed reality remote collaboration. In Proceedings of the 30th Australian Conference on Computer-Human Interaction (OzCHI '18). ACM, New York, NY, USA, 406-410. DOI: https://doi.org/10.1145/3292147.3292200
Martin Feick, Anthony Tang, and Scott Bateman. 2018. Mixed-Reality for Object-Focused Remote Collaboration. In The 31st Annual ACM Symposium on User Interface Software and Technology Adjunct Proceedings (UIST '18 Adjunct). ACM, New York, NY, USA, 63-65. DOI: https://doi.org/10.1145/3266037.3266102
Thammathip Piumsomboon, Gun A. Lee, Jonathon D. Hart, Barrett Ens, Robert W. Lindeman, Bruce H. Thomas, and Mark Billinghurst. 2018. Mini-Me: An Adaptive Avatar for Mixed Reality Remote Collaboration. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18). ACM, New York, NY, USA, Paper 46, 13 pages. DOI: https://doi.org/10.1145/3173574.3173620
Spatial 2018. Retrieved from https://www.spatial.is
Maurício Sousa, Rafael Kuffner dos Anjos, Daniel Mendes, Mark Billinghurst, and Joaquim Jorge. 2019. WARPING DEIXIS: Distorting Gestures to Enhance Collaboration. In CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland Uk. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3290605.3300838
Thammathip Piumsomboon, Gun A. Lee, and Mark Billinghurst. 2018. Snow Dome: A Multi-Scale Interaction in Mixed Reality Remote Collaboration. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (CHI EA '18). ACM, New York, NY, USA, Paper D115, 4 pages. DOI: https://doi.org/10.1145/3170427.3186495
Sergio Orts-Escolano, Christoph Rhemann, Sean Fanello, Wayne Chang, Adarsh Kowdle, Yury Degtyarev, David Kim, Philip L. Davidson, Sameh Khamis, Mingsong Dou, Vladimir Tankovich, Charles Loop, Qin Cai, Philip A. Chou, Sarah Mennicken, Julien Valentin, Vivek Pradeep, Shenlong Wang, Sing Bing Kang, Pushmeet Kohli, Yuliya Lutchyn, Cem Keskin, and Shahram Izadi. 2016. Holoportation: Virtual 3D Teleportation in Real-time. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology (UIST '16). ACM, New York, NY, USA, 741-754. DOI: https://doi.org/10.1145/2984511.2984517