Skeleton Tracking in Isadora 3 with OpenNI Tracker : TroikaTronix

The new OpenNI Tracker Actor and Skeleton Decoder Actors fully integrate body tracking into Isadora 3. Once you’ve installed the plugins, there is no need to run third party software, configure OSC or MIDI data, or set up Syphon feeds.

This tutorial will walk you through setting up and using the Body Tracking actors for the first time, and demonstrate a few handys techniques.

Please keep in mind that this is a complex actor that we would generally recommend for intermediate or expert users. This tutorial is designed to help beginners to get up and running as quickly as possible, but working with body/ skeleton tracking devices requires both time and commitment to learn. If you are a more of a beginner, then we strongly advise you to start with some simple projects and build up your skills over time. It is unreasonable to expect yourself to master this topic in one sitting!

The main part of this tutorial starts with the section entitled “Start Tracking with the OpenNI Tracker Actor”. Before we get started, we want to tell you how to install the plugins, about the tracking cameras supported by OpenNI Tracker, and to offer some advice about inviting a friend to help so you can learn these plugins more quickly.

Finally, if you have any problems getting OpenNI to work with your depth camera, please go through the steps outlined in our OpenNI Tracker Troubleshooter to ensure everything is set up as it should be.

How to Install the OpenNI Tracker and Associated Plugins

Download the OpenNI Tracker Plugins from the TroikaTronix Website (link)
Unzip the package to your desktop or another easily accessible folder on your computer
Open Isadora and from the Help menu choose Open Plugins Folder > TroikaTronix Actor Plugins
Copy the following three plugins from the unzipped folder into the actor plugins folder that appears:
- OpenNI Tracker.izzyplug
- Skeleton Decoder.izzyplug
- Skeleton Visualizer.izzyplug
Restart Isadora

The plugins are now installed and ready for use.

About the Supported Cameras: Microsoft for Xbox 360, Orbbec Astra Pro & Astra Mini, and Intel Realsense D435

Currently the OpenNI Tracker actor support these depth sensor cameras:

Microsoft Kinect for Xbox 360 (aka the Kinect v1, models 1414 and 1473 only)
Microsoft Kinect for Xbox One (aka Kinect v2
Orbbec Astra
Intel Realsense D435.

Below we will briefly discuss some of the pros and cons of each sensor.

Microsoft Kinect for Xbox 360 (aka the Kinect v1)

The Microsoft Kinect for Xbox 360 (aka the Kinect v1) is the ubiquitous tracking camera introduced in 2005 that brought skeleton tracking to the masses. It has long been out of production, but you can often buy them used on eBay or similar sites for a very reasonable price. It is very important that you purchase the accompanying power supply, as it will not connect to your computer otherwise!

To ensure you are working with a model 1414 or 1473 Kinect, please look at the label on the bottom of the stand for the model number, here emphasized here by the red square.

Orbbec Astra Pro and Astra Mini

The Orbbec Astra Pro and Astra Mini are still in production and thus can still be purchased. (The price at the time of this writing was US 149.95.) The sensing system is almost identical to the Kiinect for Xbox 360 and so it works very well with the OpenNI Tracker actor. Another advantages of the Orbbec Astra is that it does not require an external power supply.

The Astra Pro has the advantage of offering a higher resolution 1280p RGB color image but is limited by the fact it cannot provide this stream directly to the OpenNI Tracker actor. (Isadora will, however, see it as a normal web cam in the Live Capture Settings Window.)

In contrast, the Astra Mini does provide the RGB color camera stream directly to the OpenNI Tracker actor with its limitation being that the color image is 640x480 only.

Intel Realsense D435

The Intel Realsense D435 depth sensor uses a different technology than the Kinect or Astra, relying on a stereo pair of active infrared cameras to generate the depth map image. In our tests, it is not as reliable at acquiring bodies for skeleton tracking, but it does offer a big advantage by offering frame rates as high as 90 fps. That means lower latency and a higher density stream of tracking data. If you need to perform skeleton tracking on multiple bodies, you are probably better off with a Kinect v1 or an Astra. But if you want to use the depth map image for tracking using Isadora’s Eyes or Eyes++ actors, the high framerates offered by the Realsense could offer a notable advantage.

Special Instructions for the Orbbec Astra

Normally OpenNI will not recognize the Orbbec Astra for skeleton tracking. You can override this behavior by placing the file called "masquerade.txt" into a location where the plugin can find it. The settings in this file cause OpenNI to believe that the Astra is a Kinect, and thus to enable body tracking. Because OpenNI does not normally support this capability, we require that you, the end user, place this file in the proper location manually.

Here are the instructions for MacOS and Windows
macOS:

In Isadora, choose Help > Open Plugins Folder > TroikaTronix Actor Plugins.
Find the "OpenNI Tracker" plugin.
Right click the plugin and choose Show Package Contents.
Open the "Contents" folder and then open the "Resources" folder.
Place this file in the "Resources" folder

Windows:

In Isadora, choose Help > Open Plugins Folder > TroikaTronix Actor Plugins.
Find the "OpenNI Tracker.izzyplug" folder.
Place this file in the "OpenNI Tracker.izzyplug" plugin folder.

Learning to Track Bodies: Get a Friend to Help!

When learning how to use the Body Tracking actors, it is very helpful to work with a collaborator or friend who will physically perform in front of the sensor.

Not only does this save a great deal of time, since you won’t need to step away from the computer to move around yourself when you want to make sure the tracking is configured correctly, it also helps you keep an eye on the skeleton data to determine exactly how the body is being tracked.

Just make sure to treat your collaborator to a coffee or lunch afterwards. Being the “test subject” for body tracking can be an exhausting experience, so thank you friend properly!

Motion Tracking in Isadora with OpenNI Tracker Actor and the Microsoft Xbox Kinect

The new OpenNI Tracker Actor and Skeleton Decoder Actors fully integrate body tracking into Isadora 3. Once you’ve installed the plugins, there is no need to run third party software, configure OSC or MIDI data, or set up Syphon feeds!

This tutorial will walk you through setting up and using the Body Tracking actors for the first time, and demonstrate a few handy techniques.

Let’s get started:

When Isadora is not running on your computer, plug the Xbox Kinect, Orbbec Astra, or Intel Realsense D435 into a USB port on your computer. If you are working with a Kinect for Xbox 360, then make sure to connect it to its power supply and to connect the power supply to a power source.
Launch Isadora
Add the OpenNI tracker actor to a new Scene.

Note for users on machines with USB-C ports: You will need to use an adapter to plug the Kinect into your computer. In our testing, we discovered many current-generation Apple computers failed to detect the Kinect when using third-party USB to USB-C adapters. We had similar inconsistencies with many USB extension cables. However, we found reliable success using the first-party Apple USB-C VGA Multiport adapters in these cases. While these adapters are much more expensive than third party alternatives, we have not had much luck with cheaper adapters thus far.

The OpenNI tracker actor.

By default, the OpenNI Tracker automatically detects and initializes a connected motion tracking device. If you’re only using one sensor, no other steps are necessary. Leave the ‘sensor uri’ field blank, and keep the ‘sensor index’ set to ‘1’. (When using multiple devices, you can use these actor properties to determine which sensor the OpenNI Tracker uses)

Note: There are a lot of inputs that allow you to customize the behavior of this actor – too many to cover in this initial tutorial. The default value for these inputs are carefully chosen to work in the vast majority of situations. But if you want to learn more about what each of these inputs does, please take the time to read “Open NI Tracker Reference.rtf” which is included with the plugin download.

Once added to the Scene (or when entering a scene), the OpenNI tracker usually takes 1-2 seconds to initialize; with some Kinect for Xbox 360 models it can take as long as 15 seconds or more – so please be patient! The ‘status’ output will show ‘ready’ to indicate that the motion tracking device was detected.

If the status output doesn’t show ‘ready’, try the following steps based on the message received:

openni init err, tracker init err, device init err: Something went wrong initializing the OpenNI Tracker or the sensor device itself. Use the ‘reset’ input on the OpenNI tracker to try again. If you continue to experience this error, it may indicate an issue with the connection between the device and your computer. There may be a problem with the sensor itself, adapters, or the USB drivers on your computer.
no devices: No motion tracking device are connected to your computer. If you see this message, close Isadora. Ensure your device is securely connected to your computer and powered on. Then restart Isadora and try again.
initialize, delete device, init device, init tracker: These outputs should only appear for a few moments when the OpenNI tracker is starting up. If the OpenNI Tracker continues to display this status, use the ‘reset’ input on the tracker to restart the process and try again.
inactive: the sensor has not yet been initialized.

Once we’re ready, let’s get a quick visualization of the depth image to see what our sensor is looking at:

Add a Projector actor to the Scene
Connect the ‘depth video’ output of the OpenNI Tracker to the ‘video’ input on the Projector actor
Choose Output > Show Stages or Output > Force Stage Preview to see the resulting image on the stage.
The Depth Image from the camera appears.

Several input properties on the OpenNI tracker allow you to customize the appearance of this depth image.

‘capture mode’ input: Many motion sensing devices, including the Microsoft Kinect for Xbox 360, feature both depth cameras and color or 'RGB’ webcams. The OpenNI Tracker can receive video from one camera at a time or both cameras simultaneously, specified by the ‘capture mode’ actor input. (Note that the Orbbec Astra Pro will not provide the RGB Color stream via the OpenNI Tracker actor.)

NOTE: Whenever you change the ‘capture mode’, OpenNI Tracker must disconnect and reconnect to the sensor. As mentioned above, acquiring the sensor may take a long time, especially with certain models of the Kinect for Xbox 360. This input is probably not one you should change interactively.

The ‘depth video’ output supplies the depth camera video feed, and the ‘color video’ output supplies the RGB webcam video feed. Both outputs may be connected to any of Isadora’s video processing actors and the Projector actor. Notably, you can connect the ‘depth video’ output to Isadora Eyes or Eyes++ tracking actors. While these actors will not perform skeleton tracking, Eyes++ offers the advantage of tracking as many as 16 objects. Use of the depth stream with Eyes or Eyes++ will be the subject of a forthcoming tutorial.
‘resolution’ input: You can set the resolution of the depth and/or color streams by choosing one of the options from the ‘resolution’ input. (Remember, right clicking on an input that contains a list of options will display a popup menu that shows the available choices.) If the resolution you have chosen is not offered by the sensor to which you’ve connected, OpenNI Tracker will use the closest available resolution.
IMPORTANT! body and/or skeleton tracking will only function if the resolution is 640x480 or 320x240 pixels!
‘fps’ input: The ‘fps’ input allows to set the frame rate of the sensor. Generally, you can leave this set to 30fps, but some cameras (like the Intel Realsense) offer frame rates as high as 90fps at certain resolutions. If the frame rate you have chosen is not offered by the sensor to which you’ve connected, OpenNI Tracker will use the closest available resolution.
‘mirror’ input: Turn this input on to flip the image horizontally if you want are pointing the sensor at yourself and the output image appears backward to you.
‘body tracking’ input: If you want to perform body and/or skeleton tracking, leave this input on. If you only want the depth map output, turning this input off disables all body and skeleton tracking features and will improve performance.
‘skeleton tracking’ input: If you want to find skeletons within the tracked bodies, leave this input on. But, if you only need OpenNI Tracker to identify and colorize the sensed bodies, turning this input off will improve performance.
‘output depth’ input: If you want to track skeletons, but do not require the depth image, turning this input off will improve performance.

The OpenNI tracker can track a maximum of six bodies at a time. Note that each additional tracked skeleton increases CPU usage — tracking multiple bodies at the same time with full skeletons can have a significant impact on performance.

Compositing using the Depth Image

You can easily use the ‘depth video’ output to produce composite images using the shape of the human body as a mask for another video stream, resulting in a “video silhouette.” We’ll use the Threshold actor and the Add Alpha Channel actors to create a simple yet effective visual effect.

On the OpenNI Tracker actor, set the ‘colorize bodies’ actor property to ‘hide bkg’. This removes the background of the depth image: only the shapes of the bodies being tracked appear in the depth image.
Set the ‘draw skeletons’ input to ‘off’. We don’t want the white lines of the skeleton appearing on the depth image for this exercise.
Set the ‘skeleton tracking’ input to off to improve efficiency.
a. Set the ‘threshold’ value to ‘5’.
b. Choose solid white for the ‘bright color’
c. Choose solid black for the ‘dark color’
Connect the ‘depth video’ to the ‘video in’ of the Threshold actor. Then, configure the threshold actor as follows:
Next, connect the ‘video out’ of the Threshold actor to the ‘mask’ input on an Add Alpha Channel actor.
Add a movie player, picture player, or any other video renderer actor to your scene. Play a movie or image using this actor, and connect its output to the ‘video’ input on the Add Alpha Channel actor. In this example, I used a GLSL shader actor to generate a seascape.
Connect the video output of the Add Alpha Channel actor to the input on a Projector actor. Your patch should look something like this (replace the GLSL shader actor with whatever actor you used in step 6)
Choose Output > Show Stages to see the visual results.
Stand in front of the sensor. You will see nothing on stage if a person is not being tracked.
The result is that the video you are playing perfectly fills the silhouette of the tracked human body:

Skeleton Tracking with the Skeleton Decoder

We’ve seen how to use the body tracking feature of OpenNI Tracker to composite an image. But if you wish to know the location of the body and it’s part in space, you’ll need to use a companion actor called Skeleton Decoder.

While we can perform lots of powerful compositing using the depth image, the Skeleton Decoder is used to process motion tracking data.

When the OpenNI Tracker tracks a human body, it draws a ‘skeleton’ composed of fifteen different points:

Head
Neck
Torso
Left shoulder
Right shoulder
Left elbow
Right elbow
Left hand
Right hand
Left hip
Right hip
Left Knee
Right knee

Each of these points specifies a point in 3D space: the horizontal location (x), the vertical location (y) and the depth location (z).

To help you visualize the skeletons currently being tracked by OpenNI Tracker, you should leave the ‘draw skeleton’ input turned on and the ‘depth video’ output connected to a Projector actor so you can see it on Isadora’s stage. Whenever the tracker senses a skeleton, it will be drawn on to the ‘depth video’ output. This feedback will give you a reliable indication of the skeletons that are being tracked and how well the tracking algorithm is doing. (For example, if the feet of the body are outside the frame, the legs of the skeleton will be shortened and may behave erratically.)

The skeleton point data is available at the various ‘skeleton’ outputs regardless of whether or not the ‘draw skeleton’ input is on or off.

‘max bodies’ input: This input determines the maximum number of bodies to be tracked at one time, from 1 to 6. We consider tracking multiple bodies an advanced topic. So, if you are just starting to experiment with body tracking, we strongly recommend that you limit yourself to tracking a single performer by setting ‘max bodies’ to ‘1’. When you do this, you’ll be sure that the skeleton data s always be sent to the ‘skeleton 1’ output.

When this value is set higher than ‘1’ , the skeleton data might be sent to ‘skeleton 2’ if two bodies were present and one leaves the scene – something that can lead to confusion as you program your patch.

The OpenNI Tracker reports the X, Y, Z coordinates for each point on a tracked body to the ‘skeleton’ output. This means that each point on the human body produces 3 pieces of data. In total, a full skeleton produces 45 individual values: the x, y, and z coordinates for all 15 points on the body! All of this data is combined into a special data type called ‘skeleton’, which you can see at the six ‘skeleton’ outputs. The Skeleton Decoder to splits this combined data type into individual values that can can be connected to other Isadora actors.

Ensure the ‘max bodies’ input of the OpenNI Tracker is set to ‘1’ and check to be sure the ‘track skeletons’ input is turned on.
Add the Skeleton Decoder actor to the scene.
Connect the output of ‘skeleton 1’ on the OpenNI Tracker to the ‘skeleton in’ of the Skeleton Decoder.
The Skeleton Decoder’s ‘channels’ input allows you to specify how many points on the body you want to track, You can choose from 1 point to 15 points. Notice as you change this number, more outputs appear on the Skeleton Decoder, and that the name of the associated body part for the new points is shown at the output (e.g., left hand, right foot, etc.)
If you set the ‘channels’ input to show all fifteen points, you can see that it becomes very large which can be unwieldy if you don’t actually need all of those points.
But you don’t need to use all 15 points. We’ve set the order of the joints so that the most commonly used points appear first. For example, if you set the ‘channels’ input to 5, the the Skeleton Decoder will output the head, left and right hands, and left and right feet.
If the default order doesn’t suit your needs you want to receive data from a few specific points on the body, you can use the ‘point sel’ input to select which points you want to receive.For example, if you wanted to track only the left foot, right foot, and torso, you could set the ‘channels’ input of the Skeleton Decoder to ‘3’, and then enter the following into the ‘point sel’ input:left foot, right foot, torsoAfter entering this text, the output names will change to ‘left foot’, ‘right foot’ and ‘torso’. The order of the outputs is determined by the order of the joint names in the ‘point sel’ input.

Working with the Data

At this point, if someone is moving in front of the sensor and you can see a skeleton being drawn on the ‘depth video’ output, you should also see the various numbers at the Skeleton Decoder outputs changing as the person moves. But, What do you do with those numbers? Understanding how to transform those numbers into interactive control will be our next step.

Set the OpenNI Tracker’s ‘skeleton scale’ input to ‘m’ (meters) and take a moment observe the values you see coming out of the Skeleton Decoder. These will be the absolute XYZ position of each body part in space, measured in meters. You might be able to see that as the person moves to the camera’s left, the x values will decrease; if they move to the right, the x values will increase. If they move a body part up, its y value will increase, and so on. Try to get a feeling for the relation of the location of the body in the room and these X/Y/Z values.

Now, it is certainly possible to work with these absolute values. Imagine a room that is 5m across (x), 3m high (y) and 5m deep (z) (15ft x 9ft x 15ft.) Now, imagine defining a “target” at the exact center of this space (x = 2.5, y = 1.5, z = 2.5) and then creating an Isadora patch that measures the distance between of the skeleton’s right hand and the position of this target. You could then use the distance value to control a parameter, e.g., causing a video to fade in or out.

This might be a workable scenario if you had an installation open to the public and wanted viewers to explore and discover these invisible targets in space. But if you want to create a situation where a performer can “play” the media with their body, this solution isn’t a great choice because these imaginary “targets” offer no tactile feedback to the performer.

If you want to offer a performer precise control of the media, we suggest that you work with relative measurements, where a given point on the body is considered in relation to another point on the body. This works because we all have an ability called proprioception: the ability to perceive the location, movement, and action of parts of the body. You can expect a trained dancer to have proprioceptive ability that is incredibly refined compared to the average person.

But please remember this: when you create an interactive scenario like this, you are creating a new instrument – almost as if you were making the first violin and handing it over to someone for the first time. That means you have to be a good instrument maker, ensuring that your instrument works reliably and that it produces predictable results for the performer. Furthermore, the performer will need to learn to play the instrument you have created. (Would you expect someone to pick up a violin, practice with it for 4 hours, and play something with amazing after that short time?) We mention because doing this well takes time: time to program, and time to explore and learn the instrument. Keep this in mind if you intend to use motion tracking in performance!

Powerful Actors to use with Body Tracking

As mentioned above, we find that performers work best when data measurements are generated by a body in relation to itself, rather than to a point in space. To help you achieve these kinds of measurements, we’re going to dig in to three actors that will help you do just that!

Limit Scale Value, Min Value Hold, and Max Value Hold

Understanding value scaling is essential to working well with body tracking data.

One of the most difficult elements of motion tracking is establishing the range of motion — that is, determining the highest and lowest “borders” which you are tracking. For example, if you wanted the movement of the hand to control the volume of a music track, you would need to define which hand position represents zero volume and where the hand would be for maximum volume. Is “zero volume” the hand touching the floor, or the hand held at read by the waist? Is maximum volume the hand held at head height, or the hand held as high above the head as possible? What if the performer is taller or shorter?

This is where the Min Value Hold and Max Value hold actors are particularly useful. These actors help us find the highest and lowest reported values.

Set the channels on the Skeleton Decoder to ‘1’
Enter ‘right hand’ in the point select to track the right hand.
Add a Max Value Hold actor to the scene
Add a Min Value Hold actor to the scene
Connect the “right hand x’ output on the skeleton decoder to the ‘value 1’ property on both the Max Value Hold and Min Value Hold actors.
Now, ask the performer to stand infant of the sensor and move their hand around freely: stretch out, touch the floor, reach wide, reach across the body, above the head, and so on.
The ‘min’ and ‘max’ outputs of the Min Value Hold and Max Value Hold actors now show the range of motion for the X position of the hand.
Repeat steps 3 through 7 for the Y and Z value, adding additional Min Value Hold and Max Value Hold actors each time:
Now, we have determined the full possible range of motion for the hand. We can then use these values with Isadora’s value scaling features and actors such as Limit Scale Value to ‘map’ the range of motion onto other values in Isadora. Let’s try this now with sound.
Import a video file with music, an .mp3 file, or .wav file into Isadora.
If you imported and .mp3 or video file, add a movie player to the scene. If you added a .wav file, add a sound player to the scene. Ensure the player is set to loop and is playing the media.
Let’s add a quick button to reset the Min Value Hold and Max Value Hold Actors:
Add a Keyboard Watcher to the scene.
Set the ‘key range’ to ‘r’.
Connect the “key’ to the ‘reset’ input on the Min Value Hold and Max Value Hold actors.
Now, we can press the ‘r’ key to quickly reset the range when necessary.
Add a Limit Scale Value actor to the scene.
Find the Max Value Hold actor connected to the ‘right hand y’ output value of the Skeleton Decoder. Connect the ‘max’ output to the ‘limit max’ on the Limit Scale Value actor.
Find the Min Value Hold actor connected to the ‘right hand y’ output value of the Skeleton Decoder. Connect the ‘min’ output to the ‘limit min’ on the Limit Scale Value actor.
Connect the ‘right hand y’ output value of the Skeleton Decoder to the ‘value’ of the Limit Scale Value actor.
Connect the ‘output’ of the Limit Scale Value actor to the ‘volume’ of the Movie Player or Sound Player.
Press the ‘r’ key to rest the values.
Stand in front of the sensor. Once detected, raise your right hand as high above your head as possible. Then touch the floor. Now, raising or lowering your right hand should raise or lower the volume of the audio track!

Calc Angle 3D

This actor offers a fantastic way to measure the relationship between any two points on the body.

Set the channels on the Skeleton Decoder to ‘2’
Enter ‘left hand, right hand’ in the point select to track the hands.
Connect the X, Y, Z outputs of the left hand joints to the corresponding X1, Y1, and Z1 inputs of the Calc Angle 3D actor.
Connect the X, Y, Z outputs of the right hand joints to the corresponding X2, Y2, and Z2 inputs on the Calc Angle 3D actor.
Next, connect a Pulse Generator to the ‘trigger’ the Calc Angle 3D actor to report new values.
Set the frequency of the pulse generator to match the fps of the OpenNI tracker (30 hz)
The result — you can calculate the distance between the hands, and even derive the x-y and z-x angles formed by the hands in relation to a vertical line.
Try connecting the ‘dist' output on the Calc Angle 3D actor to input values such as volume, intensity.

3D Velocity

The 3D Velocity actor is perfect for measuring the speed at which a tracked body is moving. By connecting the X, Y, and Z outputs of a single point to the corresponding X, Y, and Z inputs on the 3D Velocity actor, the result is the velocity at which that point is moving. Try using the ‘vel’ output to control intensity, volume, color, or even the frequency of a pulse generator.

How can we help you today?

Skeleton Tracking in Isadora 3 with OpenNI Tracker Print

How to Install the OpenNI Tracker and Associated Plugins

About the Supported Cameras: Microsoft for Xbox 360, Orbbec Astra Pro & Astra Mini, and Intel Realsense D435