to answer your question I would need to know what you mean by "tracking".
In order to use the features of the array microphone, you need to use the sdk, which means you need to use the kinect speech model (as far as I know), which means English only. Hurray Microsoft! <not>
If you are talking about tracking with the camera, then yes, you can use that, and keep using the regular version of VoxCommando which supports other languages, profiles, learning etc.
Edit:
it might also be possible to detect the direction that sound is coming from and turn VC on and off from that, but keep using the regular stereo microphone input with regular VC but I'm not sure it is worth the trouble.