Hey looks very promising. I've been up to my ears in code yesterday and today so I have not been able to watch the entire video from end to end but I did jump through most of it and it looks interesting.
I'm trying to figure out if there is a way to get the second "alternate" from autovoice. I notice that a lot of the time the recognition fails because it is using Google's recognition which does not know what the correct phrases are, so for example, when you say "Jarvis no" it just makes a wild guess that you said "Jarvis know" which VC would never do. This is why VoxWav and VoxCommando generally work a lot better, but since you are trying to do this remotely I can see that it is worth trying to make it work an alternate way. It may be possible to send VoxCommando multiple recognition phrases and it will go through them until it finds a valid phrase. Most likely Google is returning both "Jarvis know" and "Jarvis no" as possible phrases, but guessing wrong about which one is the most likely match. I am going to try to take a look at autovoice to see if there is another variable that can be used. I notice that avcomm is the "first match" or something which would imply that there are more possible matches. Do you know if these are available to Tasker as variables?
When I first started making VoxWav I created an app called VoxCommandroid which used the Google recognition engine. It employed this technique of sending multiple matches to VoxCommando. I ultimately did not continue development though because I was not impressed by the poor results I was getting compared to a microphone with the VC engine so I switched to steaming audio which was much more difficult to code but ultimately yielded the results I was looking for.
For now your setup looks like a good choice for anyone that wants to talk to VC from a remote location, but it would be nice if we could increase the accuracy a bit.