@shanekuz,
10-4. I really didn't intend to be negative or anything and I apologize if I rambled on about things that you already know and believe.
I am familiar with Polycom from work (conference room mics). I hear what you are saying - as I ran into that exact same issues. I would say my biggest issue was background noise. It could be a tv, washing machine, people in another room, dogs, about anything. Unless I had everything adjusted spot on, had silence in the room, and was positioned properly relative to the microphone (in my case a Kinect or a conference mic I got), my recognition was not good enough. And rarely did I have the ideal conditions. I don't really blame anything. I understand that all sound goes in and I cannot expect it to be smart enough (at this point) to clearly understand my voice amongst all of the other sounds. That's why I have found VoxWav to be ideal. I have great success with it, but then my goal is to be able to just say things and not have to get a phone out.
BTW - I have not even voice trained my Echo. I wonder how much better it would be with training. It is already blowing my mind with how great it works, even with noise. I can ask Alexa to tell me the definition of unusual words and I am surprised how often she gets it right. Pretty much all the time. Unless the noise in the room is particularly bad. I've even had success telling her to do things with the TV on in the same room. I do have to say it louder but it would seem she is able to lock onto my voice and focus on it (my non-technical understanding).
In your case, or even with the Echo, I think it would be great to be able to train in such a way that the device recognizes voices like fingerprints. Run through some kind of training routine with them and have it pick up on certain quantifiable values that can be used to know 'this is a voice, this is what I am supposed to hear, and as little of everything else as possible'. I've seen something similar to this in S.A.R.A.H where you can use a Kinect and it will assemble a profile of you based on imaging, voice pitch, mood, things like that. But I'm thinking even more specific, and using just the voice. -- Just something I have thought about.
Finally, even with a wake word, I would get false positives and not really understand how (coming from TV). A unique wake word should help with that. On a related note, I have watched YouTube videos on my TV about Alexa development, and when the person in the video issues a command, if it is one available to all Echo users, my Echo would pick it up and respond. The other fascinating thing is the timing would be such that she would respond to me almost exactly at the same time as she did for the person in the video.
Shanekuz, what are your plans for the future? Are you planning to try and integrate more with Alexa over time? Are you planning to add anything new to your environment? Thanks again for the responses and please share any tips you have about Echo and Vox. One thing I am interested in knowing is if you are using more than one instance of VoxCommando in your setup and how well routing the packets to the proper UDP listener is working.
Thanks,
tobias.