As James mentioned, it sounds as though you need to adjust the silence threshold (though it looks like he referred to it as the "volume threshold").
He links to a different video than I did -- I had forgotten about that video. What it explains is quite important if you want to use the always on feature, particularly if you're in a noisy-ish environment. Here is the exact point where the explanation of the always on feature begins:
&feature=youtu.be&t=2m35s
If you are in an environment that has more constant background noise in it, you'll need to experiment with raising the silence threshold.
In terms of the messages you're seeing on the screen, as the video explains, when the mic is on it's always recording/sending the data until it detects silence. Only once it *detects silence* can VC do something with that data. So, if your silence threshold is too low, it will basically assume the background noise is data, keep recording, and never detect silence.
The important message on-screen is not the "sending audio" message, but the subsequent "silence detected" message--and following that, the data about min/avg/max volume thresholds (as explained in the video).
Maybe the reason it seems to be processing the command when you touch the other button is that that is the point at which it recognizes a significant enough change in the audio it's receiving? I don't know about that--seems weird. But the main point is: watch the video and experiment with the silence threshold setting accordingly and let us know if that helps.
Also, maybe you know this, but if you tap on the previously recognized command displayed onscreen (whether or not the confidence was high enough), that also sends the message. Any chance that is what is happening, at least some of the time?