Ok I adjusted the levels and everything works as long as I get the webcam/mic and speak closely to it, which kind of defeats the purpose. It does not recognize commands when the webcam is hanging on top of my monitor. Any suggestions?
Earlier you mentioned that with the lowered input levels your commands were recognized until you started playing music. Is that still the main problem?
You haven't described much about your set-up, so my next point may or may not be relevant: if you're hanging the microphone off of your monitor and the music/audio is also coming out of speakers on your computer monitor (or somewhere near the mic), that's going to be problematic. You always want to be able to minimize background noise and foreground your voice when it comes to speech recognition.
As Kalle says, there's little one can do about that from a software perspective. The issue will be finding a suitable hardware setup. So, if there's a feasible way for you to keep your audio out as far as possible from your microphone, that's the way to go.
That said, if the main challenge is having Vox understand you when there's music playing and otherwise your room is pretty quiet, the one software-related solution you can try is "ducking" (as you probably saw in the FAQ). If you search the forum you'll find a bunch of posts on different ducking options.