1
"Open air" and "whole home" microphones / Re: How to interface with TcpMic?
« on: September 20, 2015, 05:16:46 AM »
Hi,
Thanks for your elaborate response. I asked my question with one big implicit assumption, which was that VoxCommander did online processing of incoming sound, i.e. that it would start processing sounds as they came in. Looking back I could have deduced myself that it doesn't work that way, so then I might as well as you mention copy wav files across SMB to the VoxCommando machine. My main reason for interfacing with VoxCommander directly was to reduce latency, but I'm probably getting ahead of myself and should start with something that runs and then shift my focus to making it faster
Of course as you say when feeding wav files the problem becomes how to chunk the data. Judging from https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/ , the speed of speech recognition with sphinx on the rPi2 isn't something to write home about. It would mean that I'd have to record all sound, buffer it and also pipe it to sphinx, then as soon as sphinx recognizes a command start word look up what audio came after it, find the next silence, then copy the audio between the start word and the silence to VoxCommando. At which point, to be honest, VoxCommondo doesn't add much any more I mean, at that point I might as well do all processing on the rPi and call the home automation server directly. Well maybe for 'free form' recognition Vox Commando would be better (like when I'd say 'Jarvis google how many inhabitants in Ecuador' or whatever) - for me most initial applications are just keyword based though.
Thinking about it though, maybe some of the delay in the video above comes from waiting for a long enough period of silence; and maybe the time it takes to process the sound is linear to the size of the input corpus, in which case it could be made much faster by doing more aggressive chunking and having to recognize only one trigger word. The heavy lifting could then be offloaded to the beefier hardware that VoxCommando runs on, and where it can use faster speech recognition software.
Anyway just thinking out loud here, it's clear that I have a lot of experimentation to do. I have a bunch of work before I'd get to writing software that would directly interface with TCPMic, so no need for you to invest your time in that now - I'll get back to you when I get to that point. Or maybe someone else will beat me to it and I can just use that
Thanks for your time so far.
cheers,
roel
Thanks for your elaborate response. I asked my question with one big implicit assumption, which was that VoxCommander did online processing of incoming sound, i.e. that it would start processing sounds as they came in. Looking back I could have deduced myself that it doesn't work that way, so then I might as well as you mention copy wav files across SMB to the VoxCommando machine. My main reason for interfacing with VoxCommander directly was to reduce latency, but I'm probably getting ahead of myself and should start with something that runs and then shift my focus to making it faster
Of course as you say when feeding wav files the problem becomes how to chunk the data. Judging from https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/ , the speed of speech recognition with sphinx on the rPi2 isn't something to write home about. It would mean that I'd have to record all sound, buffer it and also pipe it to sphinx, then as soon as sphinx recognizes a command start word look up what audio came after it, find the next silence, then copy the audio between the start word and the silence to VoxCommando. At which point, to be honest, VoxCommondo doesn't add much any more I mean, at that point I might as well do all processing on the rPi and call the home automation server directly. Well maybe for 'free form' recognition Vox Commando would be better (like when I'd say 'Jarvis google how many inhabitants in Ecuador' or whatever) - for me most initial applications are just keyword based though.
Thinking about it though, maybe some of the delay in the video above comes from waiting for a long enough period of silence; and maybe the time it takes to process the sound is linear to the size of the input corpus, in which case it could be made much faster by doing more aggressive chunking and having to recognize only one trigger word. The heavy lifting could then be offloaded to the beefier hardware that VoxCommando runs on, and where it can use faster speech recognition software.
Anyway just thinking out loud here, it's clear that I have a lot of experimentation to do. I have a bunch of work before I'd get to writing software that would directly interface with TCPMic, so no need for you to invest your time in that now - I'll get back to you when I get to that point. Or maybe someone else will beat me to it and I can just use that
Thanks for your time so far.
cheers,
roel