Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - roel_v

Pages: [1]
1
"Open air" and "whole home" microphones / Re: How to interface with TcpMic?
« on: September 20, 2015, 05:16:46 AM »
Hi,

Thanks for your elaborate response. I asked my question with one big implicit assumption, which was that VoxCommander did online processing of incoming sound, i.e. that it would start processing sounds as they came in. Looking back I could have deduced myself that it doesn't work that way, so then I might as well as you mention copy wav files across SMB to the VoxCommando machine. My main reason for interfacing with VoxCommander directly was to reduce latency, but I'm probably getting ahead of myself and should start with something that runs and then shift my focus to making it faster :)

Of course as you say when feeding wav files the problem becomes how to chunk the data. Judging from https://wolfpaulus.com/journal/embedded/raspberrypi2-sr/ , the speed of speech recognition with sphinx on the rPi2 isn't something to write home about. It would mean that I'd have to record all sound, buffer it and also pipe it to sphinx, then as soon as sphinx recognizes a command start word look up what audio came after it, find the next silence, then copy the audio between the start word and the silence to VoxCommando. At which point, to be honest, VoxCommondo doesn't add much any more :) I mean, at that point I might as well do all processing on the rPi and call the home automation server directly. Well maybe for 'free form' recognition Vox Commando would be better (like when I'd say 'Jarvis google how many inhabitants in Ecuador' or whatever) - for me most initial applications are just keyword based though.

Thinking about it though, maybe some of the delay in the video above comes from waiting for a long enough period of silence; and maybe the time it takes to process the sound is linear to the size of the input corpus, in which case it could be made much faster by doing more aggressive chunking and having to recognize only one trigger word. The heavy lifting could then be offloaded to the beefier hardware that VoxCommando runs on, and where it can use faster speech recognition software.

Anyway just thinking out loud here, it's clear that I have a lot of experimentation to do. I have a bunch of work before I'd get to writing software that would directly interface with TCPMic, so no need for you to invest your time in that now - I'll get back to you when I get to that point. Or maybe someone else will beat me to it and I can just use that ;)

Thanks for your time so far.

cheers,

roel

2
"Open air" and "whole home" microphones / How to interface with TcpMic?
« on: September 19, 2015, 05:58:10 PM »
Hi,

I'm experimenting with VoxCommando to drive my home automation system. Getting commands right is the 'easy' part (for some values of 'easy'), my main problem is with getting coverage throughout the house. I have a server in the basement but no way to directly hook up mics to that machine, so in my 'production' build (I'm just testing right now) I'd need 'satellite' mics in each room and a way to transmit audio to VoxCommado. What I'm thinking right now is to have my USB mic (MXL AC404 USB) hooked up to a Raspberry Pi, have the Raspberry do the preprocessing (equalizing, filtering) and then pipe that (the raw audio) to my main server with VoxCommando. I want to write my own (C++) daemon on the Raspberry that will read from the sound card, maybe do the preprocessing through alsa filters or when worse comes to worst write my own signal processing software, and send the raw sound data to the VoxCommando server.

Now I have the following questions:

- The TcpMic input format doesn't seem to be documented anywhere; in fact all documentation just points to a wiki page that doesn't exist any more. Do I just open a socket and send 16-bit PCM data there? Will that also work for an 'always on' situation, i.e. do I just send a never-ending stream of audio data?

- How do I handle multiple 'senders'? Do I need to start multiple VoxCommando instances (haven't tested if that's even possible) or can the TcpMic plugin listen on multiple ports (doesn't look so from the UI)? Would you be willing to modify it so that it can, or give me the source of the TcpMic plugin so that I can build my own custom version?

- Once I get multiple senders working (presuming that's possible), how does that work out in VoxCommando? My goal is of course location awareness - when I say 'lights on' in the office, I want it to turn on the office lights, and same in the living room. Let's assume that no mic can pick up sounds from 'other' rooms. Is there a way to pass the 'source id' to the voice command and have it decide on what actions to perform? My goal in the end is just to do a single GET request to my (home-build) home automation system.

Thanks.

cheers,

roel

Pages: [1]