Hi all,
I looked through the posts and I don't see much interest in using an Echo as a mic for VC. Despite that I would still like to run this by you. If I could successfully use VC with Echo, it has the potential to be very cool. The Kinect drives me crazy and VoxWav is by far the best way to communicate but that requires using the phone which ideally I'd like to just be able to speak and not get the phone out.
To keep it short as possible I'm not going to go into details on the Alexa Skills Kit (what Amazon setup to allow developers to extend what the Echo can do).
Consider this an experiment. Will it work kind of thing. Skipping over how Echo works for custom development I will say this. There is code on GitHub where a person has bypassed the whole Amazon way and is just having the Echo send back what was heard as text to his HTTP endpoint on his home network. From there he parses the text to figure out the intent looking for keywords. From there a module is loaded to handle the speech designed for the particular intent. My thoughts right now are on using this with Emby. So here is what I am thinking.
I say Alexa, ask Emby to open (that is sent to Amazon AWS, a Lambda function I create does what I want, and the text is sent back to my listening web server). Alexa can do conversation so she responds, 'ok, which room?'. I say 'game room' or 'bedroom' (I have multiple instances of Emby and VC). This will then give my code on my endpoint the info it needs to know which address and port to send commands to the VC UDP listener. Next, I say Alexa, ask Emby to launch Media Browser Theater. The text comes back to my endpoint, which now knows what is going on and where to send, and evaluates the text to clean it up if needed and send to the VC listener. Hopefully, VC then opens Emby theater. Without grinding through the details, that is the basic idea. Triggering a particular intent, coding on my end (the http listener box) to know how to deal with the text coming back, and then trying to send the proper text to the VC listener which will then do what it does.
Technically this should be possible, as far as I know. Speaking from the Amazon side of things at least. All that I said can be done. I do not know yet how well the recognition of dictation speech will work out. I also realize that what code I need to develop may be a pain but based on the examples I've reviewed, I've seen where others are doing this with (according to them) good success. I know that the text going into VC has to be exactly correct to work.
You may say, instead of doing that with an Echo, why don't you do this or this. I am open to ideas. I'm also open to 'did you think about this?' or 'it probably won't work because...'. But in addition my Emby goal, I am helping a person in another state who has all kinds of home automation devices and wants me to do many things for him utilizing these devices. I'm doing it for fun because it is a great learning opportunity for me and with all that he has, there is plenty of room for experimentation.
He has asked me to allow him to control any of 6 Sonos systems with Amazon Echo. I've already seen where people have done this. But I'm hoping to create a web service for the home that doesn't just do one thing based on something someone else did, but look at the big picture and come up with an extensible foundation for the home web service where the Echo can be used in home automation (beyond the built in support). Actually, I'd be extremely happy just to accomplish being able to send viable speech commands as text to VoxCommando.
I'd like to leverage VC again for his Sonos request. So this post is sort of being driven by trying to solve his request, but at the same time I'd love to work with Emby for myself. And I know that in both cases there is the issue of speaking to a device while tv or music is playing, and how hard that can/will be. If you tell the Echo to play Pandora station and then you want to say something else to it, it will hear you. But I imagine that will be tougher once the sound is coming from somewhere other than the echo device. Not to ramble, but she has impressed me with her ability to hear with the tv on, or the dogs all barking, or other noise. Things that I could never manage to do with the kinect. So the Echo seems to be good at picking out my voice in all of the noise.
Thank you all and again, thank you for VoxCommando. It is a truly awesome program.
tobias.